how to calculate count of no.of pages in pdf file-batch file

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
sivasriram
Posts: 12
Joined: 30 Aug 2014 10:55

how to calculate count of no.of pages in pdf file-batch file

#1 Post by sivasriram » 02 Sep 2014 10:43

hello,

I just want to know, how to write a simple code to count no.of pages in pdf file and to notedown count in a seprate excel sheet

Thanks in advance guys :)

Squashman
Expert
Posts: 4488
Joined: 23 Dec 2011 13:59

Re: how to calculate count of no.of pages in pdf file-batch

#2 Post by Squashman » 02 Sep 2014 12:30

1) Batch files cannot natively process a PDF file. You will need a 3rd party utility.
2) Batch files cannot natively read and write Excel files. You will need to use a different scripting language lik VBscript or Jscript.

But a quick Google search probably would have given you a really nice solution.
http://www.techtipsforall.com/2013/08/p ... excel.html

ShadowThief
Expert
Posts: 1167
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: how to calculate count of no.of pages in pdf file-batch

#3 Post by ShadowThief » 02 Sep 2014 12:42

Code: Select all

@echo off
cls

set /p "file_name=File name: "
findstr /R /C:"/Type\s*/Page[^s]" "%file_name%"|find /c /v ""

aGerman
Expert
Posts: 4743
Joined: 22 Jan 2010 18:01
Location: Germany

Re: how to calculate count of no.of pages in pdf file-batch

#4 Post by aGerman » 02 Sep 2014 14:35

@ShadowThief
Your code worked for just the few files that I tested but it may have an issue. FINDSTR does not Support the \s sequence for White Spaces. That means the back slash doesn't have any effect. /Type/Page would be matched as well as /Types/Page while /Type /Page would not.
I'm not familiar with the PDF specification. Hence I don't know if you would ever get into trouble because of that.

Regards
aGerman

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: how to calculate count of no.of pages in pdf file-batch

#5 Post by penpen » 02 Sep 2014 14:42

Nice idea ShadowThief!

Sad to say, that a page could be referenced multiple times. This happens mostly (only?)
to empty pages, i think: I've never seen a PDF with another page doubled.

So you should read the value of the "Count" key of the root page tree node (unique):
The "Parent" key is prohibited in the root page tree node, so it only contains the keys
"Type" (with the value: Pages), "Kids", and "Count".

Because some "bad" whitespace characters (0x00, 0x09, 0x0A, 0x0C, 0x0D, 0x20)
between the keys are allowed i fear you have to analyze the content in hex (maybe
using "fc /B"). (At least the NUL byte is "bad".)

Source:
http://www.adobe.com/devnet/pdf/pdf_reference.html
(Third link).

penpen

ShadowThief
Expert
Posts: 1167
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: how to calculate count of no.of pages in pdf file-batch

#6 Post by ShadowThief » 02 Sep 2014 14:53

aGerman wrote:@ShadowThief
Your code workes for just the few files that I tested but it may have an issue. FINDSTR does not Support the \s sequence for White Spaces. That means the back slash doesn't have any effect. /Type/Page would be matched as well as /Types/Page while /Type /Page would not.
I'm not familiar with the PDF specification. Hence I don't know if you would ever get into trouble because of that.

Regards
aGerman

I just did what I could to port http://stackoverflow.com/a/1751348 into batch. I also can't say that I've spent a lot of time looking at PDF files in a hex editor. Googling PDF header specification seems to indicate that /Type /Page is valid syntax. Doesn't findrepl.bat allow for regex searches? Maybe that can be used.

npocmaka_
Posts: 517
Joined: 24 Jun 2013 17:10
Location: Bulgaria
Contact:

Re: how to calculate count of no.of pages in pdf file-batch

#7 Post by npocmaka_ » 02 Sep 2014 17:22

Somebody asked the same question here before: viewtopic.php?f=3&t=4987


Then I checked how the pdf stores data about pages.
Its always looks like this (despite it's a binary format it can be red with 'type'):

Code: Select all

<</Count 8/Type/Pages/Kids[blah blah]>>


in other words there's an entry(I think it's called like that in adobe documentation) with the following keys :

Type
Pages
Count - here it stores the number of pages.
Kids

All separated with "/" , the main entry does NOT have Parrent key , starts with "<<" , ends with ">>"

In most of the cases it on one line , but the pdf format allows to have new lines before slashes.And order is not mandatory.

Didn't put much effort to make a robust script to handle all possible ways , and not sure if it's possibe...Now it cought my interest again and will try to create a more universal script :-)

sivasriram
Posts: 12
Joined: 30 Aug 2014 10:55

Re: how to calculate count of no.of pages in pdf file-batch

#8 Post by sivasriram » 03 Sep 2014 03:13

ShadowThief wrote:

Code: Select all

@echo off
cls

set /p "file_name=File name: "
findstr /R /C:"/Type\s*/Page[^s]" "%file_name%"|find /c /v ""


the code was not working here :(

it is running fine, but output is not at all appearing...

ShadowThief
Expert
Posts: 1167
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: how to calculate count of no.of pages in pdf file-batch

#9 Post by ShadowThief » 03 Sep 2014 03:33

sivasriram wrote:
ShadowThief wrote:

Code: Select all

@echo off
cls

set /p "file_name=File name: "
findstr /R /C:"/Type\s*/Page[^s]" "%file_name%"|find /c /v ""


the code was not working here :(

it is running fine, but output is not at all appearing...

Oh yeah, you wanted it output to a file, didn't you? Sorry about that...

Code: Select all

@echo off
cls

set /p "file_name=File name: "
echo|set /p=%file_name%,>>pdf_num.csv
findstr /R /C:"/Type\s*/Page[^s]" "%file_name%"|find /c /v "">>pdf_num.csv

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: how to calculate count of no.of pages in pdf file-batch

#10 Post by foxidrive » 03 Sep 2014 03:39

Test this: EDITED for more speed but it fails in some files (no" Count ") and in others it finds too many "Count " terms

Code: Select all

@echo off
for %%a in (*.pdf) do (
find "Count " < "%%a"|repl ".*Count ([0-9]*).*" "$1 pages = \q%%a\q" ax
)
pause



This uses a helper batch file called `repl.bat` (by dbenham) - download from: https://www.dropbox.com/s/qidqwztmetbvklt/repl.bat

Place `repl.bat` in the same folder as the batch file or in a folder that is on the path.

npocmaka_
Posts: 517
Joined: 24 Jun 2013 17:10
Location: Bulgaria
Contact:

Re: how to calculate count of no.of pages in pdf file-batch

#11 Post by npocmaka_ » 03 Sep 2014 04:42

Code: Select all

@if (@X)==(@Y) @end /* JScript comment
@echo off

cscript //E:JScript //nologo "%~f0"  %*

exit /b 0
@if (@X)==(@Y) @end JScript comment */

   var args=WScript.Arguments;
   var filename=args.Item(0);
   var fSize=0;
   var inTag=false;
   var tempString="";
   var pages="";
   
   function getChars(fPath) {

      var ado = WScript.CreateObject("ADODB.Stream");
      ado.Type = 2;  // adTypeText = 2
      ado.CharSet = "iso-8859-1";
      ado.Open();
      ado.LoadFromFile(fPath);               
      var fs = new ActiveXObject("Scripting.FileSystemObject");
      fSize = (fs.getFile(fPath)).size;
                  
      var fBytes = ado.ReadText(fSize);
      var fChars=fBytes.split('');
      ado.Close();
      return fChars;
   }
   
   
   function checkTag(tempString) {
   
   if (tempString.length == 0 ) {
      return;
   }
   
   if (tempString.toLowerCase().indexOf("/count") == -1) {
      return;
   }
   
   if (tempString.toLowerCase().indexOf("/type") == -1) {
      return;
   }
   
   if (tempString.toLowerCase().indexOf("/pages") == -1) {
      return;
   }
   
   if (tempString.toLowerCase().indexOf("/parent") > -1) {
      return;
   }
   
   
   var elements=tempString.split("/");
   for (i = 0;i < elements.length;i++) {
      
      if (elements[i].toLowerCase().indexOf("count") > -1) {
         pages=elements[i].split(" ")[1];
         
      }
   }
   }
   
   function getPages(fPath) {
      var fChars = getChars(fPath);
      
      for (i=0;i<fSize-1;i++) {
         
         if ( fChars[i] == "<" && fChars[i+1] == "<" ) {
            inTag = true;
            continue;
         }
         
         if (inTag && fChars[i] == "<") {
            continue;
         }
         
         if ( inTag &&
              fChars[i] == ">" &&
             fChars[i+1] == ">" ) {
            
            inTag = false;
            checkTag(tempString);
            if (pages != "" ) {
               return;
            }
            
            tempString="";
            
         }
         
         if (inTag) {
            if (fChars[i] != '\n' && fChars[i] != '\r') {
               tempString += fChars[i];
             }
         }
                  
      }
   
   }
   
   getPages(filename);
   if (pages == "") {
    WScript.Echo("1");
   } else {
   WScript.Echo(pages);
   }
 
 


This needs path to the .pdf file as a first argument and simply prints the number of the pages.

Handle the cases when the tag <</Count x/Type/Page ... >> contains "Parent" key and exclude it (not sure if need of this as it seems that the child tags are always after the parent).
Handle the case when there are new lines inside this tag.
Handle the case when there's no <</Count x/Type/Page ... >> thing (could happen for single page documents).
Ignores <<>> tags that contain /Count but are not connected to the number of pages (eg Outlines,Catalogs,MediaBoxes)

Could fail in case of nested tags.

It's not so fast as it reads the pdf symbol by symbol , despite it exits after the number of the pages is found....For sure it can be optimized but not sure if it worth the effort.

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: how to calculate count of no.of pages in pdf file-batch

#12 Post by foxidrive » 03 Sep 2014 08:05

npocmaka, I changed this line

Code: Select all

cscript //E:JScript //nologo "%~f0"  %*
 


with this and it seems to be accurate.

Code: Select all

for %%a in (*.pdf) do echo "%%a"&cscript //E:JScript //nologo "%~f0"  "%%a"



I timed a 10 MB PDF file which has no "Count " inside it and it took 30 seconds.
I think this file may not be legitimate as it has 44 pages in Foxit PDF viewer.

npocmaka_
Posts: 517
Joined: 24 Jun 2013 17:10
Location: Bulgaria
Contact:

Re: how to calculate count of no.of pages in pdf file-batch

#13 Post by npocmaka_ » 03 Sep 2014 08:23

@foxidrive:

Can you tell me what is the pdf version of this file?
there should be something like file properties when its open..

According to this this count entry is required (page 76) - http://wwwimages.adobe.com/content/dam/ ... 0_2008.pdf :


Key Type Value
Type name (Required) The type of PDF object that this dictionary describes; shall be
Pages for a page tree node.
Parent dictionary (Required except in root node; prohibited in the root node; shall be an
indirect reference) The page tree node that is the immediate parent of this
one.
Kids array (Required) An array of indirect references to the immediate children of this
node. The children shall only be page objects or other page tree nodes.
Count integer (Required) The number of leaf nodes (page objects) that are descendants of
this node within the page tree.

npocmaka_
Posts: 517
Joined: 24 Jun 2013 17:10
Location: Bulgaria
Contact:

Re: how to calculate count of no.of pages in pdf file-batch

#14 Post by npocmaka_ » 03 Sep 2014 08:28

Pdf version is on the first line if the pdf file and looks like %PDF-1.6 ...

the only one file that I have without /count entry is %PDF-1.4

May be should add supportability check in the code above :D

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: how to calculate count of no.of pages in pdf file-batch

#15 Post by foxidrive » 03 Sep 2014 08:34

It's %PDF-1.7 here - from a modern Dr Who comic.

Post Reply