Splitting PDFs by page count with pdfgrep and pdftk

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
iwishiknew
Posts: 4
Joined: 12 Mar 2019 13:38

Splitting PDFs by page count with pdfgrep and pdftk

#1 Post by iwishiknew » 12 Mar 2019 14:44

Hello!
I'm attempting to put together a batch file that will split a pdf consisting of 1 page and 2 page letters into two pdfs based on page count.

What I've got so far is

Code: Select all

pdfgrep -n -r "Id Number: L[0-9]{8}">>grep.txt
for /f "tokens=2 delims=:" %%G in (grep.txt) DO @echo %%G>>pages.txt
del grep.txt
I've attached an example output file "pages.txt"

and on the other end

Code: Select all

pdftk original.pdf cat %1page% output 1page.pdf
pdftk original.pdf cat %2page% output 2page.pdf
The part that I'm missing is where it takes the contents of pages.txt and passes those contents into the 1page or 2page variable based on whether it equals the previous number +1 or not.

I'm sure that there are much more efficient ways to do this, and I'm very open to suggestions, I just wanted to put out what I've got so far. Any suggestions/revisions(/derision?) would be greatly appreciated.
Attachments
pages.txt
(12 Bytes) Downloaded 325 times

iwishiknew
Posts: 4
Joined: 12 Mar 2019 13:38

Re: Splitting PDFs by page count with pdfgrep and pdftk

#2 Post by iwishiknew » 15 Mar 2019 12:57

To clarify, the part that I need help with is looping through the pages.txt file created by the first bat file and assigning the contents to the correct variable.

Something along the lines of:
line 1 = i
line 2 = j
If line 2 = i+1 then add i to %1page%
else, add i to %2page%
delete line 1, loop until eof.

I hope this makes more sense. the attached pages.txt file is a good sample of what the data will look like. It will always be numbers between 1 and 400, in ascending order and consisting of numbers either +1 (one page) or +2 (two page).

Thanks!

penpen
Expert
Posts: 1991
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Splitting PDFs by page count with pdfgrep and pdftk

#3 Post by penpen » 16 Mar 2019 07:32

It sounds lie you only are searching for omething lie that:

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
type "pages.txt"
set "_1page="
set "_2page="
<"pages.txt" set /p "line_1="
set /a "next=line_1+1"
:: line_2 == %%~a
for /f "skip=1 usebackq" %%a in ("pages.txt") do (
echo("!line_1!", "!next!", "%%~a"
	if "!next!" == "%%~a" (
		set "_1page=!_1page! !line_1!"
	) else (
		set "_2page=!_2page! !line_1!"
	)
	set /a "next=(line_1=%%~a)+1"
)

echo(_1page="!_1page!"
echo(_2page="!_2page!"
echo(unassigned: !line_1!
goto :eof
penpen

iwishiknew
Posts: 4
Joined: 12 Mar 2019 13:38

Re: Splitting PDFs by page count with pdfgrep and pdftk

#4 Post by iwishiknew » 18 Mar 2019 08:14

Awesome! Thanks that is definitely a step in the right direction. Thank you so much for your help penpen. Something that didn't occur to me until I was looking at the output, currently it's splitting the list correctly, however, it's only getting the first page of the two page letters, is there a way to add the next page as well to all the 2 page letters

Code: Select all

_1page=" 3 6 7 8 9 10 11 12 13 14 15 20 21 22 25 30 31 32 33 38 43 44 45 46 47"
_2page=" 1 4 16 18 23 26 28 34 36 39 41"
IE, in this example, for each number that was added to the 2 page set, the next number needs to be added as well
1 2 4 5 16 17 18 19... and so forth

Thanks!

penpen
Expert
Posts: 1991
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Splitting PDFs by page count with pdfgrep and pdftk

#5 Post by penpen » 19 Mar 2019 06:51

iwishiknew wrote:
18 Mar 2019 08:14
is there a way to add the next page as well to all the 2 page letters
Yes, of course:

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
type "pages.txt"
set "_1page="
set "_2page="
<"pages.txt" set /p "line_1="
set /a "next=line_1+1"
:: line_2 == %%~a
for /f "skip=1 usebackq" %%a in ("pages.txt") do (
	if "!next!" == "%%~a" (
		set "_1page=!_1page! !line_1!"
	) else (
		set "_2page=!_2page! !line_1! !next!"
	)
	set /a "next=(line_1=%%~a)+1"
)

echo(_1page="!_1page!"
echo(_2page="!_2page!"
echo(unassigned: !line_1!
goto :eof
penpen

iwishiknew
Posts: 4
Joined: 12 Mar 2019 13:38

Re: Splitting PDFs by page count with pdfgrep and pdftk

#6 Post by iwishiknew » 19 Mar 2019 10:12

Fantastic, thanks for your help!

Post Reply