Page 1 of 1

Splitting PDFs by page count with pdfgrep and pdftk

Posted: 12 Mar 2019 14:44
by iwishiknew
Hello!
I'm attempting to put together a batch file that will split a pdf consisting of 1 page and 2 page letters into two pdfs based on page count.

What I've got so far is

Code: Select all

pdfgrep -n -r "Id Number: L[0-9]{8}">>grep.txt
for /f "tokens=2 delims=:" %%G in (grep.txt) DO @echo %%G>>pages.txt
del grep.txt
I've attached an example output file "pages.txt"

and on the other end

Code: Select all

pdftk original.pdf cat %1page% output 1page.pdf
pdftk original.pdf cat %2page% output 2page.pdf
The part that I'm missing is where it takes the contents of pages.txt and passes those contents into the 1page or 2page variable based on whether it equals the previous number +1 or not.

I'm sure that there are much more efficient ways to do this, and I'm very open to suggestions, I just wanted to put out what I've got so far. Any suggestions/revisions(/derision?) would be greatly appreciated.

Re: Splitting PDFs by page count with pdfgrep and pdftk

Posted: 15 Mar 2019 12:57
by iwishiknew
To clarify, the part that I need help with is looping through the pages.txt file created by the first bat file and assigning the contents to the correct variable.

Something along the lines of:
line 1 = i
line 2 = j
If line 2 = i+1 then add i to %1page%
else, add i to %2page%
delete line 1, loop until eof.

I hope this makes more sense. the attached pages.txt file is a good sample of what the data will look like. It will always be numbers between 1 and 400, in ascending order and consisting of numbers either +1 (one page) or +2 (two page).

Thanks!

Re: Splitting PDFs by page count with pdfgrep and pdftk

Posted: 16 Mar 2019 07:32
by penpen
It sounds lie you only are searching for omething lie that:

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
type "pages.txt"
set "_1page="
set "_2page="
<"pages.txt" set /p "line_1="
set /a "next=line_1+1"
:: line_2 == %%~a
for /f "skip=1 usebackq" %%a in ("pages.txt") do (
echo("!line_1!", "!next!", "%%~a"
	if "!next!" == "%%~a" (
		set "_1page=!_1page! !line_1!"
	) else (
		set "_2page=!_2page! !line_1!"
	)
	set /a "next=(line_1=%%~a)+1"
)

echo(_1page="!_1page!"
echo(_2page="!_2page!"
echo(unassigned: !line_1!
goto :eof
penpen

Re: Splitting PDFs by page count with pdfgrep and pdftk

Posted: 18 Mar 2019 08:14
by iwishiknew
Awesome! Thanks that is definitely a step in the right direction. Thank you so much for your help penpen. Something that didn't occur to me until I was looking at the output, currently it's splitting the list correctly, however, it's only getting the first page of the two page letters, is there a way to add the next page as well to all the 2 page letters

Code: Select all

_1page=" 3 6 7 8 9 10 11 12 13 14 15 20 21 22 25 30 31 32 33 38 43 44 45 46 47"
_2page=" 1 4 16 18 23 26 28 34 36 39 41"
IE, in this example, for each number that was added to the 2 page set, the next number needs to be added as well
1 2 4 5 16 17 18 19... and so forth

Thanks!

Re: Splitting PDFs by page count with pdfgrep and pdftk

Posted: 19 Mar 2019 06:51
by penpen
iwishiknew wrote:
18 Mar 2019 08:14
is there a way to add the next page as well to all the 2 page letters
Yes, of course:

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
type "pages.txt"
set "_1page="
set "_2page="
<"pages.txt" set /p "line_1="
set /a "next=line_1+1"
:: line_2 == %%~a
for /f "skip=1 usebackq" %%a in ("pages.txt") do (
	if "!next!" == "%%~a" (
		set "_1page=!_1page! !line_1!"
	) else (
		set "_2page=!_2page! !line_1! !next!"
	)
	set /a "next=(line_1=%%~a)+1"
)

echo(_1page="!_1page!"
echo(_2page="!_2page!"
echo(unassigned: !line_1!
goto :eof
penpen

Re: Splitting PDFs by page count with pdfgrep and pdftk

Posted: 19 Mar 2019 10:12
by iwishiknew
Fantastic, thanks for your help!