Converting PDF to Text or Excel

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Eduard14
Posts: 2
Joined: 22 Jul 2015 15:56

Converting PDF to Text or Excel

#1 Post by Eduard14 » 22 Jul 2015 16:05

Hi everyone, I have a bunch of PDF files in a folder and more keep coming in and I need them to be converted into preferably Excel format but Text works too. Now I heard of AHK to be in a loop and opens the .bat file but that's as far as I got. I want to know if it is possible to convert a PDF using a batch file. Whenever I try googling it I just get the Adobe acrobat Pro batch system in the search. I need them in a readable format for MS Access to analyze the data inside. Would it just be save as a different extension? Any thoughts help!
Thanks,
Eduard

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Converting PDF to Text or Excel

#2 Post by foxidrive » 22 Jul 2015 21:58

PDF files can have the data internally as images or plain text etc
and the actual layout of the data is part of the scenario here.

Plain batch code can't extract data from PDF files but there are utilities that are designed to convert PDF format to other file types, and which can be scripted in a batch file.

Without samples of the files and corresponding samples of what format you need the output files in,
then it's not really possible to understand the exact requirement of the task.

npocmaka_
Posts: 517
Joined: 24 Jun 2013 17:10
Location: Bulgaria
Contact:

Re: Converting PDF to Text or Excel

#3 Post by npocmaka_ » 23 Jul 2015 03:21

At the moment I have no installed word but I think the last two versions of word are able to open a PDF file.
here's one my (pretty) old script that converts word to text and I think it can be used in this case - viewtopic.php?f=3&t=4755 .
Here's updated version that should allow saving a document as docx:

Code: Select all

'>nul 2>&1|| @copy /Y %windir%\System32\doskey.exe '.exe >nul
'&&@echo off && cls &&goto :end_vbs
 
Set WordApp = CreateObject("Word.Application")
WordApp.Visible = FALSE

'Open doc for reading
Set WordDoc = WordApp.Documents.Open(WScript.Arguments.Item(0),true)

'wdFormatText 2
'wdFormatUnicodeText 7
format = CInt(WScript.Arguments.Item(2) )
WordDoc.SaveAs WScript.Arguments.Item(1) ,format
WordDoc.Close()
WScript.Quit

:end_vbs

'& if "%~1" equ "-help" echo %~n0 word_document [ destination [-unuicode]|[-docx] ]
'& if "%~1" equ "" echo word document not given & exit /b 1
'& if not exist "%~f1" echo word document does not exist & exit /b 2
'& if "%~2" equ "" ( set "save_as=%~n1.txt") else ( set "save_as=%~2")
'& if  exist "%~f2" del /s /q "%~f2"
'& if /i "%~3" equ "-unuicode" ( set "format=7") else ( set "format=2")
'&  if /i "%~3" equ "-docx" ( set "format=16")
'& taskkill /im winword* /f >nul 2>&1
'& cscript /nologo /E:vbscript %~f0 "%~f1" "%save_as%" %format%
'& pause
'& rem del /q %windir%\System32\'.exe


and it should be called like:
doctool.bat "some.pdf" "savedAs.docx" -docx


though I cant test it....

Eduard14
Posts: 2
Joined: 22 Jul 2015 15:56

Re: Converting PDF to Text or Excel

#4 Post by Eduard14 » 23 Jul 2015 08:33

foxidrive wrote:PDF files can have the data internally as images or plain text etc
and the actual layout of the data is part of the scenario here.

Plain batch code can't extract data from PDF files but there are utilities that are designed to convert PDF format to other file types, and which can be scripted in a batch file.

Without samples of the files and corresponding samples of what format you need the output files in,
then it's not really possible to understand the exact requirement of the task.


So if I have adobe acrobat pro, then a bat file can convert all the PDFs automatically using Adobe?

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Converting PDF to Text or Excel

#5 Post by foxidrive » 23 Jul 2015 08:39

Eduard14 wrote:So if I have adobe acrobat pro, then a bat file can convert all the PDFs automatically using Adobe?


If it has a command line feature then it can be scripted, but you will first have to investigate to see if it will convert your PDF files in the way you want.

The script that npocmaka_ posted uses Microsoft Word. Perhaps you can test that script.

Post Reply