Searching html? Can a batch file do this?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Searching html? Can a batch file do this?

#1 Post by Matt20687 » 05 May 2012 02:23

Hello,

I am working on a batch file which is being a right pain in the rear end. The batch file reads the ID from a csv (soon to be ID's hopefully) and bungs it into the necessary places within a web addy to load in google chrome. Not a problem!

The next step i have is to be able to get onto the second/more detailed page. To do this i need to search for a set string of text within the html and then return a 7 digit number. I can see two set bits of text which are common among all of the html no matter what ID i am searching. The bits in colour below are the same each time. The bits in bold are the bits that are variables within my script already.

onClick="ii0('op=editexam&patient_id=PREFIX-ID&sps_id=5017671');"><td>BOOKING NUMBER

Is there anyway i can have the batch script look for the booking number which is a variable and count back 16 and return the 7 digit number?

Thanks,
Matt

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Searching html? Can a batch file do this?

#2 Post by foxidrive » 05 May 2012 05:41

If there is only one number of greater than 6 digits in the same line as "BOOKING NUMBER" then the GnuSED command below can help.

Code: Select all

@echo off
for /f %%a in ('sed -n "s/.*\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\).*BOOKING NUMBER.*/\1/p" "file.html"') do echo %%a
pause

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#3 Post by Matt20687 » 05 May 2012 05:54

Hello,

Looks good (i think :S). I may be being stupid but what do i replace 'sed with? It is saying it is an unknown command so i assume i am meant to be changing it

Squashman
Expert
Posts: 4471
Joined: 23 Dec 2011 13:59

Re: Searching html? Can a batch file do this?

#4 Post by Squashman » 05 May 2012 07:48

You need to download a version of SED that works with the windows cmd shell.
Although batch might be able to do this SED is a much better alternative.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Searching html? Can a batch file do this?

#5 Post by dbenham » 05 May 2012 08:50

Here is a pure batch solution.

1) Use FINDSTR to find the line with the matching %bookingNumber% that is prefixed with ');"><td>. The quote and escape rules are a bit complicated for the search string because the command parser and FINDSTR each have their own rules. The FINDSTR needs the double quote escaped with \. The command line needs the angle brackets either escaped with ^ or quoted.

2) Use FOR to read the results of FINDSTR into a variable. The quote and escape rules are further complicated because of an extra layer of parsing. Putting the entire search string in a delayed expansion variable was the simplest way I could figure it out. Another alternative would be to capture the FINDSTR results in a temp file and then use FOR /F to read the temp file.

3) Use variable "search and replace leading part of string" operation to chop off beginning of string through &sps_id. Unfortunately we cannot search and replace the = sign.

4) Use variable substring operation to extract the 7 digit number, starting with the 2nd character - the 1st character is the = sign.

Code: Select all

@echo off
setlocal enableDelayedExpansion

::Presumably your code sets a bookingNumber variable somehow
::My test case uses "123abc"
set "bookingNumber=123abc"

set searchStr=^^^"');\"><td>%bookingNumber%"
for /f "delims=" %%A in ('findstr /l /c:!searchStr! test.txt') do set ln=%%A
set "ln=!ln:*&sps_id=!"
set "sps_id=!ln:~1,7!"
set sps_id

My test file was

Code: Select all

onClick="ii0('op=editexam&patient_id=PREFIX-ID&sps_id=5017671');"><td>123abc
onClick="ii0('op=editexam&patient_id=PREFIX-ID&sps_id=9999999');"><td>123xyz
onClick="ii0('op=editexam&patient_id=PREFIX-ID&sps_id=0000000')123abc

The search for the booking number might be more reliable if there exists some constant text after the booking number that can be included in the search. I worry that a booking number of "123" would falsely match '123abc".


Dave Benham

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#6 Post by Matt20687 » 05 May 2012 09:54

Thanks for this Dave, looks good.

In your example you have the findstr looks at a file rather than a web page. Is this possible?

abc0502
Posts: 1007
Joined: 26 Oct 2011 22:38
Location: Egypt

Re: Searching html? Can a batch file do this?

#7 Post by abc0502 » 05 May 2012 10:04

hi matt text file or html file as long it can be opened and read the content with notepad u can run the batch

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#8 Post by Matt20687 » 05 May 2012 10:13

I am really struggling with this one. Can someone explain this code for me so i can try and understand what is going on as i still cannot get it to work!

I am confused with the different variables and what they are actually referring to

set "ln=!ln:*&sps_id=!"
set "sps_id=!ln:~1,7!"
set sps_id

sorry to be a pain everyone

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Searching html? Can a batch file do this?

#9 Post by dbenham » 05 May 2012 10:51

I already tried to explain the code in my post. The mechanics of all three lines are explained in the on-line help: type HELP SET or SET /? to get the documentation.

set "ln=!ln:*&sps_id=!" is step 3) from my post: It finds "&sps_id" and replaces it and everything before it with nothing.

set "sps_id=!ln:~1,7!" is step 4) from my post: It takes the length 7 substring starting with the 2nd character.

Both of the above lines use delayed expansion, which I enabled at the top of the script with setlocal enableDelayedExpansion.

set sps_id simply displays the result.


Dave Benham

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#10 Post by Matt20687 » 05 May 2012 13:40

hello, thanks for clearing this up. I have got closer now, the search function isnt working quite right. It isnt returning the 7 digit number, instead its returning &SPS_ID. Any reason why this is? My code is below

:next

set /p uacc=<unreportacc.csv

set searchStr=^^^"');\"><td>%uacc%"
for /f "delims=" %%A in ('findstr /l /c:!searchStr! "http:Website?patient_id=%prefix%-%uid%&old_patient_id=%prefix%-%uid%&op=exams"') do set ln=%%A

set "ln=!ln:*&sps_id=!"
set "sps_id=!ln:~1,7!"
set sps_id

start chrome.exe "http://Website?http://ims/booking.php?patient_id=%prefix%-%uid%&old_patient_id=%prefix%-%uid%&op=editexam&patient_id=%prefix%-%uid%&sps_id=%sps_id%"

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#11 Post by Matt20687 » 05 May 2012 14:04

P.S. I tried your script on its own with html copied and pasted into a notepad. Outputs the correct number everytime!!!

There has to something wrong with my code. I believe it is the following bit, i have just found it is refusing to open the page

set searchStr=^^^"');\"><td>%uacc%"
for /f "delims=" %%A in ('findstr /l /c:!searchStr! "http://ims/booking.php?patient_id=%prefix%-%uid%&old_patient_id=%prefix%-%uid%&op=exams"') do set ln=%%A

how should the web addy be laid out? I assume this is not correct?

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Searching html? Can a batch file do this?

#12 Post by dbenham » 05 May 2012 14:38

Batch cannot access Web content directly. You will need a 3rd party tool to capture the content into a file. I've never done this type of thing, but Gnu wget seems to be the most popular.

While you are at it, you can download sed as well. You can much more easily be precise with your file processing using sed than you can with pure native batch commands.

Both wget and sed are available at http://sourceforge.net/projects/gnuwin32/files/

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#13 Post by Matt20687 » 05 May 2012 15:50

Thanks for that, it makes sense!!

I am downloading it now. One last question (I HOPE!) What would the variable be for the end of the web addy below. This is the one that needs the sps_id at the end.

start chrome.exe "http://Website?http://ims/booking.php?patient_id=%prefix%-%uid%&old_patient_id=%prefix%-%uid%&op=editexam&patient_id=%prefix%-%uid%&sps_id=%sps_id%"

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#14 Post by Matt20687 » 05 May 2012 16:59

dbenham wrote:Batch cannot access Web content directly. You will need a 3rd party tool to capture the content into a file. I've never done this type of thing, but Gnu wget seems to be the most popular.

While you are at it, you can download sed as well. You can much more easily be precise with your file processing using sed than you can with pure native batch commands.

Both wget and sed are available at http://sourceforge.net/projects/gnuwin32/files/


Thanks for this info, i have downloaded sed and created the following code which does pull the html out but i cannot get it to export it to a text file. Is there an easy way of doing this?

sed -n '/<body>/,/<\/body>/p' "http://WEBSITE=%prefix%-%uid%&old_patient_id=%prefix%-%uid%&op=exams" | sed -e '1s/.*<body>/<body>/' -e '$s/<\/body>.*/<\/body>/'

I am really sorry for all of the questions, i am fairly new to this and your help is much appreciated!

Thanks again,
Matt

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Searching html? Can a batch file do this?

#15 Post by foxidrive » 05 May 2012 17:41

Did you even try the SED example I provided?

I did go to the effort of testing it...

EDIT: Oh, you think SED will use a web URL? No such luck.

You will have to download the page using WGET, if you can, and then pass it to SED.

Post Reply