Searching html? Can a batch file do this?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#16 Post by Matt20687 » 06 May 2012 14:02

foxidrive wrote:Did you even try the SED example I provided?

I did go to the effort of testing it...

EDIT: Oh, you think SED will use a web URL? No such luck.

You will have to download the page using WGET, if you can, and then pass it to SED.


foxidrive, i am having a problem with sed unfortunately. I have downloaded it but it is still saying sed is an unrecognizable command!

I am just in the process of downloading wget

Squashman
Expert
Posts: 4488
Joined: 23 Dec 2011 13:59

Re: Searching html? Can a batch file do this?

#17 Post by Squashman » 06 May 2012 14:43

Matt20687 wrote:
foxidrive wrote:Did you even try the SED example I provided?

I did go to the effort of testing it...

EDIT: Oh, you think SED will use a web URL? No such luck.

You will have to download the page using WGET, if you can, and then pass it to SED.


foxidrive, i am having a problem with sed unfortunately. I have downloaded it but it is still saying sed is an unrecognizable command!

I am just in the process of downloading wget

SED has to exist in the same folder as your batch file or it has to exist within your PATH variable.

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#18 Post by Matt20687 » 06 May 2012 14:57

Squashman wrote:
Matt20687 wrote:
foxidrive wrote:Did you even try the SED example I provided?

I did go to the effort of testing it...

EDIT: Oh, you think SED will use a web URL? No such luck.

You will have to download the page using WGET, if you can, and then pass it to SED.


foxidrive, i am having a problem with sed unfortunately. I have downloaded it but it is still saying sed is an unrecognizable command!

I am just in the process of downloading wget

SED has to exist in the same folder as your batch file or it has to exist within your PATH variable.


Ah, I did not know that. What is the easiest option in your opinion

Squashman
Expert
Posts: 4488
Joined: 23 Dec 2011 13:59

Re: Searching html? Can a batch file do this?

#19 Post by Squashman » 06 May 2012 15:48

Matt20687 wrote:Ah, I did not know that. What is the easiest option in your opinion

I chose to put all my 3rd party utilities into a specific folder and make sure that folder path exists in my PATH environmental variable.

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Searching html? Can a batch file do this?

#20 Post by foxidrive » 06 May 2012 16:18

Squashman wrote:
Matt20687 wrote:Ah, I did not know that. What is the easiest option in your opinion

I chose to put all my 3rd party utilities into a specific folder and make sure that folder path exists in my PATH environmental variable.


I do too.

You can use something like c:\util and in Windows control panel > system you can change the master environment to add ;c:\util onto the end of the PATH variable (the leading ; semicolon is necessary when it goes on the end).

The alternative is to put all your utilities in c:\windows which should be on the path already, but avoid overwriting any other files in there.

type this at a cmd prompt and you will see the path variable

SET path

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#21 Post by Matt20687 » 08 May 2012 03:49

foxidrive wrote:
Squashman wrote:
Matt20687 wrote:Ah, I did not know that. What is the easiest option in your opinion

I chose to put all my 3rd party utilities into a specific folder and make sure that folder path exists in my PATH environmental variable.


I do too.

You can use something like c:\util and in Windows control panel > system you can change the master environment to add ;c:\util onto the end of the PATH variable (the leading ; semicolon is necessary when it goes on the end).

The alternative is to put all your utilities in c:\windows which should be on the path already, but avoid overwriting any other files in there.

type this at a cmd prompt and you will see the path variable

SET path


Hello,

I have installed SED and WGET. Got it to work but hit a brick wall AGAIN! I would like it to strip the html to a .txt file which i have done but i cannot get it to look/strip each web page and append the .txt file without overwriting it.


for /f "delims=" %%G in (id.txt) do (
wget --http-user=%IMSUSER% --http-password=%IMSPASS% --tries=10 -O C:\Users\MatthewM.MEDICAGROUP\Desktop\UnreportorAssignBatch\ims.txt "http://WEBSITE=%prefix%-%%G&old_patient_id=%prefix%-%%G&op=exams"
)

I know -O will overwrite the file, is there any way i can use --output-document=ims.txt or something or can i literally add >> "ims.txt"

Thanks,
Matt

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#22 Post by Matt20687 » 08 May 2012 03:55

ignore my last post, i have managed to get it to work with:

for /f "delims=" %%G in (id.txt) do (
wget --http-user=%IMSUSER% --http-password=%IMSPASS% --tries=10 -O - >> C:\Users\MatthewM.MEDICAGROUP\Desktop\UnreportorAssignBatch\%ims% "http://WEBSITE=%prefix%-%%G&old_patient_id=%prefix%-%%G&op=exams"
)

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#23 Post by Matt20687 » 08 May 2012 06:59

I am having a bit of an issue getting your code to work foxi. It works wells when i input one ACC such as below. (FYI ims.txt hold all of the html that i have extracted)

set ims=ims.txt

for /f %%a in ('sed -n "s/.*\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\).*ABC123*/\1/p" %ims%') do echo %%a
pause

when i attempt to do it for multiple ACC it returns nothing at all. See below:

set acc=acc.txt
set ims=ims.txt

for /f %%a in ('sed -n "s/.*\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\).*%acc%*/\1/p" %ims%') do echo %%a
pause

Is there anything you can suggest to make this work??

Thanks,
Matt

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#24 Post by Matt20687 » 08 May 2012 08:07

Matt20687 wrote:I am having a bit of an issue getting your code to work foxi. It works wells when i input one ACC such as below. (FYI ims.txt hold all of the html that i have extracted)

set ims=ims.txt

for /f %%a in ('sed -n "s/.*\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\).*ABC123*/\1/p" %ims%') do echo %%a
pause

when i attempt to do it for multiple ACC it returns nothing at all. See below:

set acc=acc.txt
set ims=ims.txt

for /f %%a in ('sed -n "s/.*\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\).*%acc%*/\1/p" %ims%') do echo %%a
pause

Is there anything you can suggest to make this work??

Thanks,
Matt


Right! I have it searching all of the acc numbers. The issue i have is there are a couple of examples where a 7 digit number is occurring just before the number i actually want to retrieve. Are we able to add another rule in? The number i need always has sps_id= before and &unreport after, so the question i have is is there anyway we can limit the search from the whole text file?

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Searching html? Can a batch file do this?

#25 Post by foxidrive » 09 May 2012 00:44

Matt20687 wrote:The issue i have is there are a couple of examples where a 7 digit number is occurring just before the number i actually want to retrieve. Are we able to add another rule in? The number i need always has sps_id= before and &unreport after, so the question i have is is there anyway we can limit the search from the whole text file?


This should cater for that, with those strings.

sed -n "s/.*sps_id=\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\)&unreport.*/\1/p"

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#26 Post by Matt20687 » 09 May 2012 04:12

foxidrive wrote:
Matt20687 wrote:The issue i have is there are a couple of examples where a 7 digit number is occurring just before the number i actually want to retrieve. Are we able to add another rule in? The number i need always has sps_id= before and &unreport after, so the question i have is is there anyway we can limit the search from the whole text file?


This should cater for that, with those strings.

sed -n "s/.*sps_id=\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\)&unreport.*/\1/p"


Doesnt seem to work :(

for /f %%a in ('sed -n "s/.*sps_id=\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\)&unreport.*/\1/p" %ims%') do echo %%a
pause

It is not returning anything at all, it is also not returning any errors when running it. Any ideas foxi?

Thanks,
Matt

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#27 Post by Matt20687 » 09 May 2012 05:12

Matt20687 wrote:
foxidrive wrote:
Matt20687 wrote:The issue i have is there are a couple of examples where a 7 digit number is occurring just before the number i actually want to retrieve. Are we able to add another rule in? The number i need always has sps_id= before and &unreport after, so the question i have is is there anyway we can limit the search from the whole text file?


This should cater for that, with those strings.

sed -n "s/.*sps_id=\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\)&unreport.*/\1/p"


Doesnt seem to work :(

for /f %%a in ('sed -n "s/.*sps_id=\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\)&unreport.*/\1/p" %ims%') do echo %%a
pause

It is not returning anything at all, it is also not returning any errors when running it. Any ideas foxi?

Thanks,
Matt



Hello,

Ignore this, i realised the information i provided wasnt accurate. I have tweaked it and it now finds the sps number i wanted. The only issue i have is that i need the sps_id that is on the same line as the booking number. An example of what you might see in the ims.txt is:

</script><h2>Book</h2><b>Procedures for NAME</b><br><b>PREFIX-ID</b><p><table class="infotab" cellpadding="4" cellspacing="1"><tr><th>Procedure ID</th><th>Scheduled Date</th><th>Procedure Code</th><th>Procedure Room</th><th>Modality</th><th>Consultant</th><th>Requester</th><th>Referrer</th><th>Report State</th><th>Images</th></tr><tr class="c1" onClick="ii0('op=editexam&patient_id=PREFIX-ID&sps_id=1234567');"><td>ABC1234567</td><td>Oct 31 2011 3:09PM</td><td>US Abdomen</td><td></td><td>US</td><td></td><td> </td><td></td><td>Do not report</td><td>11</td></tr><tr class="c2" onClick="ii0('op=editexam&patient_id=PREFIX-ID&sps_id=2345678');"><td>ABC2346547</td><td>Jan 6 2012 12:43PM</td><td>MR Derriford</td><td></td><td>MR</td><td></td><td> </td><td></td><td>Do not report</td><td>104</td></tr><tr class="c1" onClick="ii0('op=editexam&patient_id=PREFIX-ID&sps_id=3456789');"><td>BCA1234567

The code i am using is:

set acc=acc.txt
set ims=ims.txt
set "sps=');"

for /f %%a in ('sed -n "s/.*sps_id=\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\)%sps%.*/\1/p" %ims%') do echo %%a
pause

I need to be able to have the above code work as it is but also look at the numbers i have stored within acc.txt. In there you would have an ID as

ABC1234567

I have highlighted above where this occurs in ims.txt. So in this instance i would like the code to do what it does looking between sps_id= and %sps% and then returning the 7 digit number but it needs to be able to check it is returning the sps_id number which has ABC1234567 in the same line.

With the code as it is there is no definition as to which one it will take, there is a possiblilty that i could return:

2345678

This has the ID ABC2346547 on the same line.

Does this make sense foxi or anyone?

Thanks,
Matt

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Searching html? Can a batch file do this?

#28 Post by foxidrive » 09 May 2012 17:56

Matt20687 wrote:
patient_id=PREFIX-ID&sps_id=1234567');"><td>ABC1234567</td>

The code i am using is:

set acc=acc.txt
set ims=ims.txt
set "sps=');"

for /f %%a in ('sed -n "s/.*sps_id=\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\)%sps%.*Do not report.*/\1/p" %ims%') do echo %%a
pause

With the code as it is there is no definition as to which one it will take, there is a possiblilty that i could return:

2345678

This has the ID ABC2346547 on the same line.



Do you mean you have two instances of the match in the one line and you want to first instance?

Try it with the addition above. It will rely on the format being as you have listed it above in the example.

Or

for /f %%a in ('sed -n "s/.*sps_id=\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\)%sps%.*PREFIX-ID.*/\1/p" %ims%') do echo %%a

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#29 Post by Matt20687 » 10 May 2012 07:27

foxidrive wrote:
Matt20687 wrote:
patient_id=PREFIX-ID&sps_id=1234567');"><td>ABC1234567</td>

The code i am using is:

set acc=acc.txt
set ims=ims.txt
set "sps=');"

for /f %%a in ('sed -n "s/.*sps_id=\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\)%sps%.*Do not report.*/\1/p" %ims%') do echo %%a
pause

With the code as it is there is no definition as to which one it will take, there is a possiblilty that i could return:

2345678

This has the ID ABC2346547 on the same line.



Do you mean you have two instances of the match in the one line and you want to first instance?

Try it with the addition above. It will rely on the format being as you have listed it above in the example.

Or

for /f %%a in ('sed -n "s/.*sps_id=\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\)%sps%.*PREFIX-ID.*/\1/p" %ims%') do echo %%a


Hello,

There could be anywhere from 1 to 20 instances within one set of html. What i need it to do is to recognise the number in acc.txt and then give me the sps id which is 7 digits long inbetween sps_id= and the variable %sps% (set "sps=');").

So if i had ABC1234567 as the number in acc.txt i would want it to return the sps_id number 1234567. At the moment it is not referring to the file name acc.txt at all and just returning any 7 digit number within ims.txt (the long html i provided earlier) thats is inbetween sps_id= and the variable %sps% (set "sps=');").

Matt20687
Posts: 54
Joined: 02 May 2012 14:42

Re: Searching html? Can a batch file do this?

#30 Post by Matt20687 » 10 May 2012 09:07

Matt20687 wrote:
foxidrive wrote:
Matt20687 wrote:
patient_id=PREFIX-ID&sps_id=1234567');"><td>ABC1234567</td>

The code i am using is:

set acc=acc.txt
set ims=ims.txt
set "sps=');"

for /f %%a in ('sed -n "s/.*sps_id=\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\)%sps%.*Do not report.*/\1/p" %ims%') do echo %%a
pause

With the code as it is there is no definition as to which one it will take, there is a possiblilty that i could return:

2345678

This has the ID ABC2346547 on the same line.



Do you mean you have two instances of the match in the one line and you want to first instance?

Try it with the addition above. It will rely on the format being as you have listed it above in the example.

Or

for /f %%a in ('sed -n "s/.*sps_id=\([0-9][0-9][0-9][0-9][0-9][0-9][0-9]\)%sps%.*PREFIX-ID.*/\1/p" %ims%') do echo %%a


Hello,

There could be anywhere from 1 to 20 instances within one set of html. What i need it to do is to recognise the number in acc.txt and then give me the sps id which is 7 digits long inbetween sps_id= and the variable %sps% (set "sps=');").

So if i had ABC1234567 as the number in acc.txt i would want it to return the sps_id number 1234567. At the moment it is not referring to the file name acc.txt at all and just returning any 7 digit number within ims.txt (the long html i provided earlier) thats is inbetween sps_id= and the variable %sps% (set "sps=');").


Hello Again,

I have been testing a few things and i have found that the sed commands is not actually refering to my variable at all! I deleted the variable set "sps=');" which is defined at the start of my script and it still returned the same values although %sps% had nothing to refer to!

Post Reply