HTML file parsing with batch

Message

miskox · #1 Post by **miskox** » 16 Apr 2012 05:25

I did some testing (and with some success) of HTML parsing with batch program.

An idea was to first dump (based on this viewtopic.php?p=14361#p14361 and this viewtopic.php?p=14704#p14704) the html file and then read byte by byte and do whatever we want to do.

I needed the info between

Code: Select all

<A

and

Code: Select all

</A

To make things a little bit easier I used gsar.exe (g n u w i n 3 2 .sourceforge.net/packages/gsar.htm) and I replaced <A with 0x1 and </A with 0x2. In this way I read record by record and when I reached 0x1 and 0x2 I knew what to do and there was no need to make additional checks for the next one (two) record(s).

Yes, it is slow but I works.

Of course this is just an idea with limited implementation, maybe somebody would make a science out of it.

Saso

P.S. This post w w w .dostips.com/forum/viewtopic.php?f=3&t=3210 would help make this batch program run faster.

!k · #2 Post by !k » 16 Apr 2012 08:34

Code: Select all

sed -r -n "/<[Aa]\s/{s/.*<[Aa]\s//;s/<\/[Aa]>.*//;p}" index.htm

miskox · #3 Post by **miskox** » 16 Apr 2012 09:05

!k wrote:

Code: Select all

sed -r -n "/<[Aa]\s/{s/.*<[Aa]\s//;s/<\/[Aa]>.*//;p}" index.htm

I don't have sed installed so please translate this to English

.

My original post lacks some information: the complete info that *is* between <A and </A is not what I need. It is there but there are some other factors to check and to get just a little information.

Thanks,
Saso

DosTips.com

HTML file parsing with batch

HTML file parsing with batch

Re: HTML file parsing with batch

Re: HTML file parsing with batch