Page 1 of 1

Extract data from website (another problem)

Posted: 24 Apr 2018 04:52
by miskox
Hello!

Here viewtopic.php?f=3&t=7852#p52292 Hackoo provided very useful tool to extract links from a website.

I have this html (107 of them) 1200.html - see attached .zip (it is from lenovo.com).

Hackoo's solution gives me this:

Code: Select all

http://support.lenovo.com/en_US/downloads/detail.page?DocID=DS002917 ========> ... Learn more
../../../ibmdl/pub/pc/pccbbs/mobiles/tpafkq98.exe ========>  version 5.12.4028 - Audio driver for Windows 98
../../../ibmdl/pub/pc/pccbbs/mobiles/tpafkq98.txt ========>  Read me
http://support.lenovo.com/en_US/downloads/detail.page?DocID=DS001350 ========> ... Learn more
../../../ibmdl/pub/pc/pccbbs/mobiles/tpaf152k.exe ========>  version 5.12.01.4031 - Audio Features III for Windows 2000
../../../ibmdl/pub/pc/pccbbs/mobiles/tpaf152k.txt ========>  Read me
http://support.lenovo.com/en_US/downloads/detail.page?DocID=DS003130 ========> ... Learn more
../../../ibmdl/pub/pc/pccbbs/mobiles/aftpkw8m.exe ========>  version 5.12.01.4028 - Audio Features III for Windows 98/Me
../../../ibmdl/pub/pc/pccbbs/mobiles/aftpkw8m.txt ========>  Read me
http://support.lenovo.com/en_US/downloads/detail.page?DocID=DS003688 ========> ... Learn more
../../../ibmdl/pub/pc/pccbbs/mobiles/tpafkw2k.exe ========>  Version 5.12.01.4031 - Audio features IV for Windows 2000
../../../ibmdl/pub/pc/pccbbs/mobiles/tpafkw2k.txt ========>  Read me
I need more info:

This .html has an option to display just information that is related to the selected category. So this .html has 'audio', 'bios', 'cd and dvd drive', 'diskette drive'... categories which I would also need. Together with the 'operating system' column. Of course all in a way that I know how to put all the data together.

See attached image.

Hope this makes sense.
Thanks.
Saso

Re: Extract data from website (another problem)

Posted: 25 Apr 2018 14:00
by miskox
I checked .html file:

looks like I need these:

- category is between

Code: Select all

id='table1' name='
and

Code: Select all

'><thead>
- operating systems supported are between

Code: Select all

<br><br></td><td>
and

Code: Select all

<br></td><td>
If there is more than one they are separated by

Code: Select all

<br>
Well, I don't know how to get this information.

Anyone?

Saso