Extract information from a website to txt

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Tami
Posts: 10
Joined: 31 Mar 2017 11:01

Extract information from a website to txt

#1 Post by Tami » 03 Apr 2017 08:46

Hello,

i have a big request and im not good in batch scripting at all...
So, here's my problem:
The cmd should start and should ask for a link (Set /P variable=Paste Link:)
I want to extract the title, the Episodes,etc. basically everything from the link. (The site should be https://www.anisearch.de/anime/9357,tokyo-ghoul )
How do i do that? I know that is possible with Python, but i dont understand python at all, too...

Thanks!

ShadowThief
Expert
Posts: 1163
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Extract information from a website to txt

#2 Post by ShadowThief » 03 Apr 2017 11:07

Is there an API you can call instead? I had to do something like this one with TheTVDB and calling the API was easier than grabbing the entire page with curl and parsing it.

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Extract information from a website to txt

#3 Post by aGerman » 03 Apr 2017 11:13

The web site doesn't seem to have an API. To avoid the installation of any kind of 3rd party utilities I'd suggest to automate Internet Explorer using VBScript or JScript.
http://stackoverflow.com/questions/16629228/extract-text-between-html-tags

Steffen

Tami
Posts: 10
Joined: 31 Mar 2017 11:01

Re: Extract information from a website to txt

#4 Post by Tami » 03 Apr 2017 13:16

Could you tell me how to do that or do you have a tutorial anywhere? :/

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Extract information from a website to txt

#5 Post by aGerman » 03 Apr 2017 14:53

That's quite a lot you have to learn. If you want to use it in a Batch code you have to learn Batch and how to write hybrid scripts. For the hybrid scripts you may use JScript and thus, you have to learn JScrip. If you want to automate the Internet Explorer you have to learn how to create a InternetExplorer object and how to use its properties and methods. Last but not least if you want to access elements of an HTML source text you have to learn how to work with the HTML document object model using JScript.
This having said you can imagine that there isn't simply a single tutorial that teaches you everything at once.

You want to see an example? Here you are

Code: Select all

@if (@a)==(@b) @end /* Batch part:

@echo off &setlocal
for /f "delims=" %%i in ('cscript //nologo //e:jscript "%~fs0"') do set "name=%%i"
echo %name%
pause
exit /b


JScript Part : */

var ie = null;
try {
  ie = new ActiveXObject('InternetExplorer.Application');
  ie.Navigate('https://www.anisearch.de/anime/9357,tokyo-ghoul');
  while (ie.Busy) { WScript.Sleep(100); }
  var name = ie.document.getElementById('content').getElementsByTagName('header')[0].getElementsByTagName('div')[0].getElementsByTagName('h1')[0].getElementsByTagName('a')[1].getElementsByTagName('span')[0].innerText;
  ie.Quit();
  ie = null;
  WScript.Echo(name);
}
catch(e) {
  if (ie != null) { ie.Quit(); }
  WScript.Echo('Error!');
}


Steffen

igor_andreev
Posts: 16
Joined: 25 Feb 2017 12:55
Location: Russia

Re: Extract information from a website to txt

#6 Post by igor_andreev » 03 Apr 2017 19:25

Trivial job for external tools.
curl(wget) host | grep regex | sed regex

PaperTronics
Posts: 118
Joined: 02 Apr 2017 06:11

Re: Extract information from a website to txt

#7 Post by PaperTronics » 03 Apr 2017 21:47

The simple solution is: When the user gives that link, you add "view-source:" to the beginning of the link and then download that file using download.exe which can be found here: http://www.f2ko.de/en/cmd.php
After downloading the HTML code of the website you can extract the episodes, titles etc. whatever you want from that code.
If you don't have HTML knowledge tell me I'll extract the titles etc. for you.

Tami
Posts: 10
Joined: 31 Mar 2017 11:01

Re: Extract information from a website to txt

#8 Post by Tami » 04 Apr 2017 05:49

I asked them for an API and they will help me^^ If i get the API, would you help me making the tool? :)

npocmaka_
Posts: 512
Joined: 24 Jun 2013 17:10
Location: Bulgaria
Contact:

Re: Extract information from a website to txt

#9 Post by npocmaka_ » 04 Apr 2017 08:33

You can try with winhttpjs.bat:



Code: Select all

call winhttpjs.bat "https://www.anisearch.de/anime/9357,tokyo-ghoul" -saveto tokyo-ghoul.txt


Though it will not render the javascript. Also the downloaded file probably will be not well-formatted xml and you wont be able to process it with an xml tool.
Probably the aGerman's option is the best you have.

You can take look also to phantomjs if the pages are not compatible with IE.

ShadowThief
Expert
Posts: 1163
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Extract information from a website to txt

#10 Post by ShadowThief » 04 Apr 2017 09:06

Tami wrote:I asked them for an API and they will help me^^ If i get the API, would you help me making the tool? :)

Honestly, the API will make it so easy that you'll likely be able to figure it out yourself, but sure.

Post Reply