Page 1 of 1

using for loop and delims to extract a substring

Posted: 25 Apr 2019 16:24
by zedude
Hey guys,

I know this question has been asked several times but even when digging into different posts, i'm not able to apply answer to my specific question (and i certainly am not skilled enough with dos scripting as a matter of fact :D )

So, here we go

Let's say i have a variable called $html which contains this :
<source src="http://videos.dummysite.com/videos/video01.mp4" type="video/mp4"/>

Now, what i would like to do is to extract just the URL between the 1st 2 quotes so that in the end, i'm able to retrieve just this :
http://videos.dummysite.com/videos/video01.mp4

(note that it could be another video format than mp4, hence why i'm talking about extract the address between the 1st 2 quotes (not the quotes after the tag "type")

Does that make any sense to you guys ?
How can i do that ?

Re: using for loop and delims to extract a substring

Posted: 26 Apr 2019 08:45
by aGerman
Give that a go:

Code: Select all

@echo off &setlocal
set "$html=<source src="http://videos.dummysite.com/videos/video01.mp4" type="video/mp4"/>"
for /f tokens^=2^ delims^=^" %%i in ("%$html%") do echo %%i
pause
Steffen

Re: using for loop and delims to extract a substring

Posted: 26 Apr 2019 09:43
by Aacini

Code: Select all

@echo off
setlocal

set "$html=<source src="http://videos.dummysite.com/videos/video01.mp4" type="video/mp4"/>"

set "x=%$html: =" & set "%"

echo %src:~1,-1%
Antonio

Re: using for loop and delims to extract a substring

Posted: 26 Apr 2019 23:27
by penpen

Code: Select all

@echo off
setlocal
set "$html=<source src="http://videos.dummysite.com/videos/video01.mp4" type="video/mp4"/>"

set %$html: type=<%nul <nul
echo(%src%
penpen

Re: using for loop and delims to extract a substring

Posted: 30 Apr 2019 06:28
by zedude
Thanks a lot for all those reply

I realize that, just posting a question without giving the full picture doesn'l allow me to fully use your answers
I apology for that
So, here are the details.

My full script has the following purpose : browse through an Ascii file containing a list of html page / downloading those page / searching for the unique pattern "source src=" which gives me the URL of a video, and then, downloading the video itself

For the moment, my script looks like this :

Code: Select all

set /a PAGE=80
set /a NBVID=1
setlocal ENABLEDELAYEDEXPANSION

for /F "tokens=*" %%a in (Pages.txt) do (
	curl -o !PAGE!-!NBVID!.html %%a --cacert cacert.pem -b cookies.txt --silent

	for /f "tokens=3 delims== " %%b in ('findstr /c:"source src=" !PAGE!-!NBVID!.html') do (
		curl -o !PAGE!-!NBVID!.mp4 %%~b --cacert cacert.pem -b cookies.txt --silent
As you can see, the downloaded page is locally named after a naming convention Page-Nbvid.html (and so is the final video name)

My actual problem, after fixing the extract of the name (for which i posted this question) is that if there's one or more spaces inside the name of the video, the tokens and delim parameters are not able to retrieve the full name of the video, but it stops after the 1st space

Any idea how to mitigate this ?

Re: using for loop and delims to extract a substring

Posted: 02 May 2019 01:26
by zedude
Anyone ?

Re: using for loop and delims to extract a substring

Posted: 02 May 2019 16:52
by penpen
There are multiple issues here. For example lines of html files may not be limited to 8190 characters, and in most cases are utf-8 encoded, ... .
You better might want to use text extracting software such as for example dave benham's "JREPL.BAT":
viewtopic.php?f=3&t=6044
(An example is given on the linked page just search for "extracting between html tags".)


penpen