using for loop and delims to extract a substring

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
zedude
Posts: 3
Joined: 25 Apr 2019 16:04

using for loop and delims to extract a substring

#1 Post by zedude » 25 Apr 2019 16:24

Hey guys,

I know this question has been asked several times but even when digging into different posts, i'm not able to apply answer to my specific question (and i certainly am not skilled enough with dos scripting as a matter of fact :D )

So, here we go

Let's say i have a variable called $html which contains this :
<source src="http://videos.dummysite.com/videos/video01.mp4" type="video/mp4"/>

Now, what i would like to do is to extract just the URL between the 1st 2 quotes so that in the end, i'm able to retrieve just this :
http://videos.dummysite.com/videos/video01.mp4

(note that it could be another video format than mp4, hence why i'm talking about extract the address between the 1st 2 quotes (not the quotes after the tag "type")

Does that make any sense to you guys ?
How can i do that ?

aGerman
Expert
Posts: 3669
Joined: 22 Jan 2010 18:01
Location: Germany

Re: using for loop and delims to extract a substring

#2 Post by aGerman » 26 Apr 2019 08:45

Give that a go:

Code: Select all

@echo off &setlocal
set "$html=<source src="http://videos.dummysite.com/videos/video01.mp4" type="video/mp4"/>"
for /f tokens^=2^ delims^=^" %%i in ("%$html%") do echo %%i
pause
Steffen

Aacini
Expert
Posts: 1611
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: using for loop and delims to extract a substring

#3 Post by Aacini » 26 Apr 2019 09:43

Code: Select all

@echo off
setlocal

set "$html=<source src="http://videos.dummysite.com/videos/video01.mp4" type="video/mp4"/>"

set "x=%$html: =" & set "%"

echo %src:~1,-1%
Antonio

penpen
Expert
Posts: 1712
Joined: 23 Jun 2013 06:15
Location: Germany

Re: using for loop and delims to extract a substring

#4 Post by penpen » 26 Apr 2019 23:27

Code: Select all

@echo off
setlocal
set "$html=<source src="http://videos.dummysite.com/videos/video01.mp4" type="video/mp4"/>"

set %$html: type=<%nul <nul
echo(%src%
penpen

zedude
Posts: 3
Joined: 25 Apr 2019 16:04

Re: using for loop and delims to extract a substring

#5 Post by zedude » 30 Apr 2019 06:28

Thanks a lot for all those reply

I realize that, just posting a question without giving the full picture doesn'l allow me to fully use your answers
I apology for that
So, here are the details.

My full script has the following purpose : browse through an Ascii file containing a list of html page / downloading those page / searching for the unique pattern "source src=" which gives me the URL of a video, and then, downloading the video itself

For the moment, my script looks like this :

Code: Select all

set /a PAGE=80
set /a NBVID=1
setlocal ENABLEDELAYEDEXPANSION

for /F "tokens=*" %%a in (Pages.txt) do (
	curl -o !PAGE!-!NBVID!.html %%a --cacert cacert.pem -b cookies.txt --silent

	for /f "tokens=3 delims== " %%b in ('findstr /c:"source src=" !PAGE!-!NBVID!.html') do (
		curl -o !PAGE!-!NBVID!.mp4 %%~b --cacert cacert.pem -b cookies.txt --silent
As you can see, the downloaded page is locally named after a naming convention Page-Nbvid.html (and so is the final video name)

My actual problem, after fixing the extract of the name (for which i posted this question) is that if there's one or more spaces inside the name of the video, the tokens and delim parameters are not able to retrieve the full name of the video, but it stops after the 1st space

Any idea how to mitigate this ?

zedude
Posts: 3
Joined: 25 Apr 2019 16:04

Re: using for loop and delims to extract a substring

#6 Post by zedude » 02 May 2019 01:26

Anyone ?

penpen
Expert
Posts: 1712
Joined: 23 Jun 2013 06:15
Location: Germany

Re: using for loop and delims to extract a substring

#7 Post by penpen » 02 May 2019 16:52

There are multiple issues here. For example lines of html files may not be limited to 8190 characters, and in most cases are utf-8 encoded, ... .
You better might want to use text extracting software such as for example dave benham's "JREPL.BAT":
viewtopic.php?f=3&t=6044
(An example is given on the linked page just search for "extracting between html tags".)


penpen

Post Reply