Escaping an ampersand character

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
pmennen
Posts: 18
Joined: 15 Jul 2011 02:10

Escaping an ampersand character

#1 Post by pmennen » 31 Dec 2015 21:08

I created a simple batch file that recurses thru all subfolders while creating a listing of all the video files found during the search, including the most important video parameters. These parameters are extracted from the output of ffprobe.exe which was designed to display such information about video files. A sample output from my batch file is as follows:

Code: Select all

00:27:34.45  1616 kbs  mpeg4  708x480    334 Mb  ---  Alexander the Great.avi
00:03:03.00   264 kbs  wmv3   320x240      6 Mb  ---  Apples
00:57:31.68   605 kbs  h264   640x360    261 Mb  ---  Europe.mp4
00:00:01.00  9213 kbs  msvid  648x520      1 Mb  ---  H21 similarity.avi
00:08:49.67   128 kbs  h264  1000x562      9 Mb  ---  Relational model.mp4
00:04:45.50   304 kbs  flv1   320x240     11 Mb  ---  Robot Chicken.flv

But in fact this sample output demonstrates the one bug I found in my batch file. The file name "Apples" at the end of the 2nd line is really supposed to be "Apples & Oranges.avi", although the first part of that line correctly specifies the video duration and other parameters. I think the cause of this bug is the fact that sed treats the ampersand as a special character which is used to represent the last matched pattern. Although to add to my confusion the dos box also prints this error message:

Code: Select all

'Oranges.wmv' is not recognized as an internal or external command, operable program or batch file.

Since I'm just piping the output to a text file I don't see how this text gets treated as a command unless dos's special treatment of ampersand is also involved. With files that don't contain ampersnad characters this problem never happens.

My attempt to fix this bug was to replace the & in the file name with the word "and".
If that worked (which it didn't), I would instead replace it with \& may also work since that is the way sed escapes this character.

To do that, I pulled the main work out of the loop into a subroutine (Foo) which is called with three arguments (Full file name with path, file name without path, file size)
Then as you can see I attempted to replace the & in the file name (2nd argument) using a set command in the fourth line of the Foo routine. However this string replacement does not work. When I change the string replacement so that it changes some other string to the word "and" then it does work. This indicates that dos's special use of & is also a problem here. So I tried adding the "setlocal enableDelayedEpansion command and using exclamation marks instead of percent signs as well as adding the quotes resulting in the code below. I also tried prefixing the & with 1,2,3,4,or 5 ^ characters which is a way to escape the special meaning of &, but to no avail. In fact I tried many dozen variations of this set command, but none of them would work. My batch file is shown below.

Code: Select all

for /R %%G IN (*.mp4 *.avi *.mkv *.flv *.wmv *.mpg *.mov) do call :Foo "%%G" "%%~nxG" %%~zG
goto End

:Foo
 set z=%2
 set z=%z:~1,-1%
 setlocal enableDelayedExpansion
 set "z=!z:&= and !"
 set /A sz = (%3 + 500000) / 1000000
 set sz=    %sz%
 set sz=%sz:~-5%
 ffprobe -i %1 2>&1 | sed -n -r^
 "/Duration:/{s/.*tion: ([^,]*).*(.... kb).*/\1  \2sQQ/;h};/Video:/{H;x;s/QQ\n.*eo: (.....)[^,]*,[^,]*.*(....x....).*/  \1 \2 %sz% Mb  ---  %z%/p}" >> vid.txt
 goto :eof
:End

REM - Remove commas
sed -r "s/,(.*)--/ \1--/; s/,(.*)--/ \1--/" vid.txt > vid2.txt

REM - sort alphabetically by file name (which starts in column 47)
sort /+47 vid2.txt > vidtimer.txt

type vidtimer.txt

Thanks in advance for any suggestions you may think of.
~Paul

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Escaping an ampersand character

#2 Post by foxidrive » 31 Dec 2015 21:18

Without reading all your post, it seems you want to reformat the output from ffprobe.exe

If you provide the raw output from ffprobe for a few files then we may find a more elegant way to do it, but to echo a variable containing poison characters then you can use this:

Code: Select all

for %%a in ("%variable%") do echo %%a

Squashman
Expert
Posts: 4479
Joined: 23 Dec 2011 13:59

Re: Escaping an ampersand character

#3 Post by Squashman » 31 Dec 2015 22:08

You do not need both of these set commands.
set z=%2
set z=%z:~1,-1%

Just change it to this.
set "z=%~2"

You also did not need to pass it as three variables to the function.
You could have just used one variable and used the same modifiers with %1.

pmennen
Posts: 18
Joined: 15 Jul 2011 02:10

Re: Escaping an ampersand character

#4 Post by pmennen » 31 Dec 2015 23:07

foxidrive wrote:Without reading all your post, it seems you want to reformat the output from ffprobe.exe

Hmm ... I'm not sure what that means or your code snippet, but I'll experiment with that and see if I can figure out what it means.

foxidrive wrote:If you provide the raw output from ffprobe for a few files then we may find a more elegant way to do it, but to echo a variable containing poison characters then you can use this:

Ok, I'll do that for all the files that I used to create the sample output shown in my first post. Actually there are lots of lines in the output of ffprobe, but I'll only supply the two lines that I use. The first one contains the string " Duration: " from which I extract the duration and the bit rate. The second line contains the string "Video: " from which I extract the codec type (the next 5 characters) and the horiz x vert pixel resolution (by looking for the "x"). Occasionally there is more than one line containing "Video: " but I pick just the first one which seems to be appropriate.

Code: Select all

------------ Alexander the great.avi -----------------------------------
  Duration: 00:27:34.45, start: 0.000000, bitrate: 1616 kb/s
    Stream #0:0: Video: mpeg4 (Simple Profile) (XVID / 0x44495658), yuv420p, 708x480 [SAR 1:1 DAR 59:40], 1474 kb/s, 29.97 fps, 29.97 tbr, 29.97 tbn, 30k tbc
------------  Apples & Oranges.wmv -------------------------------------
  Duration: 00:03:03.00, start: 0.000000, bitrate: 264 kb/s
    Stream #0:1: Video: wmv3 (Main) (WMV3 / 0x33564D57), yuv420p, 320x240, 223 kb/s, 30 fps, 30 tbr, 1k tbn, 1k tbc
------------ Europe.mp4 --------------------------------------------------
  Duration: 00:57:31.68, start: 0.000000, bitrate: 605 kb/s
    Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 640x360 [SAR 1:1 DAR 16:9], 501 kb/s, 25 fps, 25 tbr, 1k tbn, 50 tbc (default)
------------- H21 similarity.avi ---------------------------------------------
  Duration: 00:00:01.00, start: 0.000000, bitrate: 9213 kb/s
    Stream #0:0: Video: msvideo1 (CRAM / 0x4D415243), rgb555le, 648x520, 15 fps, 15 tbr, 15 tbn, 15 tbc
------------- Relational model.mp4 ------------------------------------------
  Duration: 00:08:49.67, start: 0.000000, bitrate: 128 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1000x562 [SAR 1:1 DAR 500:281], 88 kb/s, 15 fps, 15 tbr, 15k tbn, 30 tbc (default)
------------- Robot Chicken.flv----------------------------------------------
  Duration: 00:04:45.50, start: 0.000000, bitrate: 304 kb/s
    Stream #0:0: Video: flv1, yuv420p, 320x240, 237 kb/s, 30 fps, 30 tbr, 1k tbn, 1k tbc

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Escaping an ampersand character

#5 Post by foxidrive » 01 Jan 2016 03:04

BTW, your filesize routine will fail on files over 2GB and show this:
Invalid number. Numbers are limited to 32-bits of precision.

because of a limitation in batch math

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Escaping an ampersand character

#6 Post by foxidrive » 01 Jan 2016 04:18

I had posted some code above my last reply, but have removed that if anyone saw it.

I downloaded ffmprobe and found there were results from different files that skewed the variables, so I've added a compare for some variations in the ffmprobe output - this is tested here and seems to work ok.

Code: Select all

@echo off
set "tempfile=%temp%\ffprobedata.txt"
set "ffprobe=c:\util\ffmpeg\ffprobe.exe"
for /R %%G IN (*.mp4 *.avi *.mkv *.flv *.wmv *.mpg *.mov) do (
   set "filename=%%~nG"
   setlocal enabledelayedexpansion
   "%ffprobe%" -i "%%G" >"%tempfile%" 2>&1
   for /f "tokens=2,6,7 delims=/, " %%a in ('find " Duration: " ^<"%tempfile%"') do (
   rem echo "%%a" "%%b" "%%c"&pause
   for /f "tokens=1-8 delims=," %%d in ('find "  Stream #" ^< "%tempfile%" ^|find " Video: "') do (
   rem echo %%d ~ %%e ~ %%f ~ %%g ~ %%h ~ %%i ~ %%j ~ %%k
   set "res="
   for %%x in ("%%f" "%%g" "%%h") do set "testres=%%~x" & if /i not "!testres:x=!"=="!testres!" set "res=%%~x"
   rem  echo !res!
   for /f %%m in ("!res!") do (
   for /f "tokens=4"  %%o in ("%%d") do (
   set "bitrate=      %%b"
   set "vidcodec=%%o     "
   set "res=        %%m"
   set "filesize=%%~zG"
   set "filesize=         !filesize:~0,-6!"
   echo %%a !bitrate:~-6! %%cs  !vidcodec:~0,5! !res:~-9! !filesize:~-5! MB   %%~xG  ---  !filename!
   ))))
   endlocal
   )
   del "%tempfile%" 2>nul
pause

pmennen
Posts: 18
Joined: 15 Jul 2011 02:10

Re: Escaping an ampersand character

#7 Post by pmennen » 01 Jan 2016 05:16

foxidrive wrote:I downloaded ffmprobe and found there were results from different files that skewed the variables, so I've added a compare for some variations in the ffmprobe output - this is tested here and seems to work ok.


Wow, foxidrive. That looks like it was a lot of work. Amazing, you didn't even you sed.
I didn't know you could do that much character processing with raw batch commands.
I see you added a column for the file extension, which I agree is a good idea.
I even tested it on files including the dreaded ampersand character and indeed it still worked.

I suspect I will be able to figure out how your batch script works, although it might take me awhile :)
The only change I made so far is to redirect the echo output to a file since I need that.
I did find one problem. I had mentioned that sometimes more than one line appears containing the "Video:" string and in that case the right thing to do was to use the first one. That worked in my script, but with your script it prints two lines as in this example:

Code: Select all

01:55:25.89    766 kbs  h264   1280x534   663 MB   .mp4  ---  The Incredibles 2004 
01:55:25.89    766 kbs  png   1600x2400   663 MB   .mp4  ---  The Incredibles 2004


The output from ffprobe with this particularo mp4 file includes the following lines:

Code: Select all

 Duration: 01:55:25.89, start: 0.000000, bitrate: 766 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 1280x534 [SAR 1:1 DAR 640:267], 659 kb/s, 23.98 fps, 23.98 tbr, 90k tbn, 180k tbc (default)
    Stream #0:8: Video: png, rgb24(pc), 1600x2400, 90k tbr, 90k tbn, 90k tbc


It's the last line that causes the problem. I don't know why ffprobe reports this as a video stream since it seems to be a still picture.
The only type other than png that I've seen in a similar way is mjpeg. That also produces an extra unwanted line.
Perhaps the extra lines could be removed by filtering out Video lines containing the strings png or mjpeg although perhaps safer would be to only print out a line for the first video line encountered since I really only want one output line per file. Once I figure out how your script works I think I will be able to fix this (hopefully) last bug.

Many thanks
~Paul

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Escaping an ampersand character

#8 Post by foxidrive » 01 Jan 2016 06:47

Repost after a number of edits.

Thanks Paul,

The png looks to be an embedded image inside the file - which I assume some players can extract as the media file image/thumbnail for the video. I guess ffprobe just considers it video.

Thanks for posting because ffprobe may be useful for my own media information processing.

I've added some filters to extract the first video stream as you mentioned - it's possible a file may have more than one actual video stream and another one is the primary video stream - but just mentioning that as it's not likely to be something you'd encounter often.


The ampersand itself can be escaped with ^
and delayedexpansion allows me to use it here in a string.

The earlier line I posted also allows & etc to be echoed, but you'd assemble the entire line into a variable and then echo the variable using that method.


I added a line at the start to acquire the filelist - it turned out to be unneeded but I left it there.
The sort is near the bottom - and the output file is recreated every time.


Batch code can process quite a bit - but it's not straightforward :) and it's inherently slow compared to sed etc, and batch code chokes on different characters as it does with & but the data in ffprobe output is safe to process fortunately.



Code: Select all

@echo off
set "ffprobe=c:\Util\ffmpeg\ffprobe.exe"

set "tempfile=%temp%\ffprobedata.txt"
dir /b /s /a-d *.mp4 *.avi *.mkv *.flv *.wmv *.mpg *.mov >"%tempfile%.1"
(
for /f "usebackq delims=" %%G IN ("%tempfile%.1") do (
   echo %%G 1>&2
   set "filename=%%~nG"
   setlocal enabledelayedexpansion
   "%ffprobe%" -i "%%G" >"%tempfile%" 2>&1
   for /f "tokens=2,6,7 delims=/, " %%a in ('find " Duration: " ^<"%tempfile%"') do (
   rem echo "%%a" "%%b" "%%c"&pause
   rem numbers the video stream lines and takes the first video stream
   for /f "tokens=1-8 delims=," %%d in ('find "  Stream #" ^< "%tempfile%" ^|find " Video: " ^|findstr /n "^" ^|findstr "^1:"') do (
   rem echo %%d ~ %%e ~ %%f ~ %%g ~ %%h ~ %%i ~ %%j ~ %%k
   rem checks 3 sequential items in the video stream line to find which one has an x in it
   rem as that is the term to keep with the resolution in it
   set "res="
   for %%y in ("%%f" "%%g" "%%h") do set "testres=%%~y" & if /i not "!testres:x=!"=="!testres!" set "res=%%~y"
   rem  echo !res!
   for /f %%m in ("!res!") do (
   rem extracts the video codec term from the initial portion up to the first comma in the video stream line
   for /f "tokens=5"  %%o in ("%%d") do (
   set "bitrate=      %%b"
   set "vidcodec=%%o     "
   set "res=        %%m"
   set "filesize=%%~zG"
   set "filesize=         !filesize:~0,-6!"
   echo %%a !bitrate:~-6! %%cs  !vidcodec:~0,5! !res:~-9! !filesize:~-5! MB   %%~xG  ---  !filename!
   ))))
   endlocal
   )
) >"%tempfile%.2"
sort /+58 <"%tempfile%.2" >"file-output.txt"
   del "%tempfile%*" 2>nul
pause
goto :eof

pmennen
Posts: 18
Joined: 15 Jul 2011 02:10

Re: Escaping an ampersand character

#9 Post by pmennen » 01 Jan 2016 14:42

foxidrive wrote:Batch code can process quite a bit - but it's not straightforward :) and it's inherently slow compared to sed etc, and batch code chokes on different characters as it does with & but the data in ffprobe output is safe to process fortunately.


It is slightly slower than my old version, but not that noticeable (perhaps 20%)

Here is a modification to add a line at the end to show the total number of files processed as well as the sum of the durations.
Summing the durations is a bit tricky, but perhaps you will think of an easier way.
Also I wanted to watch the output as it is generated, so I used "wtee.exe" for that.
Also I changed the spacing slightly. What do you think?
Watch out, I put ffprobe.exe in a different place than you.

Code: Select all

@echo off
rem Traverses recursively from the current directory while tallying the video durations

set "ffprobe=c:\Util\ffprobe.exe"
set "tempfi1=%temp%\vidtemp1.txt"
set "tempfi2=%temp%\vidtemp2.txt"
set "tempbat=%temp%\tempbat"
del "%tempfi2%" 2>nul

for /R %%G IN (*.mp4 *.avi *.mkv *.flv *.wmv *.mpg *.mov) do (
   rem echo %%G 1>&2
   set "filename=%%~nG"
   setlocal enabledelayedexpansion
   "%ffprobe%" -i "%%G" >"%tempfi1%" 2>&1
   for /f "tokens=2,6,7 delims=/, " %%a in ('find " Duration: " ^<"%tempfi1%"') do (
   rem echo "%%a" "%%b" "%%c"&pause
   rem numbers the video stream lines and takes the first video stream
   for /f "tokens=1-8 delims=," %%d in ('find "  Stream #" ^< "%tempfi1%" ^|find " Video: " ^|findstr /n "^" ^|findstr "^1:"') do (
   rem echo %%d ~ %%e ~ %%f ~ %%g ~ %%h ~ %%i ~ %%j ~ %%k
   rem checks 3 sequential items in the video stream line to find which one has an x in it
   rem as that is the term to keep with the resolution in it
   set "res="
   for %%y in ("%%f" "%%g" "%%h") do set "testres=%%~y" & if /i not "!testres:x=!"=="!testres!" set "res=%%~y"
   rem  echo !res!
   for /f %%m in ("!res!") do (
   rem extracts the video codec term from the initial portion up to the first comma in the video stream line
   for /f "tokens=5"  %%o in ("%%d") do (
   set "bitrate=      %%b"
   set "vidcodec=%%o     "
   set "res=        %%m"
   set "filesize=%%~zG"
   set "filesize=         !filesize:~0,-6!"
   echo %%a !bitrate:~-5! kbs  !vidcodec:~0,5! !res:~-9! !filesize:~-5! MB  %%~xG  ---  !filename! | wtee -a "%tempfi2%"
   ))))
   endlocal
   )

sort /+58 "%tempfi2%" > vidtimer.txt

rem - The fi variable will count the number of files processed
rem - The sec variable will total running time in seconds
rem - We have to handle each digit of the time separately (otherwise we get the leading zero octal problem)
set sec=0 & set fi=0
sed "s_0\(.\):\(.\)\(.\):\(.\)\(.\).*_set /A fi+=1 \& set /A sec+=\1*3600+\2*600+\3*60+\4*10+\5_" vidtimer.txt > "tempbat.bat"

REM - Execute (and delete) the sec batch file to count the files and sum up the durations
call "tempbat"
del "tempbat.bat"

REM - Convert seconds to hours/minutes/seconds
REM - Zeropad minutes and seconds by adding 100 and then removing the first digit

set /A min = sec/60
set /A sec += 100 - 60*min
set /A hour = min/60
set /A min += 100 - 60*hour

REM - Append the running time and file count to the end of the results file
echo ==================== Total run time is %hour%:%min:~1%:%sec:~1%  (%fi% video files) | wtee -a vidtimer.txt

del "%tempfi1%" 2>nul
del "%tempfi2%" 2>nul

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Escaping an ampersand character

#10 Post by foxidrive » 01 Jan 2016 23:38

pmennen wrote:It is slightly slower than my old version, but not that noticeable (perhaps 20%)

Here is a modification to add a line at the end to show the total number of files processed as well as the sum of the durations.
Also I wanted to watch the output as it is generated, so I used "wtee.exe" for that.
Also I changed the spacing slightly. What do you think?


I've enjoyed playing with this, and learning things along the way, and you clearly have a good grasp of it all...

The file count and duration is a neat extra - I've used Dave Benham's tips on google to make the script count the files as it goes - it's complicated by the way I enable and disable delayed expansion, to make sure that filenames with ! in them are processed too. It adds the seconds-duration as it goes too.

This modification eliminates the need for a tee filter - and this is just in the spirit of forcing batch to do it all.
If performance was a requirement with millions of files being processed then another way might be better for a great deal of the code. :)

I found an extra method in google to tally the durations, as I got wacky results. The method you used to count the seconds is an eye opener to me - it's elegant and I'd just never considered the 10's factor of the time element. I must have been snoozing that day at skool. ;)

It needs a good test still...
Cheers


Code: Select all

@echo off
rem Traverses recursively from the current directory
rem providing aspects of video file information and tallying the video durations

set "ffprobe=c:\Util\ffprobe.exe"
set "tempfi1=%temp%\vidtemp1.txt"
set "tempfi2=%temp%\vidtemp2.txt"
del "%tempfi2%" 2>nul
set totalsec=0
set fi=0

for /R %%G IN (*.mp4 *.avi *.mkv *.flv *.wmv *.mpg *.mov) do (
   rem echo %%G 1>&2
   set "filename=%%~nG"
   setlocal enabledelayedexpansion
   "%ffprobe%" -i "%%G" >"%tempfi1%" 2>&1
   for /f "tokens=2,6,7 delims=/, " %%a in ('find " Duration: " ^<"%tempfi1%"') do (
   rem echo "%%a" "%%b" "%%c"&pause
   rem numbers the video stream lines and takes the first video stream
   for /f "tokens=1-8 delims=," %%d in ('find "  Stream #" ^< "%tempfi1%" ^|find " Video: " ^|findstr /n "^" ^|findstr "^1:"') do (
   rem echo %%d ~ %%e ~ %%f ~ %%g ~ %%h ~ %%i ~ %%j ~ %%k
   rem checks 3 sequential items in the video stream line to find which one has an x in it
   rem as that is the term to keep with the resolution in it
   set "res="
   for %%y in ("%%f" "%%g" "%%h") do set "testres=%%~y" & if /i not "!testres:x=!"=="!testres!" set "res=%%~y"
   rem  echo !res!
   for /f %%m in ("!res!") do (
   rem extracts the video codec term from the initial portion up to the first comma in the video stream line
   for /f "tokens=5"  %%o in ("%%d") do (
   set "bitrate=      %%b"
   set "vidcodec=%%o     "
   set "res=        %%m"
   set "filesize=%%~zG"
   set "filesize=         !filesize:~0,-6!"
   rem separate the terms in the duration and sum them up
   set "duration=%%a"
   set   "hoursx10=!duration:~0,1!"
   set      "hours=!duration:~1,1!"
   set "minutesx10=!duration:~3,1!"
   set    "minutes=!duration:~4,1!"
   set "secondsx10=!duration:~6,1!"
   set    "seconds=!duration:~7,1!"
   rem The totalsec variable will total the running time in seconds
   set /a totalsec=36000*!hoursx10!+3600*!hours!+600*!minutesx10!+60*!minutes!+10*!secondsx10!+!seconds!+!totalsec!
   rem count the files
   set /a fi+=1
   set output=%%a !bitrate:~-5! kbs  !vidcodec:~0,5! !res:~-9! !filesize:~-5! MB  %%~xG  ---  !filename!
   echo !output!
   echo !output! 1>&2

   ))))
   for /f "tokens=1,* delims=|" %%A in ("!fi!|!totalsec!") do endlocal & set fi=%%A& set totalsec=%%B
   )> "%tempfi2%"

sort /+58 "%tempfi2%" > vidtimer.txt


REM - Convert seconds to hours/minutes/seconds

set /a days=totalsec/86400
set /a hours=(totalsec/3600)-(days*24)
set /a minutes=(totalsec/60)-(days*1440)-(hours*60)
set /a seconds=totalsec %% 60

set days=0%days%&set hours=0%hours%&set minutes=0%minutes%&set seconds=0%seconds%


REM - Append the running time and file count to the end of the results file
set summary===================== Total run time is %days% days %hours:~-2%:%minutes:~-2%:%seconds:~-2%  (%fi% video files)
echo %summary%>>vidtimer.txt
echo %summary%
del "%tempfi1%" 2>nul
del "%tempfi2%" 2>nul
pause
goto :eof

pmennen
Posts: 18
Joined: 15 Jul 2011 02:10

Re: Escaping an ampersand character

#11 Post by pmennen » 02 Jan 2016 06:54

foxidrive wrote: and you clearly have a good grasp of it all...

Perhaps not as good a grasp as you though. My tee filter method using wtee failed miserably when it hit a file name containing an ampersand sign. I hadn't noticed that problem when I posted my last script. So I had to rethink that.
The method you used to count the seconds is an eye opener to me - it's elegant and I'd just never considered the 10's factor of the time element.

I'm not sure what you mean by the 10's factor, but I'm glad you appreciated it. My first try was much simpler but then I scratched my head for awhile when it occasionally failed. It turned out to be a problem where a number containing a leading zero is interpreted as an octal number. That made the final script more complicated.

It needs a good test still...

Actually I did find some problems that you may want to consider fixing.

1.) The size parameter is truncated down to the nearest MB instead of rounded
2.) In the latest version, the vidtimer.txt file and the tempfi2 end up with only a single file in it. I see that you are using the append to file method so I don't see why it isn't working. It correctly display all the files on the console. Your previous version didn't have this problem.
3.) I found one video file which didn't report the resolution properly. The abbreviated output from ffprobe is as follows:

Code: Select all

Input #0, mpeg, from 'Secrets and Lies 1996.mpg':
  Duration: 02:21:51.37, start: 0.233367, bitrate: 1662 kb/s
    Stream #0:0[0x1e0]: Video: mpeg2video (Main), yuv420p(tv), 720x480 [SAR 8:9 DAR 4:3], max. 2600 kb/s, 29.97 fps, 29.97 tbr, 90k tbn, 59.94 tbc

The output to the console from either of your latest two versions is as follows:

Code: Select all

02:21:51.37  1662 kbs  mpeg2      max.  1769 MB  .mpg  ---  Secrets and Lies 1996

As you can see, for some reason it is not picking up the resolution properly in this instance.
(If you want the video file, I could send you a link to it.)

I attempted to fix this last problem in your script, but try as I might I couldn't figure out how your resolution extraction code works. Funny, since I could figure out virtually every other part of your script works. So in an attempt to fix it I went back to some of my old sed processing ways. Perhaps you won't appreciate that method, but that was the only thing I could get working.

I also kept my sed method (and cleaned up the code a bit) for computing the duration total mostly because it is so much shorter than your method. I was concerned about the execution time, although in the end your code ran only 25% slower on a test case of 942 video files, so perhaps execution time is not the overriding concern. Still, my script is faster, shorter, and it doesn't have any of the bugs I mentioned, so in case you want to go with my script, I have included it below:

Code: Select all

@echo off
rem Traverses recursively from the current directory while tallying the video durations

set "ffprobe=c:\Util\ffprobe.exe"
set "tempA=%temp%\vidtemp1.txt"
set "tempB=%temp%\vidtemp2.txt"
set "tempbat=%temp%\tempbat"
type NUL > "%tempB%"

for /R %%G IN (*.mp4 *.avi *.mkv *.flv *.wmv *.mpg *.mov) do (
   rem echo %%G 1>&2
   set "filename=%%~nG"
   setlocal enabledelayedexpansion
   "%ffprobe%" -i "%%G" >"%tempA%" 2>&1
   for /f "tokens=2,6 delims=, " %%a in ('sed -n "/ Duration:/p" "%tempA%"') do (
   rem echo "%%a" "%%b"
   for /f "tokens=1,2 delims=," %%d in ('sed -n -r "0,/Video:/{s_.*Video: (.....).*(..[0-9].x....).*_\1,\2_p}" "%tempA%"') do (
   rem echo ,%%d,%%e,
   set "codec=%%d  " & set "res=%%e    " & set "bitrate=      %%b"
   set sz=%%~zG
   set sz=!sz:~0,-5! & set /A sz = !sz! + 5 & set "sz=    !sz:~0,-1!"
   set "ln=%%a !bitrate:~-5! kbs  !codec:~0,5!  !res:~0,9! !sz:~-5! MB  %%~xG  ---  !filename!"
   echo !ln!  &  echo !ln! >> "%tempB%"
   ))
   endlocal
   )

sort /+58 "%tempB%" > vidtimer.txt

rem - The fi variable will count the number of files processed
rem - The sec variable will total running time in seconds
rem - We have to handle each digit of the time separately (otherwise we get the leading zero octal problem)
set sec=0 & set fi=0
sed -r "s_0(.):(.)(.):(.)(.).*_set /A fi+=1 \& set /A sec+=\1*3600+\2*600+\3*60+\4*10+\5_" vidtimer.txt > "%tempbat%.bat"

rem - Execute the batch file to count the files and sum up the durations
call "%tempbat%"

rem - Convert seconds to hours/minutes/seconds
rem - Zeropad minutes and seconds by adding 100 and then removing the first digit
set /A min = sec/60  &  set /A sec += 100 - 60*min
set /A hour = min/60 &  set /A min += 100 - 60*hour

rem - Append the running time and file count to the end of the results file
set "ln===================== Total run time is %hour%:%min:~1%:%sec:~1%  (%fi% video files)"
echo %ln%  &  echo %ln% >> vidtimer.txt

type NUL > "%tempA%"
type NUL > "%tempB%"
type NUL > "%tempbat%.bat"

This script worked flawlessly on my 942 file test case, but if you run into a video file that causes it to fail, I would certainly appreciate hearing about it.

Many thanks for the collaboration, as I don't think I would have surmounted the ampersand problem on my own.
~Paul

pmennen
Posts: 18
Joined: 15 Jul 2011 02:10

Re: Escaping an ampersand character

#12 Post by pmennen » 02 Jan 2016 21:21

In case there is anyone left following this thread, I made one more useful modification to the batch file.
I found that most often it was useful to have the path name included in front of the filename, so I added that.
However then I found that the part of the path that was common to all the lines was useless clutter so I added a short loop to the script to remove that clutter. This ridiculously inefficient loop uses several sed commands to remove one column at a time from the text file. However as a testament to how fast sed is, after scanning 8500 video files (which took about 20 minutes) it only took another few hundred milliseconds for sed to remove a dozen characters from each line. I didn't see any obvious errors in the list of 8500 video files and property lists suggesting that the script is reasonably robust. The batch file follows:

Code: Select all

@echo off
rem Traverses recursively from the current directory while tallying the video durations

set "ffprobe=c:\Util\ffprobe.exe"
set "tempA=%temp%\vidtemp1.txt"
set "tempB=%temp%\vidtemp2.txt"
set "tempbat=%temp%\tempbat"
set "tempba=%tempbat%.bat"
type NUL > "%tempB%"

for /R %%G IN (*.mp4 *.avi *.mkv *.flv *.wmv *.mpg *.mov) do (
   rem echo %%G 1>&2
   set "filename=%%~pG%%~nG"
   setlocal enabledelayedexpansion
   "%ffprobe%" -i "%%G" >"%tempA%" 2>&1
   for /f "tokens=2,6 delims=, " %%a in ('sed -n "/ Duration:/p" "%tempA%"') do (
   rem echo "%%a" "%%b"
   for /f "tokens=1,2 delims=," %%d in ('sed -n -r "0,/Video:/{s_.*Video: (.....).*(..[0-9].x....).*_\1,\2_p}" "%tempA%"') do (
   rem echo ,%%d,%%e,
   set "codec=%%d  " & set "res=%%e    " & set "bitrate=      %%b"
   set sz=%%~zG
   set sz=!sz:~0,-5! & set /A sz = !sz! + 5 & set "sz=    !sz:~0,-1!"
   set "ln=%%a !bitrate:~-5! kbs  !codec:~0,5!  !res:~0,9! !sz:~-5! MB  %%~xG  ---  !filename!"
   echo !ln!  &  echo !ln! >> "%tempB%"
   ))
   endlocal
   )

rem - This loop removes common characters at the beginning of the file path (column 62).
rem - If all the scanned video files belong to the same folder and if the first letter of the
rem - file name is always the same, then comment out this loop so that letter is not deleted.
:LoopStart
rem - Extract column 62 and remove duplicate lines
sed -r "s_.{61}(.).*_\1_" %tempB% | sed "$!N; /^\(.*\)\n\1$/!P; D" > "%tempba%"
call :filesize "%tempba%"
if %size% GTR 4 goto LoopDone
rem - Remove column 62 if it is always the same character
sed -r "s_(.{61}).(.*)_\1\2_" %tempB% > %tempba%
copy "%tempba%" "%tempB%" > NUL
goto LoopStart
:filesize
set size=%~z1
goto :eof
:LoopDone

sort /+58 "%tempB%" > vidtimer.txt

rem - The fi variable will count the number of files processed
rem - The sec variable will total running time in seconds
rem - We have to handle each digit of the time separately (otherwise we get the leading zero octal problem)
set sec=0 & set fi=0
sed -r "s_0(.):(.)(.):(.)(.).*_set /A fi+=1 \& set /A sec+=\1*3600+\2*600+\3*60+\4*10+\5_" vidtimer.txt > "%tempba%"

rem - Execute the batch file to count the files and sum up the durations
call "%tempbat%"

rem - Convert seconds to hours/minutes/seconds
rem - Zeropad minutes and seconds by adding 100 and then removing the first digit
set /A min = sec/60  &  set /A sec += 100 - 60*min
set /A hour = min/60 &  set /A min += 100 - 60*hour

rem - Append the running time and file count to the end of the results file
set "ln===================== Total run time is %hour%:%min:~1%:%sec:~1%  (%fi% video files)"
echo %ln%  &  echo %ln% >> vidtimer.txt

type NUL > "%tempA%"
type NUL > "%tempB%"
type NUL > "%tempba%"


BTW, foxidrive - If I share this script with others would you like me to give you credit for your contribution, or would you prefer to remain anonymous?

~Paul

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Escaping an ampersand character

#13 Post by foxidrive » 03 Jan 2016 14:14

Sorry for cocking up the test Paul, you're right - the file has only the last file in it. It requires an extra set of ()

The "setlocal enabledelayedexpansion" line has to be moved down one line - it was excluding the files that were the problem in the first place. Oops!

Something wacky happened yesterday with the conversion from seconds to hour/mic/sec and I got 04:31:32620 - which is why I added that different routine. Your code works fine now - I did something wrong then.
1.) The size parameter is truncated down to the nearest MB instead of rounded

I found in my test, comparing your output to mine, that only a few files vary by a single MB - is that significant?
3.) I found one video file which didn't report the resolution properly.

Thanks, I've added a change that should fix that.

The code below checks terms (separated by spaces) from the video info for an x and keeps the last term which contains an x.
By adding the extra terms it should find the resolution figure, as it was bumped along due to the extra information in that line.

Code: Select all

 for %%y in ("%%f" "%%g" "%%h" "%%i" "%%j") do set "testres=%%~y" & if /i not "!testres:x=!"=="!testres!" set "res=%%~y"


If you want speed and native tools, then Dave Benham's jrepl.bat will replace SED, and his gettimestamp.bat can calculate all sorts of time data easily - they use native Windows jscript internally for speed.
aacini created findrepl.bat which also works using jscript, and has formidable processing techniques like Dave's jrepl.bat has.

For some extra speed I'd get all the ffprobe data into one file and process it in one hit.
This script worked flawlessly on my 942 file test case, but if you run into a video file that causes it to fail, I would certainly appreciate hearing about it.

The setlocal enabledelayedexpansion line needs to be changed as I mentioned above - Yours works well otherwise.



This is my changed code:

Code: Select all

@echo off
rem Traverses recursively from the current directory
rem providing aspects of video file information and tallying the video durations

set "ffprobe=c:\Util\ffprobe.exe"
set "tempfi1=%temp%\vidtemp1.txt"
set "tempfi2=%temp%\vidtemp2.txt"
del "%tempfi2%" 2>nul
set totalsec=0
set fi=0



if not exist "%ffprobe%" echo can't find ffprobe.exe - check the location &pause&goto:eof

(
for /R %%G IN (*.mp4 *.avi *.mkv *.flv *.wmv *.mpg *.mov) do (
   rem echo %%G 1>&2
   set "filename=%%~nG"
   "%ffprobe%" -i "%%G" >"%tempfi1%" 2>&1
   setlocal enabledelayedexpansion
   for /f "tokens=2,6,7 delims=/, " %%a in ('find " Duration: " ^<"%tempfi1%"') do (
   rem echo "%%a" "%%b" "%%c"&pause
   rem numbers the video stream lines and takes the first video stream
   for /f "tokens=1-8 delims=," %%d in ('find "  Stream #" ^< "%tempfi1%" ^|find " Video: " ^|findstr /n "^" ^|findstr "^1:"') do (
   rem echo %%d ~ %%e ~ %%f ~ %%g ~ %%h ~ %%i ~ %%j ~ %%k
   rem checks 3 sequential items in the video stream line to find which one has an x in it
   rem as that is the term to keep with the resolution in it
   set "res="
   for %%y in ("%%f" "%%g" "%%h" "%%i" "%%j") do set "testres=%%~y" & if /i not "!testres:x=!"=="!testres!" set "res=%%~y"
   rem  echo !res!
   for /f %%m in ("!res!") do (
   rem extracts the video codec term from the initial portion up to the first comma in the video stream line
   for /f "tokens=5"  %%o in ("%%d") do (
   set "bitrate=      %%b"
   set "vidcodec=%%o     "
   set "res=        %%m"
   set "filesize=%%~zG"
   set "filesize=         !filesize:~0,-6!"
   rem separate the terms in the duration and sum them up
   set "duration=%%a"
   set   "hoursx10=!duration:~0,1!"
   set      "hours=!duration:~1,1!"
   set "minutesx10=!duration:~3,1!"
   set    "minutes=!duration:~4,1!"
   set "secondsx10=!duration:~6,1!"
   set    "seconds=!duration:~7,1!"
   rem The totalsec variable will total the running time in seconds
   set /a totalsec=36000*!hoursx10!+3600*!hours!+600*!minutesx10!+60*!minutes!+10*!secondsx10!+!seconds!+!totalsec!
   rem count the files
   set /a fi+=1
   set output=%%a !bitrate:~-5! kbs  !vidcodec:~0,5! !res:~-9! !filesize:~-5! MB  %%~xG  ---  !filename!
   echo !output!
   echo !output! 1>&2

   ))))
   for /f "tokens=1,* delims=|" %%A in ("!fi!|!totalsec!") do endlocal & set fi=%%A& set totalsec=%%B
   )
   )> "%tempfi2%"

sort /+58 "%tempfi2%" > vidtimer.txt


REM - Convert seconds to hours/minutes/seconds

set /a days=totalsec/86400
set /a hours=(totalsec/3600)-(days*24)
set /a minutes=(totalsec/60)-(days*1440)-(hours*60)
set /a seconds=totalsec %% 60

set days=0%days%&set hours=0%hours%&set minutes=0%minutes%&set seconds=0%seconds%


REM - Append the running time and file count to the end of the results file
set summary===================== Total run time is %days% days %hours:~-2%:%minutes:~-2%:%seconds:~-2%  (%fi% video files)
echo %summary%>>vidtimer.txt
echo %summary%
del "%tempfi1%" 2>nul
del "%tempfi2%" 2>nul
:pause
goto :eof



There's no need to credit me, but thanks for asking.


Regarding speed, I ran a few tests here just for interests sake:

Code: Select all

==================== Total run time is 1737:00:54  (1052 video files)
0 days 00:01:44.197 <--- this is your code from earlier yesterday
==================== Total run time is 1737:00:54  (1052 video files)
0 days 00:01:42.284 <--- this is your code from yesterday
==================== Total run time is 072 days 09:00:54  (1052 video files)
0 days 00:01:42.207 <--- this is my latest batch
==================== Total run time is 1737:00:54  (1052 video files)
0 days 00:01:42.438 <--- this is your latest batch

pmennen
Posts: 18
Joined: 15 Jul 2011 02:10

Re: Escaping an ampersand character

#14 Post by pmennen » 04 Jan 2016 14:07

foxidrive wrote:I found in my test, comparing your output to mine, that only a few files vary by a single MB - is that significant?

Actually when truncating as you were doing, very close to half the files will be rounded down when rounding up would be more appropriate. You are right that this is not all that significant since these files tend to be hundreds of megabytes anyway. I guess I'm just picky that way :)
If you want speed and native tools, then Dave Benham's jrepl.bat will replace SED

I didn't look at that closely but I'm a little confused how a batch file could be faster than SED. I don't quite understand the motivation for replacing SED anyway unless perhaps if the functionality is greater. The GNU sed I'm using seems to be blindingly fast. SEDs execution time is so small that it doesn't add measurably to the batch file execution time.

The setlocal enabledelayedexpansion line needs to be changed as I mentioned above - Yours works well otherwise.

I don't understand that comment. Why does that matter? In fact I moved that setlocal command out of the loop (to before the for command) and put the endlocal command after the loop and my script still seems to work as well as before.

Regarding speed, I ran a few tests here just for interests sake:

Yes, I think in the end, all these scripts will be limited by the fileIO speed once the data sets get fairly large.
I tried adapting my script to mp3 files, still using ffprobe. It's not much different except for how I extract the bit rate, and there is no need to extract a codec type. However the problem is that my mp3 collection has many more files in it (Well over 100 thousand). The script starts out running pretty fast but then slows to a crawl, to the point it looks like it will take days to finish. I thought the problem was the file append, so I switched to the single file direction (using the single >) like you did in your script, but it had the same problem. I think the implementation of the redirect must really be using a file append regardless of whether a single > or double >> is being used. I even tried a more complicated loop where I only appended 100 lines to a temp file, then appended the 100 line file to the main file and clearing out the temp file. So the temp file never gets big, so appending to it never should slow down. However it does slow down which is a big mystery to me. Of course the main file will still get big, but I can see that the speed problem is not related to the file append with the big file. The main loop just slows down for a reason I don't understand. Looks like if I want this program to work I will have to switch to a real scripting language, or perhaps "c".

~Paul

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Escaping an ampersand character

#15 Post by foxidrive » 05 Jan 2016 03:36

pmennen wrote:
If you want speed and native tools, then Dave Benham's jrepl.bat will replace SED

I didn't look at that closely but I'm a little confused how a batch file could be faster than SED. I don't quite understand the motivation for replacing SED anyway unless perhaps if the functionality is greater. The GNU sed I'm using seems to be blindingly fast. SEDs execution time is so small that it doesn't add measurably to the batch file execution time.


As I wrote in my reply, jrepl and findrepl use jscript - they are blindingly fast.
The benefit is that they use native Windows code (and that code leaves batch code for dust) so copying the batch file to another windows PC will allow your script to work in the same way, without downloading and installing a third-party tool.

I like GNUsed and cut my regexp teeth on it - but I've learned the smaller differences in Windows regexp and prefer to have my code far more portable - as many companies will not allow people to install binary files.

That may not be any reason for you to switch - I am just commenting on why I prefer it, when in a support role.
Both Dave and Antonio's code is very powerful - and there are more tools where they came from.

The setlocal enabledelayedexpansion line needs to be changed as I mentioned above - Yours works well otherwise.

I don't understand that comment. Why does that matter? In fact I moved that setlocal command out of the loop (to before the for command) and put the endlocal command after the loop and my script still seems to work as well as before.

See if you can find any files with an ! in the name, in your output file :)

I tried adapting my script to mp3 files, still using ffprobe. It's not much different except for how I extract the bit rate, and there is no need to extract a codec type. However the problem is that my mp3 collection has many more files in it (Well over 100 thousand). The script starts out running pretty fast but then slows to a crawl, to the point it looks like it will take days to finish.


Some forms of the for loop have an inherent bug when processing large numbers of file/with long paths. But that makes it start very slowly.

If you post the code you are using then I will test it here on a large number of files - I'm interested to see what is making it slow down once it's running, and if there's a workaround.

Post Reply