eol=; tokens=*

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: eol=; tokens=*

#16 Post by dbenham » 10 Jan 2015 23:59

Squashman wrote:
dbenham wrote:Sponge Belly's point is still valid, as long as you understand the definition of a FOR /F token.

I am not understanding. Are you saying my comment is wrong and his code will work regardless of the number of repeat delimiters and we don't have to use your PARSECSV.bat file to properly parse the number of correct tokens?

No, not at all - My ParseCSV.bat is still very useful.

Your comment is true, but so is Sponge Belly's discovery and description. The two concepts really don't have any bearing on each other CSV column parsing rules are not compatible with FOR /F,.

Sponge Belly
Posts: 221
Joined: 01 Oct 2012 13:32
Location: Ireland
Contact:

Re: eol=; tokens=*

#17 Post by Sponge Belly » 13 Feb 2015 16:30

Thanks to everyone who replied. Apologies for not responding sooner.

@npocmaka

Imho, the priority of for /f options in most cases is:

  1. usebackq
  2. skip
  3. delims
  4. eol
  5. tokens

Skip will count blank lines and lines beginning with eol. Delims is next because for /f has to know on what characters to split the line into tokens. Next, if the first character of the first token is eol, then the line is skipped. And lastly, the line is split into tokens.

Thanks to Dave for setting me straight.

@Jeb

You wrote:

Code: Select all

set "line=X"
for /L %%n in (1 1 10) do set "line=!line:~0,500!!line:~0,500!"

(
   for /L %%n in (1 1 100) do (
      set /p ".=!line!" < nul
   )
   echo  param2 param3
) > long.txt


You created var line containing 100k Xs. I understand the second part where line contained 1k Xs and you printed it out 10 times. It’s the first part that has me stumped. You expanded line like an accordian twice, and it magically filled up with 1k Xs. I’ve never seen this done before. Please explain what’s going on here, and more importantly, how you figured out how to do it. My growing suspicion is that you have spooky powers. ;)

@Foxidrive

I wanted to find the last token because I was looking for an alternative method to trim leading and trailing whitespace from a string. Here’s a working draft:

Code: Select all

@echo off & setlocal enableextensions disabledelayedexpansion
:: nasty string padded with lots of tabs and spaces
set ^"str=   ^^^"    ^^^&^^    ^"^^^&^"^& !^^!^^^^! %%   %%OS%%   ^"
:: must double quotes in ddx
if not defined str (>&2 echo(string not defined & goto die
) else set "str=%str:"=""%"

:: count tokens in string
set "nth=0"
:loop
set /a nth+=1
for /f tokens^=%nth%^ eol^= %%A in ("%str%") do goto loop
set /a nth-=1
if %nth% equ 0 (>&2 echo(string consists entirely of whitespace
goto die) else if %nth% equ 1 (
:: edge case of 1 token in string
for /f tokens^=1^ eol^= %%B in ("%str%") do (set "str=%%B"
setlocal enabledelayedexpansion
rem safe to undouble quotes in edx
echo([!str:""="!]
endlocal & goto end))

:: store l&r-trimmed last token in var
for /f tokens^=%nth%^ eol^= %%B in ("%str%") do set "nthtoken=%%B"

:: get length of tail (start of last token to end of string)
set /a nth-=1
for /f tokens^=%nth%*^ eol^= %%C in ("%str%") do (set "tail=%%D"
for /f skip^=1^ delims^=:^ eol^= %%E in ('set tail ^^^& echo( ^| ^
findstr /o "^"') do set /a lentail=%%E-7)

:: chop off leading whitespace with "tokens=*" trick
for /f tokens^=*^ eol^= %%F in ("%str%") do set "str=%%F"
setlocal enabledelayedexpansion
:: Str[0..-LenTail] + NthToken = L&R Trimmed Str
set "str=!str:~0,-%lentail%!!nthtoken!"
:: safe to undouble quotes in edx
echo([!str:""="!]
endlocal & goto end

:die
(call)
:end
endlocal & goto :eof


I’m liking this approach much more than my previous efforts at coming up with a viable alternative solution to this problem, but it still has a ways to go. I want to get rid of the goto loop and I’m not happy about having to find the length of the tail. The former can be eliminated with chicanery, but the latter is proving a hard nut to crack.

If anyone would like to suggest refinements to my code, please feel free to post them here.

Toodles! :)

- SB
Last edited by Sponge Belly on 14 Feb 2015 13:13, edited 1 time in total.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: eol=; tokens=*

#18 Post by dbenham » 13 Feb 2015 20:31

Sponge Belly wrote:@npocmaka

Imho, the priority of for /f options in most cases is:

  1. usebackq
  2. skip
  3. delims
  4. tokens
  5. eol

The phrase "in most cases" and programming do not play well together. If you must resort to that phrase, then your understanding is wrong.

Sponge Belly wrote:Skip will skip over even blank lines. Delims overrides eol. Tokens can’t be determined until delims are known, and tokens affects eol as demonstrated in the OP.

I fail to see how your OP demonstrates that TOKENS affects EOL. In fact, I am confident that TOKENS has no effect on EOL. You can set TOKENS=1 or 2 or 3, etc., and it will not change which lines are skipped due to EOL. But change DELIMS and it definitely has an effect.

Reread the rules I posted at viewtopic.php?p=39022#p39022. I have never seen an example that violates the rules as I stated them.

Sponge Belly wrote:@Jeb

You wrote:

Code: Select all

set "line=X"
for /L %%n in (1 1 10) do set "line=!line:~0,500!!line:~0,500!"

(
   for /L %%n in (1 1 100) do (
      set /p ".=!line!" < nul
   )
   echo  param2 param3
) > long.txt


You created var line containing 100k Xs. I understand the second part where line contained 1k Xs and you printed it out 10 times. It’s the first part that has me stumped. You expanded line like an accordian twice, and it magically filled up with 1k Xs. I’ve never seen this done before. Please explain what’s going on here, and more importantly, how you figured out how to do it. My growing suspicion is that you have spooky powers. ;)

jeb may very well have spooky powers, but there is nothing eerie about the code. Pretend the expansion occurs without the substring for now. It simply doubles the length of the line each time. It is initialized to length 1, so the first iteration multiplies by 2 (=2), the 2nd iteration by 2 (=4) etc. In other words, 2 to the 10th power, which yields a length of 1024. But the goal is to get exactly 1000 bytes, so the substring operation is used. If the length of the actual string is less than the substring length, then the entire string is used. This is what happens for all but the last iteration. On the last iteration the length is 512, the substring makes it length 500, times 2 = 1000.


Dave Benham

Sponge Belly
Posts: 221
Joined: 01 Oct 2012 13:32
Location: Ireland
Contact:

Re: eol=; tokens=*

#19 Post by Sponge Belly » 14 Feb 2015 11:53

Hi Dave,

And a happy St Valentine’s Day to you! ;)

Thanks for explaining Jeb’s string magic. I get it now. The truncation only occurs on the last iteration of the for /l loop. The syntax is ignored the rest of the time. Devious. :twisted:

You wrote:

When FOR /F processes a line, it first breaks the line into tokens as per DELIMS, and then skips the line if the first character of the first token is the EOL character. This explains why the indented line is skipped when using the default DELIMS value of <space><tab>. It also explains the known behavior that setting EOL to one of the DELIMS characters effectively disables EOL - any EOL character at the start will have been consumed by DELIMS processing by the time the EOL check is made.

Only after all of the above occurs are the appropriate token values assigned to FOR values as per TOKENS. So TOKENS has no impact on whether a line is skipped due to EOL.


Well, I stand corrected. :oops:

The priority should be: usebackq; skip; delims; eol; and tokens. I’ll amend my previous post.

- SB

Post Reply