Safely parse nearly any CSV with parseCSV.bat v1.3

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
dbenham
Expert
Posts: 2251
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Safely parse nearly any CSV with parseCSV.bat v1.3

#1 Post by dbenham » 19 Jun 2014 00:15

It is fairly common that someone wants to parse CSV using FOR /F. This is a simple task if you know all columns are populated, and there are no commas, newlines, or quotes within values. Assume there are 4 columns:

Code: Select all

@echo off
for /f "tokens=1-4 delims=," %%A in (test.csv) do (
  echo ----------------------
  echo A=%%~A
  echo B=%%~B
  echo C=%%~C
  echo D=%%~D
  echo(
)
But things become more difficult if any of the following conditions occur:

1 - Values may be empty, with consecutive commas. FOR /F treats consecutive delimiters as one, so it will throw off the column assignment.

2 - Quoted values may contain commas. FOR /F will incorrectly treat a quoted comma as a column delimiter.

3 - Quoted values may contain newlines. FOR /F will break the line at the newline and incorrectly treat the one row as two.

4 - Quoted values may contain paired quotes that represent one quote. For example, "He said, ""Hello there"". A method is needed to convert "" into ".

Then there is a secondary problems that can crop up if delayed expansion is enabled.

5 - A FOR variable %%A will be corrupted if it contains ! (or sometimes ^) if delayed expansion is enabled when the variable is expanded.

There are fairly easy solutions for some of these issues, but solving all of them is extremely difficult (and slow) with pure batch.

I have written a hybrid JScript/batch utility called parseCSV.bat that makes it easy and relatively efficient to correctly parse nearly any CSV file with FOR /F.

EDIT 2017-02-19: Updated to version 1.1 to enable CSV transformations for purposes other than parsing by FOR /F. Also now supports escape sequences.
EDIT 2017-02-20: Updated to version 1.2 to fix a bug concerning /E with /Q:N, added the /U option, and improved performance
EDIT 2019-05-25: Updated to version 1.3 to fix nasty bug - /D was not working when combined with /E
parseCSV.bat

Code: Select all

@if (@X)==(@Y) @end /* harmless hybrid line that begins a JScrpt comment

::************ Documentation ***********
::parseCSV.bat version 1.3
:::
:::parseCSV  [/option]...
:::
:::  Parse stdin as CSV and write it to stdout in a way that can be safely
:::  parsed by FOR /F. All columns will be enclosed by quotes so that empty
:::  columns may be preserved. It also supports delimiters, newlines, and
:::  escaped quotes within quoted values. Two consecutive quotes within a
:::  quoted value are converted into one quote by default.
:::
:::  Available options:
:::
:::    /I:string = Input delimiter. Default is a comma (,)
:::
:::    /O:string = Output delimiter. Default is a comma (,)
:::
:::         The entire option must be quoted if specifying poison character
:::         or whitespace literals as a delimiters for /I or /O.
:::
:::         Examples:  pipe = "/I:|"
:::                   space = "/I: "
:::
:::         Standard JScript escape sequences can also be used.
:::
:::         Examples:       tab = /I:\t  or  /I:\x09
:::                   backslash = /I:\\
:::
:::    /E = Encode output delimiter literal within value as \D
:::         Encode newline within value as \N
:::         Encode backslash within value as \S
:::
:::    /D = escape exclamation point and caret for Delayed expansion
:::         ! becomes ^!
:::         ^ becomes ^^
:::
:::    /L = treat all input quotes as quote Literals
:::
:::    /Q:QuoteOutputFormat
:::
:::       Controls output of Quotes, where QuoteOutputFormat may be any
:::       one of the following:
:::
:::         L = all columns quoted, quote Literals output as "   (Default)
:::         E = all columns quoted, quote literals Escaped as ""
:::         N = No columns quoted, quote literals output as "
:::
:::       The /Q:E and /Q:N options are useful for transforming data for
:::       purposes other than parsing by FOR /F
:::
:::    /U = Write unix style lines with newline (\n) instead of the default
:::         Windows style of carriage return and linefeed (\r\n).
:::
:::parseCSV  /?
:::
:::  Display this help
:::
:::parseCSV  /V
:::
:::  Display the version of parseCSV.bat
:::
:::parseCSV.bat was written by Dave Benham. Updates are available at the original
:::posting site: http://www.dostips.com/forum/viewtopic.php?f=3&t=5702
:::

::************ Batch portion ***********
@echo off
if "%~1" equ "/?" (
  setlocal disableDelayedExpansion
  for /f "delims=: tokens=*" %%A in ('findstr "^:::" "%~f0"') do echo(%%A
  exit /b 0
)
if /i "%~1" equ "/V" (
  for /f "delims=:" %%A in ('findstr /bc:"::%~nx0 version " "%~f0"') do echo %%A
  exit /b 0
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0


************ JScript portion ***********/
var args     = WScript.Arguments.Named,
    stdin    = WScript.Stdin,
    stdout   = WScript.Stdout,
    escape   = args.Exists("E"),
    literalQ = args.Exists("L"),
    escapeQ  = (args.Item("Q")&&args.Item("Q").toUpperCase()=="E"),
    quoteCol = (args.Item("Q")&&args.Item("Q").toUpperCase()=="N") ? '' : '"',
    delayed  = args.Exists("D"),
    inDelim  = args.Item("I") ? eval('"'+args.Item("I")+'"') : ",",
    outDelim = args.Item("O") ? eval('"'+args.Item("O")+'"') : ",",
    newline  = args.Exists("U") ? "\n" : "\r\n",
    quote    = false,
    ln, c, n, out;
while (!stdin.AtEndOfStream) {
  ln=stdin.ReadLine();
  out="";
  if (!quote) stdout.Write(quoteCol);
  for (n=0; n<ln.length; n++ ) {
    c=ln.charAt(n);
    if (c == '"') {
      if (literalQ) {
        if (escapeQ) c+='"';
      } else if (quote && ln.charAt(n+1) == '"') {
        n++;
        if (escapeQ) c+='"';
      } else {
        quote=!quote;
        continue;
      }
    }
    else if (c == inDelim && !quote) c=quoteCol+outDelim+quoteCol;
    else {
      if (escape) {
        if (c == outDelim) c="\\D";
        if (c == "\\") c="\\S";
      }
      if (delayed) {
        if (c == "!") c="^!";
        if (c == "^") c="^^";
      }
    }
    out+=c;
  }
  out += (quote) ? ((escape) ? "\\N" : newline) : quoteCol+newline;
  stdout.Write(out);
}
I have also written a script that defines a macro to assist with parsing the most problematic CSV files. See http://www.dostips.com/forum/viewtopic.php?f=3&t=1827 for background information about batch macros with arguments.
EDIT 2015-01-08: Updated to version 1.1 to fix a silly mistake in the Example usage code

define_csvGetCol.bat

Code: Select all

::define_csvGetCol.bat version 1.1
::
:: Defines variable LF and macro csvGetCol to be used with
:: parseCSV.bat to parse nearly any CSV file.
::
:: This script must be called with delayedExpansion disabled.
::
:: The %csvGetCol% macro must be used with delayedExpansion enabled.
::
:: Example usage:
::
::   @echo off
::   setlocal disableDelayedExpansion
::   call define_csvGetCol
::   setlocal enableDelayedExpansion
::   for /f "tokens=1-3 delims=," %%A in ('parseCSV /d /e ^<test.csv') do (
::     %== Load and decode column values ==%
::     %csvGetCol% A "," %%A
::     %csvGetCol% B "," %%B
::     %csvGetCol% C "," %%C
::     %== Display the result ==%
::     echo ----------------------
::     for %%V in (A B C) do echo %%V=!%%V!
::     echo(
::   )
::
:: Written by Dave Benham
::

:: Delayed expansion must be disabled during macro definition

:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

:: define a newline with line continuation
set ^"\n=^^^%LF%%LF%^%LF%%LF%^^"

:: Define csvGetCol
:: %csvGetCol%  envVarName  "Delimiter"  FORvar
set csvGetCol=for %%# in (1 2) do if %%#==2 (%\n%
setlocal enableDelayedExpansion^&for /f "tokens=1,2*" %%1 in ("!args!") do (%\n%
  endlocal^&endlocal%\n%
  set "%%1=%%~3"!%\n%
  if defined %%1 (%\n%
    for %%L in ("!LF!") do set "%%1=!%%1:\N=%%~L!"%\n%
    set "%%1=!%%1:\D=%%~2!"%\n%
    set "%%1=!%%1:\S=\!"%\n%
  )%\n%
)) else setlocal disableDelayedExpansion ^& set args=
Usage is extremely simple if you know there are no commas or newlines in any values, and delayed expansion is not needed:

test1.csv

Code: Select all

"value1 with ""quotes""",value2: No problem!,value3: 2^3=8,value4: (2^2)!=16
value1,,value3,value4
value1,,,value4
value1,,,
,,,value4
test1.bat - no delayed expansion, no commas or newlines in values

Code: Select all

@echo off
for /f "tokens=1-4 delims=," %%A in ('parseCSV ^<test1.csv') do (
  echo -------------
  echo(A=%%~A
  echo(B=%%~B
  echo(C=%%~C
  echo(D=%%~D
  echo(
)
--OUTPUT1--

Code: Select all

-------------
A=value1 with "quotes"
B=value2: No problem!
C=value3: 2^3=8
D=value4: (2^2)!=16

-------------
A=value1
B=
C=value3
D=value4

-------------
A=value1
B=
C=
D=value4

-------------
A=value1
B=
C=
D=

-------------
A=
B=
C=
D=value4
It is also quite simple when commas are in values if you know of a character that does not exist in any value. Simply specify a unique character for the output delimiter.

test2.csv

Code: Select all

"value1 with ""quotes""","value2, No problem!","value3, 2^3=8","value4, (2^2)!=16"
value1,,value3,value4
value1,,,value4
value1,,,
,,,value4
test2.bat - no delayed expansion, no newlines or pipes in values EDIT: Note that the entire option must be quoted if the delimiter is a poison character

Code: Select all

@echo off
for /f "tokens=1-4 delims=|" %%A in ('parseCSV "/o:|" ^<test2.csv') do (
  echo -------------
  echo(A=%%~A
  echo(B=%%~B
  echo(C=%%~C
  echo(D=%%~D
  echo(
)
--OUTPUT2--

Code: Select all

-------------
A=value1 with "quotes"
B=value2, No problem!
C=value3, 2^3=8
D=value4, (2^2)!=16

-------------
A=value1
B=
C=value3
D=value4

-------------
A=value1
B=
C=
D=value4

-------------
A=value1
B=
C=
D=

-------------
A=
B=
C=
D=value4
It only takes a bit more code if values may contain newlines or if you don't know of a character that does not appear in any value. This solution encodes newlines, delimiters, and slashes as \N, \D, and \S. Delayed expansion is needed within the loop to decode the values, so ! and ^ must be escaped as ^! and ^^.

test3.csv

Code: Select all

"2^3=8","(2^2)!=16","Success!",Value4
value1,value2,value3,value4
,,,value4
"value1","value2","value3","value4"
"He said, ""Hey cutie.""","She said, ""Drop dead!""","value3 line1
value3 line2",c:\Windows
test3.bat - Allow virtually any valid CSV, without using a macro.

Code: Select all

@echo off
setlocal enableDelayedExpansion

:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

for /f "tokens=1-4 delims=," %%A in ('parseCSV /e /d ^<test3.csv') do (
  %== Load columns with encoded values. The trailing ! is important ==%
  set "A=%%~A"!
  set "B=%%~B"!
  set "C=%%~C"!
  set "D=%%~D"!
  %== Decode values ==%
  for %%L in ("!LF!") do for %%V in (A B C D) do if defined %%V (
    set "%%V=!%%V:\N=%%~L!"
    set "%%V=!%%V:\D=,!"
    set "%%V=!%%V:\S=\!"
  )
  %== Print results ==%
  echo ---------------------
  for %%V in (A B C D) do echo(%%V=!%%V!
  echo(
)
--OUTPUT3--

Code: Select all

---------------------
A=2^3=8
B=(2^2)!=16
C=Success!
D=Value4

---------------------
A=value1
B=value2
C=value3
D=value4

---------------------
A=
B=
C=
D=value4

---------------------
A=value1
B=value2
C=value3
D=value4

---------------------
A=He said, "Hey cutie."
B=She said, "Drop dead!"
C=value3 line1
value3 line2
D=c:\Windows
test4.bat - Allow virtually any valid CSV, but now use the %csvGetCol% macro.

Code: Select all

@echo off

:: Delayed expansion must be disabled during macro definition
setlocal disableDelayedExpansion
call define_csvGetCol

:: Delayed expansion must be enabled when using %csvGetCol%
setlocal enableDelayedExpansion
for /f "tokens=1-4 delims=," %%A in ('parseCSV /e /d ^<test3.csv') do (
  %== Load and decode column values ==%
  %csvGetCol% A "," %%A
  %csvGetCol% B "," %%B
  %csvGetCol% C "," %%C
  %csvGetCol% D "," %%D
  %== Print results ==%
  echo ---------------------
  for %%V in (A B C D) do echo(%%V=!%%V!
  echo(
)
Output is identical to test3.bat


If the CSV file is very large, then it is much more efficient to save the output of parseCSV.bat to a temporary file, and then use the FOR /F loop to read the temporary file.


There are still a couple inherent limitations that are true for all FOR /F usage:

1) A single FOR /F cannot parse more than 32 columns. Update 2019-05-30: See this post for methods to enable parseCSV to parse any number of columns.

2) Batch line length restriction of 8191 characters can still be a problem.


Dave Benham
Last edited by dbenham on 08 Jan 2015 10:14, edited 3 times in total.

Samir
Posts: 345
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Safely parse nearly any CSV with parseCSV.bat

#2 Post by Samir » 08 Jul 2014 11:41

Very nice. I need to remember this for an upcoming batch where I'll need to a parse by tabs...

foxidrive
Expert
Posts: 6033
Joined: 10 Feb 2012 02:20

Re: Safely parse nearly any CSV with parseCSV.bat

#3 Post by foxidrive » 03 Sep 2014 08:16

Not sure if I have the right end of the stick - I tend to skim read large texts - but is this the way to handle pipe delimited .csv Dave?

Code: Select all

d:\>type file.csv |ParseCSV.bat /I:"|" >file2.csv
| was unexpected at this time.

Squashman
Expert
Posts: 4106
Joined: 23 Dec 2011 13:59

Re: Safely parse nearly any CSV with parseCSV.bat

#4 Post by Squashman » 03 Sep 2014 09:06

Maybe you can't pipe input to ParseCSV??? All of Dave's examples are redirected input.

dbenham
Expert
Posts: 2251
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Safely parse nearly any CSV with parseCSV.bat

#5 Post by dbenham » 03 Sep 2014 09:50

Whoa :!: :shock: :? :oops: :evil:

Sorry guys. I don't know what I have done. I swear this used to work with a poison character as a delimiter. But now it clearly doesn't. I tested all those examples, and the output was real.

I must have made some change after testing that broke things, without realizing it.

I'll try to fix this to work properly with poison character delimiters within the next couple days.

BTW - It should work with either pipe or redirection, although I think redirection is a bit more efficient.


Dave Benham

dbenham
Expert
Posts: 2251
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Safely parse nearly any CSV with parseCSV.bat

#6 Post by dbenham » 03 Sep 2014 10:12

OK, I must have made an invalid change to the test2.bat example after testing.

When quoting a delimiter, the entire option must be quoted, not just the value. So "/i:|" is correct, and /i:"|" fails. I don't understand the mechanism, but I don't have time to investigate at the moment.

I've edited the top post to fix test2.bat.


Dave Benham

foxidrive
Expert
Posts: 6033
Joined: 10 Feb 2012 02:20

Re: Safely parse nearly any CSV with parseCSV.bat

#7 Post by foxidrive » 03 Sep 2014 10:34

Thanks Dave.

dbenham
Expert
Posts: 2251
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Safely parse nearly any CSV with parseCSV.bat

#8 Post by dbenham » 18 Feb 2017 23:28

Squashman presented a good idea in his comment to this StackOverflow question dealing with CSV format conversion. The OP there wanted to change the delimiter from semicolon to comma. Squashman suggests using parseCSV.bat to do the conversion. I had never thought of this use for my utility, and I think it is a good idea. However, there is one problem in that escaped quote literals are transformed into un-escaped literals for the purpose of FOR /F parsing. This is not desirable for a simple conversion.

So I decided to add some new options to make parseCSV.bat generally useful for nearly any type of CSV format transformation, as long as both the input and output use delimiter separated values. While I was at it, I modified the /I and /O options to accept escape sequences for the delimiters. This is especially useful for specifying tab as the input delimiter using /I:\t.

There are two new options, one of which has multiple forms:

/L - specifies that all quotes within the input are quote literals. This is useful for CSV input that does not follow the standard of quoting values that contain quotes, and doubling quote literals.

/Q:E - Escape quote literals as "" within the output. Also encloses each column within quotes.

/Q:N - Do not quote column values in the output, and quote literals are not escaped (not doubled). This is only useful if you know that none of the column values contain the output delimiter.

For completeness, there is also /Q:L - which is the same as the original default behavior: All output columns are quoted, and quote literals are not escaped (not doubled).

I've updated the code in the first post of this thread to version 1.1, with the above changes.


Dave Benham

dbenham
Expert
Posts: 2251
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Safely parse nearly any CSV with parseCSV.bat

#9 Post by dbenham » 19 Feb 2017 22:53

I updated the first post to version 1.2 with the following changes:

  • Fixed a bug that caused all output delimiters to be escaped if /E was combined /Q:N.
  • Modified behavior to write Windows style lines with \r\n terminators by default, and added the /U option to specify Unix style output with \n terminators.
  • Improved performance by writing entire lines instead of one character at a time.

Dave Benham

dbenham
Expert
Posts: 2251
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Safely parse nearly any CSV with parseCSV.bat

#10 Post by dbenham » 25 May 2019 00:35

OMG :shock: :oops:

I had a nasty bug in version 1.2 that prevented the /D option from working if /E was also used.

I updated the version to 1.3 in the first post of this thread.


Dave Benham

dbenham
Expert
Posts: 2251
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Safely parse nearly any CSV with parseCSV.bat v1.3

#11 Post by dbenham » 30 May 2019 19:42

So parseCSV.bat by itself does not help when you have more than 32 columns to parse. But Aacini began a thread investigating how to access large numbers of columns. Through a group effort, we were able to develop routines to easily access any number of tokens, as many as will fit within the 8191 line length limit. It is not difficult to combine parseCSV.bat with the above such that you can parse any CSV as long as you don't run into the 8191 line length limit. Note that parseCSV can increase the length of the line, so the source line limit may be less than 8191.

Below is a CSV that contains all possible complications when the number of fields is constant:
  • Comma field delimiter within field values
  • Empty field values
  • Quote literals within field values
  • Newlines within field values
  • All possible poison characters within field values
  • ! and ^ within field values, and we want delayed expansion enabled within our parsing loop
  • More than 32 fields per line (36 to be exact)
test1.csv

Code: Select all

C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,C15,C16,C17,C18,C19,C20,C21,C22,C23,C24,C25,C26,C27,C28,C29,C30,C31,C32,C33,C34,C35,C36
;,"Line1
Line2","""quoted ,"" and non quoted ,",(<\S\D\N&|>)^,"(<\S\D\N,&|>)^ ""(<\S\D\N,&|>)^""",(<\S\D\N&|>)^!,"(<\S\D\N,&|>)^! ""(<\S\D\N,&|>)^!""","Line1:(<\S\D\N,&|>)^!
Line2:""(<\S\D\N,
Line3:&|>)^!""",,,,,,,,,,,,,,,,,,,,,,,,,,,,Last Column!
The goal of this demonstration is to create a 2 dimensional array containing all columns of all rows. Given the developments noted above, it is pretty simple to do :)
The code assumes we cannot rule out any character at the beginning of each line, so EOL must be undefinied.

Code: Select all

@echo off

setlocal disableDelayedExpansion
call :define_csvGetCol
setlocal enableDelayedExpansion
call :defineFor For36InA A 36 ","

set "row=0"
for /f delims^=^ eol^= %%A in ('parseCSV /D /E ^<"test1.csv"') do (
  set /a row+=1
  setlocal disableDelayedExpansion
  %For36InA% (
    endlocal
    for /l %%C in (1 1 36) do (
      call :decodeAndStoreToken array.!row!.%%C "," !$%%C!
    )
  )
)

:: Show results
for /l %%R in (1 1 %row%) do for /l %%C in (1 1 36) do echo Row %%R Col %%C = [!array.%%R.%%C!]
exit /b


::==============================================================================================
:: The remainder of this script is constant for any CSV with a fixed number of fields per line,
:: regardless how many columns, or what delimiters are used, etc.

:decodeAndStoreToken  RtnVar  Delim  TokenName
::
:: Decodes FOR /F token %%~TokenName and stores result in RtnVar
::
for %%. in (.) do (
  %csvGetCol% %1 %2 %%~%3
)
exit /b


:defineFor  ForMacroName  InputVar  TokenCount  [DelimChars]
::
:: Defines a macro to be used for parsing an arbitrary number of tokens from
:: a FOR variable string. The macro always parses one additional token to hold
:: any remainder of the line that lies beyond the TokenCount tokens.
::
::    ForMacroName = The name of the macro variable to be created.
::
::    InputVar = The name of the FOR variable that contains the string of tokens.
::
::    TokenCount = The number of tokens to parse.
::                 The maximum value is 2304 (256*9)
::
::    DelimChars = An optional string of one or more characters, each of which
::                 is treated as a token delimiter. Default is "<tab><space>".
::                 If <space> is included in the string, then it must be the
::                 last character in the string.
::
:: Tokens are accessed by $n variables.
:: For example, %%%$45% would represent the 45th token.
::
:: FOR /F modifiers may be freely used. For example, %%~nx%$10% would treat the
:: 10th token as a file path, and would expand to the file name and extension.
::
:: Normally, a single FOR /F is limited to 31 tokens, but the macro supports
:: many more, theoretically as many as 2304. However, each line to be parsed
:: must be less than 8191 characters in length.
::
:: This function may be called with delayed expansion enabled or disabled.
:: It is generally recommended that the macro be used with delayed expansion
:: disabled so that tokens containing ! are not corrupted.
::
:: This function automatically calls :defineForChars to define enough $n
:: variables to satisfy the TokenCount+1 tokens.
::
:: Example usage - Suppose you want to parse a well behaved CSV file named
:: test.csv that contains 300 columns. All lines must have the same number of
:: columns, and no column value may contain a comma.
::
:: The following code will correctly parse each data line of test.csv:
::
::    @echo off
::    setlocal disableDelayedExpansion
::    call :defineFor For300InA A 300 ","
::    for /f "skip=1 delims=" %%A in (test.csv) do %For300InA% (
::      echo token   1 = %%%$1%
::      echo token   2 = %%%$2%
::      echo ...
::      echo token 300 = %%%$300%
::    )
::
:: If the first token might begin with any character, including the default
:: EOL character, then the FOR /F line should be changed as follows:
::
::    for /f skip^=1^ delims^=^ eol^= %%A in (test.csv) do %For300InA% (
::
if %$max%0 gtr %~30 goto :defineForInternal
set /a "$max=(%~3)/256+1"
call :defineForChars %$max%
:defineForInternal
setlocal enableDelayedExpansion
set "delims=%~4"
if not defined delims set "delims= "
set "in=%~2"
set "macro="
set /a max=31, end=0
for /l %%N in (1 31 %~3) do (
  if %%N neq 1 set "in=!$%%N!"
  set /a end+=31
  if !end! gtr %~3 set /a "max=%~3-%%N+1"
  set "macro=!macro! for /f "eol=!delims:~0,1! tokens=1-!max!* delims=!delims!" %%!$%%N! in ("%%!in!") do"
)
for /f "delims=" %%A in ("!macro! ") do endlocal & set "%~1=%%A"
exit /b


:defineForChars  Count
::
:: Defines variables to be used as FOR /F tokens, from $1 to $n, where n = Count*256
:: Also defines $max = Count*256.
:: No other variables are defined or tampered with.
::
:: Once defined, the variables are very useful for parsing lines with many tokens, as
:: the values are guaranteed to be contiguous within the FOR /F mapping scheme.
::
:: For example, you can use $1 as a FOR variable by using %%%$1%.
::
::   FOR /F "TOKENS=1-31" %%%$1% IN (....) DO ...
::
::      %%%$1% = token 1, %%%$2% = token 2, ... %%%$31% = token 31
::
:: This routine never uses SETLOCAL, and works regardless whether delayed expansion
:: is enabled or disabled.
::
:: Three temporary files are created and deleted in the %TEMP% folder, and the active
:: code page is temporarily set to 65001, and then restored to the starting value
:: before returning. Once defined, the $n variables can be used with any code page.
::
for /f "tokens=2 delims=:." %%P in ('chcp') do call :DefineForCharsInternal %1
exit /b
:defineForCharsInternal
set /a $max=%1*256
>"%temp%\forVariables.%~1.hex.txt" (
  echo FF FE
  for %%H in (
    "0 1 2 3 4 5 6 7 8 9 A B C D E F"
  ) do for /l %%N in (1 1 %~1) do for %%A in (%%~H) do for %%B in (%%~H) do (
    echo %%A%%B 0%%N 0D 00 0A 00
  )
)
>nul certutil.exe -decodehex -f "%temp%\forVariables.%~1.hex.txt" "%temp%\forVariables.%~1.utf-16le.bom.txt"
>nul chcp 65001
>"%temp%\forVariables.%~1.utf8.txt" type "%temp%\forVariables.%~1.utf-16le.bom.txt"
<"%temp%\forVariables.%~1.utf8.txt" (for /l %%N in (1 1 %$max%) do set /p "$%%N=")
for %%. in (dummy) do >nul chcp %%P  
del "%temp%\forVariables.%~1.*.txt"
exit /b


:define_csvGetCol
::
:: Defines variable LF and macro csvGetCol to be used with
:: parseCSV.bat to parse nearly any CSV file.
::
:: This script must be called with delayedExpansion disabled.
::
:: The %csvGetCol% macro must be used with delayedExpansion enabled.
::
:: Example usage:
::
::   @echo off
::   setlocal disableDelayedExpansion
::   call define_csvGetCol
::   setlocal enableDelayedExpansion
::   for /f "tokens=1-3 delims=," %%A in ('parseCSV /d /e ^<test.csv') do (
::     %== Load and decode column values ==%
::     %csvGetCol% A "," %%A
::     %csvGetCol% B "," %%B
::     %csvGetCol% C "," %%C
::     %== Display the result ==%
::     echo ----------------------
::     for %%V in (A B C) do echo %%V=!%%V!
::     echo(
::   )
::
:: Written by Dave Benham
::

:: Delayed expansion must be disabled during macro definition

:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

:: define a newline with line continuation
set ^"\n=^^^%LF%%LF%^%LF%%LF%^^"

:: Define csvGetCol
:: %csvGetCol%  envVarName  "Delimiter"  FORvar
set csvGetCol=for %%# in (1 2) do if %%#==2 (%\n%
setlocal enableDelayedExpansion^&for /f "tokens=1,2*" %%1 in ("!args!") do (%\n%
  endlocal^&endlocal%\n%
  set "%%1=%%~3"!%\n%
  if defined %%1 (%\n%
    for %%L in ("!LF!") do set "%%1=!%%1:\N=%%~L!"%\n%
    set "%%1=!%%1:\D=%%~2!"%\n%
    set "%%1=!%%1:\S=\!"%\n%
  )%\n%
)) else setlocal disableDelayedExpansion ^& set args=
--OUTPUT--

Code: Select all

Row 1 Col 1 = [C1]
Row 1 Col 2 = [C2]
Row 1 Col 3 = [C3]
Row 1 Col 4 = [C4]
Row 1 Col 5 = [C5]
Row 1 Col 6 = [C6]
Row 1 Col 7 = [C7]
Row 1 Col 8 = [C8]
Row 1 Col 9 = [C9]
Row 1 Col 10 = [C10]
Row 1 Col 11 = [C11]
Row 1 Col 12 = [C12]
Row 1 Col 13 = [C13]
Row 1 Col 14 = [C14]
Row 1 Col 15 = [C15]
Row 1 Col 16 = [C16]
Row 1 Col 17 = [C17]
Row 1 Col 18 = [C18]
Row 1 Col 19 = [C19]
Row 1 Col 20 = [C20]
Row 1 Col 21 = [C21]
Row 1 Col 22 = [C22]
Row 1 Col 23 = [C23]
Row 1 Col 24 = [C24]
Row 1 Col 25 = [C25]
Row 1 Col 26 = [C26]
Row 1 Col 27 = [C27]
Row 1 Col 28 = [C28]
Row 1 Col 29 = [C29]
Row 1 Col 30 = [C30]
Row 1 Col 31 = [C31]
Row 1 Col 32 = [C32]
Row 1 Col 33 = [C33]
Row 1 Col 34 = [C34]
Row 1 Col 35 = [C35]
Row 1 Col 36 = [C36]
Row 2 Col 1 = [;]
Row 2 Col 2 = [Line1
Line2]
Row 2 Col 3 = [quoted ," and non quoted ,]
Row 2 Col 4 = [(<\S\D\N&|>)^]
Row 2 Col 5 = [(<\S\D\N,&|>)^ "(<\S\D\N,&|>)^"]
Row 2 Col 6 = [(<\S\D\N&|>)^!]
Row 2 Col 7 = [(<\S\D\N,&|>)^! "(<\S\D\N,&|>)^!"]
Row 2 Col 8 = [Line1:(<\S\D\N,&|>)^!
Line2:"(<\S\D\N,
Line3:&|>)^!"]
Row 2 Col 9 = []
Row 2 Col 10 = []
Row 2 Col 11 = []
Row 2 Col 12 = []
Row 2 Col 13 = []
Row 2 Col 14 = []
Row 2 Col 15 = []
Row 2 Col 16 = []
Row 2 Col 17 = []
Row 2 Col 18 = []
Row 2 Col 19 = []
Row 2 Col 20 = []
Row 2 Col 21 = []
Row 2 Col 22 = []
Row 2 Col 23 = []
Row 2 Col 24 = []
Row 2 Col 25 = []
Row 2 Col 26 = []
Row 2 Col 27 = []
Row 2 Col 28 = []
Row 2 Col 29 = []
Row 2 Col 30 = []
Row 2 Col 31 = []
Row 2 Col 32 = []
Row 2 Col 33 = []
Row 2 Col 34 = []
Row 2 Col 35 = []
Row 2 Col 36 = [Last Column!]
It should not be difficult to adapt the code to work with a variable number of fields per line.


Dave Benham

Post Reply