How to work with CSV containing many columns

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
SIMMS7400
Posts: 539
Joined: 07 Jan 2016 07:47

How to work with CSV containing many columns

#1 Post by SIMMS7400 » 10 Oct 2019 18:56

Hi Folks -

I wrote a piece of code to check for string lengths over 300 and trim back down to 300.

Code: Select all

::-- Set maximum length of string --::
SET "STRINGLEN=300"
SET "TOT_COLS=6"

IF EXIST "datafiletemp.csv" DEL /F /Q "datafiletemp.csv" >nul 2>&1

FOR /F "USEBACKQ tokens=1-%TOT_COLS% delims=," %%A IN ( "datafiletemp.csv" ) DO (
    SET "STRING=%%F" & CALL :STRLEN RESULT STRING
    IF !RESULT! GTR %STRINGLEN% SET "STRING=!STRING:~0,%STRINGLEN%!"
    ECHO %%~A,%%~B,%%~C,%%~D,%%~E,!STRING!>>"datafiletempz.csv"
)


pause
    
    



:STRLEN <resultVar> <stringVar>
(   
    SET "S=!%~2!#"
    SET "LEN=0"
    FOR %%P IN (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) DO (
        IF "!S:~%%P,1!" NEQ "" ( 
            SET /a "LEN+=%%P"
            SET "S=!S:~%%P!"
        )
    )
)
( 
    ENDLOCAL
    SET "%~1=%LEN%"
    EXIT /B
)
However, I just realized I'm going to be getting files with 50+ columns so the above method wont work as the column# I need to search will be 45 or higher. I was thinking, is there a JSCRIPT solution that could solve this?

Thank you!

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: How to work CSV containing many columns

#2 Post by dbenham » 11 Oct 2019 05:28

Your problems are related to working with large CSV files, having nothing to do with trimming a string to 300 characters. So I edited your title.

You ought to consider doing your entire job with a scripting language better suited to the task - PowerShell, VBS, or JScript would all be better.

But there are techniques that should enable you to work within batch.

Accessing more than 32 columns was investigated at Using many "tokens=..." in FOR /F command in a simple way Within that thread penpen made the critical discovery that utf8 can be used to reliably access any token. Beginning at viewtopic.php?f=3&t=7703&start=30#p51604 I introduce convenient routines to utilize the discovery and access many columns.

But that only solves one aspect of working with CSV in batch.

There are other potentially serious issues with using FOR /F to parse CSV
  • Empty columns
  • Column delimiter literals within quoted values
  • Line Feeds within values
At Safely parse nearly any CSV with parseCSV.bat I provide hybrid batch/JScript routines to conveniently deal with the above issues.

And at viewtopic.php?f=3&t=5702#p59742 I demonstrate how to combine the techniques to truly process nearly any CSV within batch.


Dave Benhaqm

SIMMS7400
Posts: 539
Joined: 07 Jan 2016 07:47

Re: How to work with CSV containing many columns

#3 Post by SIMMS7400 » 12 Oct 2019 06:42

HI Dave -

Thanks for this! I'm trying to incorporate the parseCSV with the macro for more than 32 columns on this link: viewtopic.php?f=3&t=5702&p=59742#p59742

For some reason, it doesn't return anything. Also, is there a way to check a column in X position and if the value is > 300 (in length) then trim back to 300 from the left.

For instance, I have a file of 105 columns and I need to do the length check on column 45:

Code: Select all

2019-07-31T17:00:00.000-07:00,2019-08-30T17:00:00.000-07:00,,,,,,,,,,N,N,N,,"Aug-19Purchase Invoices GW Germany","Payables A 1867883000001 1867888 N",Aug-19,08/29/2019,08/31/2019,45,45-DE,132,"PS - Digital",62030,"Hotels and Accommodations",ProdGP,DE,067350,45.132.62030.000000.067350.00.0,471,471,,EUR,EMPLOYEE,E-100121,"Christopher Catchings E-100121",,,"Journal Import Payables 1867888:",,"Journal Import Created",FF-a1t2T000002vuMbQAI-151504,471,"Hotel|Meals - GW Emp Only|Meals - Self - Dinner|Meals - Self - Lunch|Taxi/Train/Public Transit",E-100121,Payables,,ccatchings_Basler-2019-08-10,thu,,,,,,,,,,,,,,,,,,,2019-08-29T20:52:49.000+00:00,18,thu,,471,471,,EUR,08/31/2019,User,"Default Product",2019-08-31T00:00:00.000+00:00,0,471,,"Journal Import 1867888:",000000,"Insurance Product Germany (IPG) - CC Upgrade","Hotels and Accommodations",,,,,,E-100121,DE,,,,"Purchase Invoices",00,"Default ICO",300000001414021,"GW DE PL",,,
Any ideas? I will keep trying to get it to work...thank you, Dave!

SIMMS7400
Posts: 539
Joined: 07 Jan 2016 07:47

Re: How to work with CSV containing many columns

#4 Post by SIMMS7400 » 14 Oct 2019 05:48

Dave -

I ended up with VBS. What I tried to do is spool all the strings greater than 300 to a temp file and then use JSCRIPT to do a find and replace on the original file, however that was taking a very long time to complete. The file is 105 columns X 47k rows.

Code: Select all

strTrimFile = "Trim_List_" & Year(Date) & "_" & Month(Date) & Day(Date) & "_" & Hour(Now) & Minute(Now) & ".csv"
strInputFile = "C:\TEMP\test_TrimColumn_V1\datafile.csv"
strOutputFile = "C:\TEMP\test_TrimColumn_V1\" & strTrimFile
strTrimLen = 300: strColumnPos = 44

Set objDictionary = CreateObject("Scripting.Dictionary")
Set oFS = CreateObject("Scripting.FileSystemObject")
Set oInputFile = oFS.OpenTextFile(strInputFile, 1, True)

If oFS.FileExists(strOutputFile) Then oFS.DeleteFile(strOutputFile)
Set oOutputFile = oFS.OpenTextFile(strOutputFile, 8, True)

Do While Not oInputFile.AtEndOfStream
    
    sLine = oInputFile.ReadLine : strColumn = Split(sLine, ",")(strColumnPos)
    If Not IsNull(strColumn) And Len(strColumn) > strTrimLen And Not objDictionary.Exists(strColumn) Then objDictionary.Add strColumn, strColumne
      
Loop

For Each objKey In objDictionary.Keys
    oOutputFile.WriteLine objKey
Next

oInputFile.Close: oOutputFile.Close

Set oInputFile = oFS.OpenTextFile(strInputFile, 1)
strNewContents = oInputFile.ReadAll

For Each objKey In objDictionary.Keys
    strNewContents = Replace(strNewContents, objKey, Left(objKey,strTrimLen) & """")
Next
    
oInputFile.Close

strOutputFile = "C:\TEMP\test_TrimColumn_V1\datafiletempnew.csv"
Set oOutputFile = oFS.OpenTextFile(strOutputFile, 2, True)
oOutputFile.Write strNewContents

oOutputFile.Close
Set oFS = Nothing
Set objDictionary = Nothing
Set oInputFile = Nothing
Set oOutputFile = Nothing

Post Reply