Reading Data in a CSV, removing lines, and adding entries.

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
ws_systems
Posts: 4
Joined: 03 Nov 2013 17:35

Reading Data in a CSV, removing lines, and adding entries.

#1 Post by ws_systems » 03 Nov 2013 17:57

Long time reader! Some of the smartest guys on the web are her on dostips.com so hopefully someone skilled can help me out! Tried coding this and gave up. Figured start fresh.

Scenario: Every night my system downloads about 200 csv files. Recently, the format of them has changed. Now appended to every CSV file are the first 4 rows of data which are not in CSV format.
I have always been able to merge them into one csv. However now, I have the repeating 4 rows at the begging of every CSV! I would like to clean up each CSV Files 4 rows at the top which are not in comma format. Heres the tricky part. Some of the data in the 4 rows is different. The account number.

In the 2nd row of the CSVthe exact information "Account Number - XYZ" In past times, I would append the file name of the csv to the first entry in the row of the csv files. That way when all the csv files were merged I could tell which csv account it came from. My format has now changed where the CSV files all have temp names. The system now stores the account number on the 2nd row of the CSV.

Next comes the header. Each file has the same exact header at the start of the CSV. This was originally my first row before the format changed. I was simply able to remove the first row and merge in previous trials. I only need the header for the columns at the very start of the CSV. Since my tech lingo might be off, Ill try to provide a sample of the data I have, and the out put I am looking for. I would be forever grateful.

Heck anyone that can solve this please give your PAYPAL and I will GLADLY! send you a donation thank you :)

-Mike

Sample Data Provided from Downloaded CSV Files.

===================================================
Owner Information
Account Number - XYZ
Owner Address

Heading1, Heading 2, Heading 3
"data1","data2","data3"
"data1","data2","data3"
"data1","data2","data3"

===================================================

**The owner information Line can be removed, but I need to read the account number that follow the hypen after "Account Number -" If at all possible have that appended to its own column before data1. Then removing the rest of the 4 lines in the beginning. There is a line that is completely blank. Line 4. Dont ask me why someone decided to include a blank line in a CSV! The 5th line which contains the heading names is not even needed either. My imports have been the same for the last 20 years until someone changed our bank recently.

Final Product:

=================================================
"XYZ","data1","data2","data3"
"XYZ","data1","data2","data3"
"XYZ","data1","data2","data3"

=================================================

Thanks everyone!

aGerman
Expert
Posts: 4743
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Reading Data in a CSV, removing lines, and adding entrie

#2 Post by aGerman » 03 Nov 2013 18:18

Some things I want to know:
- What's the maximum length of the rows?
- Do you know what characters are used for the line breaks? (Hopefully you could check using a HEX editor)
- Are all the files in the same directory?
- Should the results be written back to the files or should they be merged in a separate file? (If so, what's the filename?)

Regards
aGerman

PS: In the mean time you could try that:

Code: Select all

@echo off &setlocal
set "folder=C:\wherever\your\CSVs\are"
set "outfile=merge.csv"

pushd "%folder%"
>"%outfile%" type nul
for /f "delims=" %%i in ('dir /a-d /b *.csv') do (
  set "infile=%%i"
  setlocal enabledelayedexpansion
  <"!infile!" >>"!outfile!" (
    for /f %%j in ('type "!infile!"^|find /c /v ""') do (
      set /a "n=%%j - 5"
      set /p "="
      set "ln=" &set /p "ln="
      set /p "=" &set /p "=" &set /p "="
      for /f "tokens=3*" %%k in ("!ln!") do set "account=%%l"
      for /l %%k in (1 1 !n!) do (
        set "ln=" &set /p "ln="
        echo("!account!",!ln!
      )
    )
  )
  endlocal
)
popd

ws_systems
Posts: 4
Joined: 03 Nov 2013 17:35

Re: Reading Data in a CSV, removing lines, and adding entrie

#3 Post by ws_systems » 03 Nov 2013 18:54

aGerman wrote:Some things I want to know:
- What's the maximum length of the rows?
- Do you know what characters are used for the line breaks? (Hopefully you could check using a HEX editor)
- Are all the files in the same directory?
- Should the results be written back to the files or should they be merged in a separate file? (If so, what's the filename?)

Regards
aGerman



- What's the maximum length of the rows?
On average each file has anywhere from 5 rows to a max of 1000

- Do you know what characters are used for the line breaks? (Hopefully you could check using a HEX editor)

- Are all the files in the same directory?

All of the files are in the same directory

- Should the results be written back to the files or should they be merged in a separate file? (If so, what's the filename?)
they can be written to each file before merged. The problem is the file names now have been changed to download-date-time etc The format is now always changing.
Last edited by ws_systems on 03 Nov 2013 19:02, edited 1 time in total.

ws_systems
Posts: 4
Joined: 03 Nov 2013 17:35

Re: Reading Data in a CSV, removing lines, and adding entrie

#4 Post by ws_systems » 03 Nov 2013 19:01

aGerman wrote:Some things I want to know:

Code: Select all

@echo off &setlocal
set "folder=C:\wherever\your\CSVs\are"
set "outfile=merge.csv"

pushd "%folder%"
>"%outfile%" type nul
for /f "delims=" %%i in ('dir /a-d /b *.csv') do (
  set "infile=%%i"
  setlocal enabledelayedexpansion
  <"!infile!" >>"!outfile!" (
    for /f %%j in ('type "!infile!"^|find /c /v ""') do (
      set /a "n=%%j - 5"
      set /p "="
      set "ln=" &set /p "ln="
      set /p "=" &set /p "=" &set /p "="
      for /f "tokens=3*" %%k in ("!ln!") do set "account=%%l"
      for /l %%k in (1 1 !n!) do (
        set "ln=" &set /p "ln="
        echo("!account!",!ln!
      )
    )
  )
  endlocal
)
popd



Ok we are making progress!

It made a csv with 9039 lines. The first 4868 lines had the first column with:

"Owner Information", "file_name_of_csv", data1","data2","data3"

which is weird.. dont need.

However the remaining rows were perfectly formated. removing the text "Account Number - XYZ" and Substituting the first column of each row with XYZ. XYZ changes for each csv. Gettin close!

"XYZ","data1","data2","data3"
Last edited by ws_systems on 03 Nov 2013 19:06, edited 1 time in total.

aGerman
Expert
Posts: 4743
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Reading Data in a CSV, removing lines, and adding entrie

#5 Post by aGerman » 03 Nov 2013 19:03

My bad, sorry.

Code: Select all

@echo off &setlocal
set "folder=C:\wherever\your\CSVs\are"
set "outfile=merge.csv"

pushd "%folder%"
if exist "%outfile%" del "%outfile%"
set "tmpfile=%temp%\merge.cs~"
>"%tmpfile%" type nul
for /f "delims=" %%i in ('dir /a-d /b *.csv') do (
  set "infile=%%i"
  setlocal enabledelayedexpansion
  <"!infile!" >>"!tmpfile!" (
    for /f %%j in ('type "!infile!"^|find /c /v ""') do (
      set /a "n=%%j - 5"
      set /p "="
      set "ln=" &set /p "ln="
      set /p "=" &set /p "=" &set /p "="
      for /f "tokens=3*" %%k in ("!ln!") do set "account=%%l"
      for /l %%k in (1 1 !n!) do (
        set "ln=" &set /p "ln="
        echo("!account!",!ln!
      )
    )
  )
  endlocal
)
move "%tmpfile%" "%outfile%"
popd

Regards
aGerman

ws_systems
Posts: 4
Joined: 03 Nov 2013 17:35

Re: Reading Data in a CSV, removing lines, and adding entrie

#6 Post by ws_systems » 03 Nov 2013 19:16

aGerman,

My appologies. Im an Idiot. Both of your codes work flawless! I forgot and left my own merge.csv file that I had been working on. Essentially once your code hit that CSV in the folder it went nuts! Both of your work flawless! I am curious if you can explain which one is more efficient or which you prefer as a safer method. :)

I owe you many thanks!

aGerman
Expert
Posts: 4743
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Reading Data in a CSV, removing lines, and adding entrie

#7 Post by aGerman » 03 Nov 2013 19:25

You're welcome!
Use the 2nd code. It deletes the merge.csv if you forgot it and it prevents the processing of a semi finished merge.csv that would already exist in the folder if you would use the 1st code.

Regards
aGerman

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Reading Data in a CSV, removing lines, and adding entrie

#8 Post by foxidrive » 03 Nov 2013 22:14

Here's a method that uses two helper batch files - findrepl.bat by Aacini and repl.bat by Dave Benham, both on this site.
Place them in the same folder as this batch file below.

`findrepl.bat` from - https://www.dropbox.com/s/rfdldmcb6vwi9xc/findrepl.bat
`repl.bat` from - https://www.dropbox.com/s/qidqwztmetbvklt/repl.bat

It may be faster than plain batch, and it is certainly more robust with the characters that plain batch can have issues with.

Line 4 below extracts the information from line 2 of each CSV file, which is placed in %%b
Line 5 below extracts line 6 and after of each .CSV file and writes it to 'merged.csv' with the data in %%b as the first field.

Code: Select all

@echo off
del merged.csv 2>nul
for /f "delims=" %%a in (' dir /b /a-d *.csv ') do (
   for /f "delims=" %%b in ('type "%%a"^|findrepl /o:2:2') do (
      type "%%a"|findrepl /o:6|repl "^" "\x22%%b\x22,$&" x >>"merged.csv"
   )
)

Post Reply