Page 1 of 2

Sorting a 'Column' in a CSV or tab-delimited File

Posted: 17 Jun 2014 08:52
by Samir
I somehow remember some code for this on here, but I've searched and searched to no avail. :(

Imaging a txt file with 4 columns of data delimited by tab or comma. Each column has different length of strings. Because each column's stringlength is different, you can't simply use sort /+n to sort the file.

So what can be used to the file using a particular column? I know there's an answer out there. Thank you in advance.

Re: Sorting a 'Column' in a CSV or tab-delimited File

Posted: 17 Jun 2014 14:22
by Aacini

Code: Select all

@echo off
setlocal

rem Assume a .csv file with 4 columns separated by comma
for /F "tokens=1-4 delims=," %%a in (theFile.csv) do (
   rem Define the sorting column in next line: %%a=1, %%b=2, etc...
   set "line["%%c"]=%%a,%%b,%%c,%%d"
)

for /F "tokens=1* delims==" %%a in ('set line[') do echo %%b


Antonio

Re: Sorting a 'Column' in a CSV or tab-delimited File

Posted: 17 Jun 2014 14:30
by Squashman
Antonio,
How big of a file have you tested this with? I work with some rather large csv files some times.
This might come in handy some day.

Re: Sorting a 'Column' in a CSV or tab-delimited File

Posted: 17 Jun 2014 15:10
by einstein1969
Hi Squashman,

Have you a big csv to share for test? I have some idea ... :idea:

einstein1969

Re: Sorting a 'Column' in a CSV or tab-delimited File

Posted: 17 Jun 2014 15:34
by Aacini
Supposedly, the environment may hold a file up to 64 MB, but in practice the process becomes every time slower as the environment grows. Further details on this post.

Antonio

Re: Sorting a 'Column' in a CSV or tab-delimited File

Posted: 17 Jun 2014 19:27
by dbenham
Here is another option. This example sorts by the third comma delimited column.

Code: Select all

@echo off
(
  for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%A,%%B,%%C,%%D
)|sort|for /f "tokens=1* delims=," %%A in ('findstr "^"') do @echo(%%B

If the file is large, then it will be much more efficient to use a temporary intermediate file

Code: Select all

@echo off
(for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%A,%%B,%%C,%%D)|sort >test.csv.tmp
for /f "tokens=1* delims=," %%A in (test.csv.tmp) do @echo(%%B
del test.csv.tmp



Dave Benham

Re: Sorting a 'Column' in a CSV or tab-delimited File

Posted: 19 Jun 2014 17:35
by Samir
Wonderful answers gentlemen! Thank you so much!

I'll be using this to parse a csv into another format. 8)

Re: Sorting a 'Column' in a CSV or tab-delimited File

Posted: 27 Aug 2014 22:00
by Samir
Works great! Using the following to sort a 9 column csv:

Code: Select all

@echo off
setlocal

rem Assume a .csv file with 4 columns separated by comma
for /F "tokens=1-9 delims=," %%a in (test.csv) do (
   rem Define the sorting column in next line: %%a=1, %%b=2, etc...
   set "line["%%h"]=%%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,%%i"
)

for /F "tokens=1* delims==" %%a in ('set line[') do echo %%b


Thank you all very much!

Re: Sorting a 'Column' in a CSV or tab-delimited File

Posted: 07 Sep 2014 16:02
by Samir
dbenham wrote:Here is another option. This example sorts by the third comma delimited column.

Code: Select all

@echo off
(
  for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%A,%%B,%%C,%%D
)|sort|for /f "tokens=1* delims=," %%A in ('findstr "^"') do @echo(%%B
Dave Benham
I ran across a problem with Aacini's code when multiple rows have a '0' in the column that is being sorted. :(

Dave, I tried modifying your code for 9 columns as such, but it is isn't echoing the first column. What is wrong?

Code: Select all

REM
REM SORT FILE BY CHECK COLUMN
REM
(
  for /f "tokens=1-9 delims=," %%a in (NET.CSV) do @echo(%%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,%%i)|sort|for /f "tokens=1* delims=," %%a in ('findstr "^"') do @echo(%%b >> SORTED.CSV

Thank you in advance for any assistance. 8)

Re: Sorting a 'Column' in a CSV or tab-delimited File

Posted: 07 Sep 2014 16:18
by Samir
And now I just got another requirement. :( When the value in the column being sorted is the same, a second column needs to be used to sub-sort the rows. Any ideas?

Re: Sorting a 'Column' in a CSV or tab-delimited File

Posted: 08 Sep 2014 05:23
by Samir
Samir wrote:
dbenham wrote:Here is another option. This example sorts by the third comma delimited column.

Code: Select all

@echo off
(
  for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%A,%%B,%%C,%%D
)|sort|for /f "tokens=1* delims=," %%A in ('findstr "^"') do @echo(%%B
Dave Benham
I ran across a problem with Aacini's code when multiple rows have a '0' in the column that is being sorted. :(

Dave, I tried modifying your code for 9 columns as such, but it is isn't echoing the first column. What is wrong?

Code: Select all

REM
REM SORT FILE BY CHECK COLUMN
REM
(
  for /f "tokens=1-9 delims=," %%a in (NET.CSV) do @echo(%%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,%%i)|sort|for /f "tokens=1* delims=," %%a in ('findstr "^"') do @echo(%%b >> SORTED.CSV

Thank you in advance for any assistance. 8)
I found the problem--I had left out the sort column in my initial echo statement following the first for. :oops:

So what about the secondary sort column. Any ideas or approaches? I can't fully comprehend what is going on in the code to create a second inner loop or an intermediary file, if that would even be the approach.

Re: Sorting a 'Column' in a CSV or tab-delimited File

Posted: 08 Sep 2014 06:12
by Samir
I don't know what just happened, but the code just made sense!

So the logic is to copy the data from the column that needs to be sorted to the front of each row, and then use regular sort to sort the resulting data.

So to sort based on two columns, data from both sort columns needs to be copied to the front of each row, and then the first two columns need to be stripped off. If my guess is right, the code would look something like this (based on Dave's algorithm):

Code: Select all

@echo off
(
  for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%B,%%A,%%B,%%C,%%D
)|sort|for /f "tokens=1* delims=," %%A in ('findstr "^"') do @echo(%%C

I'll try to implement this in the next hour or so and see what happens. 8)

Re: Sorting a 'Column' in a CSV or tab-delimited File

Posted: 08 Sep 2014 07:00
by Squashman
If you are sorting on two columns then the tokens need to be "1-2*" for the sort output.

Re: Sorting a 'Column' in a CSV or tab-delimited File

Posted: 08 Sep 2014 07:26
by Samir
The following code correctly sorted a file with 9 columns by 3 columns: 8)

Code: Select all

REM
REM SORT FILE BY CHECK COLUMN
REM
(
  for /f "tokens=1-9 delims=," %%a in (NET.CSV) do @echo(%%h,%%f,%%g,%%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,%%i)|sort|for /f "tokens=3* delims=," %%a in ('findstr "^"') do @echo(%%b >> SORTED.CSV

Re: Sorting a 'Column' in a CSV or tab-delimited File

Posted: 08 Sep 2014 09:32
by Samir
Squashman wrote:If you are sorting on two columns then the tokens need to be "1-2*" for the sort output.
Thank you Squashman. Did you see the tokens=3* for my 3 column solution? This works, but it is correct?