Sorting a 'Column' in a CSV or tab-delimited File

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Sorting a 'Column' in a CSV or tab-delimited File

#1 Post by Samir » 17 Jun 2014 08:52

I somehow remember some code for this on here, but I've searched and searched to no avail. :(

Imaging a txt file with 4 columns of data delimited by tab or comma. Each column has different length of strings. Because each column's stringlength is different, you can't simply use sort /+n to sort the file.

So what can be used to the file using a particular column? I know there's an answer out there. Thank you in advance.

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Sorting a 'Column' in a CSV or tab-delimited File

#2 Post by Aacini » 17 Jun 2014 14:22

Code: Select all

@echo off
setlocal

rem Assume a .csv file with 4 columns separated by comma
for /F "tokens=1-4 delims=," %%a in (theFile.csv) do (
   rem Define the sorting column in next line: %%a=1, %%b=2, etc...
   set "line["%%c"]=%%a,%%b,%%c,%%d"
)

for /F "tokens=1* delims==" %%a in ('set line[') do echo %%b


Antonio

Squashman
Expert
Posts: 4465
Joined: 23 Dec 2011 13:59

Re: Sorting a 'Column' in a CSV or tab-delimited File

#3 Post by Squashman » 17 Jun 2014 14:30

Antonio,
How big of a file have you tested this with? I work with some rather large csv files some times.
This might come in handy some day.

einstein1969
Expert
Posts: 941
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Re: Sorting a 'Column' in a CSV or tab-delimited File

#4 Post by einstein1969 » 17 Jun 2014 15:10

Hi Squashman,

Have you a big csv to share for test? I have some idea ... :idea:

einstein1969

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Sorting a 'Column' in a CSV or tab-delimited File

#5 Post by Aacini » 17 Jun 2014 15:34

Supposedly, the environment may hold a file up to 64 MB, but in practice the process becomes every time slower as the environment grows. Further details on this post.

Antonio

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Sorting a 'Column' in a CSV or tab-delimited File

#6 Post by dbenham » 17 Jun 2014 19:27

Here is another option. This example sorts by the third comma delimited column.

Code: Select all

@echo off
(
  for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%A,%%B,%%C,%%D
)|sort|for /f "tokens=1* delims=," %%A in ('findstr "^"') do @echo(%%B

If the file is large, then it will be much more efficient to use a temporary intermediate file

Code: Select all

@echo off
(for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%A,%%B,%%C,%%D)|sort >test.csv.tmp
for /f "tokens=1* delims=," %%A in (test.csv.tmp) do @echo(%%B
del test.csv.tmp



Dave Benham

Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Sorting a 'Column' in a CSV or tab-delimited File

#7 Post by Samir » 19 Jun 2014 17:35

Wonderful answers gentlemen! Thank you so much!

I'll be using this to parse a csv into another format. 8)

Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Sorting a 'Column' in a CSV or tab-delimited File

#8 Post by Samir » 27 Aug 2014 22:00

Works great! Using the following to sort a 9 column csv:

Code: Select all

@echo off
setlocal

rem Assume a .csv file with 4 columns separated by comma
for /F "tokens=1-9 delims=," %%a in (test.csv) do (
   rem Define the sorting column in next line: %%a=1, %%b=2, etc...
   set "line["%%h"]=%%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,%%i"
)

for /F "tokens=1* delims==" %%a in ('set line[') do echo %%b


Thank you all very much!

Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Sorting a 'Column' in a CSV or tab-delimited File

#9 Post by Samir » 07 Sep 2014 16:02

dbenham wrote:Here is another option. This example sorts by the third comma delimited column.

Code: Select all

@echo off
(
  for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%A,%%B,%%C,%%D
)|sort|for /f "tokens=1* delims=," %%A in ('findstr "^"') do @echo(%%B
Dave Benham
I ran across a problem with Aacini's code when multiple rows have a '0' in the column that is being sorted. :(

Dave, I tried modifying your code for 9 columns as such, but it is isn't echoing the first column. What is wrong?

Code: Select all

REM
REM SORT FILE BY CHECK COLUMN
REM
(
  for /f "tokens=1-9 delims=," %%a in (NET.CSV) do @echo(%%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,%%i)|sort|for /f "tokens=1* delims=," %%a in ('findstr "^"') do @echo(%%b >> SORTED.CSV

Thank you in advance for any assistance. 8)

Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Sorting a 'Column' in a CSV or tab-delimited File

#10 Post by Samir » 07 Sep 2014 16:18

And now I just got another requirement. :( When the value in the column being sorted is the same, a second column needs to be used to sub-sort the rows. Any ideas?

Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Sorting a 'Column' in a CSV or tab-delimited File

#11 Post by Samir » 08 Sep 2014 05:23

Samir wrote:
dbenham wrote:Here is another option. This example sorts by the third comma delimited column.

Code: Select all

@echo off
(
  for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%A,%%B,%%C,%%D
)|sort|for /f "tokens=1* delims=," %%A in ('findstr "^"') do @echo(%%B
Dave Benham
I ran across a problem with Aacini's code when multiple rows have a '0' in the column that is being sorted. :(

Dave, I tried modifying your code for 9 columns as such, but it is isn't echoing the first column. What is wrong?

Code: Select all

REM
REM SORT FILE BY CHECK COLUMN
REM
(
  for /f "tokens=1-9 delims=," %%a in (NET.CSV) do @echo(%%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,%%i)|sort|for /f "tokens=1* delims=," %%a in ('findstr "^"') do @echo(%%b >> SORTED.CSV

Thank you in advance for any assistance. 8)
I found the problem--I had left out the sort column in my initial echo statement following the first for. :oops:

So what about the secondary sort column. Any ideas or approaches? I can't fully comprehend what is going on in the code to create a second inner loop or an intermediary file, if that would even be the approach.

Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Sorting a 'Column' in a CSV or tab-delimited File

#12 Post by Samir » 08 Sep 2014 06:12

I don't know what just happened, but the code just made sense!

So the logic is to copy the data from the column that needs to be sorted to the front of each row, and then use regular sort to sort the resulting data.

So to sort based on two columns, data from both sort columns needs to be copied to the front of each row, and then the first two columns need to be stripped off. If my guess is right, the code would look something like this (based on Dave's algorithm):

Code: Select all

@echo off
(
  for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%B,%%A,%%B,%%C,%%D
)|sort|for /f "tokens=1* delims=," %%A in ('findstr "^"') do @echo(%%C

I'll try to implement this in the next hour or so and see what happens. 8)

Squashman
Expert
Posts: 4465
Joined: 23 Dec 2011 13:59

Re: Sorting a 'Column' in a CSV or tab-delimited File

#13 Post by Squashman » 08 Sep 2014 07:00

If you are sorting on two columns then the tokens need to be "1-2*" for the sort output.

Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Sorting a 'Column' in a CSV or tab-delimited File

#14 Post by Samir » 08 Sep 2014 07:26

The following code correctly sorted a file with 9 columns by 3 columns: 8)

Code: Select all

REM
REM SORT FILE BY CHECK COLUMN
REM
(
  for /f "tokens=1-9 delims=," %%a in (NET.CSV) do @echo(%%h,%%f,%%g,%%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,%%i)|sort|for /f "tokens=3* delims=," %%a in ('findstr "^"') do @echo(%%b >> SORTED.CSV

Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Sorting a 'Column' in a CSV or tab-delimited File

#15 Post by Samir » 08 Sep 2014 09:32

Squashman wrote:If you are sorting on two columns then the tokens need to be "1-2*" for the sort output.
Thank you Squashman. Did you see the tokens=3* for my 3 column solution? This works, but it is correct?

Post Reply