Sorting a 'Column' in a CSV or tab-delimited File
Moderator: DosItHelp
Sorting a 'Column' in a CSV or tab-delimited File
I somehow remember some code for this on here, but I've searched and searched to no avail.
Imaging a txt file with 4 columns of data delimited by tab or comma. Each column has different length of strings. Because each column's stringlength is different, you can't simply use sort /+n to sort the file.
So what can be used to the file using a particular column? I know there's an answer out there. Thank you in advance.
Imaging a txt file with 4 columns of data delimited by tab or comma. Each column has different length of strings. Because each column's stringlength is different, you can't simply use sort /+n to sort the file.
So what can be used to the file using a particular column? I know there's an answer out there. Thank you in advance.
Re: Sorting a 'Column' in a CSV or tab-delimited File
Code: Select all
@echo off
setlocal
rem Assume a .csv file with 4 columns separated by comma
for /F "tokens=1-4 delims=," %%a in (theFile.csv) do (
rem Define the sorting column in next line: %%a=1, %%b=2, etc...
set "line["%%c"]=%%a,%%b,%%c,%%d"
)
for /F "tokens=1* delims==" %%a in ('set line[') do echo %%b
Antonio
Re: Sorting a 'Column' in a CSV or tab-delimited File
Antonio,
How big of a file have you tested this with? I work with some rather large csv files some times.
This might come in handy some day.
How big of a file have you tested this with? I work with some rather large csv files some times.
This might come in handy some day.
-
- Expert
- Posts: 941
- Joined: 15 Jun 2012 13:16
- Location: Italy, Rome
Re: Sorting a 'Column' in a CSV or tab-delimited File
Hi Squashman,
Have you a big csv to share for test? I have some idea ...
einstein1969
Have you a big csv to share for test? I have some idea ...
einstein1969
Re: Sorting a 'Column' in a CSV or tab-delimited File
Supposedly, the environment may hold a file up to 64 MB, but in practice the process becomes every time slower as the environment grows. Further details on this post.
Antonio
Antonio
Re: Sorting a 'Column' in a CSV or tab-delimited File
Here is another option. This example sorts by the third comma delimited column.
If the file is large, then it will be much more efficient to use a temporary intermediate file
Dave Benham
Code: Select all
@echo off
(
for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%A,%%B,%%C,%%D
)|sort|for /f "tokens=1* delims=," %%A in ('findstr "^"') do @echo(%%B
If the file is large, then it will be much more efficient to use a temporary intermediate file
Code: Select all
@echo off
(for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%A,%%B,%%C,%%D)|sort >test.csv.tmp
for /f "tokens=1* delims=," %%A in (test.csv.tmp) do @echo(%%B
del test.csv.tmp
Dave Benham
Re: Sorting a 'Column' in a CSV or tab-delimited File
Wonderful answers gentlemen! Thank you so much!
I'll be using this to parse a csv into another format.
I'll be using this to parse a csv into another format.
Re: Sorting a 'Column' in a CSV or tab-delimited File
Works great! Using the following to sort a 9 column csv:
Thank you all very much!
Code: Select all
@echo off
setlocal
rem Assume a .csv file with 4 columns separated by comma
for /F "tokens=1-9 delims=," %%a in (test.csv) do (
rem Define the sorting column in next line: %%a=1, %%b=2, etc...
set "line["%%h"]=%%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,%%i"
)
for /F "tokens=1* delims==" %%a in ('set line[') do echo %%b
Thank you all very much!
Re: Sorting a 'Column' in a CSV or tab-delimited File
I ran across a problem with Aacini's code when multiple rows have a '0' in the column that is being sorted.dbenham wrote:Here is another option. This example sorts by the third comma delimited column.Dave BenhamCode: Select all
@echo off
(
for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%A,%%B,%%C,%%D
)|sort|for /f "tokens=1* delims=," %%A in ('findstr "^"') do @echo(%%B
Dave, I tried modifying your code for 9 columns as such, but it is isn't echoing the first column. What is wrong?
Code: Select all
REM
REM SORT FILE BY CHECK COLUMN
REM
(
for /f "tokens=1-9 delims=," %%a in (NET.CSV) do @echo(%%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,%%i)|sort|for /f "tokens=1* delims=," %%a in ('findstr "^"') do @echo(%%b >> SORTED.CSV
Thank you in advance for any assistance.
Re: Sorting a 'Column' in a CSV or tab-delimited File
And now I just got another requirement. When the value in the column being sorted is the same, a second column needs to be used to sub-sort the rows. Any ideas?
Re: Sorting a 'Column' in a CSV or tab-delimited File
I found the problem--I had left out the sort column in my initial echo statement following the first for.Samir wrote:I ran across a problem with Aacini's code when multiple rows have a '0' in the column that is being sorted.dbenham wrote:Here is another option. This example sorts by the third comma delimited column.Dave BenhamCode: Select all
@echo off
(
for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%A,%%B,%%C,%%D
)|sort|for /f "tokens=1* delims=," %%A in ('findstr "^"') do @echo(%%B
Dave, I tried modifying your code for 9 columns as such, but it is isn't echoing the first column. What is wrong?Code: Select all
REM
REM SORT FILE BY CHECK COLUMN
REM
(
for /f "tokens=1-9 delims=," %%a in (NET.CSV) do @echo(%%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,%%i)|sort|for /f "tokens=1* delims=," %%a in ('findstr "^"') do @echo(%%b >> SORTED.CSV
Thank you in advance for any assistance.
So what about the secondary sort column. Any ideas or approaches? I can't fully comprehend what is going on in the code to create a second inner loop or an intermediary file, if that would even be the approach.
Re: Sorting a 'Column' in a CSV or tab-delimited File
I don't know what just happened, but the code just made sense!
So the logic is to copy the data from the column that needs to be sorted to the front of each row, and then use regular sort to sort the resulting data.
So to sort based on two columns, data from both sort columns needs to be copied to the front of each row, and then the first two columns need to be stripped off. If my guess is right, the code would look something like this (based on Dave's algorithm):
I'll try to implement this in the next hour or so and see what happens.
So the logic is to copy the data from the column that needs to be sorted to the front of each row, and then use regular sort to sort the resulting data.
So to sort based on two columns, data from both sort columns needs to be copied to the front of each row, and then the first two columns need to be stripped off. If my guess is right, the code would look something like this (based on Dave's algorithm):
Code: Select all
@echo off
(
for /f "tokens=1-3* delims=," %%A in (test.csv) do @echo(%%C,%%B,%%A,%%B,%%C,%%D
)|sort|for /f "tokens=1* delims=," %%A in ('findstr "^"') do @echo(%%C
I'll try to implement this in the next hour or so and see what happens.
Re: Sorting a 'Column' in a CSV or tab-delimited File
If you are sorting on two columns then the tokens need to be "1-2*" for the sort output.
Re: Sorting a 'Column' in a CSV or tab-delimited File
The following code correctly sorted a file with 9 columns by 3 columns:
Code: Select all
REM
REM SORT FILE BY CHECK COLUMN
REM
(
for /f "tokens=1-9 delims=," %%a in (NET.CSV) do @echo(%%h,%%f,%%g,%%a,%%b,%%c,%%d,%%e,%%f,%%g,%%h,%%i)|sort|for /f "tokens=3* delims=," %%a in ('findstr "^"') do @echo(%%b >> SORTED.CSV
Re: Sorting a 'Column' in a CSV or tab-delimited File
Thank you Squashman. Did you see the tokens=3* for my 3 column solution? This works, but it is correct?Squashman wrote:If you are sorting on two columns then the tokens need to be "1-2*" for the sort output.