Extract rows from text file and slight reformatting

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
NYTReader123
Posts: 9
Joined: 15 Sep 2008 17:20

Extract rows from text file and slight reformatting

#1 Post by NYTReader123 » 31 Dec 2008 05:41

Hi:

I have a pretty straightforward objective, but I am having problems getting the code to work. Any help or suggestions from the DOS gurus here would be most appreciated!


Short version
---------------
- want to extract a consecutive set of rows from a text file (omitting rows from the top and the bottom of the file)
- add a string to the beginning of each of these rows and then write to a new file


More details
--------------
The original file (tmp.csv) has N+T+2 rows (N and T are integers),

row1
row2
.
.
rowN
City, Visits, Page_Visits,...
City1, Visits1, Page_Visits1,...
.
.
.
CityT, VisitsT, Page_VisitsT,...
# --------------------------------------------------------------------------------

Note that row N+1 is a text string (variable names) while the T rows below it are numbers (the data).
I am interested only in the data rows: the T rows starting just after row N+1 and ending with the second to last row.

I would like to save these data rows to a file (NewData.csv) but with one small change: before each row I would like to add the contents of a variable (%%S). So for example the first row of NewData.csv would be,
%%S, City1, Visits1, Page_Visits1,...

There are two other issues, since I am actually doing this for a series of tmp.csv files:
(i) sometimes tmp.csv has no data (e.g. the bottom rows are,
rowN
City, Visits, Page_Visits,...
# --------------------------------------------------------------------------------
). I am not sure how to keep the batch file from crashing here.
(ii) I would like the first row of the new file NewData.csv to be the variable names (e.g. row N+1). I do not know the variable names in advance (e.g. I do not know the entire string in row N+1), but of course it can be read from the first tmp.csv that is used.

jeb
Expert
Posts: 1041
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

#2 Post by jeb » 04 Jan 2009 18:07

Hi NYTReader123,

try to use the FOR statement to extract single lines from the file.

Code: Select all

for /f "tokens=*" %%a in (tmp.csv) do @echo %%a


But first you have to simple count the number of lines (also with the FOR),
then it should be simple.

Then you start a second FOR-run and append your string like

Code: Select all

echo PRESTR %%a


hope it helps
jeb

NYTReader123
Posts: 9
Joined: 15 Sep 2008 17:20

#3 Post by NYTReader123 » 09 Jan 2009 03:51

Thanks Jeb for the suggestion (and sorry for my slow response)!

My main stumbling block is how to limit the lines which are echo'd from the tmp.csv file. My strategy was to find the row right before the data starts and also to count the number of rows in the file (since I want to stop echo'ing at the second to last row).

The following does not quite work:

REM FR will hold the row number just before the data begins
For /F %%A in ('Find /V /C "City,Visits" tmp.csv') Do set FR=%%A
REM numrows will hold the number of rows in tmp.csv
set /a numrows=0
for /f %%n in ('type "tmp.csv"|find "" /v /c') do set /a numrows=%%n

Any suggestions on what is going wrong?

And then if I do get this to work, how do I use the two variables here to limit the rows which are echoed? I know how to skip the first FR rows,

For /F "tokens=* skip=%FR%" %%A in (tmp.csv) do echo %%A

but am not sure how to skip the last row.

Any help on these two questions would be most appreciated!

NYTReader123
Posts: 9
Joined: 15 Sep 2008 17:20

follow-up

#4 Post by NYTReader123 » 15 Jan 2009 21:00

Hi everyone:

I hope this is not a violation of the etiquette here, but I am still struggling with this problem. If anyone can pass along some suggestions on my code and questions from my 9 Jan posting I would be greatly indebted!

jeb
Expert
Posts: 1041
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

#5 Post by jeb » 18 Jan 2009 15:00

Hi NTY,

so I try to find a hint.

I build my own tmp.csv

Code: Select all

remark 1
remark 2
remark ...
remark N

City,Visits
Hamburg,1
Berlin,8
Bochum,1000
This is the end, but should not used


Code: Select all

For /F %%A in ('Find /V /C "City,Visits" tmp.csv') Do set FR=%%A


First, I tried "Find /V /C" and it will result on my system (Vista) with
---------- tmp.csv: 9

this could be the first problem, because the set FR=%%A will fail.
But perhaps on your system find.exe works different.

But the /C stands for counting the lines which contains the text, and /V shows only the line which no contain the text, that's not excactly what you want.

This code works with my tmp.csv, but perhaps your problem is quite different from my solution.

Code: Select all

@ECHO off
setlocal ENABLEDELAYEDEXPANSION

REM FR will hold the row number where the data begins
For /F "delims=[]" %%A in ('type "tmp.csv" ^| find /N "City,Visits"') Do set /a FirstLine=%%A

REM numrows will hold the total number of rows in tmp.csv
set /a numrows=0
for /f %%n in ('type "tmp.csv" ^| find "" /V /C') do set /a numrows=%%n

set /a showRows=numrows-FirstLine-1


echo **** Info  FirstLine=%FirstLine% num=%numrows% ShowRows=%showRows%

set /a row=0
for /F "tokens=* skip=%FirstLine%" %%r in (tmp.csv) do (
  set /a row=row + 1
  if !row! LEQ !showRows! echo %%r
)


hope it helps
Jan Erik

NYTReader123
Posts: 9
Joined: 15 Sep 2008 17:20

#6 Post by NYTReader123 » 20 Jan 2009 04:09

Jan:

Thanks a ton for your comment!
I had actually figured out something along the lines of your first points over the weekend, but I never would have thought of the loop at the end of your code. Really helpful!!

I have been a bit buried at work, but will try your code in the next couple of days. I may have one more short question, but either way will report back.

Again, many thanks for your assistance.

NYTReader123
Posts: 9
Joined: 15 Sep 2008 17:20

code that works (and one more question)

#7 Post by NYTReader123 » 26 Jan 2009 11:36

Jan and board readers:

Following Jan's great suggestions, I got the code working. Here is what I used:

Code: Select all

@ECHO off   
SETLOCAL ENABLEEXTENSIONS
SETLOCAL ENABLEDELAYEDEXPANSION

REM firstrow will hold the row number just before the data begins
REM headers will hold list of variable names
   For /F "tokens=1,2,* delims=[]" %%A in ('type tmp.csv^|Find /N "City,Visits"') Do set /a firstrow=%%A
   For /F "tokens=*"  %%B in ('type tmp.csv^|Find "City,Visits"') Do set headers=%%B
REM numrows will hold the number of rows in tmp.csv
   set /a numrows=0
   FOR /f %%n in ('type tmp.csv^|find "" /v /c') do set /a numrows=%%n
set /a showRows=numrows-firstrow-1
REM check code
   echo Info: row before data=!firstrow!, num rows=!numrows!, num rows with data=!showRows!, headers=!headers!
REM save relevant data (with state name listed in front)
REM add var headers in first row
   IF NOT EXIST "MyFile.csv" (
      echo State,!headers!>>"MyFile.csv"
   )
set /a row=0
for /F "tokens=* skip=%firstrow%" %%r in (tmp.csv) do (
   set /a row=row + 1
   if !row! LEQ !showRows! echo %%S,%%r>>"=MyFile.csv"
)


Note that this code works even in the case where showrows is zero (e.g. there is no data to add).


I have one last question about an item which occurs in my code before the chunk above: I need to loop through some numbers and it is important that all leading zeros be included (in my case that there are always two digits). Is there a way to do it in one step? My attempt listed below fails: basically I do no know how to set a new variable (dd) equal to the counter (d) --> in my code "dd" is never set to any value.

Code: Select all

for /l %%d in (1,1,30) do (
   REM ensures two digits
   if %%d lss 10 (
      set dd=0%%d
   ) else (
      set dd=%%d
   )


Note that I tried setting an initial value for the new variable (set dd=0), but then this value was never changed. I also tried variations discussed on the board,

Code: Select all

   set /a j=%%d
   set dd=0!j!&set dd=!dd:~-2!

but this does not work either.

Does anyone have a suggestion? Again thanks for any help which can be offered.

jeb
Expert
Posts: 1041
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

#8 Post by jeb » 26 Jan 2009 16:39

Hi NTYeader,


it's nice that your code works finally.

Your minor problem is not a problem.

Code: Select all

setlocal ENABLEDELAYEDEXPANSION

for /l %%d in (1,1,30) do (
   REM ensures two digits
   if %%d lss 10 (
      set dd=0%%d
   ) else (
      set dd=%%d
   )
   echo dd=!dd!
)

Works fine.

Or you can solve it this way

Code: Select all

setlocal ENABLEDELAYEDEXPANSION
for /l %%d in (101,1,130) do (
    set temp=%%d
    set dd=!temp:~1!
    echo !dd!
)


Hope it helps
jeb

DosItHelp
Expert
Posts: 239
Joined: 18 Feb 2006 19:54

#9 Post by DosItHelp » 26 Jan 2009 23:21

Or that way:

Code: Select all

setlocal ENABLEDELAYEDEXPANSION
for /l %%d in (1,1,30) do (
    set dd=0%%d
    set dd=!dd:~-2!
    echo !dd!
)

:D

NYTReader123
Posts: 9
Joined: 15 Sep 2008 17:20

thanks again (and one more question: ugh)

#10 Post by NYTReader123 » 27 Jan 2009 10:00

Jeb:

Thanks again. I obviously should not be writing batch files at 4 in the morning. Your tweak of my code worked perfectly (again). I owe you a beer!!!

Unfortunately I have one one other issue which I had not noticed before. From the end of the code which you had suggested before, I cannot get the final FOR loop to work:

Code: Select all

set /a row=0
for /F "tokens=* skip=%firstrow%" %%r in (tmp.csv) do (
   set /a row=row + 1
   if !row! LEQ !showRows! echo %%S,%%r>>"=MyFile.csv"
)


I get an error,

Code: Select all

" was unexpected at this time

which I am pretty sure means it is not using the value for "firstrow". This is puzzling since in the earlier check,

Code: Select all

echo Info: row before data=!firstrow!

it outputs correct numbers.

I am pretty sure this has to do with the SETLOCAL ENABLEDELAYEDEXPANSION but I have not been able to figure out what I am doing wrong:
- I tried "!firstrow!" in the FOR loop and I get a message that this is unexpected at this time
- I tried creating a new variable (set var temp=!firstrow!) and same error as before

If someone can point out what is probably a very stupid mistake on my end I would be greatly in their debt. I feel badly coming back to the board so many times with questions, but my hope is that I will have something to contribute in the future.

PS Thanks also DosItHelp!

NYTReader123
Posts: 9
Joined: 15 Sep 2008 17:20

#11 Post by NYTReader123 » 04 Feb 2009 16:50

Hi again forum readers:

I have still not had any luck with my question regarding the FOR loop in my last post above. If anyone has a few seconds to take a look at it and can point out my mistake I would be greatly appreciative!

Post Reply