DosTips.com

Posted: **24 Nov 2014 09:22**

Hi,

I use to do this in Ms Excel now i want to try on batch scripting for its robustness.

Increment the numbers in the rows starting at field 7 using the parameters in field 3(start) and 4(end), and to format each of the cells with leading zeros to have a 6 decimal number, and save as tab delimited text file.

Example:
Field-1{tab}Field-2{tab}Field-3{tab}Field-4{tab}Field-5{tab}Field-6
123456{tab}00010{tab}---1---{tab}---5--{tab}00110{tab}456732
123451{tab}00399{tab}-121---{tab}---4--{tab}00201{tab}346556

Result:
Field-1{tab}Field-2{tab}Field-3{tab}Field-4{tab}Field-5{tab}Field-6{tab}Field-7{tab}Field--8{tab}Field-9{tab}Field10{tab}Field11
123456{tab}00010{tab}---1---{tab}--5---{tab}00110{tab}456732{tab}000001{tab}000002{tab}000003{tab}000004{tab}000005
123451{tab}00399{tab}-121---{tab}--4---{tab}00201{tab}346556{tab}000121{tab}000122{tab}000123{tab}000124

Is the results possible in batch or its better to continue with excel, import file and increment and save.
please suggest.

Posted: **24 Nov 2014 11:35**

Do the numbers really have hyphens around them or were you using them to align your text?

The hard part will be recreating the Header row and how many additional tabs to append depending on what is the largest incremental number. You would basically have to parse all the data twice. Once to get the largest incremental number. This will then allow us to append the correct amount of field labels onto row 1 and it will allow us to determine the number of extra tabs you would append onto each row that has less than the largest incremental number.

Posted: **24 Nov 2014 11:38**

I'm sure this could be done with pure batch, but I don't like working with tab characters in batch scripts.

I decided to see how well the new features of JREPL.BAT could tackle this problem... Turns out, Quite well

The problem would be much easier if you didn't have a header row. The number of columns in the header depends on the max value found in column 4. So the solution requires the file be read at least twice.

A pure batch solution would probably be faster on a small file. But this JREPL solution should be MUCH faster once the input file becomes moderately large.

Code: Select all

@echo off
setlocal
set "inFile=test.txt"
set "outFile=out.txt"
for /f %%N in (
  'jrepl "^(?:.*?\t){3}-*(\d+)-*\t"^
    "if ($1>max) max=Number($1); false"^
    /jmatch /jbeg "max=0;" /jend "output.WriteLine(max);" /f "%inFile%"'
) do call jrepl "^(?:.*?\t){2}-*(\d+)-*\t-*(\d+)-*\t.*|^.*"^
            "for (var n=0; n<(ln==1?%%N:Number($2)); n++) $0+='\t'+(ln==1?'Field-'+(7+n):lpad(n+Number($1),'000000')); $0"^
            /j /f "%inFile%" /o "%outFile%"
type "%outFile%"

Dave Benham

Posted: **24 Nov 2014 11:39**

zagix wrote:now i want to try on batch scripting for its robustness.

Can't say I have ever seen someone use the words BATCH and ROBUSTNESS in the same sentence. Ever!

Posted: **24 Nov 2014 11:41**

Squashman wrote:
zagix wrote:now i want to try on batch scripting for its robustness.

Can't say I have ever seen someone use the words BATCH and ROBUSTNESS in the same sentence. Ever!

I meant to say the same thing in my answer, but forgot to put it in

Dave Benham

Posted: **24 Nov 2014 12:47**

Hi Dave Benham,

The out file does not give any results.
On your suggestion if i remove the header and convert the tab to pipe delimiters will that work.
Test file sample.

2125305|21|407871|10|1706|28374
2125306|344|407881|10|1706|15423
2125307|556|407891|40|1706|28707
2125308|676|407901|25|351|28707
2125309|565|407911|20|1714|28910
2125310|654|407921|1|1576|26139
2125311|54|407931|5|1477|21255
2125312|343|407941|100|1576|25818
2125313|432|407951|10|1723|23997
2125314|323|407961|10|1623|28543
2125315|567|407971|10|1724|15563
2125316|899|407981|10|1293|23743
2125317|987|407991|20|814|26011
2125318|876|408001|50|1567|17806
2125319|776|408011|25|1724|26151
2125320|765|408021|10|1217|27523
2125321|766|408031|10|1527|29083
2125322|654|408041|10|1670|29019
2125323|643|408051|10|368|11721
2125324|523|408061|10|1149|13749

Posted: **24 Nov 2014 12:59**

zagix wrote:The out file does not give any results.

Works for me.

Code: Select all

C:\BatchFiles\Increment>increment.bat
Field-1 Field-2 Field-3 Field-4 Field-5 Field-6 Field-7 Field-8 Field-9 Field-10        Field-11
123456  00010   ---1--- ---5--  00110   456732  000001  000002  000003  000004  000005
123451  00399   -121--- ---4--  00201   346556  000121  000122  000123  000124

C:\BatchFiles\Increment>

zagix wrote:On your suggestion if i remove the header and convert the tab to pipe delimiters will that work.

Shouldn't matter for Dave's JREPL script.

Posted: **24 Nov 2014 21:44**

You may also use my FindRepl.bat program that works in a similar way of Dave's JREPL.BAT, although with a different approach. This is the solution using FindRepl:

Code: Select all

@echo off

rem Get the maximum number of additional fields, for headers line
for /F "skip=1 tokens=4" %%a in ('sort /R /+21 input.txt') do set "Max=%%a" & goto continue
:continue
set Max=%Max:-=%


set "regexp=^.*\t.*\t\D*(\d*)\D*\t\D*(\d*)\D*\t.*\t.*"


rem  FOR ( init, test, inc )
rem     body

set        "init=I=1,J=1000000+$[1]"
set             "test=I<=(Header?Header:$[2])"
set                    "inc=I++,J++"

set     "body=Header?@\tField-@+(I+6):@\t@+J.toString().slice(-6)"

< input.txt FindRepl "%regexp%"  /VAR:Header=%Max%  /Q:@  ^
                     "$0.slice(0,-1)+FOR(arguments, '%init%', '%test%', '%inc%', '%body%')+'\r'.slice(Header=0)" /J

However, I woud like to give some explanations about this code.

I got the maximum number of additional fields in the header using SORT /R /+21. This works with the example data, but if this is not enough for the real data, just an additional FindRepl execution is needed.

The first part in this type of solutions is to obtain the regular expression (regexp) that will process the data. In this case the goal is to write a regexp that match the entire line, but that have the fields to process enclosed in parentheses (subexpressions) so we can extract them. This is done as explained below:

Code: Select all

123456\t00010\t---1---\t---5--\t00110\t456732
123451\t00399\t-121---\t---4--\t00201\t346556
                \ /       \/
                $1        $2
{tab} is written with \t   The desired regexp should do the following:

1- Start at beginning of the line: ^
2- Pass over several characters that ends in a tab (first field): .*\t
3- Pass again over a similar field (second field): .*\t
4- To get the number in the next field, get the digits enclosed by non-digits: \D*(\d*)\D*\t
   this is the third field and first subexpression ($1)
5- The next field is similar (fourth field, second subexpression $2): \D*(\d*)\D*\t
6- And end with fields five and six: .*\t.*

This way, the required regular expression is this:

set "regexp=^.*\t.*\t\D*(\d*)\D*\t\D*(\d*)\D*\t.*\t.*"

You may test this regexp using FindRepl to just show matching lines, and adding /$ switch to show submatched subexpressions instead of complete lines. As a matter of fact, it is very easy to develop a new regexp testing it this way, and adjusting the regexp after each test.

Code: Select all

C:\ type input.txt
Field-1 Field-2 Field-3 Field-4 Field-5 Field-6
123456  00010   ---1--- ---5--  00110   456732
123451  00399   -121--- ---4--  00201   346556

C:\ set "regexp=^.*\t.*\t\D*(\d*)\D*\t\D*(\d*)\D*\t.*\t.*"

C:\ < input.txt FindRepl "%regexp%"
Field-1 Field-2 Field-3 Field-4 Field-5 Field-6
123456  00010   ---1--- ---5--  00110   456732
123451  00399   -121--- ---4--  00201   346556

C:\ < input.txt FindRepl "%regexp%" /$:0
""Field-1       Field-2 Field-3 Field-4 Field-5 Field-6
""123456        00010   ---1--- ---5--  00110   456732
""123451        00399   -121--- ---4--  00201   346556

C:\ < input.txt FindRepl "%regexp%" /$:1:2
 "3" "4"
 "1" "5"
 "121" "4"

The $0 always match the whole regexp. Note that in this example, /$:0 ends in a CR (this is visible becase the two quotes together at beginning of the line). This means that when we output the data, we need to supress the last character from $0 "$0.slice(0,-1)" and add it at the end, after the inserted data: "+'\r'".

In order to generate the new data we use the FindRepl's FOR predefined function that is a functional equivalent of the for statement. The function have this form:

Code: Select all

FOR ( init, test, inc, body )

When the function starts, it initialize its result as an empty string. In each iteration the value of "body" is joined to previous result, and the final result is returned when the function ends. This is the method used:

We use "I" and "J" variables; "I" count the number of additional fields (from 1 to $2, i.e.: from 1 to 4) and "J" have the number of the first additional field plus 1000000 (i.e.: +$1 = 1000121). This allows to get its last 6 digits directly via "J.toString().slice(-6)".
Code: Select all
```
set        "init=I=1,J=1000000+$[1]"
```
In the Header data line, the number of additional fields comes from the Max variable obtained before: "/VAR:Header=%Max%". In the following lines, the number is given by second subexpression:
Code: Select all
```
set             "test=I<=(Header?Header:$[2])"
```
This also requires to clear "Header=0" after the first line was processed.
The increment step just increments both "I" and "J":
Code: Select all
```
set                    "inc=I++,J++"
```
And the body part is similar: in Header line insert a tab, followed by "Field-", followed by "I"+6. In the rest of lines, insert a tab and the number with 6 digits:
Code: Select all
```
set     "body=Header?@\tField-@+(I+6):@\t@+J.toString().slice(-6)"
```
Note that you can not include a quote in any part of the JScript expression, so we use /Q:@ switch in order to use an at-sign instead.

To test previous solution, copy FindRepl.bat program from this site and include the definition of FOR predefined function in "// Predefined functions" section:

Code: Select all

function FOR($,init,test,inc,body){var For=""; for ( eval(init); eval(test); eval(inc) ) For+=eval(body); return(For);};

I like this!

**EDIT**: Oops! I just realized that I used the unreleased version 2.1 of FindRepl when I wrote this solution. The required change is minimal: the line that start FindRepl should be this one:

Code: Select all

< input.txt FindRepl "%regexp%"  /ARG1:%Max%  /Q:@  ^

... instead of this:

Code: Select all

< input.txt FindRepl "%regexp%"  /VAR:Header=%Max%  /Q:@  ^

... and three references to "Header" variable must be replaced by "ARG1". I want not complete these changes in this post because the explanation is very clear in the current form, so please do these changes yourself before you try this solution...

Antonio

Posted: **25 Nov 2014 11:18**

Antonio, Thankyou for explantion, grateful to you.

Thankyou Dave Benham, it was my mistake and your script is 100% and its outputs the results.

Help required Dave,

I want to export/append the data to csv files, the first field will decide the CSV file name, if its 123456 then it will append the data to 010000011.csv, if its 123451 then it will append the data to 010000010.csv with field 2,5,6 & onwards, droping field 1,3 & 4 from the text file. Convert the spaces/tab to Pipe-Delimiter, and if CSV file does not exist create new with headers and if exist append the data.

Thanking you in advance.

Posted: **25 Nov 2014 11:39**

zagix wrote:[b]I want to export/append the data to csv files, the first field will decide the CSV file name, if its 123456 then it will append the data to 010000011.csv, if its 123451 then it will append the data to 010000010.csv with field 2,5,6 & onwards, droping field 1,3 & 4 from the text file. Convert the spaces/tab to Pipe-Delimiter, and if CSV file does not exist create new with headers and if exist append the data.

Thanking you in advance.

How does this relate to your other thread?
viewtopic.php?f=3&t=6057

Are you saying that the script will have to hard code the relationship between the first field and the output file name. Otherwise I don't see any other way to magically know if there will be other variables in the first field and what data file to append the data to.

DosTips.com

Increment the numbers

Increment the numbers

Re: Increment the numbers

Re: Increment the numbers

Re: Increment the numbers

Re: Increment the numbers

Re: Increment the numbers

Re: Increment the numbers

Re: Increment the numbers

Re: Increment the numbers

Re: Increment the numbers