You may also use my FindRepl.bat program that works in a similar way of Dave's JREPL.BAT, although with a different approach. This is the solution using FindRepl:
Code: Select all
@echo off
rem Get the maximum number of additional fields, for headers line
for /F "skip=1 tokens=4" %%a in ('sort /R /+21 input.txt') do set "Max=%%a" & goto continue
:continue
set Max=%Max:-=%
set "regexp=^.*\t.*\t\D*(\d*)\D*\t\D*(\d*)\D*\t.*\t.*"
rem FOR ( init, test, inc )
rem body
set "init=I=1,J=1000000+$[1]"
set "test=I<=(Header?Header:$[2])"
set "inc=I++,J++"
set "body=Header?@\tField-@+(I+6):@\t@+J.toString().slice(-6)"
< input.txt FindRepl "%regexp%" /VAR:Header=%Max% /Q:@ ^
"$0.slice(0,-1)+FOR(arguments, '%init%', '%test%', '%inc%', '%body%')+'\r'.slice(Header=0)" /J
However, I woud like to give some explanations about this code.
I got the maximum number of additional fields in the header using SORT /R /+21. This works with the example data, but if this is not enough for the real data, just an additional FindRepl execution is needed.
The first part in this type of solutions is to obtain the regular expression (regexp) that will process the data. In this case the goal is to write a regexp that match the entire line, but that have the fields to process enclosed in parentheses (subexpressions) so we can extract them. This is done as explained below:
Code: Select all
123456\t00010\t---1---\t---5--\t00110\t456732
123451\t00399\t-121---\t---4--\t00201\t346556
\ / \/
$1 $2
{tab} is written with \t The desired regexp should do the following:
1- Start at beginning of the line: ^
2- Pass over several characters that ends in a tab (first field): .*\t
3- Pass again over a similar field (second field): .*\t
4- To get the number in the next field, get the digits enclosed by non-digits: \D*(\d*)\D*\t
this is the third field and first subexpression ($1)
5- The next field is similar (fourth field, second subexpression $2): \D*(\d*)\D*\t
6- And end with fields five and six: .*\t.*
This way, the required regular expression is this:
set "regexp=^.*\t.*\t\D*(\d*)\D*\t\D*(\d*)\D*\t.*\t.*"
You may test this regexp using FindRepl to just show matching lines, and adding /$ switch to show submatched subexpressions instead of complete lines. As a matter of fact, it is very easy to develop a new regexp testing it this way, and adjusting the regexp after each test.
Code: Select all
C:\ type input.txt
Field-1 Field-2 Field-3 Field-4 Field-5 Field-6
123456 00010 ---1--- ---5-- 00110 456732
123451 00399 -121--- ---4-- 00201 346556
C:\ set "regexp=^.*\t.*\t\D*(\d*)\D*\t\D*(\d*)\D*\t.*\t.*"
C:\ < input.txt FindRepl "%regexp%"
Field-1 Field-2 Field-3 Field-4 Field-5 Field-6
123456 00010 ---1--- ---5-- 00110 456732
123451 00399 -121--- ---4-- 00201 346556
C:\ < input.txt FindRepl "%regexp%" /$:0
""Field-1 Field-2 Field-3 Field-4 Field-5 Field-6
""123456 00010 ---1--- ---5-- 00110 456732
""123451 00399 -121--- ---4-- 00201 346556
C:\ < input.txt FindRepl "%regexp%" /$:1:2
"3" "4"
"1" "5"
"121" "4"
The $0 always match the whole regexp. Note that in this example, /$:0
ends in a CR (this is visible becase the two quotes together at beginning of the line). This means that when we output the data, we need to supress the last character from $0 "$0.slice(0,-1)" and add it at the end, after the inserted data: "+'\r'".
In order to generate the new data we use the FindRepl's FOR
predefined function that is a functional equivalent of the for statement. The function have this form:
When the function starts, it initialize its result as an empty string. In each iteration the value of "body" is joined to previous result, and the final result is returned when the function ends. This is the method used:
- We use "I" and "J" variables; "I" count the number of additional fields (from 1 to $2, i.e.: from 1 to 4) and "J" have the number of the first additional field plus 1000000 (i.e.: +$1 = 1000121). This allows to get its last 6 digits directly via "J.toString().slice(-6)".
- In the Header data line, the number of additional fields comes from the Max variable obtained before: "/VAR:Header=%Max%". In the following lines, the number is given by second subexpression:
Code: Select all
set "test=I<=(Header?Header:$[2])"
This also requires to clear "Header=0" after the first line was processed.
- The increment step just increments both "I" and "J":
- And the body part is similar: in Header line insert a tab, followed by "Field-", followed by "I"+6. In the rest of lines, insert a tab and the number with 6 digits:
Code: Select all
set "body=Header?@\tField-@+(I+6):@\t@+J.toString().slice(-6)"
Note that you can not include a quote in any part of the JScript expression, so we use /Q:@ switch in order to use an at-sign instead.
To test previous solution, copy FindRepl.bat program from
this site and include the definition of FOR predefined function in "// Predefined functions" section:
Code: Select all
function FOR($,init,test,inc,body){var For=""; for ( eval(init); eval(test); eval(inc) ) For+=eval(body); return(For);};
I like this!
**EDIT**: Oops! I just realized that I used the unreleased version 2.1 of FindRepl when I wrote this solution. The required change is minimal: the line that start FindRepl should be this one:
Code: Select all
< input.txt FindRepl "%regexp%" /ARG1:%Max% /Q:@ ^
... instead of this:
Code: Select all
< input.txt FindRepl "%regexp%" /VAR:Header=%Max% /Q:@ ^
... and three references to "Header" variable must be replaced by "ARG1". I want not complete these changes in this post because the explanation is very clear in the current form, so please do these changes yourself before you try this solution...
Antonio