Awk - A nifty little tool for text manipulation and more. - Page 2 - DosTips.com

Awk - A nifty little tool for text manipulation and more.

Moderator: DosItHelp

31 posts

Message

Author

berserker: Posts: 95; Joined: 18 Dec 2013 00:51

Pattern Matching and Substitution

Quote

#16 Post by berserker » 06 Jan 2014 09:12

Awk has in built pattern matching and functions for string substitutions. Here I show some basic examples of simple matching and substitution. Regular expressions is a vast topic so if for in depth regex , please consult a regex book. My favorite is Mastering Regular Expression from Oreilly.

Pattern matching
In awk, simple matching goes like this using the ~ operator. (all examples use myFile.txt)

Code: Select all

C:\> type myFile.txt
yearID,lgID,teamID,Half,divID,DivWin,Rank,G,W,L
1981,NL,ATL,1,W,N,4,54,25,29
1981,NL,ATL,2,W,N,5,52,25,27
1981,AL,BAL,1,E,N,2,54,31,23
1981,AL,BAL,2,E,N,4,51,28,23
1981,AL,BOS,1,E,N,5,56,30,26
1981,AL,BOS,2,E,N,2,52,29,23
1981,AL,CAL,1,W,N,4,60,31,29
1981,AL,CAL,2,W,N,6,50,20,30
1981,AL,CHA,1,W,N,3,53,31,22
1981,AL,CHA,2,W,N,6,53,23,30
1981,NL,CHN,1,E,N,6,52,15,37
1981,NL,CHN,2,E,N,5,51,23,28
1981,NL,CIN,1,W,N,2,56,35,21
1981,NL,CIN,2,W,N,2,52,31,21
1981,AL,CLE,1,E,N,6,50,26,24
1981,AL,CLE,2,E,N,5,53,26,27
1981,AL,DET,1,E,N,4,57,31,26
1981,AL,DET,2,E,N,2,52,29,23
1981,NL,HOU,1,W,N,3,57,28,29
1981,NL,HOU,2,W,N,1,53,33,20
1981,AL,KCA,1,W,N,5,50,20,30
1981,AL,KCA,2,W,N,1,53,30,23
1981,NL,LAN,1,W,N,1,57,36,21
1981,NL,LAN,2,W,N,4,53,27,26
1981,AL,MIN,1,W,N,7,56,17,39
1981,AL,MIN,2,W,N,4,53,24,29

C:\>awk "/divID/" myFile.txt
yearID,lgID,teamID,Half,divID,DivWin,Rank,G,W,L

The above says to find any lines that has the string "divID" . For pattern matching, the regex pattern to find is usually enclosed in / /.

If you want case-insensitive search , use the IGNORECASE variable

Code: Select all

C:\>awk "BEGIN{IGNORECASE=1}/divid/" myFile.txt
yearID,lgID,teamID,Half,divID,DivWin,Rank,G,W,L

Setting IGNORECASE to 0 toggles it back to case-sensitive.

If you want to find all records with 2nd column starting with "A", then

Code: Select all

C:\>awk -F"," "$2 ~ /^A/ {print}" myFile.txt
1981,AL,BAL,1,E,N,2,54,31,23
1981,AL,BAL,2,E,N,4,51,28,23
1981,AL,BOS,1,E,N,5,56,30,26
1981,AL,BOS,2,E,N,2,52,29,23
1981,AL,CAL,1,W,N,4,60,31,29
1981,AL,CAL,2,W,N,6,50,20,30
1981,AL,CHA,1,W,N,3,53,31,22
1981,AL,CHA,2,W,N,6,53,23,30
1981,AL,CLE,1,E,N,6,50,26,24
1981,AL,CLE,2,E,N,5,53,26,27
1981,AL,DET,1,E,N,4,57,31,26
1981,AL,DET,2,E,N,2,52,29,23
1981,AL,KCA,1,W,N,5,50,20,30
1981,AL,KCA,2,W,N,1,53,30,23
1981,AL,MIN,1,W,N,7,56,17,39
1981,AL,MIN,2,W,N,4,53,24,29

First, give the -F"," option because the file is "," delimited. Then use $2 because its the 2nd column. Then using the regex /^A/. "^" means "starts with". After that "{print}" action will print the relevant records.

In awk, you can negate matches using !~ operator. For example , you want to find
records that doesn't have "DET" as the 3rd field

Code: Select all

C:\>awk -F"," "$3 !~ /DET/{print}" myFile.txt
yearID,lgID,teamID,Half,divID,DivWin,Rank,G,W,L
1981,NL,ATL,1,W,N,4,54,25,29
1981,NL,ATL,2,W,N,5,52,25,27
1981,AL,BAL,1,E,N,2,54,31,23
1981,AL,BAL,2,E,N,4,51,28,23
1981,AL,BOS,1,E,N,5,56,30,26
1981,AL,BOS,2,E,N,2,52,29,23
1981,AL,CAL,1,W,N,4,60,31,29
1981,AL,CAL,2,W,N,6,50,20,30
1981,AL,CHA,1,W,N,3,53,31,22
1981,AL,CHA,2,W,N,6,53,23,30
1981,NL,CHN,1,E,N,6,52,15,37
1981,NL,CHN,2,E,N,5,51,23,28
1981,NL,CIN,1,W,N,2,56,35,21
1981,NL,CIN,2,W,N,2,52,31,21
1981,AL,CLE,1,E,N,6,50,26,24
1981,AL,CLE,2,E,N,5,53,26,27
1981,NL,HOU,1,W,N,3,57,28,29
1981,NL,HOU,2,W,N,1,53,33,20
1981,AL,KCA,1,W,N,5,50,20,30
1981,AL,KCA,2,W,N,1,53,30,23
1981,NL,LAN,1,W,N,1,57,36,21
1981,NL,LAN,2,W,N,4,53,27,26
1981,AL,MIN,1,W,N,7,56,17,39
1981,AL,MIN,2,W,N,4,53,24,29

If you just want to find records that doesn't have the string "DET", just do a !/DET/ using the "!" operator

Code: Select all

C:\>awk -F"," "!/DET/" myFile.txt
yearID,lgID,teamID,Half,divID,DivWin,Rank,G,W,L
1981,NL,ATL,1,W,N,4,54,25,29
1981,NL,ATL,2,W,N,5,52,25,27
1981,AL,BAL,1,E,N,2,54,31,23
1981,AL,BAL,2,E,N,4,51,28,23
1981,AL,BOS,1,E,N,5,56,30,26
1981,AL,BOS,2,E,N,2,52,29,23
1981,AL,CAL,1,W,N,4,60,31,29
1981,AL,CAL,2,W,N,6,50,20,30
1981,AL,CHA,1,W,N,3,53,31,22
1981,AL,CHA,2,W,N,6,53,23,30
1981,NL,CHN,1,E,N,6,52,15,37
1981,NL,CHN,2,E,N,5,51,23,28
1981,NL,CIN,1,W,N,2,56,35,21
1981,NL,CIN,2,W,N,2,52,31,21
1981,AL,CLE,1,E,N,6,50,26,24
1981,AL,CLE,2,E,N,5,53,26,27
1981,NL,HOU,1,W,N,3,57,28,29
1981,NL,HOU,2,W,N,1,53,33,20
1981,AL,KCA,1,W,N,5,50,20,30
1981,AL,KCA,2,W,N,1,53,30,23
1981,NL,LAN,1,W,N,1,57,36,21
1981,NL,LAN,2,W,N,4,53,27,26
1981,AL,MIN,1,W,N,7,56,17,39
1981,AL,MIN,2,W,N,4,53,24,29

These are very simple examples on using regex operator ~, !~ for searching strings.

String replacement
Awk provides the sub() and gsub() functions to replace strings in files
The syntax for sub() is
sub(regexp, replacement [, target])

for example, replace "LAN" with "NAL"

Code: Select all

C:\>awk "{sub(\"LAN\",\"NAL\", $0); print }" myFile.txt
yearID,lgID,teamID,Half,divID,DivWin,Rank,G,W,L
1981,NL,ATL,1,W,N,4,54,25,29
1981,NL,ATL,2,W,N,5,52,25,27
1981,AL,BAL,1,E,N,2,54,31,23
1981,AL,BAL,2,E,N,4,51,28,23
1981,AL,BOS,1,E,N,5,56,30,26
1981,AL,BOS,2,E,N,2,52,29,23
1981,AL,CAL,1,W,N,4,60,31,29
1981,AL,CAL,2,W,N,6,50,20,30
1981,AL,CHA,1,W,N,3,53,31,22
1981,AL,CHA,2,W,N,6,53,23,30
1981,NL,CHN,1,E,N,6,52,15,37
1981,NL,CHN,2,E,N,5,51,23,28
1981,NL,CIN,1,W,N,2,56,35,21
1981,NL,CIN,2,W,N,2,52,31,21
1981,AL,CLE,1,E,N,6,50,26,24
1981,AL,CLE,2,E,N,5,53,26,27
1981,AL,DET,1,E,N,4,57,31,26
1981,AL,DET,2,E,N,2,52,29,23
1981,NL,HOU,1,W,N,3,57,28,29
1981,NL,HOU,2,W,N,1,53,33,20
1981,AL,KCA,1,W,N,5,50,20,30
1981,AL,KCA,2,W,N,1,53,30,23
1981,NL,[color=#800000]NAL[/color],1,W,N,1,57,36,21
1981,NL,[color=#800000]NAL[/color],2,W,N,4,53,27,26
1981,AL,MIN,1,W,N,7,56,17,39
1981,AL,MIN,2,W,N,4,53,24,29

sub() only replaces one occurence of the string. For global replacement, use gsub() which has the same syntax as sub().

To replace the "BAL" string from the 4th line only, use NR==4 as the "pattern". then use sub().

Code: Select all


C:\>awk "NR==4 { sub(\"BAL\",\"LAB\") } {print}" myFile.txt
yearID,lgID,teamID,Half,divID,DivWin,Rank,G,W,L
1981,NL,ATL,1,W,N,4,54,25,29
1981,NL,ATL,2,W,N,5,52,25,27
1981,AL,[color=#800000]LAB[/color],1,E,N,2,54,31,23
1981,AL,BAL,2,E,N,4,51,28,23
1981,AL,BOS,1,E,N,5,56,30,26
.....

to be continued

- berserker

Last edited by berserker on 06 Jan 2014 20:13, edited 1 time in total.

berserker: Posts: 95; Joined: 18 Dec 2013 00:51

Writing an awk script - Averaging example.

Quote

#17 Post by berserker » 06 Jan 2014 09:45

Awk commands are not just for one liners as we have seen so far. You can put awk commands in a script (aka, text file) and have awk run them for you. Its the same as writing a vbscript and having cscript engine runs the command for you.

The syntax for running awk scripting is simply (-f option)

Code: Select all

c:\> awk -f myawkscript.awk input_file.csv

Lets round up this part of the primer by an example. Say you have last 20 days worth of Google financial data in a csv comma delimited file. you want to find the average of the closing price (column 5) and find out how many of the days (records) have their closing price greater than the average.

Code: Select all

Date,Open,High,Low,Close,Volume,Adj Close
2014-01-03,1115.00,1116.93,1104.93,1105.00,1666700,1105.00
2014-01-02,1115.46,1117.75,1108.26,1113.12,1821400,1113.12
2013-12-31,1112.24,1121.00,1106.26,1120.71,1357900,1120.71
2013-12-30,1120.34,1120.50,1109.02,1109.46,1236100,1109.46
2013-12-27,1120.00,1120.28,1112.94,1118.40,1569700,1118.40
2013-12-26,1114.01,1119.00,1108.69,1117.46,1337800,1117.46
2013-12-24,1114.97,1115.24,1108.10,1111.84,734200,1111.84
2013-12-23,1107.84,1115.80,1105.12,1115.10,1721600,1115.10
2013-12-20,1088.30,1101.17,1088.00,1100.62,3261600,1100.62
2013-12-19,1080.77,1091.99,1079.08,1086.22,1665700,1086.22
2013-12-18,1071.85,1084.95,1059.04,1084.75,2210300,1084.75
2013-12-17,1072.82,1080.76,1068.38,1069.86,1535700,1069.86
2013-12-16,1064.00,1074.69,1062.01,1072.98,1602000,1072.98
2013-12-13,1075.40,1076.29,1057.89,1060.79,2162400,1060.79
2013-12-12,1079.57,1082.94,1069.00,1069.96,1595900,1069.96
2013-12-11,1087.40,1091.32,1075.17,1077.29,1695800,1077.29
2013-12-10,1076.15,1092.31,1075.65,1084.66,1853900,1084.66
2013-12-09,1070.99,1082.31,1068.02,1078.14,1482600,1078.14
2013-12-06,1069.79,1070.00,1060.08,1069.87,1428800,1069.87
2013-12-05,1057.20,1059.66,1051.09,1057.34,1133700,1057.34

For this, its too "complicated" to be a one liner so we put commands inside a file. You can use any text editor to create your script.
The basic layout of the script goes like this:

Code: Select all

BEGIN{ 
 # here you can initialize variables
} 

{
  # here you Do processing For every record
}

End {
   # here you can Do End processing like printing final result
}

Here's a snapshot of a the script

Code: Select all

BEGIN{ 
 # here you can initialize variables
 FS = ","      # Set the field delimiter To comma
 sum = 0      # Set a variable called sum To store the total of column 5
} 

NR>1{
  # use NR > 1 To exclude the header row
  # here you Do processing For every record  
  sum += $5 # awk convert implictly Each column 5 values To integer
}


END {
   # here you can Do End processing like printing final result
   print "The total sum is " sum
   print "The average is " sum/NR
}

NR is the total number of records, so to average column 5 which is the closing price, just divide the sum by NR at the END block.

Running the script gives

Code: Select all

C:\>awk -f average.awk google.csv
The total sum is 21823.6
The average is 1039.22

Next we find how many days are there in the file that has closing price greater than average. This is the code

Code: Select all

BEGIN{ 
 # here you can initialize variables
 FS = ","      # Set the field delimiter To comma
 sum = 0      # Set a variable called sum To store the total of column 5
} 

NR>1{
  # use NR > 1 To exclude the header row
  # here you Do processing For every record  
  sum += $5     # awk convert implictly Each column 5 values To integer
  days[$1] = $5 # store the closing price into Array, With the first column as index
}


END {
   # here you can Do End processing like printing final result
   average = sum/NR
   print "The total sum is " sum
   print "The average is " average
   print "Days greater than average"
   
   for( d in days ) {
         if ( days[d] > average ) {
            print d, days[d]
         }
   }
}

running the script gives

Code: Select all


C:\>awk -f  average.awk google.csv
The total sum is 21823.6
The average is 1039.22
Days greater than average
2013-12-10 1084.66
2013-12-11 1077.29
2013-12-20 1100.62
2013-12-12 1069.96
2013-12-30 1109.46
2013-12-13 1060.79
2013-12-31 1120.71
2013-12-05 1057.34
2013-12-23 1115.10
2014-01-02 1113.12
2013-12-06 1069.87
2013-12-24 1111.84
2014-01-03 1105.00
2013-12-16 1072.98
2013-12-17 1069.86
2013-12-26 1117.46
2013-12-09 1078.14
2013-12-18 1084.75
2013-12-27 1118.40
2013-12-19 1086.22

Very simple example to end this part of the primer. hope you understand how to use simple awk in your batch.

- berserker

Last edited by berserker on 07 Jan 2014 01:04, edited 2 times in total.

berserker: Posts: 95; Joined: 18 Dec 2013 00:51

Writing an awk script - Parsing systeminfo example

Quote

#18 Post by berserker » 06 Jan 2014 18:49

Let's say you want to get some information from systeminfo command. eg you want to get the data from these items:
OS Name
System type
System Up Time
Original Install Date"
Total Physical Memory
Available Physical Memory
BIOS Version
OS Version

Here is the code, save as parse_systeminfo.awk

Code: Select all

BEGIN{ 
 # here you can initialize variables
 FS = ":[ ]+"      # Set the field delimiter To : and one or more spaces
 
 # initialize lookup table
 array["OS Name"]=""
 array["System type"] = ""
 array["System Up Time"] = ""
 array["Original Install Date"] = ""
 array["Total Physical Memory"] = ""
 array["Available Physical Memory"] = ""
 array["BIOS Version"] = ""
 array["OS Version"] = "" 
} 

{
    # update table
    if ( $1 in array  ){
        array[$1] = $2
    }
}


END {
    for( item in array ){
        # beautify output by adjusting width using printf
        printf("%-30s  ===> %-30s\n" , item,  array[item])
    }
}

Another way to do it is just to use a regex inside the body, eg

Code: Select all

/OS Name|Bios Version|....../ {
    array[$1] = $2
}

Results:

Code: Select all

C:\>systeminfo | awk -f parse_systeminfo.awk
System Up Time                  ===> 0 Days, 8 Hours, 25 Minutes, 58 Seconds
OS Version                      ===> 5.1.2600 Service Pack 3 Build 2600
System type                     ===> X86-based PC
Available Physical Memory       ===> 244 MB
Total Physical Memory           ===> 575 MB
BIOS Version                    ===> VBOX   - 1
OS Name                         ===> Microsoft Windows XP Professional
Original Install Date           ===> 2013/12/09, 12:04:49 AM

berserker: Posts: 95; Joined: 18 Dec 2013 00:51

Re: Awk - A nifty little tool for text manipulation and more

Quote

#19 Post by berserker » 06 Jan 2014 20:20

foxidrive wrote:
If you want to find all records with 2nd column starting with "A", then

That should read containing an "A"

nice spot. amended. Actually its the ^ that I left out.

Next we find how many days are there in the file that has closing price greater than average. This is the code

This uses the variable average but it's not set anywhere - right?

yes, i left out average=sum/NR. fixed.

foxidrive wrote:Nice work with the primer.

thks for proofreading.

berserker: Posts: 95; Joined: 18 Dec 2013 00:51

Awk User Defined Functions

Quote

#20 Post by berserker » 07 Jan 2014 01:03

For this next part of the primer I am going to introduce user defined functions in awk. Awk in fact is a little "programming language" as you can already see what features it has so far. As such, you can create user defined functions inside an awk script. The purpose of functions is to provide a means for running repetitive tasks in the program. The syntax of awk functions is similar to other languages.

Code: Select all

     function name( argument1, argument2 ... )
     {
          body-of-function
          return [expression]
     }

you can put all the functions declarations before the BEGIN block, eg say you want to create a function that prints horizontal lines at various part of your code

Code: Select all

function horizontal_line(){
    # function prints 100 "dashs"
    for(i=0;i<100;i++){
        printf "-"
    }
    print       # add final new line
}

BEGIN{ 
    print "Initializing..."
    horizontal_line()
    print "After horizontal_line function is called ..."
}

output results:

Code: Select all

C:\>awk -f myScript.awk 
Initializing...
----------------------------------------------------------------------------------------------------
After horizontal_line function is called ...

This is a simple example of a function with no arguments.

In awk, if you pass an array as the function argument, then the array is said to be "passed as reference". Otherwise, the argument is said to be "passed by value". For example, a string is passed by value.

Code: Select all

animal = "monkey"
z = zoo( animal )

function zoo( string ){
    print string
    string = "snake"
    print string
}

the function zoo does not change the value of "animal" in the main code. This is called "passed by value"

For arrays, its passed by reference, as in this example

Code: Select all

function zoo(b){
    b[1] = "hippo"   # here we change the item to hippo
}

BEGIN{ 
    # main code
    a[1] = "test"     # define an item in array
    print "a[1] before function is: " a[1]
    zoo(a)             # call zoo function
    print "a[1] after function is: " a[1]
}

result:

Code: Select all


C:\>awk -f myScript.awk
a[1] before function is: test
a[1] after function is: hippo

we can see that the array item is changed in the main code after calling the function zoo.

Values can be passed back to the calling program by using the return keyword.

Code: Select all

function calculate(){
   .. calculation code here...
   result = ....
   return result
}

This is a simple introduction to user defined functions in awk

to be continued...

-berserker

berserker: Posts: 95; Joined: 18 Dec 2013 00:51

Getting User Input and File Reading

Quote

#21 Post by berserker » 07 Jan 2014 02:56

In awk, you can get user input using the getline function eg

Code: Select all

BEGIN{ 
    print "Enter something" 
    getline entered
    print "You entered " entered
}

result

Code: Select all

C:\>awk -f test.awk
Enter something
test
You entered test

here, you the variable "entered" will contain the value of what the user has entered.

There is another common usage of getline function. Reading a file. Here's an example of how to read a file inside an awk script

Code: Select all

BEGIN{     
    while ( ( getline line < "myFile.txt" ) > 0 ){
        print "Read: " line
    }
}

result

Code: Select all

C:\>type myFile.txt
dostips.com
is
the
best

C:\>awk -f myScript.awk
Read: dostips.com
Read: is
Read: the
Read: best

Let's dissect the while loop , first use getline to read in the file

Code: Select all

 ( getline line < "myFile.txt" )

Every line that is successfully read in has a value more than 0.

Code: Select all

( getline line < "myFile.txt" ) > 0

You can then use a while loop to iterate the file,

Code: Select all

    while ( ( getline line < "myFile.txt" ) > 0 ){
        # do something with line
    }

each time checking the value if its greater than 0. Otherwise, getline will finish processing when reached end of file, and the while loop will end.

Lastly, another common way to use getline is using a pipe. Let's say you want to display the output of the "dir" DOS command inside awk. Here's how to do it. Its still using a while loop coupled with the getline function

Code: Select all

BEGIN{ 
    
    while ( ("dir" | getline line ) > 0 ){
        print "Read: " line
    }
    close("dir")    # close the pipe properly for next use in the program
}

result

Code: Select all

C:\>awk -f myScript.awk
Read:  Volume in drive C has no label.
Read:  Volume Serial Number is DCEB-67C9
Read:
Read:  Directory of C:\
Read:
....
... [ too long ] ...

That's how you can call an external DOS command and have it displayed inside awk program itself.

getline returns 1 if it finds a record, and 0 if the end of the file is encountered. If there is some error in getting a record, such as a file that cannot be opened, then getline returns -1. It is generally good practice to always explicitly test for >0 while reading a file or handling input from pipes.

to be continued

-berserker

Last edited by berserker on 08 Jan 2014 09:45, edited 1 time in total.

berserker: Posts: 95; Joined: 18 Dec 2013 00:51

Date and Time

Quote

#22 Post by berserker » 07 Jan 2014 18:52

Dealing with date and time is more or less a common task when batch scripting. Awk provides simple date and time function for basic time/date manipulation needs.
1) systime()
2) strftime()
3) mktime()

1) systime().
This is the the number of seconds since the system epoch. systime is commonly used to create a random number seed.

Code: Select all

C:\>awk "BEGIN{ print systime(); } "
1389169226

2) strftime().
This is a function to format a timestamp based on the contents of the format string. This is useful if you want to create a time stamp on windows.eg To get the full 4-digits year, use the "%Y" format

Code: Select all

C:\>awk "BEGIN{ print strftime(\"%Y\") } "
2014

To get YYYY-MM-DD-HH-mm-ss timestamp

Code: Select all

C:\>awk "BEGIN{ print strftime(\"%Y-%m-%d-%H-%M-%S\") } "
2014-01-08-16-24-23

you can then capture the results in the usual DOS for loop.

3) mktime( date specs )
"date specs" argument to mktime is a string of the form YYYY MM DD HH MM SS.
YYYY = full year
MM = month, 1 to 12
DD = day, 1 to 31
HH = hour, 0 to 23
mm = minute, 0 to 59
SS = seconds, 0 to 59
mktime will create a timestamp similar to systime()
eg

Code: Select all

C:\>awk "BEGIN{string=\"2014 01 01 0 0 0\"; print mktime(string) } "
1388505600

mktime is commonly use to get time difference. eg compare the date "2014 01 01 0 0 0 " against today's date and get their difference (in secs)

Code: Select all

C:\>awk "BEGIN{string=\"2014 01 01 0 0 0\"; s=mktime(string); print (systime() - s) } "
664866

this is useful if for example, you are parsing a log file and filtering the date/time column for a specific date.

to be continued

-berserker

berserker: Posts: 95; Joined: 18 Dec 2013 00:51

Merging strings of similar items (keys).

Quote

#23 Post by berserker » 08 Jan 2014 04:47

Sometimes you many want to merge a collection of similar items. eg

Code: Select all

C_1,KOG0155
C_1,KOG0306
C_2,KOG3259
C_3,KOG0931
C_2,KOG3638
C_4,KOG0956
C_6,KOG0155
C_1,KOG0306
C_3,KOG3259
C_4,KOG0931
C_5,KOG3638
C_1,KOG0956

to become something like this:

Code: Select all

C_1,KOG0155 ,KOG0306,KOG0306,KOG0956
C_2,KOG3259, KOG3638
C_3,KOG0931, KOG3259
C_4,KOG0956, KOG0931
C_6,KOG0155
C_5,KOG3638

You can make use of associative arrays in awk

Code: Select all


C:\>awk -F"," "{ array[$1] = array[$1]\",\"$2 }END{ for(idx in array) print idx, a[idx]}" 
C_3 ,KOG0931,KOG3259
C_4 ,KOG0956,KOG0931
C_5 ,KOG3638
C_6 ,KOG0155
C_1 ,KOG0155,KOG0306,KOG0306,KOG0956
C_2 ,KOG3259,KOG3638

Endoro: Posts: 244; Joined: 27 Mar 2013 01:29; Location: Bozen

Re: Awk - A nifty little tool for text manipulation and more

Quote

#24 Post by Endoro » 08 Jan 2014 07:52

nice work!

short comment, please:

Code: Select all

awk "BEGIN {  while ( (\"dir\" | getline line ) > 0 ) print \"reading:\",line }"

instead of 'while (expression > 0)' we can simply write in awk 'while (expression)', the expression '> 0' is always 'true', and '= 0' is 'false'.

Code: Select all

awk "BEGIN {  while (\"dir\" | getline line ) print \"reading:\",line }"

berserker: Posts: 95; Joined: 18 Dec 2013 00:51

Re: Awk - A nifty little tool for text manipulation and more

Quote

#25 Post by berserker » 08 Jan 2014 08:27

Endoro wrote:instead of 'while (expression > 0)' we can simply write in awk 'while (expression)', the expression '> 0' is always 'true', and '= 0' is 'false'.

Please check the manual

"getline returns 1 if it finds a record, and 0 if the end of the file is encountered. If there is some error in getting a record, such as a file that cannot be opened, then getline returns -1. "

testing explicit for >0 is always encouraged (for beginners at least) so that you are making sure there is no other records to read.

you can try this

Code: Select all

awk "BEGIN{ while ( getline < "ddd" ) {...}  }"

and see what happens. Make sure ddd is a file that does not exist.

If you really want to omit the explicit test, the its really up to your discretion...

Endoro: Posts: 244; Joined: 27 Mar 2013 01:29; Location: Bozen

Re: Awk - A nifty little tool for text manipulation and more

Quote

#26 Post by Endoro » 08 Jan 2014 08:48

sorry SIR, but you need an update of your manual. 'getline' NEVER returns a negative value. And even beginners should learn, never to read from a non-existing file .... :mrgreen:

berserker: Posts: 95; Joined: 18 Dec 2013 00:51

Re: Awk - A nifty little tool for text manipulation and more

Quote

#27 Post by berserker » 08 Jan 2014 09:27

Endoro wrote:sorry SIR, but you need an update of your manual. 'getline' NEVER returns a negative value. And even beginners should learn, never to read from a non-existing file ....

its not my manual. If you have doubts, please write to the contributors and ask them.

sometimes you don't have the luxury to know before hand whether a file exists. Need I explain more?

Squashman: Expert; Posts: 4488; Joined: 23 Dec 2011 13:59

Re: Awk - A nifty little tool for text manipulation and more

Quote

#28 Post by Squashman » 08 Jan 2014 11:12

I guess there could be instances when you could not test for the existence of a file but I am not sure when. Every batch file I have ever written I would always use IF EXIST to test for the file.

Now, from what I read on their website it is setting a variable called ERRNO to -1. So that must be internal to AWK. What I am wondering does it report it back to the cmd shell that the command failed so that you can do things like:

Code: Select all

awk "BEGIN{ while ( getline < "ddd" ) {...}  }" && (mycmd) || (othercmd)

What does the variable ERRORLEVEL get set to when that command fails?

berserker: Posts: 95; Joined: 18 Dec 2013 00:51

Re: Awk - A nifty little tool for text manipulation and more

Quote

#29 Post by berserker » 08 Jan 2014 18:09

Squashman wrote:I guess there could be instances when you could not test for the existence of a file but I am not sure when.

awk internally doesn't have a mechanism for checking file existence such as -f test for linux. so most of the time if you want to do that then have to make a system call , OR to call getline and check -1.

Code: Select all

C:\>awk "BEGIN{ x=getline < \"ddd\"   ; print x  }"
-1

However, what I am emphasizing is that its always good practice to check for a return when using getline for pipes and file reading, because its not impossible (although rarely) to have something wrong while reading the file/pipe, even when a file/pipe exists.

Squashman wrote:What does the variable ERRORLEVEL get set to when that command fails?

ERRNO is just a string internal for awk.

Code: Select all

C:\>awk "BEGIN{ getline < \"ddd\"   ; print ERRNO  }"
No such file or directory

so it doesn't get returned to DOS errorlevel. you can capture it though using exit().

Code: Select all

C:\>awk "BEGIN{ x=getline < \"ddd\"   ; exit(x)  }"
C:\>echo %errorlevel%
-1

Endoro: Posts: 244; Joined: 27 Mar 2013 01:29; Location: Bozen

Re: Awk - A nifty little tool for text manipulation and more

Quote

#30 Post by Endoro » 09 Jan 2014 02:27

.. after some research and development I come out with:

If we use 'getline' with a file: result=getline line < "file"
result is

-1: I/O error, Permission denied (file does not exist, file is directory)
0: EOF, file is empty
1: success

If we use 'getline' in a pipe: result="command" | getline line
result is

0: command does not exist, command has no output to STDOUT, drive not ready, path not found
1: success = command has output to STDOUT

If the command prints its error message to STDOUT, 'result=1'.
If the command has success but prints no output to STDOUT, 'result=0'.
We cannot get the cmd %ERRORLEVEL% here.

Post Reply

31 posts

Return to “DOS Batch Forum”