Thursday, November 5, 2009

How to use AWK command

------------------------- How to Use AWK --------------------------


First, suppose you have a file called 'file1' that has 2 columns of numbers, and you want to make a new file called 'file2' that has columns 1 and 2 as before, but also adds a third column which is the ratio of the numbers in columns 1 and 2. Suppose you want the new 3-column file (file2) to contain only those lines with column 1 smaller than column 2. Either of the following two commands does what you want:

awk '$1 < $2 {print $0, $1/$2}' file1 > file2

-- or --

cat file1 | awk '$1 < $2 {print $0, $1/$2}' > file2


Suppose you have several thousand files you want to move into a new directory and rename by appending a .dat to the filenames. You could do this one by one (several hours), or use vi to make a decent command file to do it (several minutes), or use awk (several seconds). Suppose the files are named junk* (* is wildcard for any sequence of characters), and need to be moved to ../iraf and have a '.dat' appended to the name. To do this type

ls junk* | awk '{print "mv "$0" ../iraf/"$0".dat"}' | csh


More complex awk scripts need to be run from a file. The syntax for such cases is:

cat file1 | awk -f a.awk > file2

where file1 is the input file, file2 is the output file, and a.awk is a file containing awk commands. Examples below that contain more than one line of awk need to be run from files.

Some useful awk variables defined for you are NF (number of columns), NR (the current line that awk is working on), END (true if awk reaches the EOF), BEGIN (true before awk reads anything), and length (number of characters in a line or a string). There is also looping capability, a search (/) command, a substring command (extremely useful), and formatted printing available. There are logical variables || (or) and && (and) that can be used in 'pattern'. You can define and manipulate your own user defined variables. Examples are outlined below. The only bug I know of is that Sun's version of awk won't do trig functions, though it does do logs. There is something called gawk (a Gnu product), which does a few more things than Sun's awk, but they are basically the same. Note the use of the 'yes' command below. Coupled with 'head' and 'awk' you save an hour of typing if you have a lot of files to analyze or rename.


EXAMPLES # is the comment character for awk. 'field' means 'column'

# Print first two fields in opposite order:
awk '{ print $2, $1 }' file


# Print lines longer than 72 characters:
awk 'length > 72' file


# Print length of string in 2nd column
awk '{print length($2)}' file


# Add up first column, print sum and average:
{ s += $1 }
END { print "sum is", s, " average is", s/NR }


# Print fields in reverse order:
awk '{ for (i = NF; i > 0; --i) print $i }' file


# Print the last line
{line = $0}
END {print line}


# Print the total number of lines that contain the word Pat
/Pat/ {nlines = nlines + 1}
END {print nlines}


# Print all lines between start/stop pairs:
awk '/start/, /stop/' file


# Print all lines whose first field is different from previous one:
awk '$1 != prev { print; prev = $1 }' file


# Print column 3 if column 1 > column 2:
awk '$1 > $2 {print $3}' file


# Print line if column 3 > column 2:
awk '$3 > $2' file


# Count number of lines where col 3 > col 1
awk '$3 > $1 {print i + "1"; i++}' file


# Print sequence number and then column 1 of file:
awk '{print NR, $1}' file


# Print every line after erasing the 2nd field
awk '{$2 = ""; print}' file


# Print hi 28 times
yes | head -28 | awk '{ print "hi" }'


# Print hi.0010 to hi.0099 (NOTE IRAF USERS!)
yes | head -90 | awk '{printf("hi00%2.0f \n", NR+9)}'

# Print out 4 random numbers between 0 and 1
yes | head -4 | awk '{print rand()}'

# Print out 40 random integers modulo 5
yes | head -40 | awk '{print int(100*rand()) % 5}'


# Replace every field by its absolute value
{ for (i = 1; i <= NF; i=i+1) if ($i < i =" -$i" 2="="" i="875;i">833;i--){
printf "lprm -Plw %d\n", i
} exit
}


Formatted printouts are of the form printf( "format\n", value1, value2, ... valueN)
e.g. printf("howdy %-8s What it is bro. %.2f\n", $1, $2*$3)
%s = string
%-8s = 8 character string left justified
%.2f = number with 2 places after .
%6.2f = field 6 chars with 2 chars after .
\n is newline
\t is a tab


# Print frequency histogram of column of numbers
$2 <= 0.1 {na=na+1} ($2 > 0.1) && ($2 <= 0.2) {nb = nb+1} ($2 > 0.2) && ($2 <= 0.3) {nc = nc+1} ($2 > 0.3) && ($2 <= 0.4) {nd = nd+1} ($2 > 0.4) && ($2 <= 0.5) {ne = ne+1} ($2 > 0.5) && ($2 <= 0.6) {nf = nf+1} ($2 > 0.6) && ($2 <= 0.7) {ng = ng+1} ($2 > 0.7) && ($2 <= 0.8) {nh = nh+1} ($2 > 0.8) && ($2 <= 0.9) {ni = ni+1} ($2 > 0.9) {nj = nj+1}
END {print na, nb, nc, nd, ne, nf, ng, nh, ni, nj, NR}


# Find maximum and minimum values present in column 1
NR == 1 {m=$1 ; p=$1}
$1 >= m {m = $1}
$1 <= p {p = $1}
END { print "Max = " m, " Min = " p }

# Example of defining variables, multiple commands on one line
NR == 1 {prev=$4; preva = $1; prevb = $2; n=0; sum=0}
$4 != prev {print preva, prevb, prev, sum/n; n=0; sum=0; prev = $4; preva = $1; prevb = $2}
$4 == prev {n++; sum=sum+$5/$6}
END {print preva, prevb, prev, sum/n}

# Example of defining and using a function, inserting values into an array
# and doing integer arithmetic mod(n). This script finds the number of days
# elapsed since Jan 1, 1901. (from http://www.netlib.org/research/awkbookcode/ch3)
function daynum(y, m, d, days, i, n)
{ # 1 == Jan 1, 1901
split("31 28 31 30 31 30 31 31 30 31 30 31", days)
# 365 days a year, plus one for each leap year
n = (y-1901) * 365 + int((y-1901)/4)
if (y % 4 == 0) # leap year from 1901 to 2099
days[2]++
for (i = 1; i < m; i++)
n += days[i]
return n + d
}
{ print daynum($1, $2, $3) }

# Example of using substrings
# substr($2,9,7) picks out characters 9 thru 15 of column 2
{print "imarith", substr($2,1,7) " - " $3, "out."substr($2,5,3)}
{print "imarith", substr($2,9,7) " - " $3, "out."substr($2,13,3)}
{print "imarith", substr($2,17,7) " - " $3, "out."substr($2,21,3)}
{print "imarith", substr($2,25,7) " - " $3, "out."substr($2,29,3)}


3 comments:

  1. good post.....very informative

    lagta hai hai meri training kaafi kaam aa gayi.....when and in what do u want me to give u the next tutorial?

    ReplyDelete
  2. Very nice article. Great help for the Administrator . You can find more awk command in Unix here.

    ReplyDelete