# Lesson 01: Basics of Awk

If you haven't read the Awk [man page](https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/awk.1.html), you should start there. It's helpful! Some highlights: 

> awk − pattern-directed scanning and processing language

> `awk [ −F fs ] [ −v var=value ] [ ’prog’ | −f progfile ] [ file ... ]`

> _Awk_ scans each input _file_ for lines that match any of a set of patterns specified literally in _prog_ or in one or more files specified as __−f__ _progfile_.

> With each pattern there can be an associated action that will be performed when a line of a _file_ matches the pattern.

> Each line is matched against the pattern portion of every pattern-action statement; the associated action is performed for each matched pattern

> A pattern-action statement has the form `pattern {action}`.

> A missing `{ action }` means print the line; a missing pattern always matches. 

I created an simple example file to demonstrate basic Awk:

In [1]:
cat data/letters.txt

a
bb
ccc
dddd
ggg
hh
i

### A Basic Pattern

If we match lines longer than two characters and use the implicit print action, we get:

In [2]:
awk 'length $0 > 2' data/letters.txt

bb
ccc
dddd
ggg
hh


`$0` is a built-in variable that contains the line.

### A Basic Function

If we leave out a pattern, we will match every line. A trivial action would be to print each line:

In [3]:
awk '{ print }' data/letters.txt

a
bb
ccc
dddd
ggg
hh
i


Using the `length` function as our action, we can get the length of each line:

In [4]:
awk '{ print length }' data/letters.txt

1
2
3
4
3
2
1


The action implicity acts on the whole line. We can be more explicit if we want:

In [5]:
awk '{ print length $0 }' data/letters.txt

1a
2bb
3ccc
4dddd
3ggg
2hh
1i


Awk has special controls for executing some code before the file input begins and after it is complete.

In [3]:
awk 'BEGIN { print "HI" } { print $0 } END { print "BYE!" }' data/letters.txt

HI
a
bb
ccc
dddd
ggg
hh
i
BYE!


### Combining Patterns and Functions

Of course, patterns and functions can be combined so that the function is only applied when the pattern is matched. 

From the man page:

> A pattern-action statement has the form

> ```pattern { action }```

We can print the length of all lines longer than 2 characters.

In [6]:
awk 'length($0) > 2 { print length($0) }' data/letters.txt

3
4
3


### Multiple Fields

Awk is designed for easy handling of data with multiple fields per row. The field delimiter can be specified with the `-F` option.

Here's a simple space-delimited file:

In [7]:
awk '{print }' data/field_data.txt

Roses are red,
Violets are blue,
Sugar is sweet,
And so are you.


If we specify the field seperator, we can print the second field from each row:

In [8]:
awk -F " " '{print $2 }' data/field_data.txt

are
are
is
so


We don't get an error if a line doesn't have the referenced field; it just shows up as blank:

In [9]:
awk -F " " '{print $4 }' data/field_data.txt




you.


The seperator expression is interpreted as a regular expression.

In [10]:
awk -F "((so )?are|is) " '{print "Field 1: " $1 "\nField 2: " $2}' data/field_data.txt

Field 1: Roses 
Field 2: red,
Field 1: Violets 
Field 2: blue,
Field 1: Sugar 
Field 2: sweet,
Field 1: And 
Field 2: you.


### Built-in Functions

Awk comes with a variety of built-in functions. Basic mathematical functions are available:

In [18]:
echo "1 2 3 4 5 6 7" | awk '{print exp $1, log $2, sqrt $3, sin $4, cos $5, atan2($6, $7) }' 

2.718281 02 13 0.8414714 0.5403025 0.708626


It can also generate random numbers on (0, 1). Using the `BEGIN` control, we can generate a random number without having to provide any input to Awk.

In [48]:
awk 'BEGIN { print rand; print rand }' 

0.840188
0.394383


By default, Awk starts with same seed for each call to Awk. Running this command twice in a row returns the same result:

In [49]:
awk 'BEGIN { print rand; print rand }' 

0.840188
0.394383


The `srand` function can be used to set the seed:

In [52]:
awk 'BEGIN { srand(10); print rand; print rand }' 

0.565811
0.61093


The `int` function returns "the nearest integer to x, located between x and zero and truncated toward zero".

In [59]:
awk 'BEGIN { print "int(0.9) = " int(0.9); print "int(-0.9) = " int(-0.9) }' 

int(0.9) = 0
int(-0.9) = -0
