## Makefile example
Add this to the existing command line notebook after fresh pull
sub the cell below for the cell in the updated notebook

##Command line programs we'll use:

* csvkit by onryx, install with pip or conda
* pcregrep , Debian sudo apt-get install
* find
* cut
* awk
* head
* tail
* make
* xargs
* parallel
* cat
* sed
* time
* date
* read
* while
* for


# Save the output to a file for the future
We'll redirect the output from standard out (terminal display) to a file.

In [1]:
%%bash
#root_dir="/home/daniel/git/Python2.7/DataScience/command_line_data"
root_dir="/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data"
cat $root_dir/linux_inventory.csv | cut -d, -f 3 | sort | uniq -c | sort -n > $root_dir/ext_list.txt

## Makefile
Let's try using a makefile. We have 2 steps required to create the sorted list of extensions and their counts.

1. make an inventory
2. sort the inventory by extension

We also have 2 dependencies

1. Linux kernel source
2. inventory

There's one final output, the sorted list of counts by extension

The idea behind the makefile, is that if we change a dependency, then we want the target output steps to run again.
If a file was added to the Linux kernel, then we need a new inventory and then a file extension count list. in the a real world usage I'd make some more effort to avoid running the entire inventory. Here, that's a bit overkill.


## What is happening.
Make keeps track of when a file or directory has been modified. If something was touched, then the recipe is invoked to handle that updated information. Make can make use of functions, shell paramters although it a slightly different form.

Makefile are typically named, "makefile", and are tab delimited. Make has to parse the makefile so there's some special syntax that is similar to but distinct from that of the shell.

In [None]:
%%bash

kernel = $(root_dir)/"linux-2.6.32.67"
inventory = $(root_dir)/linux_inventory.csv
extension_list = $(root_dir)/ext_list.txt

#function
cat $root_dir/linux_inventory.csv | cut -d, -f 3 | sort | uniq -c | sort -n > $root_dir/ext_list.txt

#function
find $root_dir -type f | parallel -n 1 --jobs 2 run_inventory.sh > $root_dir/linux_inventory.csv
sed -i '1 i\path_,tot_lines' $root_dir/linux_inventory.csv


inventory:
	create_ext_list

kernel:
	run_inventory.sh


## Testing it
We can test this makefile by using the touch program, to update the last modified time of a file, to the current date and time. Make program also has an option for this, -f.

So we can spoof an update to our dependency files and see how this thing works.

We may also want to make new directories for the outputs and name them by date. This easily done with the _date_ program.

## Storing the output of command to a variable (shell paramter)

We can set a shell paramter from the output of another shell command/program.
You've seen this trick used in the simple inventory program earlier. I used it a lot actually, to 
assign the output of commands to a shell paramter.

In [28]:
%%bash

echo $USER # shell variable setup when you login
var=$(echo $USER | cut -c 1-4)
echo $var

dcuneo
dcun


In [2]:
%%bash
date

Mon Sep 28 12:07:36 PDT 2015


In [58]:
%%bash
root_dir="/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data"

dir_path=${root_dir}/$(date +%Y_%m_%d_%H:%M:%S)
echo $dir_path

/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data/2015_09_28_13:17:16


## Shell Loops

There are two kinds of loops that I tend to use:

1. while%%bash
2. for

### Tests

The square brackets are called "tests" . This is another topic in shell scripting that I can't really cover right here but you can see a use case for it. 

The loop below is rather contrived. We would really just use _cat_ command to see the contents. But it's a good practice because the output is predictable.

In [38]:
%%bash
root_dir="/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data"

while read f;do
    echo $f
done < $root_dir/ext_list.txt

1 "1"
1 "1992-1997"
1 "1994-2004"
1 "1995-2002"
1 "1996-2002"
1 "5"
1 "act2000"
1 "AddingFirmware"
1 "AdvancedTopics"
1 "agh"
1 "aic79xx"
1 "aic7xxx"
1 "arcmsr"
1 "asp"
1 "au0828"
1 "audio"
1 "auto"
1 "avmb1"
1 "awk"
1 "ax"
1 "binfmt"
1 "bttv"
1 "buddha"
1 "build"
1 "CAPI"
1 "cc"
1 "cert"
1 "ChangeLog"
1 "char"
1 "clean"
1 "Coding"
1 "common"
1 "concap"
1 "Conclusion"
1 "copyright"
1 "cpia"
1 "cpia2"
1 "cputype"
1 "cx23885"
1 "cycladesZ"
1 "DAC960"
1 "dino"
1 "diversion"
1 "DOC"
1 "drm"
1 "drv_ba_resend"
1 "dtc"
1 "dvb-usb"
1 "Early-stage"
1 "em28xx"
1 ext
1 "FIRST"
1 "FlashPoint"
1 "Followthrough"
1 "FPE"
1 "freeze"
1 "freezer"
1 "fwinst"
1 "gate"
1 "gdbinit_200MHz_16MB"
1 "gdbinit_300MHz_32MB"
1 "gdbinit_400MHz_32MB"
1 "generic"
1 "gigaset"
1 "glade"
1 "headersinst"
1 "hfc-pci"
1 "HiSax"
1 "history"
1 "hm12"
1 "host"
1 "hp300"
1 "hysdn"
1 "hz"
1 "i2400m"
1 "icn"
1 "ide"
1 "include"
1 "inc_shipped"
1 "inf"
1 "ini"
1 "Intro"
1 "ioctl"
1 "iosched"
1 "ips"
1 "ipw2100"
1 "ipw2200"
1 "ir"


The bash magic for the notebook won't allow the use of the IFS, field separator.
Bash will delimit the output of the cat, by spaces or newlines (\n). So in order to get the output
as we'd like, we need each line to be delimited by \n, thus the IFS syntax.

Read is a better way to work with the contents of a file where it's assumed that you want data on a per-line basis.
Most of the time we do.

In [52]:
%%bash
root_dir="/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data"

IFS=$'\n'
for f in $(cat $root_dir/ext_list.txt);do                                                                                 
    echo $f
done

      1 "1"
      1 "1992-1997"
      1 "1994-2004"
      1 "1995-2002"
      1 "1996-2002"
      1 "5"
      1 "act2000"
      1 "AddingFirmware"
      1 "AdvancedTopics"
      1 "agh"
      1 "aic79xx"
      1 "aic7xxx"
      1 "arcmsr"
      1 "asp"
      1 "au0828"
      1 "audio"
      1 "auto"
      1 "avmb1"
      1 "awk"
      1 "ax"
      1 "binfmt"
      1 "bttv"
      1 "buddha"
      1 "build"
      1 "CAPI"
      1 "cc"
      1 "cert"
      1 "ChangeLog"
      1 "char"
      1 "clean"
      1 "Coding"
      1 "common"
      1 "concap"
      1 "Conclusion"
      1 "copyright"
      1 "cpia"
      1 "cpia2"
      1 "cputype"
      1 "cx23885"
      1 "cycladesZ"
      1 "DAC960"
      1 "dino"
      1 "diversion"
      1 "DOC"
      1 "drm"
      1 "drv_ba_resend"
      1 "dtc"
      1 "dvb-usb"
      1 "Early-stage"
      1 "em28xx"
      1 ext
      1 "FIRST"
      1 "FlashPoint"
      1 "Followthrough"
      1 "FPE"
      1 "freeze"
      1 "freezer"
      1 "fwinst"
    

Let's do something slighly more interesting and introduce the _test_ while were at it.

In [39]:
%%bash
root_dir="/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data"

while read f;do
        count=$(echo $f | awk '{print $1}')
    if [ $count -gt 10 ];then
        echo $f
    fi
done < $root_dir/ext_list.txt

11 "c_shipped"
13 "tst"
14 "ppm"
15 "lds"
23 "pl"
26 "HEX"
28 "debug"
33 "tmpl"
34 "sh"
50 "boot"
79 "gitignore"
105 "xml"
111 "ihex"
115 "dts"
849 "txt"
1080 "S"
2391 "NONE"
11622 "h"
13147 "c"


## Use case
Maybe you run a program in a loop, and save each output to a new file named with the dat and time

In [59]:
%%bash
root_dir="/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data"

for ind in {1..5};do 
    fname="${root_dir}/$(date +%Y_%m_%d_%H:%M:%S).test"
    echo $fname
done

/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data/2015_09_28_13:25:16.test
/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data/2015_09_28_13:25:16.test
/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data/2015_09_28_13:25:16.test
/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data/2015_09_28_13:25:16.test
/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data/2015_09_28_13:25:16.test


In [60]:
%%bash
root_dir="/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data"

for ind in {1..5};do 
    fname="${root_dir}/test_$ind.txt"
    echo $fname
done

/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data/test_1.txt
/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data/test_2.txt
/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data/test_3.txt
/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data/test_4.txt
/home/dcuneo/git/PersonalDS/DataScience/command_line_pres_data/test_5.txt
