# Manipulating Files and Directories

We use graphical displays to interact with computers (GUIs) but a lot of repetitive tasks could be achieved faster if we use the command line.

In [3]:
# to see where we are in the filesystem
!pwd

/Users/miguel.carvalho/dev/datacamp/5_introduction_to_shell/notes


## Quick intro

- Paths starting with `/` are absolute and paths not starting with `/` are relative
- `~`refers to home directory
- some basic commands
    - `cd`: `c`hange `d`irectory
    - `ls`: `l`i`s`t files
    - `mv`: `m`o`v`e file (also renames)
    - `mkdir`: `m`a`k`e `dir`ectory
    - `rmdir`: `r`e`m`ove `dir`ectory (only if empty)
    - `cat`: con`cat`enate a file (used to just view its contents)

## `less`

`less` can be used to view the contents of files, one page at a time. After, we can use the spacebar to see the next chunk, `:n` to see the next file, and `:q` to quit.

In [4]:
!ls

5_introduction_to_shell.ipynb


In [5]:
!less 5_introduction_to_shell.ipynb

{7[?47h[?1h=
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Manipulating Files and Directories"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use graphical displays to interact with computers (GUIs) but a lot of repetitive tasks could be achieved faster if we use the command line."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
[K     "name": "stdout",ll.ipynb[m[K
:[K

## `head`

`head` displays the first lines of a given file.

In [7]:
!head 5_introduction_to_shell.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Manipulating Files and Directories"
   ]
  },
  {


## Flags

Flags can be used to give more granular control to commands. They usually consist of `-` and a single letter indicating the flag's functionality (e.g. `-n` usually means new lines).

- Flags should come before filenames
- Adding a space after the flag is good style

In [8]:
!head -n 20 5_introduction_to_shell.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Manipulating Files and Directories"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use graphical displays to interact with computers (GUIs) but a lot of repetitive tasks could be achieved faster if we use the command line."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},


## Listing everything under a directory

We can use the flag `-R` (recursive) to print everything under a directory. We can also use `-F` which makes the output more readable by adding `/` after directories and `*` after runnable programs.

In [11]:
!ls -R -F ~/dev

[34mcore[m[m/     [34mdatacamp[m[m/ [34mstuff[m[m/

/Users/miguel.carvalho/dev/core:
README.md  [34mpackages[m[m/  [34mscripts[m[m/   [34mvenv[m[m/

/Users/miguel.carvalho/dev/core/packages:
__init__.py  [34mbuild[m[m/       [34mdist[m[m/        [34mmk[m[m/          [34mmk.egg-info[m[m/ setup.py

/Users/miguel.carvalho/dev/core/packages/build:
[34mbdist.macosx-10.14-x86_64[m[m/ [34mlib[m[m/

/Users/miguel.carvalho/dev/core/packages/build/bdist.macosx-10.14-x86_64:

/Users/miguel.carvalho/dev/core/packages/build/lib:
[34mmk[m[m/

/Users/miguel.carvalho/dev/core/packages/build/lib/mk:
__init__.py  [34mbquery[m[m/      [34mcrawl[m[m/       [34mnlp[m[m/         [34mprocess[m[m/
[34mapp_radar[m[m/   [34mconstants[m[m/   [34mgcp[m[m/         [34mplots[m[m/       [34mtools[m[m/

/Users/miguel.carvalho/dev/core/packages/build/lib/mk/app_radar:
__init__.py

/Users/miguel.carvalho/dev/core/packages/build/lib

[31mREADME.md[m[m*              [31mproject_description.md[m[m*
[31mfinding_donors.ipynb[m[m*   [31mvisuals.py[m[m*

/Users/miguel.carvalho/dev/stuff/sandbox/miguel/udacity/nanodegree_machine_learning/term_1/projects/image-classification:
[31mReadMe.md[m[m*                  [31mimage_classification.ipynb[m[m*
[31mhelper.py[m[m*                  [31mproblem_unittests.py[m[m*

/Users/miguel.carvalho/dev/stuff/sandbox/miguel/udacity/nanodegree_machine_learning/term_1/projects/intro-to-tensorflow:
[31menvironment.yml[m[m*                    [31mintro_to_tensorflow.ipynb[m[m*
[31menvironment_win.yml[m[m*                [31mintro_to_tensorflow_solution.ipynb[m[m*
[34mimage[m[m/

/Users/miguel.carvalho/dev/stuff/sandbox/miguel/udacity/nanodegree_machine_learning/term_1/projects/intro-to-tensorflow/image:
[31mLearn Rate Tune - Image.png[m[m* [31mnetwork_diagram.png[m[m*
[31mMean Variance - Image.png[m[m*   [31mnotmnist.png[m[m*


## Getting help with commands

We can get help for a given command by preceding it with `man` (for manual).

In [12]:
!man head


HEAD(1)                   BSD General Commands Manual                  HEAD(1)

NNAAMMEE
     hheeaadd -- display first lines of a file

SSYYNNOOPPSSIISS
     hheeaadd [--nn _c_o_u_n_t | --cc _b_y_t_e_s] [_f_i_l_e _._._.]

DDEESSCCRRIIPPTTIIOONN
     This filter displays the first _c_o_u_n_t lines or _b_y_t_e_s of each of the speci-
     fied files, or of the standard input if no files are specified.  If _c_o_u_n_t
     is omitted it defaults to 10.

     If more than a single file is specified, each file is preceded by a
     header consisting of the string ``==> XXX <=='' where ``XXX'' is the name
     of the file.

EEXXIITT SSTTAATTUUSS
     The hheeaadd utility exits 0 on success, and >0 if an error occurs.

SSEEEE AALLSSOO
     tail(1)

HHIISSTTOORRYY
     The hheeaadd command appeared in PWB UNIX.

BSD                              June 6, 1993            

## How to interpret `man`?

- `NAME` gives a brief explanation of what the command does
- `SYNOPSIS` lists all the flags the command understands
- Optionals are shown in `[...]`
- Either/or options are separated with `|`
- Things which can be repeated are shown by `...`

## `cut`

`cut` lists columns from a file. A typical usage would be:

In [19]:
# -f means columns and -d means the delimiter ("," in a csv)
!cut -f 2-5,8 -d , ../../2_streamlined_data_ingestion_with_pandas/datasets/vt_tax_data_2016.csv

STATE,zipcode,agi_stub,N1,MARS4
VT,0,1,111580,10740
VT,0,2,82760,11310
VT,0,3,46270,3620
VT,0,4,30070,960
VT,0,5,39530,590
VT,0,6,9620,0
VT,05001,1,1340,140
VT,05001,2,1070,140
VT,05001,3,590,70
VT,05001,4,350,30
VT,05001,5,450,0
VT,05001,6,80,0
VT,05031,1,80,0
VT,05031,2,50,0
VT,05031,3,50,0
VT,05031,4,0,0
VT,05031,5,50,0
VT,05031,6,0,0
VT,05032,1,410,50
VT,05032,2,330,30
VT,05032,3,180,20
VT,05032,4,130,0
VT,05032,5,110,0
VT,05032,6,20,0
VT,05033,1,540,60
VT,05033,2,430,90
VT,05033,3,230,20
VT,05033,4,140,0
VT,05033,5,130,0
VT,05033,6,30,0
VT,05034,1,80,0
VT,05034,2,40,0
VT,05034,3,20,0
VT,05034,4,20,0
VT,05034,5,30,0
VT,05034,6,0,0
VT,05035,1,110,0
VT,05035,2,80,0
VT,05035,3,60,0
VT,05035,4,30,0
VT,05035,5,30,0
VT,05035,6,0,0
VT,05036,1,130,0
VT,05036,2,130,30
VT,05036,3,80,0
VT,05036,4,40,0
VT,05036,5,70,0
VT,05036,6,0,0
VT,05037,1,120,0
VT,05037,2,100,40
VT,05037,3,70,0
VT,05037,4,60,0
VT,05037,5,70,0
VT,05037,6,20,0
VT,05038,

`cut` is dumb in that it struggles with quoted strings. 

For instance, given the line

`
Name,Age
"Johel,Ranjit",28
"Sharma,Rupinder",26
`

In [None]:
!cut -f 2 -d , everyone.csv

would produce:

`Age
Ranjit"
Rupinder"`

## Running commands multiple times

`history` shows a list of the recently run commands 

In [20]:
!history

Running `!` + command (e.g. `!head`) runs the most recent run of that command.

## `grep`

`grep` takes a piece of text followed by filenames and returns matches for that text in the filenames. Common flags are:
- `-c`: print a count of the matching lines rather than the lines themselves
- `-h`: don't print the names of the files when searching multiple files
- `-l`: print the names of the files which match, not the matches themselves
- `-i`: ignore case
- `-n`: print line numbers for matching lines
- `-v`: show lines that don't match

In [26]:
# all lines with the term package
!grep -n package ../../3_software_engineering_for_data_scientists_in_python/notes/3_software_engineering_for_data_scientists_in_python.ipynb

15:    "        - packages, classes and methods (in Python)\n",
114:    "A minimal Python package consists of 2 elements: a directory and a python file. The name of the directory should be the name of the package. According to PEP8, it should be `package_name`, in a way that describes its functionality. The file in the file **must** be called `__init__.py`\n",
116:    "> As of Python 3.3, any directory can be imported as if it were a package without error even if it doesn't follow the structure"
128:    "|-- package_name\n",
142:      "Help on package package_name:\n",
145:      "    package_name\n",
158:    "import package_name\n",
161:    "help(package_name)"
168:    "We can add other files to our package, changing the tree to\n",
173:    "|-- package_name\n",
185:    "`import my_package.utils`"
202:    "import package_name.utils\n",
204:    "package_name.utils.we_need_to_talk(break_up=True)"
211:    "We can also use the package's `__init__.py` file to make our utils fun

## Combining Tools