## Introduction to Python for Map Librarians

### Basic Data Types

- Numbers: 6, 2.305, -4
- Strings: "Hello", "Reno, Nevada" , "3"
- Lists: ['a','b','c','d']

Try the following:

    >>> print (type(5))
    >>> print (type('Reno, Nevada'))
    >>> print (type(['maps','atlases',6])


### Variables

Create a variable called __conference__ which contains a string value: __"WAML"__

Create a variable called __year__ which contains a number: __2019__

Create a variable called __days__ which contains a list: __["Wednesday","Thursday","Friday","Saturday"]__


    >>> conference = "WAML"
    >>> year = 2019
    >>> list = ["Wednesday","Thursday","Friday","Saturday"]
    >>> print (conference, year, days)
    

Assign a new value to the year variable:

    >>> year = 2018

    >>> print (conference, year, days)

### Lists

Use __append__ to add an item to a list:

    >>> days.append("Sunday")

### Concatenating Strings

Use + to concatenate string 

__>>> print ("Python" + "Workshop")__

*PythonWorkshop*

__>>> print ("Python " + "Workshop")__

*Python  Workshop*

__>>> print ("Python " + year)__

*TypeError*

__>>> print ("Python " + str(year))__

*Python 2019*


### Indexing

Print the first letter of the variable __conference__:

    >>> print (conference[0])

Print the fourth letter of the word Geospatial:

    >>> print ("Geospatial"[3])

Print the third item from the list __days__:

    >>> print (days[2])

Print the last letter of the word Geospatial:

    >>> print ("Geospatial"[-1])

Prints the second to last item from the list __days__:

    >>> print (days[-2])

Index error:

    >>> print (days[14])

### Slicing

Access a subset of data using slice notation

    >>> print (conference[0:3])

    >>> print (days[1:4])

    >>> print ("Geospatial"[3:])

### Splitting Strings

Create a string variable called __place__:

    >>> place = "El Dorado County, California"

Split text using a given delimiter:

    >>> print (place)

    >>> print (place.split(","))

    >>> print (place.split(" C"))

    >>> print (place.split("o", 2))

    >>> print (place.split(",")[0])

### Loops

Repeat actions multiple times using for loops.

    >>> print (days)
    
    >>> for day in days:
       >>> print (day)
    >>> print ("Loop finished running!")   

#### The OS Module

OS allows for operating system functionality.

    import os
    print (os.getcwd())
    print (os.listdir())

#### Help Documentation

Using __help( )__ to view documentation for a module

    >>> help(os)

    >>> help(os.getcwd)

Now use OS to go back one level in your directory to the home folder. "../" navigates back a folder.

    >>> os.chdir('../')
    >>> print (os.getcwd())

#### Pandas

Import the Pandas library using a common shortcut:

    >>> import pandas as pd

#### Reading a File into a Dataframe

Use __read_csv( )__ to import a file into a dataframe:

    >>> pd.read_csv('workshopdata/California_Tahoe_Counties_raw.csv')

Create a variable for a dataframe called __California__ and read a csv file into it.

    >>> California = pd.read_csv("workshopdata/California_Tahoe_Counties_raw.csv", dtype={"FIPS":str}

    >>> California

#### Drop the first row

Use the __drop__ to remove the first row:

    >>> California = California.drop[0]

    >>> California

Use __shape__ to view the dimensions (rows, columns) of the dataframe

    >>> California.shape

Use __rename__ to rename a column

    >>> California = California.rename(columns = {'FIPS':'GEOID'})

#### Setting an Column Index  
    >>> California.set_index("GEOID")

### Finding and Selecting Data

#### Select by Column

    >>> California["GEOID"]

#### Select Multiple Columns:

    >>> California[['GEOID','Name of Area']]

#### Select a single row

Use __loc__ to select one or more rows

    >>> California.loc[6]

#### Select a Subset of Rows

    >>> California.loc[0:5]

### Create a new dataframe

Create a variable __Cal_Cos__ with the value being the __California__ data frame. 

Set the dataframe index to __GEOID__

    >>> Cal_Cos = California.set_index("GEOID")

    >>> Cal_Cos

#### Adding a New Column

Add a new column called __Year__ and assign a value of __2017__

    >>> Cal_Cos["Year"] = 2017

    >>> Cal_Cos
    
The new column is added to the end of the dataframe.

Create a column for the County by extracting the county name from the "Name of Area" column:

    >>> County = Cal_Cos["Name of Area"].str.split(',', n = 2, expand=True)

    >>> Cal_Cos["County Name"] = County[1]

    >>> Cal_Cos

#### Exporting the Dataframe

Use __to_csv( )__ to write the dataframe to a *.csv*

    >>> Cal_Cos.to_csv("California_Counties.csv")

Repeat the same steps for "Nevada_Tahoe_Counties_raw.csv":

    >>> Nevada = pd.read_csv('workshopdata/Nevada_Tahoe_Counties_raw.csv', dtype={"FIPS":str})

    >>> Nevada = Nevada.drop([0])
    
    >>> Nevada = Nevada.rename(columns={'FIPS':'GEOID'})

    >>> Nev_Cos = Nevada.set_index('GEOID')

    >>> Nev_Cos["Year"] = 2017

    >>> County = Nev_Cos["Name of Area"].str.split(',', n = 2, expand=True)

    >>> Nev_Cos["County Name"] = County[1]

    >>> Nev_Cos.to_csv('Nevada_Counties.csv')

    >>> Nev_Cos = pd.read_csv('Nevada_Counties.csv')
    
    
    
    

### Repeating Actions with Loops


#### Finding Files

The __glob__ module finds all files in a directory matching a particular pattern.

Create a variable called __raw_files__ and use glob to search for all files containing *raw.csv*

    >>> import glob

    >>> raw_files = glob.glob("*raw.csv")

    >>> print (raw_files)

Loop over the list of files and read each csv into a dataframe called __data__.

Perform the following actions on each dataframe object:

Drop the first row

Rename FIPS to GEOID

Set index to "GEOID"

Add a column for year with the value 2017

Extract the county name and create a new column with that value

Create a new filename using the string up until the first underscore + "Counties."

Write the csv data to the home directory.


In [None]:
for f in raw_files:
    print(f)
    data = pd.read_csv(f)
    data = data.drop([0])
    data = data.rename(columns={'FIPS':'GEOID'})
    data = data.set_index("GEOID")
    data["Year"] = 2017
    Count = data["Name of Area"].str.split(',', n = 2, expand=True)
    data["County Name"] = County[1]
    fileName = f.replace('/','_')
    fileName = fileName.split('_')[1] + "_Counties"
    data.to_csv(fileName + '.csv')
print("\n" + "Done")

### Concatenate Data

Use __concat( )__ to concatenate files


    >>> pd.concat([Cal_Cos, Nev_Cos])

Create a variable called __All_Cos__ and concatenate __Cal_Cos__ and __Nev_Cos__.   

Then write __All_Cos__ to a csv file called "All_Counties.csv"

__Tip__

The command *%whos* will output a list of all variables and modules currently being used by the program