# Python Introduction

Python is an extremely versatile, easy-to-read scripting language which is particularly popular in the world of data science. Whereas *R* is an excellent language for working interactively with data and generating visualizations; python excels at more advanced data processing techniques.

This very brief introduction to Python will go through the basics of the language. You can highlight and run cells to see what output they generate. You can also edit the code and re-run it if you'd like to experiment yourself. 

## Working with Variables

Python is a **dynamically typed** language. It does understand the difference between strings and numbers, but unlike many object-oriented languages you don't have to *declare* the type of the variable. The python engine will work it out for you.

In [1]:

first_name = 'Michael'
initial = 'D.'
last_name = 'Higgins'

# adding strings together joins them into one single string
print(first_name + " " + initial + " " + last_name)

# This is a bit of a questionable way of calculating age in months, but how or never...
age_in_years = 79
months_in_year = 12
age_in_months = age_in_years * months_in_year
print('age in months: ')
print(age_in_months)


Michael D. Higgins
age in months: 
948


In [2]:
population_republic_ireland = 4900000 # or thereabouts
square_miles_republic_ireland = 27133

population_density = population_republic_ireland / square_miles_republic_ireland

print(population_density)

# If we want to add a number to a string we need to convert it using str()
print("The population density of Rep. Ireland is roughly " + str(round(population_density)) + " per square mile")

180.5918991633804
The population density of Rep. Ireland is roughly 181 per square mile


## Flow Control

Unlike many programming languages, Python doesn't use curly braces { } for if statements. Instead it uses indentation. An if statement ends with a colon, and everything inside the code block is indented by 4 spaces (or 1 tab). This is one of my least favourite things about Python; it can bite you if you're not careful!

In [3]:
what_for = 'love'
thing = 'anything'

if what_for == 'love':
    if thing == 'anything':
        print("I'll do it")
    elif thing == 'that':
        print("I won't do it")
    else:
        print("I might do it")
else:
    print("I might do it (but not for love)")


I'll do it


In [4]:
count = 1

print("Everybody get up, singing...")

while count < 6:
    print(count)
    count = count + 1

print("Will make you get down now")

Everybody get up, singing...
1
2
3
4
5
Will make you get down now


In [5]:
frequencies = ['bass', 'mid', 'treble']

for frequency in frequencies:
    if frequency == 'bass':
        print('all about that ' + frequency)
        print("'bout that " + frequency)
    elif frequency == 'treble':
        print('no ' + frequency)

all about that bass
'bout that bass
no treble


## Lists

Python lists are similar to arrays in Java or C#. A list can contain more than one value. Each value can be of a different type (though there's not usually a good reason to have this). We write lists in Python using square brackets.

We can work our way through a list one item at a time using Python's **for** loop

In [6]:
things_rick_wont_do = ['give you up', 'let you down', 'run around and desert you', 'make you cry', 'say goodbye', 'tell a lie and hurt you']

for thing in things_rick_wont_do:
    print('never gonna ' + thing)

never gonna give you up
never gonna let you down
never gonna run around and desert you
never gonna make you cry
never gonna say goodbye
never gonna tell a lie and hurt you


### Indexing
We can also access individual list items using their **index**. The index is the position of the item in the list (starting from 0).

In [7]:
first_thing_rick_wont_do = things_rick_wont_do[0]

print("the first thing Rick won't do is... " + first_thing_rick_wont_do)

third_thing_rick_wont_do = things_rick_wont_do[2]

print("the third thing Rick won't do is... " + third_thing_rick_wont_do)

the first thing Rick won't do is... give you up
the third thing Rick won't do is... run around and desert you


If we use a negative number as the index then Python will count back from the end of the list. The last item in the list is -1, the second-last is -2 etc.

In [8]:
last_thing_rick_wont_do = things_rick_wont_do[-1]

print("the last thing Rick won't do is... " + last_thing_rick_wont_do)

the last thing Rick won't do is... tell a lie and hurt you


### List Slices
We can grab a whole *chunk* of a list using list slices. The colon operator lets us take everything between a start and an end index (the start index is **inclusive** while the end index is **exclusive**

In [9]:
things_rick_wont_do[0:3] # items 0 - 2
things_rick_wont_do[0:-1] # everything but the last

things_rick_wont_do[3:] # If we leave the second index blank it takes everything up to the end of hte list

['make you cry', 'say goodbye', 'tell a lie and hurt you']

### Testing for Inclusion

Often we'll want to check if a list contains a value. In lots of languages we'd need to write our own custom code to do this, but Python makes it easy for us. We can use the Python keyword **in** to check if a value is in a list


In [10]:
rights = ["life", "liberty", "security of person"]

if "party" in rights:
    print("nothing to do")
else:
    print("fight!")
    

fight!


## Python Packages

One of Python's biggest advantages in the area of data science is the huge number of high-quality user-contributed packages. A package gives you extra functionality, and will make your life as a data scientist so much easier.

If you want to use a package you'll first need to install it using **pip**. Pip is Python's package manager and it stands for *Pip Installs Packages*. You can't install packages from inside a Python script; to do so, you need to open a terminal and run the command

```bash
pip install <package_name>
```

For example, to install the *pandas* package for working with data-frames in Python you would run

```bash
pip install pandas
```

We're using Jupyter notebook, and Jupyter notebook lets you run a terminal command by prefixing it with the % character. The following cell will install the pandas package for you. Make sure you check the output of the command; you may need to restart the kernel before you can use the package (see the **Kernel** menu item above)

In [11]:
%pip install pandas

Note: you may need to restart the kernel to use updated packages.


Now that we've installed the package on our computer we need to **import** it to make it available to our script. When we import a package it gives us lots of extra functions we can call. We can call those functions using the module name

```
<module_name>.<function()>
```

For example, to call the pandas **read_csv()** function we would use


```python
pandas.read_csv()
```

Data scientists use the pandas package quite a lot, and why type 6 characters when you type 2? By convention we import pandas using an **alias**, a short name we can use when we want to reference the package in our script.

In [12]:
import pandas as pd

print(pd.__version__)

0.25.1
