## The Basics

At the core of Python (and any programming language) there are some key characteristics of how a program is structured that enable the proper execution of that program. These characteristics include the structure of the code itself, the core data types from which others are built, and core operators that modify objects or create new ones. From these raw materials more complex commands, functions, and modules are built.
For guidance on recommended Python structure refer to the [Python Style Guide](https://www.python.org/dev/peps/pep-0008).

# Examples: Variables and Data Types

## The Interpreter

In [1]:
# The interpreter can be used as a calculator, and can also echo or concatenate strings.

3 + 3

6

In [2]:
3 * 3

9

In [3]:
3 ** 3

27

In [4]:
3 / 2 # classic division - output is a floating point number

1.5

In [5]:
# Use quotes around strings

'dogs'

'dogs'

In [6]:
# + operator can be used to concatenate strings

'dogs' + "cats"

'dogscats'

In [7]:
print('Hello World!')

Hello World!


### Try It Yourself

Go to the section _4.4. Numeric Types_ in the Python 3 documentation at <https://docs.python.org/3.4/library/stdtypes.html>. The table in that section describes different operators - try some!

What is the difference between the different division operators (`/`, `//`, and `%`)?

## Variables

Variables allow us to store values for later use. 

In [8]:
a = 5
b = 10
a + b

15

Variables can be reassigned:

In [9]:
b = 38764289.1097
a + b

38764294.1097

The ability to reassign variable values becomes important when iterating through groups of objects for batch processing or other purposes. In the example below, the value of `b` is dynamically updated every time the `while` loop is executed:

In [10]:
a = 5
b = 10
while b > a:
    print("b="+str(b))
    b = b-1

b=10
b=9
b=8
b=7
b=6


Variable data types can be inferred, so Python does not require us to declare the data type of a variable on assignment.

In [11]:
a = 5
type(a)

int

is equivalent to

In [12]:
a = int(5)
type(a)

int

In [13]:
c = 'dogs'
print(type(c))

c = str('dogs')
print(type(c))

<class 'str'>
<class 'str'>


There are cases when we may want to declare the data type, for example to assign a different data type from the default that will be inferred. Concatenating strings provides a good example. 

In [14]:
customer = 'Carol'
pizzas = 2
print(customer + ' ordered ' + pizzas + ' pizzas.')

TypeError: can only concatenate str (not "int") to str

Above, Python has inferred the type of the variable `pizza` to be an integer. Since strings can only be concatenated with other strings, our print statement generates an error. There are two ways we can resolve the error:

1. Declare the `pizzas` variable as type string (`str`) on assignment or
2. Re-cast the `pizzas` variable as a string within the `print` statement.

In [15]:
customer = 'Carol'
pizzas = str(2)
print(customer + ' ordered ' + pizzas + ' pizzas.')

Carol ordered 2 pizzas.


In [16]:
customer = 'Carol'
pizzas = 2
print(customer + ' ordered ' + str(pizzas) + ' pizzas.')

Carol ordered 2 pizzas.


Given the following variable assignments:

```
x = 12
y = str(14)
z = donuts
```

Predict the output of the following:

1. `y + z`
2. `x + y`
3. `x + int(y)`
4. `str(x) + y`

Check your answers in the interpreter.

### Variable Naming Rules

Variable names are case senstive and:

1. Can only consist of one "word" (no spaces).
2. Must begin with a letter or underscore character ('\_').
3. Can only use letters, numbers, and the underscore character.

We further recommend using variable names that are meaningful within the context of the script and the research.


## Reading Files

We can accomplish a lot by assigning variables within our code as demonstrated above, but often we are interested in working with objects and data that exist in other files and directories on our system.

When we want read data files into a script, we do so by assigning the content of the file to a variable. This stores the data in memory and lets us perform processes and analyses on the data without changing the content of the source file.

There are several ways to read files in Python - many libraries have methods for reading text, Excel and Word documents, PDFs, etc. This morning we're going to demonstrate using the ```read()``` and ```readlines()``` method in the standard library, and the Pandas```read_csv()``` function.

In [23]:
# Read unstructured text

# One way is to open the whole file as a block
file_path = "./beowulf" # We can save the path to the file as a variable
file_in = open(file_path, "r") # Options are 'r', 'w', and 'a' (read, write, append)
beowulf_a = file_in.read()
file_in.close()
print(beowulf_a)

ï»¿BEOWULF.

I.

THE LIFE AND DEATH OF SCYLD.


{The famous race of Spear-Danes.}

          Lo! the Spear-Danes' glory through splendid achievements
          The folk-kings' former fame we have heard of,
          How princes displayed then their prowess-in-battle.

{Scyld, their mighty king, in honor of whom they are often called
Scyldings. He is the great-grandfather of Hrothgar, so prominent in the
poem.}

          Oft Scyld the Scefing from scathers in numbers
        5 From many a people their mead-benches tore.
          Since first he found him friendless and wretched,
          The earl had had terror: comfort he got for it,
          Waxed 'neath the welkin, world-honor gained,
          Till all his neighbors o'er sea were compelled to
       10 Bow to his bidding and bring him their tribute:
          An excellent atheling! After was borne him

{A son is born to him, who receives the name of Beowulf--a name afterwards
made so famous by the hero of the poem.}

          A 

In [24]:
# Another way is to read the file as a list of individual lines

with open(file_path, "r") as b:
    beowulf_b = b.readlines()

print(beowulf_b)

['ï»¿BEOWULF.\n', '\n', 'I.\n', '\n', 'THE LIFE AND DEATH OF SCYLD.\n', '\n', '\n', '{The famous race of Spear-Danes.}\n', '\n', "          Lo! the Spear-Danes' glory through splendid achievements\n", "          The folk-kings' former fame we have heard of,\n", '          How princes displayed then their prowess-in-battle.\n', '\n', '{Scyld, their mighty king, in honor of whom they are often called\n', 'Scyldings. He is the great-grandfather of Hrothgar, so prominent in the\n', 'poem.}\n', '\n', '          Oft Scyld the Scefing from scathers in numbers\n', '        5 From many a people their mead-benches tore.\n', '          Since first he found him friendless and wretched,\n', '          The earl had had terror: comfort he got for it,\n', "          Waxed 'neath the welkin, world-honor gained,\n", "          Till all his neighbors o'er sea were compelled to\n", '       10 Bow to his bidding and bring him their tribute:\n', '          An excellent atheling! After was borne him\n', '\n'

In [25]:
# In order to get a similar printout to the first method, we use a for loop
# to print line by line - more on for loops below!

for l in beowulf_b:
    print(l)

ï»¿BEOWULF.



I.



THE LIFE AND DEATH OF SCYLD.





{The famous race of Spear-Danes.}



          Lo! the Spear-Danes' glory through splendid achievements

          The folk-kings' former fame we have heard of,

          How princes displayed then their prowess-in-battle.



{Scyld, their mighty king, in honor of whom they are often called

Scyldings. He is the great-grandfather of Hrothgar, so prominent in the

poem.}



          Oft Scyld the Scefing from scathers in numbers

        5 From many a people their mead-benches tore.

          Since first he found him friendless and wretched,

          The earl had had terror: comfort he got for it,

          Waxed 'neath the welkin, world-honor gained,

          Till all his neighbors o'er sea were compelled to

       10 Bow to his bidding and bring him their tribute:

          An excellent atheling! After was borne him



{A son is born to him, who receives the name of Beowulf--a name afterwards

made so famous by the hero 

In [31]:
# We now have two variables with the content of our 'beowulf' file represented using two different data structures.
# Why do you think we get the different outputs from the next two statements?

# Beowulf text stored as one large string
print("As string:", beowulf_a[0])

# Beowulf text stored as a list of lines
print("As list of lines:", beowulf_b[0])

As string: ï
As list of lines: ï»¿BEOWULF.



In [32]:
# Read CSV files using the Pandas read_csv method.
# Note: Pandas also includes methods for reading Excel.

# First we need to import the pandas library
import pandas as pd

# Create a variable to hold the path to the file
fpath = "aaj1945_DataS1_Egg_shape_by_species_v2.csv"
egg_data = pd.read_csv(fpath)

In [34]:
# We can get all kinds of info about the dataset

# info() provides an overview of the structure
print(egg_data.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1400 entries, 0 to 1399
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Order             1400 non-null   object 
 1   Family            1400 non-null   object 
 2   MVZDatabase       1400 non-null   object 
 3   Species           1396 non-null   object 
 4   Asymmetry         1400 non-null   float64
 5   Ellipticity       1400 non-null   float64
 6   AvgLength (cm)    1400 non-null   float64
 7   Number of images  1400 non-null   int64  
 8   Number of eggs    1400 non-null   int64  
dtypes: float64(3), int64(2), object(4)
memory usage: 98.6+ KB
None


In [47]:
# Look at the first five rows
egg_data.head()

Unnamed: 0,Order,Family,MVZDatabase,Species,Asymmetry,Ellipticity,AvgLength (cm),Number of images,Number of eggs
0,ACCIPITRIFORMES,Accipitridae,Accipiter badius,Accipiter badius,0.1378,0.3435,3.8642,1,2
1,ACCIPITRIFORMES,Accipitridae,Accipiter cooperii,Accipiter cooperii,0.0937,0.2715,4.9008,27,103
2,ACCIPITRIFORMES,Accipitridae,Accipiter gentilis,Accipiter gentilis,0.1114,0.3186,5.9863,7,18
3,ACCIPITRIFORMES,Accipitridae,Accipiter nisus,Accipiter nisus,0.0808,0.2391,4.0355,13,61
4,ACCIPITRIFORMES,Accipitridae,Accipiter striatus,Accipiter striatus,0.0749,0.2543,3.87,15,57


In [39]:
# Names of columns
print(egg_data.columns.values)

['Order' 'Family' 'MVZDatabase' 'Species' 'Asymmetry' 'Ellipticity'
 'AvgLength (cm)' 'Number of images' 'Number of eggs']


In [42]:
# Dimensions (number of rows and columns)
print(egg_data.shape)

(1400, 9)


In [44]:
# And much more! But as a final example we can perform operations on the data.
# Descriptive statistics on the "Number of eggs" column
print(egg_data["Number of eggs"].describe())

count    1400.000000
mean       35.125000
std        85.790347
min         1.000000
25%         3.000000
50%         8.000000
75%        26.250000
max      1139.000000
Name: Number of eggs, dtype: float64


In [45]:
# Or all of the columns in whole table with numeric data types:
print(egg_data.describe())

         Asymmetry  Ellipticity  AvgLength (cm)  Number of images  \
count  1400.000000  1400.000000     1400.000000       1400.000000   
mean      0.148230     0.384384        3.426853          9.320714   
std       0.071228     0.089594        2.161549         20.747693   
min       0.001400     0.096700        1.196000          1.000000   
25%       0.104800     0.325775        1.958925          1.000000   
50%       0.141750     0.377400        2.581150          2.000000   
75%       0.184825     0.435075        4.323650          8.000000   
max       0.484700     0.723700       23.870000        300.000000   

       Number of eggs  
count     1400.000000  
mean        35.125000  
std         85.790347  
min          1.000000  
25%          3.000000  
50%          8.000000  
75%         26.250000  
max       1139.000000  


### Structure

Now that we have practiced assigning variables and reading information from files, we will have a look at concepts that are key to developing processes to use and analyze this information.

#### Blocks

The structure of a Python program is pretty simple:
Blocks of code are defined using indentation. Code that is at a lower level of indentation is not considerd part of a block. Indentation can be defined using spaces or tabs (spaces are recommended by the style guide), but be consistent (and prepared to defend your choice). As we will see, code blocks define the boundaries of sets of commands that fit within a given section of code. This indentation model for defining blocks of code significantly increases the readabiltiy of Python code.

For example:

    >>>a = 5
    >>>b = 10
    >>>while b > a:
    ...    print("b="+str(b))
    ...    b = b-1
    >>>print("I'm outside the block")

#### Comments & Documentation

You can (and should) also include documentation and comments in the code your write - both for yourself, and potential future users (including yourself). Comments are pretty much any content on a line that follows a `#` symbol (unless it is between quotation marks. For example:

    >>># we're going to do some math now
    >>>yae = 5                   # the number of votes in favor
    >>>nay = 10                  # the number of votes against
    >>>proportion = yae / nay    # the proportion of votes in favor
    >>>print(proportion)


When you are creating functions or classes (a bit more on what these are in a bit) you can also create what are called *doc strings* that provide a defined location for content that is used to generate the `help()` information highlighted above and is also used by other systems for the automatic generation of documentation for packages that contain these *doc strings*. Creating a *doc string* is simple - just create a single or multi-line text string (more on this soon) that starts on the first indented line following the start of the definition of the function or class. For example:  

    >>># we're going to create a documented function and then access the information about the function
    >>>def doc_demo(some_text="Ill skewer yer gizzard, ye salty sea bass"):
    ...    """This function takes the provided text and prints it out in Pirate
    ...    
    ...    If a string is not provided for `some_text` a default message will be displayed
    ...    """
    ...    out_string = "Ahoy Matey. " + some_text
    ...    print(out_string)
    >>>help(doc_demo)
    >>>doc_demo()
    >>>doc_demo("Sail ho!")

### Standard Objects

Any programming language has at its foundation a collection of *types* or in Python's terminology *objects*. The standard objects of Python consist of the following:

* **Numbers** - integer, floating point, complex, and multiple-base defined numeric values
* **Strings** - **immutable** strings of characters, numbers, and symbols that are bounded by single- or double-quotes
* **Lists** - an ordered collection of objects that is bounded by square-brackets - `[]`. Elements in lists are extracted or referenced by their position in the list. For example, `my_list[0]` refers to the first item in the list, `my_list[5]` the sixth, and `my_list[-1]` to the last item in the list. 
* **Dictionaries** -  an unordered collection of objects that are referenced by *keys* that allow for referring to those objexts by reference to those keys. Dictionaryies are bounded by curley-brackets - `{}` with each element of the dictionary consisting of a *key* (string) and a *value* (object) separated by a colon `:`. Elements of a dictionary are extracted or referenced using their keys. for example:

        my_dict = {"key1":"value1", "key2":36, "key3":[1,2,3]}
        my_dict['key1'] returns "value1"
        my_dict['key3'] returns [1,2,3]

* **Tuples** - **immutable** lists that are bounded by parentheses = `()`. Referencing elements in a tuple is the same as referencing elements in a list above. 
* **Files** - objects that represent external files on the file system. Programs can interact with (e.g. read, write, append) external files through their representative file objects in the program.
* **Sets** - unordered, collections of **immutable** objects (i.e. ints, floats, strings, and tuples) where membership in the set and uniqueness within the set are defining characteristics of the member objects. Sets are created using the `set` function on a sequence of objects. A specialized list of operators on sets allow for identifying *union*, *intersection*, and *difference* (among others) between sets. 
* **Other core types** - Booleans, types, `None`
* **Program unit types** - *functions*, *modules*, and *classes* for example
* **Implementation-related types** (not covered in this workshop)

These objects have their own sets of related methods (as we saw in the `help()` examples above) that enable their creation, and operations upon them.

In [45]:
# Fun with types

this = 12
that = 15
the_other = "27"
my_stuff = [this,that,the_other,["a","b","c",4]]
more_stuff = {
    "item1": this, 
    "item2": that, 
    "item3": the_other, 
    "item4": my_stuff
}
this + that

# this won't work ...
# this + that + the_other

# ... but this will ...
this + that + int(the_other)

# ...and this too
str(this) + str(that) + the_other

'121527'

## Lists

<https://docs.python.org/3/library/stdtypes.html?highlight=lists#list>

Lists are a type of collection in Python. Lists allow us to store sequences of items that are typically but not always similar. All of the following lists are legal in Python:

In [46]:
# Separate list items with commas!

number_list = [1, 2, 3, 4, 5]
string_list = ['apples', 'oranges', 'pears', 'grapes', 'pineapples']
combined_list = [1, 2, 'oranges', 3.14, 'peaches', 'grapes', 99.19876]

# Nested lists - lists of lists - are allowed.

list_of_lists = [[1, 2, 3], ['oranges', 'grapes', 8], [['small list'], ['bigger', 'list', 55], ['url_1', 'url_2']]]

There are multiple ways to create a list:

In [47]:
# Create an empty list

empty_list = []

# As we did above, by using square brackets around a comma-separated sequence of items

new_list = [1, 2, 3]

# Using the type constructor

constructed_list = list('purple')

# Using a list comprehension

result_list = [i for i in range(1, 20)]

We can inspect our lists:

In [48]:
empty_list

[]

In [49]:
new_list

[1, 2, 3]

In [50]:
result_list

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [51]:
constructed_list

['p', 'u', 'r', 'p', 'l', 'e']

The above output for `typed_list` may seem odd. Referring to the documentation, we see that the argument to the type constructor is an _iterable_, which according to the documentation is "An object capable of returning its members one at a time." In our construtor statement above

```
# Using the type constructor

constructed_list = list('purple')
```

the word 'purple' is the object - in this case a word - that when used to construct a list returns its members (individual letters) one at a time.

Compare the outputs below:

In [52]:
constructed_list_int = list(123)

TypeError: 'int' object is not iterable

In [53]:
constructed_list_str = list('123')
constructed_list_str

['1', '2', '3']

Lists in Python are:

* mutable - the list and list items can be changed
* ordered - list items keep the same "place" in the list

_Ordered_ here does not mean sorted. The list below is printed with the numbers in the order we added them to the list, not in numeric order:

In [54]:
ordered = [3, 2, 7, 1, 19, 0]
ordered

[3, 2, 7, 1, 19, 0]

In [55]:
# There is a 'sort' method for sorting list items as needed:

ordered.sort()
ordered

[0, 1, 2, 3, 7, 19]

Info on additional list methods is available at <https://docs.python.org/3/library/stdtypes.html?highlight=lists#mutable-sequence-types>

Because lists are ordered, it is possible to access list items by referencing their positions. Note that the position of the first item in a list is 0 (zero), not 1!

In [56]:
string_list = ['apples', 'oranges', 'pears', 'grapes', 'pineapples']

In [57]:
string_list[0]

'apples'

In [58]:
# We can use positions to 'slice' or selection sections of a list:

string_list[3:]

['grapes', 'pineapples']

In [59]:
string_list[:3]

['apples', 'oranges', 'pears']

In [60]:
string_list[1:4]

['oranges', 'pears', 'grapes']

In [61]:
# If we don't know the position of a list item, we can use the 'index()' method to find out.
# Note that in the case of duplicate list items, this only returns the position of the first one:

string_list.index('pears')

2

In [62]:
string_list.append('oranges')

In [63]:
string_list

['apples', 'oranges', 'pears', 'grapes', 'pineapples', 'oranges']

In [64]:
string_list.index('oranges')

1

In [65]:
# one more time with lists and dictionaries
list_ex1 = my_stuff[0] + my_stuff[1] + int(my_stuff[2])
print(list_ex1)

list_ex2 = (
    str(my_stuff[0]) 
    + str(my_stuff[1]) 
    + my_stuff[2] 
    + my_stuff[3][0]
)
print(list_ex2)

dict_ex1 = (
    more_stuff['item1']
    + more_stuff['item2']
    + int(more_stuff['item3'])
)
print(dict_ex1)

dict_ex2 = (
    str(more_stuff['item1'])
    + str(more_stuff['item2'])
    + more_stuff['item3']
)
print(dict_ex2)



54
121527a
54
121527


In [66]:
# Now try it yourself ...
# print out the phrase "The answer: 42" using the following 
# variables and one or more of your own and the 'print()' function
# (remember spaces are characters as well)

start = "The"
answer = 42


### Operators

If *objects* are the nouns, operators are the verbs of a programming language. We've already seen examples of some operators: *assignment* with the `=` operator, *arithmetic* addition *and* string concatenation with the `+` operator, *arithmetic* division with the `/` and `-` operators, and *comparison* with the `>` operator. Different object types have different operators that may be used with them. The [Python Documentation](https://docs.python.org/3/library/stdtypes.html) provides detailed information about the operators and their functions as they relate to the standard object types described above. 

###  Flow Control and Logical Tests

Flow control commands allow for the dynamic execution of parts of the program based upon logical conditions, or processing of objects within an *iterable* object (like a list or dictionary). Some key flow control commands in python include:

* `while-else` loops that continue to run until the termination test is `False` or a `break` command is issued within the loop:

        done = False
        i = 0
        while not done:
            i = i+1
            if i > 5: done = True

* `if-elif-else` statements defined alternative blocks of code that are executed if a test condition is met:

        do_something = "what?"
        if do_something == "what?":
            print(do_something)
        elif do_something == "where?":
            print("Where are we going?")
        else:
            print("I guess nothing is going to happen")
            
* `for` loops allow for repeated execution of a block of code for each item in a python sequence such as a list or dictionary. For example:

        my_stuff = ['a', 'b', 'c']
        for item in my_stuff:
            print(item)
        
        a
        b
        c
