# Intro to Markdown

What you are seeing is a Markdown file. It is a text document format that is fairly easy to write.

Markdown is organized in blocks. Blocks can contain either text or code. This is a text block. You can edit a block by double clicking in it. An editor view will pop up.

You can use the user interface to add elements such as **bold text**, links or pictures. All these elements are converted to plain text and you can inspect what they look like in the background in edit mode. For instance, if you activate edit mode, you will see that a text is bold when it is surrounded by two asterisks, as in `**bold text**`.  You will also see that surrounding text in backticks displays it as a code snipet. This is useful for referencing code in code blocks.



# Variables and Functions

The most basic objects in Python are **variables**. You can create a variable by giving it a name, using an equal sign, and assigning it a value. So `a = 2` creates a variable named `a` that stores a value of 2.


Another important concept is a function. Each function performs, well, a function. Most functions you will use have been created and distributed either by Python itself, or through several packages we will use, such as the Natural Language Toolkit (nltk) package.

The code below defines a function named `mean()`. To differentiate functions from variables, functions receive parenthesis. The function we define takes two **arguments**, `number1` and `number2`. Note that these are **local variables**, they only exist within the function.

The function then creates a new local variable called `sum` which takes the sum of both numbers. Python handles mathematical operations as you would expect. The function creates another local variable called `average`. Lastly, the function returns the value assigned to `average`.

We **call** a function by its name, and add any arguments inside parenthesis. The function returns only what `return` dictates. We typically don't see what's going on in the background of functions.

The following block is a code block. It will introduce some basic Python functionality. Note that the code block has a "run" option. Running a code block will execute the code and print any outputs from that code.

In [1]:
# This is a code block. You can add text as comments by starting a line with a hashtag

'''
You can add multiple lines of text by surrounding your paragraph in three single quotation marks.

Since Markdown allows for text blocks, this is often unnecessary
'''

a = 2
b = 4
c = "Hello!"

def mean(number1, number2):
    sum = number1 + number2
    average = sum / 2
    return average

mean(a, b)

3.0

# Object types

Each object, such as a variable, has a **type**. You can learn what the type of an object is with the `type()` function. You can print messages after the code block with the `print()` function.

We used both functions to print the type of `a`. When `a = 5`, `a` is a string (as noted by the `<class 'int'>` output).

We can re-assign values to variables and, in Python, types are changed automatically. So after changing `a` to `Hello Ruby`, `a` becomes a string variable.

Here are the types of variables you will typically encounter:

* Integers (`int`): Positive or negative whole numbers
* Floating-point (`float`): Positive or negative numbers with a decimal point.
* Strings(`str`): A single character or a group of characters. Strings are created with double-quotes (`"" ""`) or single quotes (`' '`) in Python.
* Boolean (`bool`): A variable that can only be `true` or `false`

The following code shows some examples:

In [2]:
a = 5.2
print(type(a))

a = "Hello"
print(type(a))
print(len(a))

<class 'float'>
<class 'str'>
5


In [3]:
# Values can be passed directly to functions, no need to assign them to variables first
print(type(5))

# Python handles object types automatically
print(type(5.0))

print(type("5"))

# You can perform operators with any type of number objects.
print(5 + 5)
print(type(5 + 5))

# Integers are converted to float automatically
print(5 + 2.0)
print(type(5 + 2.0))

# You can use the == operator to check if two values are equal. This returns a bool
print(5 == 5)

print(type(5 == 5))

<class 'int'>
<class 'float'>
<class 'str'>
10
<class 'int'>
7.0
<class 'float'>
True
<class 'bool'>


# Diving deeper into objects!

You will rarely work with single objects alone. Objects are grouped in Python in a few different ways

You create groups of objects the same way you create a variable: give it a name, equal sign, and the values it should have. The formatting of the inputs will determine the data type. See below for examples

## Lists (and modules!)
Lists are a ordered collection of objects. As such, you access elements in a list by their position. List objects can be of any data type!

Lists are created with square braces  `[ ]`

An important point: **Python uses 0 indexing**, which means you start counting from 0, not one. The first object of a list is in position 0.

Here another important concept is introduced: methods. Methods are functions that objects can perform. Methods of an object can be accessed with the dot operator `.`. You will see a few examples below:


In [5]:
# You can declare several variables at the same time
a, b, c = 1, 2, 3
print(a + b)
print(a + b == c)

# You can create a list with square braces.
first_list = [a, b, c]
print(first_list)
print(type(first_list))

# You can access list objects with square braces indicating their position
print(first_list[0])

# The len function returns the size of the list
print(len(first_list))

# You can change objects in a list. A list can contain any object types.
first_list[2] = "Hello"
print(first_list)

# You can add objects to a list with the append module

first_list.append(2)
print(first_list)

3
True
[1, 2, 3]
<class 'list'>
1
3
[1, 2, 'Hello']
[1, 2, 'Hello', 2]


## Dictionaries

Dictionaries are unordered key-value pairs. As such, the main difference between dictionaries and lists is that you access dictionary elements by their name, while you access list elements by their position. Dictionaries are created using square brackets `{ }`

In [6]:
first_dictionary = {'a': 1,
                    'b': 2,
                    'c': 3}

print(first_dictionary)

first_dictionary['c']

# You can change elements by their name
first_dictionary['a'] = "Hello"
#If 'a' is the name, what is the meaning of the numbers in this dictionary?

print(first_dictionary)

{'a': 1, 'b': 2, 'c': 3}
{'a': 'Hello', 'b': 2, 'c': 3}


## Arrays
Arrays work very similarly to lists, however they are more efficient in the background and are preferred for tasks that demand heavy computation. Importantly, **array elements are all of the same type**

Python does not have a "native" array format, so we will also learn how to use libraries.


The most popular array library is `numpy`. Numpy is considered the state-of-the-art library for numerical operations. You can import a list with the `import` command. We will use various Numpy modules with the dot operator as well.

We can create a numpy array with the `.array` module. It takes a list as an argument.

In [7]:
# We import the Numpy library and save it in an object named np
import numpy as np

a, b, c = 1, 2, 3

# Create an array with the values. Note the list notation inside the parenthesis
first_array = np.array([a, b, c])

# Is is cleaner to create a list first
first_list = [a, b, c]
second_array = np.array(first_list)

# Check if the elements in both array are the same as expected
print(first_array == second_array)

# Access elements by their index, as you would with a list
print(first_array[0])

[ True  True  True]
1


# Additional functionality

Here is a comprehensive list of more advanced concepts you may encounter. In particular to language processing, we also dive deeper into working with strings since they are the main unit of analysis


In [8]:
'''
Access string characters using their position, as you would with a list.

In fact, it is useful to think of a string as a list of individual characters
'''

sentence = "Hello, my name is Ruby"

print(sentence[0])

# Slicing selects based on a range rather than a single position
print(sentence[0:5])

# Omiting the first number selects from the beginning
print(sentence[:5])

# Omiting the last number selects until the end
print(sentence[6:])

# A negative sign selects from the end
print(sentence[-1])

# You can combine these to select the last n characters
print(sentence[-4:])

# The code above slices from the last fourth character to the end

# You can concatenate strings
print(sentence[:5] + ",", sentence[6:] + ", nice to meet you.")

H
Hello
Hello
 my name is Ruby
y
Ruby
Hello,  my name is Ruby, nice to meet you.


In [9]:
first_array = np.array([1, 2, 3])

# You can perform element-wise operations with array elements
print(first_array + 10)

# You can perform element-by-element operations
second_array = np.array([3, 2, 1])


[11 12 13]


As short as this intro is, it covers most of the things you will encounter in the language processing project.

New functions will be explained when they are first used, and any other methods used to clean or filter the data, such as slicing, will be explained as well

To "visualize" how this short intro becomes a large data analysis project, here is an outline of what has to be done

1. Convert the entire transcript into a Python object. It will be a string, which is a very large collection of ordered characters.
2. Extract each sentence and attribute it to a person. This can be easily done with a **dictionary**, where each person is a key, and their entire speech is the value associated with that key.
3. We can use several methods to break down the entire speech of a person into manageable blocks. For instance, we can create a **list** of words by breaking the single string containing all the text into many individual strings.
4. We can perform element-wise operator in each of the elements of the new list. For instance, we can go over all the words and check whether they are verbs.

The power of programming comes from the fact that each object inside a list or a dictionary can itself be a list or a dictionary. As such, we can create a nearly infinite structure of organized data. So we will have a dictionary where each person has a list of lists: individual words they used, n-grams (groups of n-words), individual sentences, action words, etc.