# Python for Linguists

Notebook 1: Python basics. Data types and variables.


Venelin Kovatchev

University of Barcelona 2020

This is your first Jupyter notebook. In it we will experiment with different data types and variables.

Jupyter notebooks include two types of cells - ``text`` cels like this one and ``code`` cels where you write your programs.

Text cells are informative, they can include comments, pictures, references.

Code cells are executed by pressing the ``run`` button at the top of the screen or by pressing shift + enter

In [None]:
# You can add comments to code cells by using the special symbol "#"

# Comments are not executed and make the code easier to read

                    # Comment doesn't have to start at the begining of the line

In this notebook we will make use of the print() function. 

print() is used to check a given value or the value of a variable

We still haven't learned what functions are and how they work, so we will use print() as it is

In [None]:
# The following line will print a message ``Hello World``
# Note that the "Hello World" is not kept in a variable. 
# The program will forget anything it knows about "Hello World" as soon as it prints it.
print("Hello World!")

# Printing a message is useful for keeping track of the program, for example:
print("My program starts now!")

### Variables

Variables are a way to keep track of information. 

Variables are containers where we keep "information". Each variable has a name and a value.

You can imagine variables like boxes on kitchen shelf:
   - each box has a label on it ("sugar", "salt", "pepper", etc...) - the label of the box is its ``name``
   - each box has content - some boxes are empty, some boxes contain sugar, salt, or pepper - the content of the box is its ``value``

It is not necessary that the name of the box corresponds to the value - you can put salt in the box that says sugar.
However, it is recommended that there is at least some correspondence between the two.

In [None]:
# Variables are assigned using the "=" sign
# The variable name is on the left, the variable value is on the right
# The following line creates a box with the name "my_text" and content "Hello World Variable"
my_text = "Hello World Variable!"

# If you use print() and the name of the variable you will get the value
print(my_text)

Why do we need variables?

Variables allow the program to "remember" things and keep track of them. 

Let's look at two very simple usage cases: reusing a variable and continuously modifying a variable

In [None]:
# Reusing variables

# Imagine there is a long text that you want to process. 
# So far the only function we know is print(), but soon we will learn about many others
my_long_text = "This is a sentence which is much longer than the name of the corresponding variable."



# If you want to print this sentence again, you do the same
print(my_long_text)


In [None]:
# If you want to print this sentence, all you need to do is run print() and use the name of the variable
print(my_long_text)

In [None]:
# On the contrary, if you want to print without using variaibles, you have to provide the whole text every single time:
print("This is a sentence which is much longer than the name of the corresponding variable.")

Now imagine that instead of a sentence, you have a whole paragraph, or a document, or a whole book. 

It's much more convenient to just assign a "name" to that variable and use the name

We as humans also use names to refer to things 

In [None]:
# Continuously modifying a variable

# Let's go back to the cake example from the slides
# Let's initialize our cake variable

# At the begining of the program, our cake doesn't contain anything
cake = ""
print(cake)


In [None]:
# We add flour
cake = cake + "flour"
print(cake)

In [None]:
# We add sugar
cake = cake + ", sugar"
print(cake)

In [None]:
# We add eggs
cake = cake + ", eggs"
print(cake)

# You can see how our cake keeps changing at every step. 
# Using a variable is the way to keep track of the current state of the cake.
# Computer programs can get very complicated, it is often hard (and impractical) to remember everything.
# Therefore we use variables to help us.

### Data Types

We learned about different types of data: numeric, boolean, string, list

In the next part of the practice we will see some differences between the data types and we will explore some basic operations.

#### Numeric data types

There are two types of numeric data types that we learned about - integer and float.

We introduced several operations that we can apply to numeric data types:

* Addition / Sum 	( + ) : 	2 + 3 = 5
* Subtraction 	( – ) :	3 – 2 = 1
* Multiplication 	( * ) :	2 * 3 = 3 * 2 = 6
* (True) Division 	( / ) : 	5 / 2 = 2.5
* (Integer) Division	( // ) :	5 // 2 = 2
* Power		( ** ) :	5 ** 2 = 25

You can experiment with the different data types and the different operations in the next cell.

Keep the results in variables and use print to see the result as in the example.

In [None]:
# This is an example for addition with integers and floats
result = 2 + 3
print(result)
result_fl = 2.5 + 3.1
print(result_fl)


In [None]:
# Here you can experiment with other operations - substraction, multiplication, division, power
# Compare the results using integers and floats

#### String data types

Strings are a sequence of characters, enclosed by quotes (single or double).

We can concatenate strings using the "+" sign.
We can access a specific character in the string by using brackets and the sequence number of the character (starting from 0)
We can ``slice`` a part of the string, by using brackets and providing the start and end sequence number

A special function called len() gives you the number of characters in a string

You can experiment with strings in the next cell. Can you think of other operations you could use with strings?

In [None]:
# Concatenating strings
text_1 = "some"
text_2 = "word"
text = text_1 + " " + text_2
print(text)


In [None]:
# Accessing a character by its number
# 0, 1, 2, 3 -> 3 is the 4th character
character_4 = text[3]
print("The 4th character of '" + text + "' is")
print(character_4)

In [None]:
# Slicing a part of a string
# The last character is NOT included - 0:4 means 0,1,2,3
word_1 = text[0:4] 
print(word_1)

In [None]:
# A special function to see the length of a string
# The function len() is similar to print - it does "something" with the variable that we provide.
# When used on strings, len() counts the number of characters in the string
print(len(text))
print(len(character_4))
print(len(word_1))

In [None]:
# Difference between integers and strings
string_4 = "4"
int_4 = 4

# Can you guess what will happen with each of these two actions
string_add = string_4 + string_4
int_add = int_4 + int_4

In [None]:
# Converting between data types is possible
conv_int_4 = str(int_4)
print(conv_int_4 + conv_int_4)

#### List data types

A list is a collection of OTHER data types, surrounded by brackets [] and separated by a comma:

* A list of integers: [1, 2, 3, 4, 5]
* A list of strings: [“some”, “words”, “in”, “a”, “list”]
* A list of integers AND strings: [6, “words”, “and”, 2, “numbers”, “in”, “a”, “list”]

Lists can be concatenated and sliced in a similar manner than strings. You can also use the len() function on lists

Why do we need lists? Lists allow us to represent multiple different "things". We can choose what these "things" are.

For example, observe the difference between the following string and list:

"This is a sentence containing several words."
["This", "is", "a", "sentence", "containing", "several", "words"]

The textual content of both is the same. 
However, a string is a sequence of characters, it has no structure and the only "element" of a strirng are the characters.
A list in this example is a sequence of strings. Each "element" of the list corresponds to a word.
From a lingusitic point of view it's easier to work with words than with characters.

You can experiment with lists in the next cell.

In [None]:
# Initializing a list
int_list = [1,2,3,4,5]
int_list_2 = [6,7,8]

# List addition
int_list_3 = int_list + int_list_2
print(int_list_3)

# You can experiment with trying to refer to a particular element of the list ot slice a portion of the list
# Use the same idea as when working with strings

### Computing = Data + Algorithms

Here is a very simple example of automatic text processing using the NLTK programming library. Don't worry if you don't understand everything, it's just an sneak peak into the future of the course. In the next classes you will learn more about functions, libraries, text manipulation and by the end of the course you should know more about how these programs work.

In this example we start with a sentence in a string. We then tokenize it automatically and obtain a list of words. Then we apply Part of Speech tagging to the tokenized version of the sentence.

Remember: computing = data + algorithms

What is the data and what are the algorithms in these examples?

Feel free to experiment with the code if you want.


In [None]:
# Import the nltk library
import nltk

# Define the sentencee
ex_sentence = "The quick brown fox jumped over the lazy dog."
print(ex_sentence)


In [None]:
# Tokenize the sentence
tok_sentence = nltk.word_tokenize(ex_sentence)
print(tok_sentence)

In [None]:
# POS tag the sentence
pos_sentence = nltk.pos_tag(tok_sentence)
print(pos_sentence)