# Week 1: Introduction to Python (Part 1)

This is a python notebook file 
- it has the file extension **.ipynb** 
- it can be run in a web-based interactive computational environment such as **Google colab** or using Anaconda's Jupyter notebook environment

Cells in a python notebook can contain **markdown** (like this one) or **code**. 
- To run a code cell, press shift+Enter. 
- The output from a cell will be printed beneath it.
- If you run a markdown cell, the text will be formatted according to the markdown instructions.  Click edit on this cell and see how the text was actually input.


The notebooks this week are designed to give you the working knowledge of Python necessary to complete the lab sessions for Natural Language Engineering. 

- Run all of the code cells as you work through the notebook. 
- Try to understand what is happening in each code cell and predict the output before running it.
- Get used to adding your own cells (both code and text) whereever you want to try things out
- Complete all of the exercises.
- Discuss answers and ask questions!


## 1.1.1 Python types

We are going to start by looking at some basic datatypes in Python.
- String
- Integer
- Float

### Strings
A String is a datatype used to represent text.  In Python, Strings can be enclosed in double or single quotes.
- 'Hello World'
- "Hello World"  

Quite often we want to display or print strings to output - we can do this with Python's built-in *print()* function.  We will look more at functions next time - but for now, *print()* is a function which takes one or more arguments (specified in the () after the keyword *print*).  The arguments will be printed in the output when the cell is run.

Run the code in the cells below by clicking on them and then pressing "shift"+"enter" (or by clicking on the play button next to the cell in google colab).

In [None]:
print('Hello World')

Hello World


In [None]:
print("Hello World")

Hello World


In [None]:
# This is a comment (# at the beginning of the line)
# Note that a string enclosed in double quotes can contain single quotes as part of the string:
print("'A reader lives a thousand lives before he dies,' said Jojen. 'The man who never reads lives only one.'")

In [None]:
# ...and a string enclosed in single quotes can contain double quotes as part of the string:
print('"A reader lives a thousand lives before he dies," said Jojen. "The man who never reads lives only one."')

As an alternative to using the explicit `print` function, when a cell is run, Python will print the value of the **last line of code** in a cell. Try running the following cell.

In [None]:
"Hello World"
'"A reader lives a thousand lives before he dies," said Jojen. "The man who never reads lives only one."'

### Integers
Integers represent whole numbers

In [None]:
75

75

### Floats
Floats represent decimal or floating point numbers

In [None]:
6.3646

When a string contains just digits, the function `int` will **cast** that string to an integer.

In [None]:
# give the type of the string '623'
type('623')

In [None]:
# cast the string '623' to an integer
int('623')

In [None]:
# give the type that results from casting the string '623' to an integer.
type(int('623'))

## 1.1.2 Basic operations

Now we will look at some operations which can be carried out on the basic datatypes.

Strings can be joined using `+`

In [None]:
"Hello " + "World"

Standard mathematical operators can be used on integers and floats: `+`, `-`, `*`, and `/`.

In [None]:
7 - 3 + 5

In [None]:
100*200*1000000

In [None]:
3.5*8/4

If we want to use floor division (rounded down to nearest integer), we use `//`.

In [None]:
7//2

Use `**` for exponentiation - e.g. `3**2 = 3^2`.

In [None]:
# This is equivalent to 2*2*2*2*2
2**5

In [None]:
10**4

Use double equals, `==`, to check equality.

In [None]:
5*4 == 2*10

Modulo operator `%` returns the remainder after integer division.  
e.g. 13/5 = 2 with 3 leftover, so `13%5=3`.

In [None]:
7%3

In [None]:
4 % 2

## 1.1.3 Python error reports

Sometimes your code won't work and will generate an error.  You need to get used to looking at error reports and seeing what the type of error is and where it has occurred.

For example, in the code below a Type Error occurs when attempting to join a **string** and an **integer**.  This is because the operator + can only be used on values of the same type (e.g., two strings or 2 integers or 2 floats)

In [None]:
"Hello" + 3

TypeError: ignored

### **Exercise 1**
In the empty cell below write a single line Python expression to print "Hello world! My name is", joined with another string containing your name

In [None]:
"hello world!my name is"+"Agneisa"

'hello world!my name isAgneisa'

## 1.1.4 Python identifiers

Normally, we want to store values in variables so we can use them later.

We can assign a variable name to any value (e.g., string, integer, float) using a single equals sign.



In [None]:
student_name = "Adam"

The code above didn't generate any output - it just stored the string value "Adam" in the variable called *student_name*.  To see the current value of any variable, we can use the print function or just run a cell containing the variable name alone (or as the last line of the cell).

In [None]:
print(student_name)

In [None]:
student_name

You can use any name you like for a variable.  However, be careful not to choose names which are also Python key words

In [None]:
print = "Adam"

I have now overwritten the inbuilt function print with a String.  This means I can no longer use the *print()* function.

In [None]:
print(student_name)

Even, if you go back and change or delete the offending cell, the print function appears to be gone.  

Go back and try deleting the cell where we assigned the value "Adam" to print, and then calling the print function again.

In [None]:
print(student_name)

You can fix this by using the del function or by restarting the runtime environment.

In [None]:
del print
print(student_name)

So, when choosing variable names, remember:
- don't use keywords (or anything which might be a keyword).

It is also best to: 
- use meaningful variable names.  This will to help you remember what the variables store (and to help other people read your code)
- use _ to join separate words to form a single variable name.  This is a Python convention which is different to the convention of using camelCase (e.g., studentName) in other languages such as Java.

In [None]:
student_age = 21
student_age

Operations can be carried out as before, using the variable names.

In [None]:
student_age/2

We can update values associated with a variable using the operators `+=` , `-=` , `/=`, and `*=`.

- For example, `+=` adds the number on the right to the current value.

This is a useful shortcut - take your time to play around and familiarise yourself with this syntax.

In [None]:
#Run this cell multiple times to see what happens.
#Note that each time you run this cell, it will add 5 to the stored value.
student_age += 5
student_age

In [None]:
age_next_year=student_age+1
age_next_year

### **Exercise 2a**
In the cell below, assign appropriate values to the variables `my_name`, `my_age`, and `years_at_sussex`.

### **Exercise 2b**
In the cell below subtract `years_at_sussex` from `my_age` and assign this value to a new variable called `age_started_sussex`.

### **Exercise 2c**
In the cell below practice using the `**`,  `+=` , `-=`, `/=`, and `*=` operators to update these values.

### 1.1.5 Dynamic typing
The `type` function is used to get an object's type: `int` for integer, `str` for string, etc.

In [None]:
type(student_name)

In [None]:
type(student_age)

As Python has dynamic typing, if a variable name is assigned to a new value of different type, the variable's type will change accordingly.

In [None]:
student_age = "Twenty"
type(student_age)

str

### **Exercise 3**
In the cell below reassign your `my_age` and `years_at_sussex` `int` variables to `string` giving the number in words. Print the type of these variables before and after.

In [None]:
my_age = 20
years_at_sussex = 1
print("my age is " + str(my_age) + "years at sussex is "+ str(years_at_sussex))

my age is 20years at sussex is 1


## 1.1.6 Lists

We are now going to look at a more complex data structure.  A list is an ordered collection of other data types.

Lists are initialised using square brackets, with objects separated by commas.

In [None]:
primes = [2,3,5,7,11]
type(primes)

Lists can contain any data type.

In [None]:
list_of_strings =['string','another string','a third string']
list_of_strings

'Empty' lists with no elements can also be initialised.

In [None]:
empty_list = []

Indexing into lists uses square brackets.
- Note that indexing starts from zero.

In [None]:
primes = [2,3,5,7,11]
type(primes)
primes[0]

2

A colon, `:`, can be used to take a slice of a list between two indices.
- Note that this will start from the first index, up to but NOT including the second index.

In [None]:
primes = [2,3,5,7,11]
type(primes)
primes[1:3]

[3, 5]

If either index is omitted, the slice will go to the beginning/end of the list.左闭右开

In [None]:
primes = [2,3,5,7,11]
type(primes)
primes[:3]

[2, 3, 5]

To index from the end of the list use negative numbers.

In [None]:
primes = [2,3,5,7,11]
type(primes)
primes[-1]

11

In [None]:
primes = [2,3,5,7,11]
type(primes)
primes[-2:]

[7, 11]

To test for list membership use the keyword `in`.

In [None]:
5 in primes

True

In [None]:
3 in primes

True

The function `len` gives the length of a list.

In [None]:
len(primes)

To append an element to a list use `append`.

In [None]:
primes.append(13)

In [None]:
primes.append(17)

In [None]:
primes

Using `append` with a list as parameter adds the list as a single element - producing a list that contains a list as its last element.

In [None]:
primes = [2, 3, 5, 7, 11, 13]
primes.append([17,19])
primes

[2, 3, 5, 7, 11, 13, [17, 19]]

That probably isn't wahat we wanted to do.  
If we want to add the elements of one list individually to another list, use the `+=` operator to concatenate the two lists.

In [None]:
primes = [2, 3, 5, 7, 11, 13]
primes += [17,19]
primes

Quite often when we have a list, we want to do the same thing to everything in that list.  That requires us to write some code to **iterate over the list**.  The most simple way to do this is with a **for** loop

To write a for loop that iterates over a list, we use the keywords `for` and `in`, `:`, and indentation to indicate the scope of the body of the loop.


In [None]:
for prime in primes:
    print(prime,"is a prime")

In the code above we could have used any variable name instead of prime. 

In [None]:
for alien_planet in primes:
    print(alien_planet,"is a prime")

It is usually best practice to consider the loop variable, alien_planet, as local to the loop and not try to access it from outside the loop.  However, if you do, you will get the last value that it had during the iteration.

In [None]:
alien_planet

### **Exercise 4a**
In the cell below initialise the variable `squares` to be a list of the square numbers from 1 to 16 inclusive.

In [None]:
x = range(1,17)
def square(num):
    return num*num
print(list(map(square,x)))

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256]


### **Exercise 4b**
In the cell below append the next square number to the list `squares`.

In [13]:
a=[2]
for i in range(1,17):
  a.append(square(a[len(a)-1]))
  print(a)

[2, 4]
[2, 4, 16]
[2, 4, 16, 256]
[2, 4, 16, 256, 65536]
[2, 4, 16, 256, 65536, 4294967296]
[2, 4, 16, 256, 65536, 4294967296, 18446744073709551616]
[2, 4, 16, 256, 65536, 4294967296, 18446744073709551616, 340282366920938463463374607431768211456]
[2, 4, 16, 256, 65536, 4294967296, 18446744073709551616, 340282366920938463463374607431768211456, 115792089237316195423570985008687907853269984665640564039457584007913129639936]
[2, 4, 16, 256, 65536, 4294967296, 18446744073709551616, 340282366920938463463374607431768211456, 115792089237316195423570985008687907853269984665640564039457584007913129639936, 13407807929942597099574024998205846127479365820592393377723561443721764030073546976801874298166903427690031858186486050853753882811946569946433649006084096]
[2, 4, 16, 256, 65536, 4294967296, 18446744073709551616, 340282366920938463463374607431768211456, 115792089237316195423570985008687907853269984665640564039457584007913129639936, 13407807929942597099574024998205846127479365820592393377723561

### **Exercise 4c**
In the cell below make a list of the next two square numbers and concatenate this with `squares`.

### **Exercise 4d**
In the cell  below check how many items are in the list now.

### **Exercise 4e**
In the cell below use indexing to print just the first 3 and last 3 items in the list `squares`

### **Exercise 4f**
In the cell below, use a `for` loop to print each item in the list `squares` on its own line, as part of a sentence. The output should like like this:
```
The first square in the list is  1
The next square in the list is  4
The next square in the list is  9
The next square in the list is  16
The next square in the list is  25
The next square in the list is  36
The last square in the list is  49
```

In [11]:
def square(num):
    return num*num
for i in range(1,8):
  result=square(i)
  print("The next square in the list is"+ str(result) )
  i+=1



The next square in the list is1
The next square in the list is4
The next square in the list is9
The next square in the list is16
The next square in the list is25
The next square in the list is36
The next square in the list is49


## 1.1.7 Strings

We are now going to take a bit more of an in-depth look at Strings.  

We often think of Strings as an atomic data types, like integers and floats, out of which we might make other more complex types (e.g., lists) but which can't be broken down any further.  But actually, a String can be thought of as a complex datatype - it is a **list** of **characters**. We just have an easier way of writing it (as a String e.g., 'Adam')) rather than using the square brackets notation ['A','d','a','m'].   

However, Python lets us use a lot of list functionality straightforwardly on Strings.

In [None]:
# Here we asign a string "Hello World" as the value a variable called hello_world
hello_world = "Hello World"

String indexing is similar to list indexing, but works on a character-by-character basis.

In [14]:
hello_world[0]

NameError: ignored

In [None]:
hello_world[7]

In [None]:
hello_world[-3:]

In [None]:
hello_world[-40]

Can you work out why the error above was generated?

We can also easily test for substring presence using the keyword `in`.

In [15]:
"w" in hello_world

NameError: ignored

In [None]:
"W" in hello_world

In [None]:
"llo" in hello_world

We can also find the length of a string using `len`.

Note that the output value is a count including spaces, tabs and non-alphanumeric characters.

In [16]:
len(hello_world)

NameError: ignored

In [None]:
hello_world+="!"
hello_world

In [17]:
len(hello_world)

NameError: ignored

We can iterating over a string with the same syntax as in normal list iteration.  However, it now works on a character-by-character basis.  In other words, in each iteration of the loop, the loop variable will the next character in the string.

In [None]:
for char in hello_world:
    print ("the character >>>", char, "<<< is present")

The `split` method provides a simplistic way to parse a string into words.   By default, it separates based on whitespace and will returns a list of *tokens*.   We will learn more about tokenisation in week 2.

An optional character can be passed to split as an argument.  See the difference if you change the following cell so that the second line reads `words = sentence.split('s')`



In [None]:
sentence = "This is a sample sentence"
words = sentence.split()
print(words)

To check for the presence of a token in a list of words, we use the `in` keyword.

In [None]:
"sample" in words

In [None]:
"Hello" in words

### **Exercise 5a**
In the empty cell below  assign the string `"It was the best of times, it was the worst of times"` to the variable `opening_line`.

In [19]:
opening_line = "It was the best of times, it was the worst of times" 

### **Exercise 5b**
In the empty cell below check whether 'worst' appears in opening_line.

In [20]:
opening_line = "It was the best of times, it was the worst of times" 
"worst" in opening_line

True

### **Exercise 5c**
In the empty cell below make a list of the words in `opening_line`, assigned to the variable `dickens_words`, and iterate over `dickens_words`, printing one word per line.

In [24]:
opening_line = "It was the best of times, it was the worst of times" 
dickens_words = ""
for each in opening_line:
  dickens_words=dickens_words.join(each)
  print(dickens_words)

I
t
 
w
a
s
 
t
h
e
 
b
e
s
t
 
o
f
 
t
i
m
e
s
,
 
i
t
 
w
a
s
 
t
h
e
 
w
o
r
s
t
 
o
f
 
t
i
m
e
s


### **Exercise 5d**
In the empty cell below check whether `'blurst'` appears in the list you made.

## 1.1.8 Conditions and booleans

Finally, we are going to take a quick look at conditional statements.  In the code below, note the use of the keywords if and else as well as the presence of the colons (:) and the indentation.

In [None]:
if 2 > 3:
    print ("yes")
else:
    print ("no")

In [None]:
if len(words) > 10:
    print("its a long sentence")
else:
    print("its a short sentence")

There are some useful string *shape* methods, which form part of the String class and can be used to test for certain types of string.  Work out what each of the following test for:
- astring.isalpha()
- astring.isalnum()
- astring.isdigit()

In [None]:
"This".isalpha()

In [None]:
"This,".isalpha()

In [None]:
"M25".isalpha()

In [None]:
"M25".isalnum()

In [None]:
"463".isdigit()

Boolean statements can be combined using `and`. Both must be true for the combination to be evaluated as `True`.

In [None]:
True and True

In [None]:
False and True

Boolean statements can be combined using `or`. At least one statement must be true for the combination to be evaluated as `True`.

In [None]:
False or True

In [None]:
True or False

A boolean statement can be negated using `not`.

In [None]:
not True

In [None]:
not False