# Welcome to the Python & Jupyter notebook Basics

In this part, we will make ourselves familiar with the **jupyter notebook** and **Python** programming language

In [2]:
"Hi! Welcome to scraping with Python."

'Hi! Welcome to scraping with Python.'

<br>➡  This is a jupyter notebook cell
<br>➡  each cell needs to be **run** after you write your code: by clicking the `▶` button above, or with `shift/ctrl+ enter`
<br>➡  If you see a number between the brackets next to the cell, eg `[23]`, the cell **has been run**
<br>➡  If you see empty brackets `[ ]` the cell has **not been run**
<br>➡  If you see this `[*]`, the cell is **running**. You can then not run any other cells while one is running
<br>➡  **Important**: if you adjust a cell, you need to **run it again**!

#### Add new cels:
<br>➡ with the `+` sign in the top menu
<br>➡ by pressing ESC and then `a` (above) or `b` (below)


In [22]:
# this is a comment

<br>➡ put a **comment** in your code using the hashtag `#`
<br>➡  Everythung after the hashtag won't be read by Python: `# This is a comment`

# Example scraper

In [112]:
import requests
from bs4 import BeautifulSoup as bs
import pandas

data = []
for page in range(1,6):
    URL = "https://www.theguardian.com/news/series/todayinfocus?page=" + str(page)

    website_request = requests.get(URL)
    website_content = website_request.text
    website_read = bs(website_content)
    
    for headline in website_read.select("span.js-headline-text"):
        data.append(headline.text.strip())

pandas.DataFrame(data).to_csv("headlines.csv")

# **Loops**

![slicing](img/loops.png)

<br>➡  a set of instructions that are **continually repeated**

```python
for item in [something, something_else]:
    print(item)
```

<br>➡ `item` is a placeholder name for the 'something' in between the brackets
<br>➡ `print()` is a method that just displays the value 
<br>➡ in scraping used to for example open multiple URLs and extract data from them


In [86]:
# 1. Let's make a shopping list for 'bread' and 'butter' and display the items one by one using a for loop
# 2. expand on the list (add new items, such as blueberries, apples or beer)
# 3. display "We need some" for each item on the shopping list

for item in ["bread", "cheese", "blueberries"]:
    print("We need some " + item)

We need some bread
We need some cheese
We need some blueberries


<br>**Q**: What is happening here?

➡ sometimes we can create collections with a built-in **function**
<br>➡ for example `range(start, end)`

In [87]:
# 1. create a range 1-10 and print the numbers
# 2. add some text to the numbers

for nr in range(1,11):
    print(nr)

1
2
3
4
5
6
7
8
9
10


# **Debugging**

![slicing](img/bugs.png)

Most common errors:
<br>➡ `NameError` : the variable name is not right. Check for **typos**
<br>➡ `SyntaxError` : the syntax is not right, you can be **missing** brackets, quotes, or using the wrong ones
<br>➡ `AttributeError` : the method you are using is not correct
<br>...

![slicing](img/debug.png)

# **Variables**

![slicing](img/variables.png)

<br>➡  `variables` = kind of box with a label on them in which you can stor  numbers, names, expressions, and even other variables
<br>➡  name of a variable is arbitrary, but it is useful if you know what it stands for.
<br>➡  create a variable with `variable_name =`
<br>➡  a variable name can only contain **lower and uppercase letters, numbers, and underscores. No spaces** or other funky characters!

## **Strings**

<br>➡ `string` = text
<br>➡ put it in between single or double quotes. `"text"` or `'text'`

In [27]:
favorite_drink = 'tea'
city = "Oslo"

What happens if you don't put the text between quotes?

<br>➡  you can use the `print()` method to view the content of a variable: `print(variable_name)`
<br>➡  or in jupyter notebook just the variable name

In [8]:
favorite_drink

'tea'

Print your string. Use `print()`

In [9]:
print(favorite_drink)

tea


<br>➡ You can also do **addition** and **multiplication** with strings

In [10]:
"I love " + favorite_drink + "."

'I love tea.'

In [12]:
50 * city 

'OsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOsloOslo'

<br>➡ there are obviously also things you can not do

In [13]:
city - favorite_drink

TypeError: unsupported operand type(s) for -: 'str' and 'str'

## **Numbers**

<br>➡ `Integer` : whole number, such as `5`, `6` or `2454`

In [20]:
whole_number = 5

<br>➡ `Float` : a number that is not whole, such as `5.67` or `823.12`

In [21]:
floating_number = 5.8

<br>➡ You can do `addition`, `substraction`, `multiplication`, `division` (and more) with numbers.

In [23]:
whole_number + floating_number

10.8

## Collection: **Lists**

<br>➡ A `list` is a **collection** of other variables **separated by a comma**
<br>➡  Uses **brackets** `[]`
<br>➡ `list_of_strings = ["text1","text2"]`
<br>➡  `list_of_numbers = [10, 15, 60]`
<br>➡ `empty_list = []`

Let's make a list!

In [35]:
shopping_list = ["bananas", "bread", "butter", "coffee", "vegetables"]

Let's make a list of numbers

In [38]:
amounts = [2, 1, 2, 2, 3]

## Collection: **Dictionaries**

<br>➡ A `dictionary` is a **collection** of **key** and **values**
<br>➡ Uses **curly brackets** `{}`
<br>➡ `shopping_dictionary = {"bananas": 2, "bread" : 1, "butter": 2 }`
<br>➡ `person = {"name" : "Adriana", "surname": "Homolova", "email: "adriana@homolova.sk"}`
<br>➡ `empty_dictionary = {}`

In [80]:
shopping_dictionary = {"bananas": 2, "bread" : 1, "butter": 2 }

<br>➡ `dictionary[key] = value` : creates a new key / value pair

In [83]:
# make a new item 'beer' and assign it a value of 5
shopping_dictionary["beer"] = 5
shopping_dictionary

{'bananas': 2, 'bread': 1, 'butter': 2, 'beer': 5}

In [84]:
# what happens if you assing the beer value of 60?
shopping_dictionary["beer"] = 60
shopping_dictionary

{'bananas': 2, 'bread': 1, 'butter': 2, 'beer': 60}

# **Methods**

![methods](img/methods.png)

**NOTE**
<br>➡ if a method **has a** `.` at the beginning, you put it **after** the variable, eg `variable.strip()`
<br>➡ if a method **has no** `.` at the beginning, you put the variable **in between** the brackets, eg `len(variable)`

### **String methods**

<br>➡`len()` will count the characters in a string: `len(string)`

In [28]:
len(city)

4

<br>➡ `.strip()` : remove whitespaces before or after a string 

In [29]:
dirty_text = "     Oslo    "
clean_text = dirty_text.strip()
clean_text

'Oslo'

### **Number methods**

<br>➡ `str()` : makes a string out of a number

In [43]:
str(5)

'5'

Useful while scraping

In [45]:
"www.website.org/something?page=" + str(10)

'www.website.org/something?page=10'

### **List methods**

<br>➡ Lists also have a lenght

In [46]:
len(shopping_list)

5

<br>➡ `.append()` : add new elements to the list. Often used in `for` loops

In [36]:
shopping_list.append("beer!")

In [37]:
shopping_list

['bananas', 'bread', 'butter', 'coffee', 'vegetables', 'beer!']

<br>➡ `.join()` : joins a list to create a `string`

In [67]:
" and ".join(shopping_list)

'bananas and bread and butter and coffee and vegetables and beer!'

<br>➡ `.split()` : splits a string to creates a `list`

In [33]:
string = "milk,butter,bananas"
make_into_list = string.split(",")
make_into_list

['milk', 'butter', 'bananas']

# **Slicing**

![slicing](img/bread.png)

<br>➡ use a `[]` with a number after a list or string, e.g. `list[4]`
<br>➡ *Important* : in Python, we **start counting at 0**

![hello-string](img/slice.png)

What is happening here?

In [69]:
print(shopping_list)
shopping_list[2]

['bananas', 'bread', 'butter', 'coffee', 'vegetables', 'beer!']


'butter'

In [58]:
print(shopping_list)
shopping_list[3:5]

['bananas', 'bread', 'butter', 'coffee', 'vegetables', 'beer!']


['coffee', 'vegetables']

# **Questions ?**

## Still remember how to use loops?



So I heard there is a party on tonight. Let the computer help us prepare what we'll be saying during the evening.
<br>Make a loop that displays:
<br>
```
"I only had 2 beers."
"I only had 3 beers."
"I only had 4 beers."
"I only had 5 beers."
"I only had 6 beers."
"I only had 7 beers."
"I only had 8 beers."
```

In [2]:
for nr in range(2,9):
    print("I only had " + str(nr) + " beers.")

I only had 2 beers.
I only had 3 beers.
I only had 4 beers.
I only had 5 beers.
I only had 6 beers.
I only had 7 beers.
I only had 8 beers.
