<a href="https://colab.research.google.com/github/yuliiabosher/CodeSpacePortfolio/blob/main/Worksheets/01_Working_with_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Working with data
---

Data is at the heart of much of what we do, whether in business or other parts of our lives.

The weather forecast is based on data collected, process and analysed to predict trends and therefore what might happen next.

We measure how our children are doing at school using data collected about performance of students in their classes, or against students across the country.

There is a common process for managing data, a **life cycle**, which is common across most uses.

![Data life cycle](https://53.fs1.hubspotusercontent-na1.net/hub/53/hubfs/Google%20Drive%20Integration/Data%20Lifecycle%20Management.gif?width=1500&name=Data%20Lifecycle%20Management.gif)

## Why code?

Read through this presentation to help you to start thinking about why learning to use code to work with data is a valuable skill.

[Working with data](https://docs.google.com/presentation/d/e/2PACX-1vR6UxxSQE1toWFWudGgEeq6Mc-qwbcYVRHHcR_3PjifhoGOcmtuJjTWgtXM_m6N-iih8yGeezZI29aN/pub?start=false&loop=false&delayms=3000)

Listen to some reasons why knowing how to code is useful, then come back to this worksheet to start learning how to use Python code to collect data.

[Why code?](https://drive.google.com/file/d/1hj0oFUNDLpRt9ltg35W1Vj12pnUWaIfO/view?usp=drive_link)

# Coding with data

![programming process](https://drive.google.com/uc?id=1kiIsd2v0EU7PgT4avV02g7IBp_A4G8Kd)

Data is read in from a source, it is processed in some way and some new data is produced.

# Types of data

Data generally falls into two categories:

* Numeric
* Categorical

### Numeric data

* whole numbers
* decimal numbers
* decimal numbers with a high level of precision

### Categorical data

* text
* boolean (True or False)

### How Python stores data while it is using it

Data is stored in **variables**

* a variable is a holder for storing one piece of data in the computer's memory
* a variable is temporary and only exists while the code that needs it is running

Some examples of the types of data that might be stored in a variable during processing.

**Integer** (_int_)  e.g. 1, 32, 32687, -16, whole numbers from  -2147483648 through 2147483647

**Decimal** (_float_) e.g. 1.5, 31,8, -3267.9999999, -16.00

**Complex** (_complex_) these are used in option pricing models in economics and finance but are not covered in this course.






## Let's get writing python functions to collect data
----

Before we start, please watch this video that explains what a python function looks like, how to run it, add comments and deal with errors.

# [Video](https://vimeo.com/988038241/d9fb1a3596?share=copy)

---

You are going to run and change 3 **functions** to collect some whole number data, then modify the function to collect the data they use from 3 different sources.

Functions are named sets of instructions that do one particular thing, often creating a new set of data but sometimes just collecting it.  The name of a function should describe what it does, with a verb and a noun (e.g. add_one is a function that adds 1 to a number)

A *function* starts with the keyword **def** (short for define or definition). All instructions below the definition are indented and this indicates that they are part of that function. A function runs when its name is used outside the function (here it is not indented). The indentation is important, in the examples note where the code is and isn't indented.

1. Functions 1 to 3 will generate their own data
2. Functions 4 to 6 will collect data from the user
3. Functions 7 to 9 will get the data from a file

---
# Generating data within the code

To store data in a variable, you will use the **assignment operator =**

e.g.  


```
num = int(3)
num = float(5.6)
name = "Billy"
```

It is good practice to state what type of data each should be.  For numeric data you can use `int()` or `float()`

For text data  `""` will identify it.

### Exercise 1 - numerical data - whole numbers (_int_)
The cell below contains a **function** that:

*  creates a variable called **num** to store a whole number with value 3  
 `num = int(3)`
*  creates a new, second variable called **num_plus_one** to hold the result of adding 1 to `num`  
`num_plus_one = num + 1`
*  prints the new data  
`print(num_plus_one)`

If the function has been written properly, it will print 4 (this is the test).  

Run the function (click on the play button in the top left corner of the code box).

### Have a play with this function

1. Change the number called num to the number 345, run the code and it should print 346.
2. Change num to another number, run the code and check that it prints the correct answer
3.  Change num to "two" (written exactly as it is).  Run the code - it will crash and show the error message "ValueError: invalid literal for int() with base 10: 'two'" at the bottom of the error text.  This means that it couldn't process the data as it was a word rather than a number.  Change it to the number 2 and the code should run and give the answer 3

----
### Exercise 2 - processing numerical data using decimal numbers (_float_)

Have a go at writing this function (the code is included to help you).  
*  create two variables **num1** and **num2** and assign them each a whole number  
`num1 = float(28.5)` and `num2 = float(32.67)`   
*  create a third variable **total** which will store the sum of num1 + num2    
`total = num1 + num2`  
*  print the total  
`print(total)`    

Run the function to test that you get the **expected output 61.17**.  

Then change the value of one of the numbers and run the code again to get a new total.

In [5]:
def calculate_total():
  num1=1
  num2=2
  total = num1 + num2
  print(total)
  # add your code below here




calculate_total()

3


### Have a play with this function

1. Change the number called num1 to the number 345, run the code and it should print 377.67.
2. Change num2 to another  (either a whole number or a decimal number - floats can cope with both), run the code and check that it prints the correct answer
3.  Change num1 to "56.7" (written with the quotes).  Run the code - it will crash and show the error message "ValueError: invalid literal for int() with base 10: '56.7'" at the bottom of the error text.  This means that it couldn't process the data as it was a word rather than a number, even though it looks like a number (the "" identify it as text).  Change it to the number 2 and the code should run and give the answer 3

---
### Exercise 3 - categorical data - text
*  create a variable called **name** and assign it the value "Billy" with the quotes  
`name = "Billy"`
*  create a variable called **age** and assign it the value "18" with the quotes  
`age = "18"`  
* create a variable called **message** and assign it the value of all the parts of the message joined together  
`message = "Hello " + name + " you age " + age + " years old."`
*  print the message  
`print(message)`

**Expected output**:  
Hello Billy you are 18 years old

In [1]:
def show_message():
   # add your code below here


show_message()

IndentationError: expected an indented block after function definition on line 1 (<ipython-input-1-777c0ccb917d>, line 5)

### Have a play with this function

1. Change the name to "Indira", run the code to see the change.
2. Change age to another number,  run the code and check that it prints the correct answer
3. Change age to 25 (written without the quotes, so it is a number).  Run the code - it will crash and show the error message "TypeError: can only concatenate str (not "int") to str" at the bottom of the error text.  This means that it couldn't join the text together as the age was a number and not text.  

When adding things together, the variables must be of the same type. Try this:  change *age = 25* to *age = str(25)*.  It should now work correctly and give the message "Hello Indira you are 25 years old."

---
# Collecting data from the user

To collect data from the user, who will use the keyboard to enter it, use the Python instruction:

`input()`

To make it easier for the user to know that the function is waiting for them to type something, add a message in the brackets:  

`input("Please enter a number: ")`

Assign the input to the variable that will store it:

`num = int(input("Please enter a whole number: "))`
`decimal_num = float(input("Please enter a decimal number: "))`
`name = input("Please enter a name: ")`


---
### Exercise 4 - numerical data collected from the user

The cell below contains the same **function** used in Exercise 1 but this time the data will come from the user as a keyboard input rather than generated by the code, and it will add 5 instead of 1:

*  creates a variable called **num** to store a whole number entered from the keyboard  
`num = int(input("Please enter a whole number: "))`
*  creates a new, second variable called **num_plus_five** to hold the result of adding 5 to `num` (as in Exercise 1)  
*  prints the new data (as in Exercise 1)  

If the function has been written properly, and you have entered 3 when asked, it will print **8** (this is the **expected output**).  

Run the function (click on the play button in the top left corner of the code box).

In [None]:
def add_five():
  # add your code, indented, below here


add_five()

### Have a play with this function

1. Run the code with a different whole number entered.
2. Run the code with a decimal number entered (it should crash with a message saying "ValueError" as it can't convert the number from the keyboard if it doesn't look like a whole number)

----
### Exercise 5 - processing numerical data using decimal numbers (_float_) entered by the user

Have a go at writing this function (the code is included to help you).  
*  create two variables **num1** and **num2** and ask the user to enter a whole number for each  
`num1 = float(input("Please enter a decimal number: "))` and `num2 = float(input("Please enter a decimal number: "))`   
*  create a third variable **total** which will store the sum of num1 + num2    
`total = num1 + num2`  
*  print the total  
`print(total)`    

Run the function to test that you get the **expected output 61.17**.  

In [None]:
def calculate_float_total():
  # add your code, indented, below here



calculate_float_total()

### Have a play with this function

1. Try different decimal numbers and check that you get the expected output.
2. Try entering whole numbers - it should still work as it just assumes the decimal point with .00 if it isn't there.
3. Try entering words rather than numbers, you will get a ValueError

---
### Exercise 6 - categorical data - text and numbers from the keyboard
*  create a variable called **name** and ask the user to enter a name on the keyboard  
`name = input("Please enter a name: ")`
*  create a variable called **age** and ask the user to enter an age on the keyboard  
`age = input("Please enter an age`)  
* create a variable called **message** and assign it the value of all the parts of the message joined together  
`message = "Hello " + name + " you age " + age + " years old."`
*  print the message  
`print(message)`

**Expected output if user enters Billy and 18**:  
Hello Billy you are 18 years old

In [None]:
def show_message_with_keyboard_input():
   # add your code below here


show_message_with_keyboard_input()

### Have a play with this function

1. Try different names and ages and check that you get the expected output.
2. Try entering a word for the age - it should still work as it is just expecting some text for the age (but the output will not make sense).

---
# Collecting data from a file

Data from files can come in a range of formats.  To make things easier at the beginning, we will read text from plain text files holding lists of data.  Later we will learn how to read Comma Separated Values (CSV) files, which can hold table data, and JSON files which hold structured data.

Three files have been created for you to use.  You will need to upload them to this worksheet before you can work on them.

The code to read the files is already added to the functions.  It will be similar for any other functions you write where a text file is the data source.

### Getting the files into the worksheet

1. Download the zipped folder from this link: https://drive.google.com/file/d/1FIc4R39_6GC2Cbg7srRweNMTZI7HDj7m/view?usp=drive_link

2. Unzip the files on your computer.

3. Click on the folder icon in the left navigation pane ![](https://drive.google.com/uc?id=1qI1Z5UN9ybqpbd6g0HTvkzgwJuJBBGqW)

4. Create a folder - right click next to the root folder icon to get the menu - name the folder "worksheet_data" ![](https://drive.google.com/uc?id=1namV_8k-ULEu_RvqhDx5j8YxMuG-V0D9)

5. Click on the upload icon and upload the three files in the folder you unzipped (whole_number.txt, decimal_numbers.txt, message_info.txt)

6. Drag each of the 3 files into the worksheet_data folder.

You are now ready to use the files.  They are only here while the worksheet is open.  If you close it and open it again another time you will need to upload the files again.  Later we will access file directly online.

---
### Exercise 7 - numerical data collected from a file

The cell below contains the same **function** used in Exercise 1 but this time the data will come from the user as a keyboard input rather than generated by the code, and it will add 5 instead of 1:

*  creates a variable called **num** to store a whole number read from a file  
`num = int(datafile.readline())`
*  creates a new, second variable called **num_plus_five** to hold the result of adding 5 to `num` (as in Exercise 1)  
*  prints the new data (as in Exercise 1)  

If the function has been written properly, and you have accessed the correct file, it will print **8** (this is the **expected output**).  

Run the function (click on the play button in the top left corner of the code box).

In [None]:
def add_five_to_file_num():
  with open("worksheet_data/whole_number.txt", "r") as datafile:
    # add your code below, indented to align with the hash at the start of this comment

add_five_to_file_num()

3


### Have a play with this function

1. Try using the file `decimal_numbers.txt`, which contains 2 decimal numbers rather than one whole number.  To do this, just change the name of the file from `whole_number.txt` to `decimal_numbers.txt`.

----
### Exercise 8 - processing numerical data using decimal numbers (_float_) entered by the user

Have a go at writing this function (the code is included to help you).  
*  create two variables **num1** and **num2** and ask the user to enter a whole number for each  
`num1 = float(datafile.readline())` and `num2 = float(datafile.readline())`   
*  create a third variable **total** which will store the sum of num1 + num2    
`total = num1 + num2`  
*  print the total  
`print(total)`    

Run the function to test that you get the **expected output 39.6**.  

In [None]:
def calculate_floats_from_file_total():
  with open("worksheet_data/decimal_numbers.txt", "r") as datafile:
    # add your code below, indented to align with the hash at the start of this comment




calculate_floats_from_file_total()

### Have a play with this function

1. Try adding a third variable called `num3` and read the data from the file.  You will get a ValueError as there are no more numbers in the file to be read.

---
### Exercise 9 - categorical data - text and numbers from a file
*  create a variable called **name** and ask the user to enter a name on the keyboard  
`name = datafile.readline()`
*  create a variable called **age** and ask the user to enter an age on the keyboard  
`age = datafile.readline()`  
* create a variable called **message** and assign it the value of all the parts of the message joined together  
`message = "Hello " + name + " you age " + age + " years old."`
*  print the message  
`print(message)`

**Expected output if user enters Billy and 18**:  
```
Hello Billy
 you are 18
 years old
```
The different output is because the file data will have a new line after each piece of data.  This can be removed but we won't do that now.



In [None]:
def show_message_with_file_data():
  with open("worksheet_data/message_info.txt", "r") as datafile:
    # add your code below, indented to align with the hash at the start of this comment


show_message_with_file_data()

### Have a play with this function

1. Try changing the file so that you read the `decimal_numbers.txt` file instead.  This file also has two pieces of data, and it is also text data, so the code should work but the output will look strange.

---
# Takeaways from this worksheet

* coding is a valuable skill
* coding is useful for repetitive data tasks and processing that is non-standard
* data can be collected from a range of sources, including self-generated, from users or from files
* python code should be organised into functions, this is good practice
* you can add comments to code
* int() and float() are numeric data types and you should convert to the right data type when you create data for a new variable.
* strings are indicated by "" but you can use str() to force data to be text
* errors are nothing to be worried about, they help you to learn and will eventually help you to write better code.



---
# Your thoughts on what you have learnt
Please add some comments in the box below to reflect on what you have learnt through completing this worksheet, and any problems you encountered while doing so.