# 1. Python Data Types

## Recap on data types

Python has several built-in data types, which are categorized into:
- Numeric: `int`, `float`
- Sequence: `str`, `list`, `tuple`
- Mapping: `dict`
- Set: `set`
- Boolean: `bool`

### Numeric Data Types
- `int`: Integer, e.g. 1, 2, 3
- `float`: Floating point number, e.g. 1.0, -2.5, 3.14
- `complex`: Complex number, e.g. 1 + 2j, 3 - 4j

In [2]:
# Integer
int_var = 10
print(type(int_var))  # <class 'int'>

# Float
float_var = 10.5
print(type(float_var))  # <class 'float'>

# Complex
complex_var = 10 + 5j
print(type(complex_var))  # <class 'complex'>

<class 'int'>
<class 'float'>
<class 'complex'>


### Sequence Data Types
Sequence data types are **ordered** collections of similar or different data types. The elements in a sequence can be accessed using **indexing**.
- `str`: String, e.g. "hello", 'world'
- `list`: List, e.g. [1, 2, 3], ['a', 'b', 'c']
- `tuple`: Tuple, e.g. (1, 2, 3), ('a', 'b', 'c')

In [3]:
# String
str_var = "Hello, Python!"
print(type(str_var))  # <class 'str'>

# List
list_var = [1, 2, 3, 4, 5]
print(type(list_var))  # <class 'list'>

# Tuple
tuple_var = (1, 2, 3, 4, 5)
print(type(tuple_var))  # <class 'tuple'>

<class 'str'>
<class 'list'>
<class 'tuple'>


#### Indexing
- Indexing in Python starts from 0.
- Negative indexing is also possible, where -1 refers to the last element, -2 refers to the second last element, and so on.
- Slicing can be used to access a range of elements in a sequence.
    - The syntax for slicing is `sequence[start:stop:step]`.

In [4]:
# These types are ordered and can be indexed
print(str_var[0])
print(list_var[3])
print(tuple_var[-1])

H
4
5


In [5]:
print(str_var[0:5])

Hello


#### Difference between `list` and `tuple`
- `list` is mutable, i.e. the elements in a list can be changed or modified.
- `tuple` is immutable, i.e. the elements in a tuple cannot be changed or modified.

In [6]:
list_var[0] = 10
print(list_var)  # [10, 2, 3, 4, 5]

[10, 2, 3, 4, 5]


In [7]:
tuple_var[0] = 10  # TypeError: 'tuple' object does not support item assignment

TypeError: 'tuple' object does not support item assignment

#### String Methods
- `str` has several built-in methods, such as `upper()`, `lower()`, `strip()`, `split()`, `join()`, `find()` etc.
- `str` is immutable, i.e. the elements in a string cannot be changed or modified.
- String concatenation can be done using the `+` operator.
- String formatting can be done using f-strings.

In [8]:
# built in string methods
print(str_var.lower())
print(str_var.upper())
print(str_var.split(","))
print(str_var.replace("Hello", "Hi"))
print(str_var.find("Python"))

hello, python!
HELLO, PYTHON!
['Hello', ' Python!']
Hi, Python!
7


In [9]:
# f string
molecules = 'hydrogen oxide'
atoms = 3
print(f'Water is composed of mostly {molecules} and it has {atoms} atoms')

Water is composed of mostly hydrogen oxide and it has 3 atoms


### Set
A set is an unordered collection of **unique** elements. It is defined by a pair of curly braces `{}`.
- `set`: Set, e.g. {1, 2, 3}, {'a', 'b', 'c'}

In [10]:
set_var = {1, 2, 3, 4, 5}
print(type(set_var))  # <class 'set'>

<class 'set'>


In [11]:
# type is unordered and unindexed
print(set_var[0])  # TypeError: 'set' object is not subscriptable

TypeError: 'set' object is not subscriptable

In [12]:
# showcase that unique elements are stored in set
set_var = {1, 2, 3, 4, 5, 5, 5, 5, 5}
print(set_var)

{1, 2, 3, 4, 5}


#### Usage of Sets
- To eliminate duplicate elements from a list. (*See above*)
- To perform mathematical set operations like union, intersection, difference, etc.

In [13]:
# show use cases for sets
set_var1 = {1, 2, 3, 4, 5}
set_var2 = {4, 5, 6, 7, 8}

print(set_var1.union(set_var2))  # {1, 2, 3, 4, 5, 6, 7, 8}
print(set_var1.intersection(set_var2))  # {4, 5}
print(set_var1.difference(set_var2))  # {1, 2, 3}
print(set_var1.symmetric_difference(set_var2))  # {1, 2, 3, 6, 7, 8}

{1, 2, 3, 4, 5, 6, 7, 8}
{4, 5}
{1, 2, 3}
{1, 2, 3, 6, 7, 8}


### Mapping Data Types
A dictionary is a collection which is unordered, changeable and indexed. In Python dictionaries are written with curly brackets, and they have keys and values.
- `dict`: Dictionary, e.g. {'amino acid': ['alanine', 'valine'], 'nucleotide': ['adenine', 'thymine']}

In [14]:
dict_var = {"halogen": "fluorine", "noble_gas": "helium", "alkali_metal": "lithium"}
print(type(dict_var))  # <class 'dict'>
print('keys:', dict_var.keys())
print('values:', dict_var.values())

<class 'dict'>
keys: dict_keys(['halogen', 'noble_gas', 'alkali_metal'])
values: dict_values(['fluorine', 'helium', 'lithium'])


## Exercises

### Exercise 1.1:
What does this code return?

```python
my_string = "2cfo6njs[pwi2r3adcvy"
my_string[0:10:2]
```
What could the 2 mean in that context?

In [15]:
# type to check your answer
my_string = "2cfo6njs[pwi2r3adcvy"
my_string[1:20:2]


'conspiracy'

### Exercise 1.2:

How can you make this calculation work?

```python
a = 5
b = "6"
a + b
```

In [18]:
a = 5
b = "6"
print(a + int(b))


11


### Exercise 1.3:
Print out the first letter of every word in the string.

```python
sentence = 'Sober Physicists Don’t Find Giraffes Hiding In Kitchens'
```
What do you observe?

In [21]:
sentence = 'Sober Physicists Don’t Find Giraffes Hiding In Kitchens'

words = sentence.split()

first_letters = ""

for word in words:
    first_letters += word[0]

print(first_letters)

SPDFGHIK


### Exercise 1.4:

1. Create a dictionary that represents the following table:
 
| Base | Acid |
|------|------|
| 'NaOH' | 'HCl' |
| 'KOH' | 'H2SO4' |
| 'Ca(OH)2' | 'HNO3' |

2. Add a new base to the dictionary: `NH4OH`.
3. Print out the categories and chemicals. 

In [26]:
dictionary = {'base': ['NaOH', 'KOH', 'Ca(OH)2'], 'acid': ['HCl', 'H2SO4', 'HNO3']}

dictionary['acid'].append('NH4OH')

print('categories:', dictionary.keys())
print('chemicals:', dictionary.values())

categories: dict_keys(['base', 'acid'])
chemicals: dict_values([['NaOH', 'KOH', 'Ca(OH)2'], ['HCl', 'H2SO4', 'HNO3', 'NH4OH']])


# 2. Setting paths

Setting paths when coding is important. It is a good practice to set the paths to folders/data in a way that is reproducible and especially shareable. This is important when sharing code with others, or when you are working on a project that requires data from different sources. Paths also look different on different operating systems (Windows, Mac, Linux), so it is important to set paths in a way that is compatible with all operating systems. Luckily, there are libraries like `os` and `pathlib` that can help us with that. We will look into `pathlib` in this notebook.

## Introduction to Pathlib
An introduction to the pathlib module, which provides a way to handle filesystem paths.

In [27]:
# Introduction to Pathlib

# Importing the pathlib module
from pathlib import Path

# Creating a Path object
p = Path('.')

# Displaying the current directory
print(p.resolve())

# Listing all files in the current directory
for file in p.iterdir():
    print(file)

# Creating a new directory
new_dir = p / 'new_directory'
new_dir.mkdir(exist_ok=True)

# Checking if the new directory exists
print(new_dir.exists())

# Creating a new file in the new directory
new_file = new_dir / 'new_file.txt'
new_file.touch()

# Checking if the new file exists
print(new_file.exists())

# Deleting the new file
new_file.unlink()

# Checking if the new file exists
print(new_file.exists())

# Deleting the new directory
new_dir.rmdir()

# Checking if the new directory exists
print(new_dir.exists())

/Users/cedi/ppchem
Untitled1.ipynb
.DS_Store
Untitled.ipynb
03_exercise.ipynb
02_exercise.ipynb
README.md
env.yml
01_exercise-3.ipynb
example.txt
.ipynb_checkpoints
04_exercises.ipynb
.git
serotonin.png
molecule_info.txt
True
True
False
False


## Exercises

### Exercise 2.1:

1. Create a directory called `ex_folder` in the current working directory. 
2. Check after creation if the directory exists.
3. Create a file called `ex_file.txt` in the `ex_folder` directory.

In [28]:
cwd = Path('.')
print(cwd.resolve())

/Users/cedi/ppchem


In [33]:
# create a directory called ex_folder in the current working direcotry
ex_folder = Path(cwd / 'ex_folder')
ex_folder.mkdir(exist_ok=True)
print(ex_folder.exists())

True


In [34]:
ex_file = ex_folder / 'ex_file.txt'
ex_file.touch()

### Exercise 2.2:

Correct these paths so that it works on all operating systems, if possible. 
```python
path1 = 'C:\Path\to\your\working\dir\ex_file.txt'
path2 = 'Path/to/your/working/dir/ex_file.txt'
path3 = '/Users/neeser/Documents/teaching/CH-200_PracticalProgrammingChem/practical-programming-in-chemistry-exercises/week_01/ex_folder/ex_file.txt
```

What are the issues with these paths?

In [35]:
path1 = Path(ex_folder / 'ex_file.txt')
path2 = Path(cwd / 'ex_folder' / 'ex_file.txt')
path3 = Path('/Users/neeser/Documents/teaching/CH-200_PracticalProgrammingChem/practical-programming-in-chemistry-exercises/week_01/ex_folder/ex_file.txt')
path1 = Path(path1)
path2 = Path(path2)
print(path1.exists())
print(path2.exists())
print(path3.exists())

True
True
False


### Exercise 2.3:

Delete the `ex_folder` directory and its contents. Check if the directory exists after deletion.

In [36]:
ex_file.unlink()
ex_folder.rmdir()


In [37]:

print(ex_folder.exists())


False
