# Recap: Week 1 + 2

## Week 1: Python Data Types and Paths

### Recap on Data Types

* Python has several built-in data types:
* Numeric: `int`, `float`
* Sequence: `str`, `list`, `tuple`
* Mapping: `dict`
* Set: `set`
* Boolean: `bool`

### Numeric Data Types

In [None]:
# Integers and floats
int_example = 3
float_example = 3.14

## Sequence Data Types
# Strings, lists, and tuples
str_example = "Hello, Python!"
list_example = [1, 2, 3]
tuple_example = ("a", "b", "c")

# Indexing and slicing
print(list_example[0])  # First element of list
print(str_example[-1])  # Last character of string
print(tuple_example[1:])  # Slicing tuple from second element to end


## Week 2: Lists, Dictionaries, Loops, and Iterations

### Lists and Dictionaries

In [None]:
# Adding and accessing elements
list_example.append(4)  # Adding an element to a list
dict_example = {"key1": "value1", "key2": "value2"}
print(dict_example["key1"])  # Accessing a value from a dictionary

## Loops and Iterations
# For loop with a list
for item in list_example:
    print(item)


In [None]:
# While loop example
print("While loop...")
i = 0
while i < len(list_example):
    print(list_example[i])
    i += 1

# Or equivalently in a for loop
print("For loop...")
for element in list_example:
    print(element)

In [None]:
# Boolean variables and conditional statements
a = True
b = False
if a and b:
    print("Both are true")
elif a or b:
    print("At least one is true")  # this will be printed!
else:
    print("Neither is true")

# Week 3

With the recap out of the way, welcome to week 3! 

In this exercise, we will learn how to:

1. **write and read files** using what we learn previously about paths and file objects,
1. **use functions** as a re-usable way to perform specific tasks,
1. perform **error handling** for gracefully and descriptively dealing with errors in code,
1. and finally **build classes** as a way to demonstrate object-oriented programming,

## 1. Writing and reading files

First, let's define a `Path` object using the `pathlib` module, as we did last week.
Let's try to read the contents of the file, if it exists, or print a message if it
doesn't. 

We can do this with a conditional statement, using the **`exists()`** method of
the `Path` object that we store in the variable `file_path`.

As we are defining the `Path` object to a file that doesn't yet exist, the following
cell should print a message telling us this!

In [None]:
from pathlib import Path

# Create a Path object for the current directory
current_directory = Path.cwd()
print("Current Directory:", current_directory.resolve())

# Creating a Path object for an example file that does not yet exist
example_file_path = current_directory / "example.txt"

# Reading the contents of the file
if example_file_path.exists():
    with example_file_path.open("r") as file:
        content = file.read()
        print(content)
else:
    print("The file does not exist.")

Now let's create the file and try run the code again. Before running the cell, think
about the expected output.

In [None]:
example_file_path.touch()

# Reading the contents of the file
if example_file_path.exists():
    with example_file_path.open("r") as file:
        content = file.read()
        print(content)
else:
    print("The file does not exist.")

What was the output of the above cell and why?

We created the file using the **`touch()`** method. It exists, but there is no content. Maybe this code would be more useful:

In [None]:
# Reading the contents of the file
if example_file_path.exists():
    with example_file_path.open("r") as file:
        content = file.read()
        if len(content) == 0:
            print("File exists but is empty.")
        else:
            print(content)
else:
    print("The file does not exist.")

Now let's write something to file, and make it interesting. For the molecule that you
chose to upload to your GitHub repository for the week 1 milestones, go to wikipedia and
copy the first sentence in its description.

For instance, the first sentence on https://en.wikipedia.org/wiki/Caffeine
is: "Caffeine is a central nervous system (CNS) stimulant of the methylxanthine class."

Paste this into the string variable `molecule_description` below, along with the molecule
name and the URL of image you uploaded to your GitHub page. 

In [None]:
molecule_name = "NanoPutian Chain"
molecule_image_url = "https://github.com/jwa7/ppchem/blob/main/nanoputian_chain.png"
molecule_description = "NanoPutians are a series of organic molecules whose structural formulae resemble human forms."

Now we can write this to file. Let's create a new `Path` object in the variable
`molecule_info_path` and write the description here.

In [None]:
# Define path
molecule_info_path = current_directory / "molecule_info.txt"

# Create file
molecule_info_path.touch()

# Write to file
with molecule_info_path.open("w") as file:
    file.write(
        f"Molecule: {molecule_name}\n"
        f"Image URL: {molecule_image_url}\n"
        f"Description: {molecule_description}"
    )

Great! Now let's run the code block from above to check whether the file exists, and if
it does, read and print the contents.

In [None]:
# Reading the contents of the file
if example_file_path.exists():
    with example_file_path.open("r") as file:
        content = file.read()
        if len(content) == 0:
            print("File exists but is empty.")
        else:
            print(content)
else:
    print("The file does not exist.")

Huh? Why does it tell us that the file empty? Have a think why, then carry on to the next section.

## 2. Functions!

So, why did the the above cell block tell us that the file was empty?

The reason is that we were checking the existence of the file object stored in the
variable `example_file_path`, which points to the file "example.txt", not the one we just
created, i.e. "molecule_info.txt", whose path is stored in the `molecule_info_path`
variable.

If you scroll up, you'll notice that we copy-and-pasted the code block multiple times.

**Programming tip: when you find yourself copy-paste-reusing code, alarm bells should go
off!** 

Surely there is a better, more general way to do this?

And yes there is - use functions! What if we wrote a function that took as input a generic
file path, checked its existence, and printed its contents? We could then re-use this
whenever we want for different file paths, just by passing different input arguments.

See below for how this is done:

In [None]:
def check_file_existence_and_read_contents(file_path_object: Path):
    """
    Checks the existence of the file whose path is pointed to in the input variable
    `file_path_object`. 
    
    If it exists and is empty, prints "File exists but is empty. If it exists and is not
    empty, prints the contents of the file. If it doesn't exist, prints "The file does
    not exist."

    :param file_path_object: the `pathlib.Path` object pointing to the absolute path of
        the file to be checked and read.
    """
    if file_path_object.exists():  # exists

        # 'with' is what's known as acontext manager. It keeps the file open while we
        # are running code inside the `with` block, but closes it when leaving the code
        # block.
        with file_path_object.open("r") as file:
            content = file.read()  # read contents of the file

            if len(content) == 0:  # empty
                print("File exists but is empty.")

            else:  # not empty
                print(content)

    else:  # doesn't exist
        print("The file does not exist.")

Now let's use this function of each of our `Path` objects, stored in variables
`example_file_path` and `molecule_info_path`.

In [None]:
check_file_existence_and_read_contents(example_file_path)  # calls the function

In [None]:
check_file_existence_and_read_contents(molecule_info_path)  # calls the function

Now let's get a bit more chemical. Below is an example function shown in Philippe's
slides from lecture 2 (slide 37). It calculates the percentage reaction yield given two
input variables, corresponding to the theoretical and actual yield.

The inputs to the function (`theoretical_yield` and `actual_yield`) are called
*arguments*, and the output of the function (`percent_yield`) is the *return* value.

In [None]:
def reaction_yield(theoretical_yield, actual_yield):
    """
    Calculate the percent yield of a reaction.
    theoretical_yield: Theoretical yield in grams
    actual_yield: Actual yield obtained from the reaction in grams
    Returns the percent yield as a percentage.
    """
    percent_yield = (actual_yield / theoretical_yield) * 100
    return percent_yield

In [None]:
# Example usage:
yield_percent = reaction_yield(10.0, 8.5)
print(f"Experimental yield: {yield_percent} %")

What's going to happen if we pass non-physical inputs to the function?

For instance, a negative yield, or an actual yield that is higher than the theoretical yield?

In [None]:
yield_percent = reaction_yield(-10.0, 8.5)
print(f"Experimental yield: {yield_percent} %")

In [None]:
yield_percent = reaction_yield(theoretical_yield=10.0, actual_yield=12345)
print(f"Experimental yield: {yield_percent} %")

Or how about if we pass an incorrect ***data type*** to the function?

In [None]:
yield_percent = reaction_yield("I never took chemistry", 101)  # passing a string and an integer
print(f"Experimental yield: {yield_percent} %")

Perhaps we can design our function to check the inputs and provide more insightful error
messages. This can improve the usability of our function for others, and for ourselves.

## Take a moment to think about design...

Writing good code means building things that are easy for a human to read, understand,
and use, as well as being efficient for a computer to run.

There is **always** a balance between getting the job done and writing the perfect code.
We don't want to over-engineer things, or spend un-necessary time making things *too*
perfect. However, spending 20% longer on considering some design details will in the
long run save you time, when you revisit your code in 1 month / 1 year / 1 decade, and
others time too when they want to read and use your code. Practice writing well designed
functions, and this will translate into well designed classes, modules, packages,
software for research projects, commercial products etc etc...

So, in this spirit, let's add two things to the above function, with the result below.

### a) Type hints

These tell the user in an easy-to-read way what the expected data types of the inputs
are, i.e. `actual_yield: float`, and what the output data type is, i.e. `...) ->
float:`. As well as forming part of the ***documentation*** of a function, they can be
used in conjunction with type-checkers that will raise warnings if types are not used as
intended.

Python is what is known as an ***interpreted*** language, as opposed to other languages
such as `C` and `Fortran` that are ***compiled*** languages. In short, this means that
Python determines the types of variables ***dynamically***, i.e. when the code is run.
Errors caused by improper use of variable types are therefore not known until runtime.
We saw this above with the error: ``TypeError: unsupported operand type(s) for /: 'int'
and 'str'``. By contrast, for compiled languages, in general all variables are
***typed***. These types are checked when the code is ***compiled***, prior to runtime.
This ensures type errors are avoided.

Knowing the types of variables is useful, both for the human and the computer! In
this exercise, however, we will just focus on the former: better documenting our
functions for the human user.

### b) Input checks

When our code is run, it would be useful to perform some checks on the inputs to the
function. If something if a check fails, we can return a descriptive error message to
the user.

In this case, we have modified the function to check that the user passes `float` types
as input. It does so using the `isinstance` function. It also checks that they are
non-negative, and that the theoretical yield is not less than the actual yield. In the
case that any of these checks fails, an error is raised with a descriptive message that
tells the user exactly what is wrong. 

Inspect the function so you understand how this is done. Write a descriptive error
message to the check between the relationship between the actual yield and theoretical
yield.

In [None]:
def reaction_yield(theoretical_yield: float, actual_yield: float) -> float:
    """
    Calculates the percent yield of a reaction.

    :param theoretical_yield: float, the theoretical yield of the reaction, in grams.
    :param actual_yield: float, the actual yield obtained from the reaction, in grams.
    
    :return: float, the yield of the reaction as a percentage.
    """
    # Type checks: input yields should be passed as floats
    if not isinstance(theoretical_yield, float):
        raise TypeError(
            f"Invalid type {type(theoretical_yield)}: `theoretical_yield`"
            " should be passed as a float."
        )
    if not isinstance(actual_yield, float):
        raise TypeError(
            f"Invalid type {type(actual_yield)}: `actual_yield`"
            " should be passed as a float."
        )

    # Input value checks
    if theoretical_yield < 0 or actual_yield < 0:  # yields should be non-negative
        raise ValueError("Input yields must be a non-negative float")

    if actual_yield > theoretical_yield:  # actual yield can't be more than the theoretical
        raise ValueError("Actual yield cannot be more than the theoretical yield")

    # Calculate the yield and return
    percent_yield = (actual_yield / theoretical_yield) * 100
    return percent_yield

Let's re-run the erroneous examples from above and observe the error messages

In [None]:
# This does (and should) raise an error!
yield_percent = reaction_yield(-10.0, 8.5)
print(f"Experimental yield: {yield_percent} %")

In [None]:
# This gives a much nicer error message than: "TypeError: unsupported operand type(s) for /: 'int' and 'str'"
yield_percent = reaction_yield("I never took chemistry", 101)
print(f"Experimental yield: {yield_percent} %")

What if we pass two integers as input to the function?

In [None]:
yield_percent = reaction_yield(25, 5)  # error!
print(f"Experimental yield: {yield_percent} %")

And how about passing strings that maybe seem like a reasonable input?

In [None]:
yield_percent = reaction_yield("25", "5")  # error!
print(f"Experimental yield: {yield_percent} %")

In both cases, maybe we'd hope that the inputs could be interpreted as floats. Maybe a
good design choice would be to internally try to convert the arguments as floats, and
raise an error if this is not possible.

To do this, we need to learn about type conversion, which will in turn teach us
something about ***error handling***.

## 3. Error Handling



### Type conversion or "casting"

Type conversion in Python, also known as type casting, is the process of converting the
value of one data type (integer, string, float, etc.) into another. This process is
often necessary in programming because certain operations require variables to be of a
specific data type. Python provides several built-in functions for converting types,
including int(), float(), and str(), among others.

**Why Use Type Conversion?**

* Compatibility: Some operations require operands to be of the same type. For instance, you cannot directly concatenate a string and an integer without converting the integer into a string.
* Processing: Data might be read as strings from a file or user input, but you may need to perform arithmetic operations on them, requiring conversion to numeric types.
* Control: Explicit type conversion lets you control how Python interprets and uses the data in your programs.

You saw this in exercise 1, where we type-casted a str to an `int` in order to perform
an addition:

In [None]:
a = 5
b = "6"
print(a + int(b))  # 11

Whereas if we try and add together the `int` and `str`, we get an error:

In [None]:
a = 5
b = "6"
print(a + b)  # TypeError

How can we handle different types as best we can, while providing meaningful error
messages?

First, write a function called `add_5` that takes an argument `number` (a `float`) and
returns an `float`. Add argument and return type hints to the function signature (i.e.
the `def ...` line), and write a short but descriptive docstring.

In [None]:
def add_five(number: int) -> int:
    """Adds 5 to `number` and returns the result"""
    return 5 + number

Let's observe what happens in certain situations:

In [None]:
# Passing an int --> expected behaviour. Integer 11 returned
add_five(6)

In [None]:
# Passing a str --> TypeError
add_five("6")

In [None]:
# Passing a float --> no error, but returns a float not an int
result = add_five(6.0)
print(result)
print(type(result))

In the latter case, detects that the addition of an `int` and a `float` is happening, so
automatically casts the `int` to a `float` to perform the operation. Returned is
therefore a number of type `float`.

What if we didn't want this to happen - i.e. if we always wanted to return an int?

Well, we could cast the output to an int, like this:

In [None]:
def add_five(number: int) -> int:
    """Adds 5 to `number` and returns the result"""
    return int(5 + number)

In [None]:
# Passing a float --> no error, but returns a float not an int
result = add_five(6.0)
print(result)
print(type(result))

Alternatively we could cast the input arguments. At the start of the function body,
let's try to cast the input argument number to an `int`, then use it in the operation:

In [None]:
def add_five(number: int) -> int:
    """Adds 5 to `number` and returns the result"""
    number = int(number)
    return 5 + number

This works now if we pass castable `str` and `int`...

In [None]:
# Passing a float --> returns an int
result = add_five("6")
print(result)
print(type(result))

In [None]:
# Passing a float --> returns an int
result = add_five(6.0)
print(result)
print(type(result))

... but still doesn't raise a helpful error message when we try and pass an invalid
argument, i.e. in this case a non-castable string:

In [None]:
add_five("hello")

We can use a `try/except` block to handle this. This is a way of 'attempting' to
execute, but ***catching*** the error (otherwise known as an ***exception***) in the
case that it was raised. 

By placing `number = int(number)` within a `try` block, we attempt to cast the input
argument to an `int`, but if a `ValueError` is raised, the `except` block is executed,
where we can raise a meaningful error message.

In [None]:
def add_five(number: int) -> int:
    """Adds 5 to `number` and returns the result"""
    
    try:   # program attempts to execute this code
        number = int(number)

    except ValueError:     # unless there's an ValueError: catch it...
        raise ValueError(  # ... and raise a new ValueError with a meaningful message
            f"Invalid input argument: '{number}'."
            " `number` must be an int, or convertible to an int."
            f" Original error message: {e}"
        )
    
    return 5 + number

In [None]:
# More meaningful error message raised
add_five("hello")

Now we return to our `reaction_yield` function from before. In this function, we want
the user to pass the input arguments as floats. However, in the case that they don't, we
want to attempt to cast/convert the inputs to floats. If this fails, we want to raise a
meaningful error message telling them what's wrong.

In [None]:
def reaction_yield(theoretical_yield: float, actual_yield: float) -> float:
    """
    Calculates the percent yield of a reaction.

    :param theoretical_yield: float, the theoretical yield of the reaction, in grams.
    :param actual_yield: float, the actual yield obtained from the reaction, in grams.
    
    :return: float, the yield of the reaction as a percentage.
    """
    # Type checks: input yields should be passed as floats. If not, attempt to convert
    # them to a float. If this fails, raise an error.
    if not isinstance(theoretical_yield, float):
        try:  
            theoretical_yield = float(theoretical_yield)
        except ValueError as e:
            raise ValueError(
                f"Invalid input '{theoretical_yield}': `theoretical_yield`"
                " should be passed as a float, or be convertible to a float."
                f" Original error: {e}"
            )

    if not isinstance(actual_yield, float):
        try:
            actual_yield = float(actual_yield)
        except ValueError as e:
            raise ValueError(
                f"Invalid input '{actual_yield}': `actual_yield`"
                " should be passed as a float, or be convertible to a float."
                f" Original error: {e}"
            )

    # Input value checks
    if theoretical_yield < 0 or actual_yield < 0:  # yields should be non-negative
        raise ValueError("Input yields must be a non-negative float")

    if actual_yield > theoretical_yield:  # actual yield can't be more than the theoretical
        raise ValueError("Actual yield cannot be more than the theoretical yield")

    # Calculate the yield and return
    percent_yield = (actual_yield / theoretical_yield) * 100

    return percent_yield

In [None]:
# Now if we run with strings that are convertible to floats, the function works!
reaction_yield("80", "70")

In [None]:
# But if they are non-convertible, we get a nicer error message
reaction_yield(4.0, "hello")

## Problem: bringing together loops, file reading, casting, and error handling

Now we'll try an example. Below is some code that reads a series of pairs of values from
a data file called "reaction_yields.txt". Open and inspect the file.

The first row is the header, which tells us what is in each column. It starts with '#',
so should be treated as a comment line and not as a data line. Each subsequent row
contains 3 values, in order: the name of the scientist who conducted the experiment, the
theoretical yield in grams, and the actual reaction yield in grams.

The following code parses the file to extract the data, and calculates the percentage
yield. It then prints the results.

In [None]:
# Define the path object for the file and check that it exists
reaction_yield_file = current_directory / "reaction_yields.txt"
assert reaction_yield_file.exists()

# Open the file
with reaction_yield_file.open("r") as file:  # "r" is read-mode

    # Read the file and convert to a list, where each element in the list is a str
    # containing each line.
    lines = file.read().splitlines()  

    for line_i, line in enumerate(lines):
        print(f"Line number: {line_i}")
        print(f"    Line read from file: {line}")

        if line.startswith("#"):
            print("    Comment line, not extracting data\n")
        else:
            print("    Data line, extracting data")

            # Extract the values on each row by further splitting the line
            line_data = line.split()
            print(f"    Data on line: {line_data}")

            # Unpack the values in the list
            assert len(line_data) == 3
            name, theoretical, actual = line_data
            print(f"    Data types: {[type(d) for d in line_data]}")

            # Calculate the percent yield and print the results
            percent_yield = reaction_yield(theoretical, actual)  # call our function!
            print(f"    Scientist: {name}, reaction yield (%): {percent_yield}\n")


Notice that the types of the data from each line in the file were of `str` type, but our
function was still able to handle this by converting to a `float` and calcualting the
percent yield.

Now suppose the data in the file contains errors, i.e. non-physical values or invalid
(i.e. non-castable) values. Your job is to take the above code and modify it to handle
such cases, allowing the whole file to be read without the code erroring out.

In the cases where the data is invalid for the `reaction_yield` function, your code
should print the error message and carry on to the next line in the file. For this,
you'll need to use a `try/except` block to catch and handle the errors when calling the
`reaction_yield` function.

Some step-by-step hints (remember: the solution is provided, but try to get there on
your own first):

1. First, change the filename pointed to by the variable `reaction_yield_file` to
   "reaction_yields_with_errors.txt".
2. Run the code and see what happens. When is the error raised, what kind of exception
   is it, and why is it raised?
3. Use a `try/except` block to catch the error and print the message.

In [None]:
# Define the path object for the file and check that it exists
reaction_yield_file = current_directory / "reaction_yields_with_errors.txt"
assert reaction_yield_file.exists()

# Open the file
with reaction_yield_file.open("r") as file:  # "r" is read-mode

    # Read the file and convert to a list, where each element in the list is a str
    # containing each line.
    lines = file.read().splitlines()  

    for line_i, line in enumerate(lines):
        print(f"Line number: {line_i}")
        print(f"    Line read from file: {line}")

        if line.startswith("#"):
            print("    Comment line, not extracting data\n")
        else:
            print("    Data line, extracting data")

            # Extract the values on each row by further splitting the line
            line_data = line.split()
            print(f"    Data on line: {line_data}")

            # Unpack the values in the list
            assert len(line_data) == 3
            name, theoretical, actual = line_data
            print(f"    Data types: {[type(d) for d in line_data]}")

            # Calculate the percent yield and print the results
            try:
                percent_yield = reaction_yield(theoretical, actual)  # call our function!
                print(f"    Scientist: {name}, reaction yield (%): {percent_yield}\n")
            except ValueError as e:
                print(f"    ERROR: {e}\n")

# 4. Classes and objects

Above, we mostly used `functions` to help us write programs that performed a certain
task. `classes`, the building block of `objects` are another fundamental construct in
Python programming, and programming in general.

### Functions: The Building Blocks
A function is a block of organized, reusable code that is used to perform a single,
related action. Functions provide modularity for your application and a high degree of
code reusing. They allow you to encapsulate a task into a single unit of work that can
be called with different parameters to perform its operation on specific data. Functions
are defined using the def keyword and can return results with the return statement.

### Classes: Blueprint for Objects
Classes, on the other hand, are blueprints for creating objects, allowing you to
encapsulate data and functionality together. An object's class defines its properties
and behaviors as attributes and methods, respectively. Methods are functions defined
within the context of a class and are used to define the behaviors of objects created
from the class.



## Building a molecule object

We will work our way up to defining a `Molecule` class to allow us to represent the data
associated with different molecules as individual objects.

First, let's start by defining a function that counts the number of atoms in a molecule.
To represent the molecule we will use a dictionary, where the `keys` are the chemical
symbols and the `values` are the number of atoms for that species.

For instance, caffeine $C_8H_{10}N_4O_2$ would be represented as:

In [None]:
caffeine_formula = {"C": 8, "H": 10, "N": 4, "O": 2}

In [None]:
def get_number_of_atoms(chemical_formula_dict: dict) -> int:
    """
    Calculates the total number of atoms from the dictionary representation of the
    chemical formula.

    For instance, caffeine is represented by: {"C": 8, "H": 10, "N": 4, "O": 2}

    :param chemical_formula_dict: dict, where keys are the atomic symbols and values are
        the number of atoms of that type.

    :return: int, the total number of atoms in the chemical formula.
    """
    total_number_of_atoms = 0
    for symbol, count in chemical_formula_dict.items():
        # Check the keys and values of the input
        if not isinstance(symbol, str):
            raise TypeError(
                f"Invalid type {type(symbol)}: atomic symbols should be passed as strings."
            )
        if not isinstance(count, int):
            raise TypeError(
                f"Invalid type {type(count)}: atomic counts should be passed as integers."
            )
        if count < 0:
            raise ValueError("Atomic counts should be non-negative integers.")

        # Accumulate the total number of atoms
        total_number_of_atoms += count
        
    return total_number_of_atoms

In [None]:
get_number_of_atoms(caffeine_formula)

Now let's define another function. Write a function that calculates the molecular mass
of the molecule. Use the dictionary `atomic_masses` defined in the function below to
access the atomic mass (in atomic units) of each element (up to Argon). Make sure to
fill in the atomic mass of Carbon!

In [None]:
def calculate_molecular_mass(chemical_formula_dict: dict) -> float:
    """
    Calculates the molecular mass of a chemical formula.

    :param chemical_formula_dict: dict, where keys are the atomic symbols and values are
        the number of atoms of that type.

    :return: float, the molecular mass of the chemical formula.
    """
    # Dictionary of {symbol: atomic mass} up to Argon.
    atomic_masses = {
        "H": 1.008,
        "He": 4.002602,
        "Li": 6.94,
        "Be": 9.0121831,
        "B": 10.81,
        "C": 12.011,
        "N": 14.007,
        "O": 15.999,
        "F": 18.998403163,
        "Ne": 20.1797,
        "Na": 22.98976928,
        "Mg": 24.305,
        "Al": 26.9815385,
        "Si": 28.085,
        "P": 30.973761998,
        "S": 32.06,
        "Cl": 35.45,
        "Ar": 39.948,
    }
    molecular_mass = 0
    for symbol, count in chemical_formula_dict.items():
        molecular_mass += ATOMIC_MASSES[symbol] * count

    return molecular_mass

## Introduction to Chemical Classes

Let's consider a class `Molecule` that represents a chemical molecule. A molecule can have
attributes like its chemical formula, molecular weight, and a list of atoms it consists
of. Methods of this class might include functionalities to calculate the molecular
weight, display its structure, or simulate its behavior in a reaction.

### Why use classes?

* Representation: classes provide a clear and intuitive way to represent and manipulate
  chemical entities in your code.
* Reusability: once defined, you can create many instances of a chemical class, each
  representing a different molecule or chemical entity, without rewriting the code.
* Extendibility: classes can be extended to create more specific types of molecules,
  like organic molecules or polymers, inheriting the basic properties while adding
  unique features.


### The interplay between functions and classes

Functions and classes are related in that methods are functions defined in the context
of classes, and both can be used to encapsulate functionality. However, they differ in
their scope and application: functions are used for performing specific tasks and can be
called on their own, while classes are used to create objects with their own attributes
and methods, enabling more complex structures and behaviors in programs. Functions can
be used to perform actions on objects created from classes, and methods (as part of
classes) can utilize functions to implement part of their behavior, demonstrating the
complementary nature of functions and classes in Python programming.

In essence, functions give you the versatility to execute tasks and manipulate data,
while classes allow you to model complex entities and encapsulate data and
functionality, illustrating different but complementary facets of organizing logic and
structure in your programs.


### Chemcial Example: defining and using a `Molecule` class


So, a class is a convenient way to build a single object that stores associated data. In
our case, we can build a class called `Molecule` that calculates properties such as the
number of atoms and molecular mass and stores them.

Here's an example of defining a Molecule class and using it to create instances
representing different molecules:

In [None]:
class Molecule:
    """
    A class to represent some useful information about a molecule.
    """
    def __init__(self, name: str, chemical_formula_dict: dict):
        """
        Initializes a Molecule object with a name and a chemical formula.
        """
        self.name = name
        self.chemical_formula_dict = chemical_formula_dict
        self.chemical_formula_str = self.chemical_formula_str()

    def __repr__(self):
        return f"Molecule(name='{self.name}', formula={self.chemical_formula_str})"

    def chemical_formula_str(self) -> str:
        """
        Returns a string representation of the chemical formula.
        """
        formula_str = ""
        for symbol, count in self.chemical_formula_dict.items():
            formula_str += f"{symbol}{count}"
        return formula_str

And how it is used:

In [None]:
caffeine_formula = {"C": 8, "H": 10, "N": 4, "O": 2}

# "Instantiate" a Molecule object
caffeine_molecule = Molecule("Caffeine", caffeine_formula)

The __init__ function in Python is what's known as a special method, specifically the
initializer for a class. It's part of Python's classes that provides a way for us to
initialize (i.e., specify initial values for) new objects created from a class. Think of
a class as a blueprint for creating objects; the __init__ function sets up each new
object with its own unique data.

### Key Points about `__init__`: 

* Initialization, Not Creation: The __init__ function is called automatically every time
a new object of a class is created. It's important to note that __init__ does not
actually create the object; rather, it's called after the object has been created to
initialize its attributes. 

* Self Parameter: The first parameter of the __init__ function is always self, which is
a reference to the current instance of the class. This allows you to set attributes on
the object when it is created. 

* Setting Attributes: Inside the __init__ function, you can define attributes that every
object created from the class should have and set them to specific initial values.


Here, `__repr__` is another special function that tells Python how to textually
represent the `Molecule` object when it is printed. In our case, it would be useful to
see the name and formula of the molecule.

In [None]:
print(caffeine_molecule)

Now, let's take our useful ***functions*** from before and add them as ***class
methods***. Modify the class below in the following ways:

1. Copy the functions `get_number_of_atoms()`, `get_symbols()`, and
   `calculate_molecular_mass()` below `__repr__()` in the class below. Make sure the
   indentation is valid. The names of these methods can stay as is.

2. Add the `self` argument as the first argument in each of the new class methods, as it
   the case for `__init__` and `__repr__`. Remove the argument `chemical_formula_dict`
   and instead, within the body of the methods, call `self.chemical_formula_dict`. This
   accesses the `chemical_formula_dict` attribute of the class, without requiring it as
   input to the methods.

3. To the initializer `__init__` set the new attributes of the class for the number of
   atoms and molecular mass.

In [None]:
class Molecule:
    """
    A class to represent some useful information about a molecule.
    """
    def __init__(self, name: str, chemical_formula_dict: dict):
        """
        Initializes a Molecule object with a name and a chemical formula.
        """
        self.name = name
        self.chemical_formula_dict = chemical_formula_dict
        self.chemical_formula_str = self.chemical_formula_str()
        # TODO 3: call the class methods to set the following class attributes
        self.number_of_atoms = self.get_number_of_atoms()
        self.molecular_mass = self.calculate_molecular_mass()

    def __repr__(self):
        return f"Molecule(name='{self.name}', formula={self.chemical_formula_str})"

    def chemical_formula_str(self) -> str:
        """
        Returns a string representation of the chemical formula.
        """
        formula_str = ""
        for symbol, count in self.chemical_formula_dict.items():
            formula_str += f"{symbol}{count}"
        return formula_str

    def get_number_of_atoms(self) -> int:
        """
        Calculates the total number of atoms from the dictionary representation of the
        chemical formula.

        For instance, caffeine is represented by: {"C": 8, "H": 10, "N": 4, "O": 2}

        :param chemical_formula_dict: dict, where keys are the atomic symbols and values
            are the number of atoms of that type.

        :return: int, the total number of atoms in the chemical formula.
        """
        total_number_of_atoms = 0
        for symbol, count in self.chemical_formula_dict.items():
            # Check the keys and values of the input
            if not isinstance(symbol, str):
                raise TypeError(
                    f"Invalid type {type(symbol)}: atomic symbols should be passed as strings."
                )
            if not isinstance(count, int):
                raise TypeError(
                    f"Invalid type {type(count)}: atomic counts should be passed as integers."
                )
            if count < 0:
                raise ValueError("Atomic counts should be non-negative integers.")

            # Accumulate the total number of atoms
            total_number_of_atoms += count

        return total_number_of_atoms

    def calculate_molecular_mass(self) -> float:
        """
        Calculates the molecular mass of a chemical formula.
        """
        # Dictionary of {symbol: atomic mass} up to Argon
        atomic_masses = {
            "H": 1.008,
            "He": 4.002602,
            "Li": 6.94,
            "Be": 9.0121831,
            "B": 10.81,
            "C": 12.011,
            "N": 14.007,
            "O": 15.999,
            "F": 18.998403163,
            "Ne": 20.1797,
            "Na": 22.98976928,
            "Mg": 24.305,
            "Al": 26.9815385,
            "Si": 28.085,
            "P": 30.973761998,
            "S": 32.06,
            "Cl": 35.45,
            "Ar": 39.948,
        }
        molecular_mass = 0
        for symbol, count in self.chemical_formula_dict.items():
            if symbol not in atomic_masses:
                raise ValueError(
                    f"Invalid atomic symbol: {symbol}. Currently can only"
                    " calculate the molecular mass for molecules containing"
                    " atoms up to Argon (Ar)."
                )
            molecular_mass += atomic_masses[symbol] * count

        return molecular_mass

In [None]:
caffeine_molecule = Molecule("Caffeine", {"C": 8, "H": 10, "N": 4, "O": 2})

print(f"Name: {caffeine_molecule.name}")
print(f"Chemical formula: {caffeine_molecule.chemical_formula_str}")
print(f"Number of atoms: {caffeine_molecule.number_of_atoms}")
print(f"Molecular mass: {caffeine_molecule.molecular_mass}")

Defined below are the names and formulae of 10 common molecules. Within the body of the
`for` loop write code to generate a molecule object and append it to the list.

In [None]:
molecule_names_and_formula = [
    ("Caffeine", {"C": 8, "H": 10, "N": 4, "O": 2}),
    ("Water", {"H": 2, "O": 1}),
    ("Carbon Dioxide", {"C": 1, "O": 2}),
    ("Glucose", {"C": 6, "H": 12, "O": 6}),
    ("Ethanol", {"C": 2, "H": 6, "O": 1}),
    ("Acetic Acid", {"C": 2, "H": 4, "O": 2}),
    ("Ammonia", {"N": 1, "H": 3}),
    ("Methane", {"C": 1, "H": 4}),
    ("Hydrochloric Acid", {"H": 1, "Cl": 1}),
    ("Nitrous Oxide", {"N": 2, "O": 1}),
]

molecules = []
for (name, formula) in molecule_names_and_formula:
    molecule_object = Molecule(name, formula)
    molecules.append(molecule_object)

In [None]:
# And finally print the information for each molecule
for molecule in molecules:
    print(f"Name: {molecule.name}")
    print(f"Chemical formula: {molecule.chemical_formula_str}")
    print(f"Number of atoms: {molecule.number_of_atoms}")
    print(f"Molecular mass: {molecule.molecular_mass}\n")

And that's a wrap!

Next time we'll be using what we saw today to learn:

* how to organise data into `DataFrames` with `pandas`
* how to speed up code with array operations in `numpy`
* and how to visualize data with `matplotlib`!