# Homework 2.2 - Using Python (.py) files to run code

### The decision to use notebooks or .py files

Jupyter Notebooks and `.py` files serve different purposes and have distinct advantages:

* **Jupyter Notebooks** are interactive and allow for a mix of code, text, and multimedia. They are excellent for data exploration, visualisation, and sharing results. Each cell in a notebook can be run independently, and the output is displayed immediately beneath the cell. This makes it easy to iterate on specific parts of your code without rerunning the entire script. Notebooks also support markdown, enabling you to create rich, well-documented reports that combine your code, findings, and explanations.

* On the flip side, **.py files** are text-based files that contain pure Python code. They are typically used for building larger applications, libraries, or scripts. `.py` files are more suitable for production use and version control and, unlike notebooks, `.py` files don't allow for interactive outputs or mixing code with extensive markdown text.

While notebooks are great for interactive, exploratory analysis and teaching, `.py` files are better suited for building reusable code, larger projects, and production-ready software.

### How to use `.py` files

When using `.py` files, it's important to adhere to certain best practices to ensure your code is clean, efficient, and easy to understand.

**Imports**: Always import packages and modules at the beginning of your file. This makes it easy to see what dependencies a module has. Follow the order: standard library imports, related third party imports, and local application/library specific imports. Each group should have its own block separated by a blank line.

**Docstrings**: Docstrings are used for documentation of functions, methods, and classes. They are enclosed in triple quotes (""") and are located immediately after the definition of a function, method, or class. A good docstring should explain what the function does, its inputs, its outputs, and any exceptions it raises.

**Code comments**: Comments are an integral part of code documentation and are used to explain the purpose and functionality of sections of code. In Python, comments are created by starting a line with the `#` symbol.

**Running Code on CMD Line**: To run a `.py` file from the command line, navigate to the directory containing the file and use the command `python filename.py`.

Remember, consistency and readability are key. Following these best practices will make your code easier to read, maintain, and debug. It's also recommended to follow the PEP 8 style guide for Python code.

### Homework challenge

This week's homework will lay the foundations for the capstone project in week 3. One of the most common datasets in the machine learning space is the Iris dataset. The Iris dataset is a collection of 150 measurements of three species of iris flowers: setosa, versicolor, and virginica. Each measurement includes the length and width of the sepal and petal of a flower in centimeters. The dataset is widely used as an example of multivariate analysis and machine learning techniques.

Your task is to perform some exploratory data analysis on the Iris dataset, perhaps exploring the relationship between each feature and creating relevant plots to help tell the story! How you choose to interrogate the data is up to you; use the previous notebook to help you decide on the different ways you could!

The following code cell will create a new `.py` file with the base code needed to load in the dataset to a Pandas DataFrame - feel free to run it, create your own file or continue in a notebook... what you feel most comfortable doing. Happy analysing!


In [1]:
%%writefile iris_eda.py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

raw_data = pd.read_csv('https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/0e7a9b0a5d22642a06d3d5b9bcbad9890c8ee534/iris.csv') # Read data from URL

print(raw_data.head()) # print first 5 rows of data


Overwriting iris_eda.py


In [8]:
!python iris_eda.py

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
