# Introduction to Python for Data Science

## Agenda for Week 3

### Hands-on Workshop: Python for Data Science

1. Basic Python programming skills with a focus on data analysis.
1. Introduction to Python programming language
1. Basic Python syntax and data structures (lists, tuples, dictionaries)
1. Introduction to Python libraries for data science (Pandas, Numpy)
1. Reading data into Python and performing basic data cleaning
1. Simple data analysis using Python

## Weekly Meeting Agenda

Each meeting can have its unique structure depending on the nature of the session, but it's useful to have a general framework to ensure meetings are well-organized and efficient. Below is an example of a simple structure for a one-hour meeting:

1. **Introduction** (5-10 minutes):
   - Welcome participants.
   - Share the agenda for the meeting.
   - Recap the last meeting (if applicable) and note any follow-ups.

2. **Main Agenda** (40-45 minutes):
   - This is where the main activities of the meeting will take place.
   - For a **workshop** or **lecture**, this will include the main presentation and demonstration.
   - For a **discussion**, this may involve presenting the topic and then facilitating a group discussion.
   - For a **hackathon** or **project showcase**, this would include the actual work or presentations.

3. **Q&A/Discussion** (5-10 minutes):
   - Reserve some time for attendees to ask questions or discuss the day's topic further.
   - Encourage participation and interaction.

4. **Conclusion and Looking Ahead** (2-3 minutes):
   - Wrap up the meeting and summarize key points.
   - Briefly mention what the next meeting will entail.
   - Thank everyone for their participation.

This is a very flexible structure and can be adjusted to better suit the type and purpose of each meeting.
For instance, a hands-on workshop might require more time for the main agenda, while a planning meeting might have more time allocated to discussion.
The key is to plan and communicate the schedule in advance so that attendees know what to expect and can prepare accordingly.


## Part 1: Python Basics

### Variables and Data Types

In Python, we can store information in variables. There are several types of data we can store, including integers, floating point numbers, strings, and Booleans.


In [1]:
# Integer
x = 10
print(type(x))

# Float
y = 10.0
print(type(y))

# String
z = "Hello, World!"
print(type(z))

# Boolean
a = True
print(type(a))


<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>


#### Arithmetic Operations

Python supports all the basic arithmetic operations.

In [2]:
# Addition
print(5 + 5)

# Subtraction
print(5 - 2)

# Multiplication
print(3 * 3)

# Division
print(10 / 2)

# Exponentiation
print(4 ** 2)

10
3
9
5.0
16


#### Logical Operations

Python also supports logical operations, which are often used in conditional statements.

In [3]:
# And operation
print(True and False)

# Or operation
print(True or False)

# Not operation
print(not True)

False
True
False


#### Conditional Statements

Conditional statements are used to perform different computations or actions depending on whether a condition evaluates to true or false.

In [4]:
# Define a variable
x = 10

# If statement
if x > 0:
    print("x is positive")

# If-else statement
if x % 2 == 0:
    print("x is even")
else:
    print("x is odd")

# If-elif-else statement
if x < 0:
    print("x is negative")
elif x == 0:
    print("x is zero")
else:
    print("x is positive")

x is positive
x is even
x is positive


#### Loops

Loops are used to repeatedly execute a block of code.


In [5]:
# For loop
for i in range(5):
    print(i)

# While loop
i = 0
while i < 5:
    print(i)
    i += 1

# Loop through a list
fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
    print(fruit)

0
1
2
3
4
0
1
2
3
4
apple
banana
cherry


#### Functions

Functions are reusable blocks of code that perform a specific task.

In [6]:
# Define a function
def greet(name):
      return "Hello, " + name

# Call the function
print(greet("World"))

# Function with multiple parameters
def power(base, exponent):
      return base ** exponent

# Call the function
print(power(2, 3))

Hello, World
8


### Part 2: Python Data Structures

Python has four basic inbuilt data structures: `Lists`, `Tuples`, `Sets`, and `Dictionaries`.

#### Lists

A list is a collection of items. It is ordered, changeable, and allows duplicate elements.

In [7]:
# Define a list
fruits = ["apple", "banana", "cherry"]
print(fruits)

# Access list items by index
print(fruits[0])

# Change the value of a list item
fruits[1] = "blueberry"
print(fruits)

# Add an item to the list
fruits.append("orange")
print(fruits)

['apple', 'banana', 'cherry']
apple
['apple', 'blueberry', 'cherry']
['apple', 'blueberry', 'cherry', 'orange']


#### Tuples

A tuple is similar to a list, but it is ordered and unchangeable.


In [8]:
# Define a tuple
fruits_tuple = ("apple", "banana", "cherry")
print(fruits_tuple)

# Access tuple items by index
print(fruits_tuple[0])

# Trying to change the value of a tuple item throws an error
# fruits_tuple[1] = "blueberry"  # This will throw an error

('apple', 'banana', 'cherry')
apple


TypeError: 'tuple' object does not support item assignment

#### Dictionaries

A dictionary is an unordered collection of key-value pairs.

In [9]:
# Define a dictionary
fruit_colors = {
    "apple": "red",
    "banana": "yellow",
    "cherry": "red"
}
print(fruit_colors)

# Access dictionary items by key
print(fruit_colors["apple"])

# Change the value of a dictionary item
fruit_colors["banana"] = "green"
print(fruit_colors)

{'apple': 'red', 'banana': 'yellow', 'cherry': 'red'}
red
{'apple': 'red', 'banana': 'green', 'cherry': 'red'}


#### Sets

A set is an unordered collection of unique items.

In [10]:
# Define a set
fruits_set = {"apple", "banana", "cherry", "apple"}
print(fruits_set)  # Duplicates are removed


{'banana', 'apple', 'cherry'}


### Part 3: Introduction to Pandas and NumPy

#### Creating arrays in NumPy

In [11]:
import numpy as np

# Create a one-dimensional array
a = np.array([1, 2, 3])
print(a)

# Create a two-dimensional array
b = np.array([[1, 2, 3], [4, 5, 6]])
print(b)

[1 2 3]
[[1 2 3]
 [4 5 6]]


#### Manipulating arrays in NumPy

In [12]:

# Change an element of the array
b[0, 0] = 10
print(b)

# Get the shape of the array
print(a.shape)
print(b.shape)


[[10  2  3]
 [ 4  5  6]]
(3,)
(2, 3)


#### Creating DataFrames in Pandas


In [13]:
import pandas as pd

# Create a DataFrame from a dictionary
df = pd.DataFrame({
      "Name": ["Alice", "Bob", "Charlie"],
      "Age": [25, 30, 35],
      "Occupation": ["Doctor", "Engineer", "Teacher"]
})
print(df)


      Name  Age Occupation
0    Alice   25     Doctor
1      Bob   30   Engineer
2  Charlie   35    Teacher


#### Manipulating DataFrames in Pandas

In [14]:
# Select a column
print(df["Name"])

# Add a new column
df["Salary"] = [100000, 120000, 90000]
print(df)

# Delete a column
df = df.drop("Age", axis=1)
print(df)

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object
      Name  Age Occupation  Salary
0    Alice   25     Doctor  100000
1      Bob   30   Engineer  120000
2  Charlie   35    Teacher   90000
      Name Occupation  Salary
0    Alice     Doctor  100000
1      Bob   Engineer  120000
2  Charlie    Teacher   90000


#### Reading data from CSV files


In [18]:
# Assuming we have a CSV file "data.csv"
df = pd.read_csv("../data/data.csv")
print(df.head())

   age     sex     bmi  children smoker     region      charges
0   19  female  27.900         0    yes  southwest  16884.92400
1   18    male  33.770         1     no  southeast   1725.55230
2   28    male  33.000         3     no  southeast   4449.46200
3   33    male  22.705         0     no  northwest  21984.47061
4   32    male  28.880         0     no  northwest   3866.85520


**Please note**: you'd need to have a `data.csv` file available and replace `data.csv` with the path to your file. Uncomment this section when ready to use.

#### Basic data exploration


In [19]:
# Get the shape of the DataFrame
print(df.shape)

# Get information about the DataFrame
print(df.info())

# Describe the DataFrame
print(df.describe())


(1338, 7)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1338 entries, 0 to 1337
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1338 non-null   int64  
 1   sex       1338 non-null   object 
 2   bmi       1338 non-null   float64
 3   children  1338 non-null   int64  
 4   smoker    1338 non-null   object 
 5   region    1338 non-null   object 
 6   charges   1338 non-null   float64
dtypes: float64(2), int64(2), object(3)
memory usage: 73.3+ KB
None
               age          bmi     children       charges
count  1338.000000  1338.000000  1338.000000   1338.000000
mean     39.207025    30.663397     1.094918  13270.422265
std      14.049960     6.098187     1.205493  12110.011237
min      18.000000    15.960000     0.000000   1121.873900
25%      27.000000    26.296250     0.000000   4740.287150
50%      39.000000    30.400000     1.000000   9382.033000
75%      51.000000    34.693750     2.000000  16639.

## Conclusion

In this lab, we have covered the basics of Python including data types, arithmetic and logical operations, conditional statements, loops, and functions. We've also explored Python's basic data structures and were introduced to the data science libraries Pandas and NumPy. Practice these concepts and get comfortable with them, as they are the building blocks for more complex data science tasks.

Remember, the aim is to create a learning experience where students can get their hands dirty with code while also understanding the theory behind the actions they perform.