# Python for Data Engineering - Day 1

## 1. Introduction and Course Overview

- Welcome and Introductions
- Course Objectives and Structure
- Importance of Python in Data Engineering

## 2. Setting Up the Environment

### Installing Anaconda

Anaconda (includes Python) - [Download here](https://www.anaconda.com/download/success)

### Setting Up Visual Studio Code (VSCode)

Visual Studio Code (VSCode) - [Download here](https://code.visualstudio.com/Download)

### Introduction to Jupyter Notebooks

In [None]:
# Example: A simple Python code in Jupyter Notebook
print('Hello, Python for Data Engineering!')

# 

# Expressions, Variables, and Comments in Python

By understanding expressions, variables, and comments, you can write clear, understandable, and well-documented Python code. These foundational concepts will help you in building more complex programs in the future.

# 

### Expressions


An expression is a combination of values, variables, and operators that Python can evaluate to produce a new value. An expression can be as simple as a single value or more complex, involving multiple operations.


Expressions are building blocks of Python programs. They are used to compute values and perform operations.
Python evaluates expressions and returns a result, which can be stored in variables or used directly.

#### Examples of Expressions

In [20]:
#Simple Expression
2 + 3  # This is an expression that adds two numbers.

5

In [22]:
#Variable-Based Expression
x = 5
y = 3
result = x + y  # The expression x + y evaluates to 8, which is assigned to the variable result.

In [23]:
#Complex Expression
total = (x * 2) + (y / 3)  # This expression involves multiplication, addition, and division.

# 

### 2. Variables

A variable is a name that refers to a value stored in memory. Variables are used to store data that can be referenced and manipulated in a program.

- Variables act like containers for storing data values.
- In Python, you create a variable by assigning it a value using the = operator.
- The value of a variable can be changed later in the program.


#### Example of Variables

In [25]:
name = "Alice"  # A string variable
age = 25        # An integer variable
height = 5.7    # A float variable

In [26]:
x = 10          # Assigns 10 to the variable x
y = 20          # Assigns 20 to the variable y
sum = x + y     # Adds x and y, stores the result in sum
print(sum)      # Outputs: 30

30


`x` and `y` are variables that store integers.
`sum` is a variable that stores the result of the expression `x + y`.

# 

## 3. Comments

Comments are lines in your code that are not executed by Python. They are used to explain what the code does or to leave notes for other programmers (or your future self!).


#### Why Use Comments?
- Documentation: Helps explain the purpose of the code.
- Readability: Makes the code easier to read and understand.
- Debugging: Allows developers to "comment out" parts of the code to test functionality.


#### Best Practices for Comments
- Keep comments clear and concise.
- Use comments to explain why something is done, not what is being done (the code itself should be readable enough to explain the "what").
- Avoid over-commenting; too many comments can make code cluttered.




#### Types of Comments

1. Single-Line Comments:

Created using the `#` symbol.
Everything after the # on that line is a comment and is ignored by Python


2. Multi-Line Comments:

In Python, multi-line comments are typically created using triple quotes (`'''` or `"""`).
Although technically these are multi-line strings, when used without being assigned to a variable, they act as comments.


In [29]:
# This is a single-line comment
x = 5  # This sets x to 5

In [30]:
#multi-line comment
"""
This program calculates the area of a rectangle.
The width and height are defined as variables.
The area is then calculated and printed.
"""
width = 10
height = 5
area = width * height
print(area)


50


#

# Data Types in Python

In Python, variables are used to store data, and data types define the kind of data that can be stored in these variables. Understanding different data types is essential for effective programming in Python.

### 1. Integer (`int`)


An integer is a whole number that can be either positive, negative, or zero, and it does not contain any fractional or decimal part.

Integers are used for counting or ordering items and for any arithmetic operations that require whole numbers.


##### examples

In [1]:
age = 30       # Age is typically represented as a whole number
temperature = -5  # Negative integers are also valid
x = 5

###

### 2. Float (`float`)

A float, or "floating-point number," is a number that has a decimal point. It can represent both positive and negative numbers and is often used for precise calculations.


Floats are used when more precision is needed than whole numbers can provide, such as in scientific calculations, measurements, and financial computations.


##### examples

In [5]:
y = 3.14  # Float
pi = 3.14159   # A more precise value of Pi
height = 5.8   # Height in meters with a decimal part

# 

### 3. String (str)

A string is a sequence of characters enclosed in single (') or double (") quotes. Strings are used to represent text data.


Strings are used for handling text, such as names, addresses, or any other textual information. They are also used in user interfaces and input/output operations.

##### examples

In [8]:
name = "Alice"  # String
greeting = "Hello, World!"   # A common introductory example
language = 'Python'          # Strings can also use single quotes

##### Common String Operations

In [10]:
#concatenate +

full_name = "Alice" + " " + "Smith"  # Result: "Alice Smith"

In [11]:
#Repetition: Repeat strings using *

repeated = "ha" * 3  # Result: "hahaha"

In [12]:
#Indexing and Slicing: Access specific parts of a string

first_letter = name[0]  # Result: 'A'
substring = name[1:3]   # Result: 'li'


#

### 4. Boolean (`bool`)

A Boolean is a data type that can only have one of two possible values: True or False. Booleans are used in conditional statements and to represent binary outcomes.


##### examples

In [14]:
is_student = True  # Boolean
is_adult = False  # A person under 18 might not be considered an adult

##### Common Boolean Operations

In [15]:
#Logical `and`: Returns True if both operands are True
result = (x > 0) and (y < 10)  # Both conditions must be True

In [16]:
#Logical `or`: Returns True if at least one operand is True
result = (x < 0) or (y > 1)  # At least one condition must be True

In [18]:
#Logical not: Inverts the Boolean value
result = not is_student  # If is_student is True, result will be False

# 

# Basic Data Structures: Lists, Tuples, Dictionaries, Sets

Python provides several built-in data types for storing collections of items. Understanding these data types will help you effectively organize and manipulate data in your programs.


- `List`: Ordered, mutable collection (useful for storing a sequence of items).
- `Tuple`: Ordered, immutable collection (useful for fixed data that shouldn't change).
- `Dictionary`: Unordered, mutable collection of key-value pairs (useful for associating keys with values).
- `Set`: Unordered, mutable collection of unique items (useful for membership testing and eliminating duplicates).


Understanding these data types will help you choose the right one for your needs and write more efficient and effective Python code.

## 1. List

A list is an ordered, mutable (changeable) collection of items. Lists are defined using square brackets `[]`, and they can store items of any data type (integers, strings, floats, other lists, etc.).


- Lists are ordered, meaning the items have a defined order that will not change unless explicitly altered.
- Lists are mutable, so you can change, add, or remove items after the list is created.

#### Example
`fruits` is a list containing three string elements: `'apple'`, `'banana'`, and `'cherry'`.

In [32]:
fruits = ['apple', 'banana', 'cherry']  # List

#### Common Operations on Lists

In [None]:
# Accessing element
print(fruits[0])  # Output: apple

In [None]:
# Modifying element
fruits[1] = 'blueberry'  # Changes 'banana' to 'blueberry'

In [33]:
# Adding Element
fruits.append('orange')  # Adds 'orange' to the end of the list

In [None]:
# Remove Element
fruits.remove('cherry')  # Removes 'cherry' from the list


#

## 2. Tuple

A tuple is an ordered, immutable (unchangeable) collection of items. Tuples are defined using parentheses `()`.
Tuples are perfect for creating read-only data structures that are meant to be traversed, accessed, or utilized without modification. They provide a way to create lightweight and high-performance sequences.


- Tuples are ordered, meaning the items have a defined order.
- Tuples are immutable, so once a tuple is created, you cannot change, add, or remove items.


#### Example
`vegetables` is a tuple containing three string elements: `'carrot'`, `'lettuce'`, and `'spinach'`.

In [34]:
vegetables = ('carrot', 'lettuce', 'spinach')  # Tuple
weekdays = ('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday') #


#### Common Operations on tuple

In [37]:
# Accessing element
print(vegetables[0])  # Output: carrot

carrot


In [42]:
# Immutability
vegetables[1] = 'cabbage'  # This will raise an error because tuples cannot be modified


#

## 3. Dictionary

A dictionary is an unordered, mutable collection of key-value pairs. Dictionaries are defined using curly braces `{}` and use a colon `:` to separate keys from their values.

- Dictionaries are unordered, meaning there is no guaranteed order of items.
- Dictionaries are mutable, so you can add, remove, or change key-value pairs.

#### Example
`person` is a dictionary with three key-value pairs:
- `'name'` is the key with the value `'Alice'`.
- `'age'` is the key with the value `25`.
- `'city'` is the key with the value `'New York'`.

In [43]:
person = {'name': 'Alice', 'age': 25, 'city': 'New York'}  # Dictionary

In [45]:
# Accessing Values:
print(person['name'])  # Output: Alice

Alice


In [46]:
# Modifying Values:
person['age'] = 26  # Changes the value associated with 'age' to 26

In [47]:
#Adding Key-Value Pairs:
person['email'] = 'alice@example.com'  # Adds a new key-value pair to the dictionary

In [None]:
# Removing Key-Value Pairs:
del person['city']  # Removes the key 'city' and its value

#

## 4. Set

A set is an unordered, mutable collection of unique items. Sets are defined using curly braces `{}` or the `set()` function.

- Sets are unordered, meaning there is no guaranteed order of items.
- Sets do not allow duplicate elements; each item must be unique.
- Sets are mutable, so you can add or remove items.


#### Example
`unique_numbers` is a set containing five unique integer elements.

In [50]:
unique_numbers = {1, 2, 3, 4, 5}  # Set

In [51]:
# Adding Elements:
unique_numbers.add(6)  # Adds 6 to the set

In [52]:
# Removing Elements:
unique_numbers.remove(3)  # Removes 3 from the set

In [53]:
# Set Operations (like mathematical sets):
even_numbers = {2, 4, 6}
intersection = unique_numbers & even_numbers  # {2, 4}
union = unique_numbers | even_numbers         # {1, 2, 3, 4, 5, 6}

# 