### Overview of Python and Its Role in Big Data

Python is a high-level, open-source programming language widely used in Big Data analytics due to its simplicity, flexibility, and rich ecosystem of libraries.

#### Why Python is Popular in Big Data

1. Ease of Use
     Simple syntax and readability make Python easy to learn and use.
     Reduces development time compared to low-level languages like Java or C++.

2. Extensive Library Ecosystem
    Python offers powerful libraries for handling large-scale data:

    NumPy â€“ numerical computing

    Pandas â€“ data manipulation and analysis

    PySpark â€“ Python API for Apache Spark

    Dask â€“ parallel computing for large datasets

    SciPy â€“ scientific and statistical analysis

3. Integration with Big Data Frameworks

    Works seamlessly with Hadoop, Spark, Hive, and HDFS.

    PySpark allows distributed data processing across clusters.

4. Support for Data Analytics and Machine Learning

    Widely used in data mining, predictive analytics, and AI.

    Libraries such as Scikit-learn, TensorFlow, and PyTorch support large-scale model development.

5. Scalability and Performance

    Although Python itself is not the fastest language, it scales efficiently by leveraging distributed systems and optimized backend engines written in C/C++.

6. Strong Community and Industry Adoption

    Backed by a large global community.

    Used by companies like Google, Netflix, Amazon, and Facebook for big data solutions.

### Role of Python in the Big Data Lifecycle

1. Data Ingestion: Reading data from multiple sources (databases, APIs, logs).

2. Data Processing: Cleaning, transforming, and aggregating massive datasets.

3. Data Analysis: Statistical analysis and pattern discovery.

4. Data Visualization: Tools like Matplotlib and Seaborn for insights.

5. Machine Learning: Building scalable predictive models.

#### Setting Up Python Environment
ðŸ”¹ Anaconda (Recommended for Data Science)

Anaconda is a Python distribution that comes with Python, Jupyter Notebook, and popular data science libraries preinstalled.

Download Anaconda:
ðŸ‘‰ https://www.anaconda.com/download

Includes: Python, NumPy, Pandas, Matplotlib, Scikit-learn, Jupyter

Useful for Big Data, ML, and analytics

Environment management using conda

ðŸ”¹ Local IDE Options

Jupyter Notebook (included in Anaconda)
ðŸ‘‰ https://jupyter.org/


#### Cloud-Based Python Options (No Installation)
âœ… Google Colab

Free cloud-based Python environment

Preinstalled data science & ML libraries

Supports GPU/TPU

ðŸ‘‰ https://colab.research.google.com/

### Data Types

Data types define the kind of data a variable can store in Python.

Common built-in data types:

int â€“ Integer values (e.g., 10, -5)

float â€“ Decimal numbers (e.g., 3.14)

complex â€“ Complex numbers (e.g., 2+3j)

str â€“ Text or strings (e.g., "Python")

bool â€“ Boolean values (True, False)

list â€“ Ordered, mutable collection (e.g., [1, 2, 3])

tuple â€“ Ordered, immutable collection (e.g., (1, 2, 3))

set â€“ Unordered, unique elements (e.g., {1, 2, 3})

dict â€“ Key-value pairs (e.g., {"id": 1, "name": "AI"})

In [1]:
#integer
x = 10
y = -5
print(type(x))

<class 'int'>


In [2]:
#float
pi = 3.14
print(type(pi))

<class 'float'>


In [3]:
#complex
z = 2 + 3j
print(type(z))


<class 'complex'>


In [4]:
#string
name = "Python"
print(name.upper())

PYTHON


In [5]:
#boolen
is_valid = True
print(type(is_valid))


<class 'bool'>


##### Variables

Variables are used to store data values in memory.

No need to declare data type explicitly

Type is assigned dynamically

In [6]:
x = 10
name = "Python"
is_active = True
print(name)

Python


##### Operators

Operators are used to perform operations on variables and values.

Types of Operators:

Arithmetic Operators
+,  - , * / % ** //

Relational (Comparison) Operators
== != > < >= <=

Assignment Operators
= += -= *= /=

Logical Operators
and or not

Bitwise Operators
& | ^ ~ << >>

Membership Operators
in not in

Identity Operators
is is not

In [7]:
#arithematic
a = 10
b = 3
print(a + b, a * b, a%b)


13 30 1


In [8]:
#comparison
print(a > b)
print(a == b)

True
False


In [9]:
#assignment operators
a += 5
print(a)

15


In [10]:
# logical operators
x = True
y = False
print(x and y)


False


### Homework - Due Next Day Before Lecture

Write a Python program that does the following:

Store the following information using appropriate data types:

Name

Age

Height

Is the student enrolled? (True/False)

Print each value along with its data type.