# Syntecxhub Internship  
## Project 2 â€“ Pandas CSV Reader & Basic Analysis  

### Objective  
This project demonstrates how to:  
- Read CSV data using Pandas  
- Inspect dataset structure  
- Perform basic statistical analysis  
- Filter and manipulate data  
- Export processed results  

### Tools Used  
- Python  
- Pandas  


In [1]:
import pandas as pd


In [2]:
df = pd.read_csv("E:\\Data Science Intership at Syntecxhub\\Week 1\\sales_data.csv")
df


Unnamed: 0,OrderID,Customer,Region,Sales,Profit,OrderDate
0,1,Amit,West,5000,800,2024-01-10
1,2,Riya,North,7000,1200,2024-01-12
2,3,John,East,6500,900,2024-01-15
3,4,Sneha,South,4000,500,2024-01-18
4,5,Rahul,West,7200,1500,2024-01-20
5,6,Pooja,North,3000,300,2024-01-22
6,7,David,East,8000,2000,2024-01-25
7,8,Neha,South,6000,1100,2024-01-28


In [3]:
df.head()

Unnamed: 0,OrderID,Customer,Region,Sales,Profit,OrderDate
0,1,Amit,West,5000,800,2024-01-10
1,2,Riya,North,7000,1200,2024-01-12
2,3,John,East,6500,900,2024-01-15
3,4,Sneha,South,4000,500,2024-01-18
4,5,Rahul,West,7200,1500,2024-01-20


In [4]:
df.tail()

Unnamed: 0,OrderID,Customer,Region,Sales,Profit,OrderDate
3,4,Sneha,South,4000,500,2024-01-18
4,5,Rahul,West,7200,1500,2024-01-20
5,6,Pooja,North,3000,300,2024-01-22
6,7,David,East,8000,2000,2024-01-25
7,8,Neha,South,6000,1100,2024-01-28


In [5]:
df.info

<bound method DataFrame.info of    OrderID Customer Region  Sales  Profit   OrderDate
0        1     Amit   West   5000     800  2024-01-10
1        2     Riya  North   7000    1200  2024-01-12
2        3     John   East   6500     900  2024-01-15
3        4    Sneha  South   4000     500  2024-01-18
4        5    Rahul   West   7200    1500  2024-01-20
5        6    Pooja  North   3000     300  2024-01-22
6        7    David   East   8000    2000  2024-01-25
7        8     Neha  South   6000    1100  2024-01-28>

In [6]:
df.dtypes


OrderID       int64
Customer     object
Region       object
Sales         int64
Profit        int64
OrderDate    object
dtype: object

## Data Summary & Statistical Analysis  

In this section, we compute:  
- Mean  
- Median  
- Minimum and Maximum values  
- Count of records  
- Overall statistical summary  


### Overall Statistical Summary

In [7]:
df.describe()

Unnamed: 0,OrderID,Sales,Profit
count,8.0,8.0,8.0
mean,4.5,5837.5,1037.5
std,2.44949,1710.419748,544.944296
min,1.0,3000.0,300.0
25%,2.75,4750.0,725.0
50%,4.5,6250.0,1000.0
75%,6.25,7050.0,1275.0
max,8.0,8000.0,2000.0


### meean

In [8]:
df["Sales"].mean()

np.float64(5837.5)

In [9]:
df["Profit"].mean()


np.float64(1037.5)

### Median

In [10]:
df["Sales"].median()

6250.0

In [11]:
df["Profit"].median()

1000.0

### Minimum & Maximum

In [13]:
df["Sales"].min()

3000

In [14]:
df["Sales"].max()


8000

In [15]:
df["Profit"].min()


300

In [16]:
df["Profit"].max()


2000

### Count of Records

In [17]:
df["Sales"].count()


np.int64(8)

In [18]:
len(df)


8

## Data Filtering and Selection  

In this section, we:  
- Filter rows based on conditions  
- Select specific columns  
- Create subsets of the dataset  


### Filter Rows (Sales greater than 6000)

In [19]:
high_sales = df[df["Sales"] > 6000]
high_sales

Unnamed: 0,OrderID,Customer,Region,Sales,Profit,OrderDate
1,2,Riya,North,7000,1200,2024-01-12
2,3,John,East,6500,900,2024-01-15
4,5,Rahul,West,7200,1500,2024-01-20
6,7,David,East,8000,2000,2024-01-25


### Filter by Region (Example: West)

In [20]:
west_region = df[df["Region"] == "West"]
west_region


Unnamed: 0,OrderID,Customer,Region,Sales,Profit,OrderDate
0,1,Amit,West,5000,800,2024-01-10
4,5,Rahul,West,7200,1500,2024-01-20


### Select Specific Columns

In [21]:
df[["Customer", "Sales", "Profit"]]


Unnamed: 0,Customer,Sales,Profit
0,Amit,5000,800
1,Riya,7000,1200
2,John,6500,900
3,Sneha,4000,500
4,Rahul,7200,1500
5,Pooja,3000,300
6,David,8000,2000
7,Neha,6000,1100


### Filter + Select Together

In [22]:
df[df["Sales"] > 6000][["Customer", "Sales"]]


Unnamed: 0,Customer,Sales
1,Riya,7000
2,John,6500
4,Rahul,7200
6,David,8000


### Row Slicing (First 3 Rows)

In [23]:
df.iloc[0:3]


Unnamed: 0,OrderID,Customer,Region,Sales,Profit,OrderDate
0,1,Amit,West,5000,800,2024-01-10
1,2,Riya,North,7000,1200,2024-01-12
2,3,John,East,6500,900,2024-01-15


### Specific Row & Column using iloc

In [24]:
df.iloc[0, 3]

np.int64(5000)

## Exporting Filtered Results  

In this section, we export processed data into:  
- CSV file  
- Excel file  

This demonstrates data saving and reporting capability using Pandas.  


### Save High Sales Data to CSV

In [25]:
high_sales.to_csv("high_sales_data.csv", index=False)


### Save West Region Data to Excel

In [26]:
west_region.to_excel("west_region_data.xlsx", index=False)


### Final Confirmation Print

In [27]:
print("Filtered files exported successfully.")

Filtered files exported successfully.
