**What is Data Analysis?**

Data Analysis is the process of inspecting, cleaning, transforming, and modeling data to discover useful information, draw conclusions, and 
support decision-making. It is a crucial step in any data-driven approach, helping organizations and individuals make informed decisions by interpreting 
data patterns, trends, and insights.

**Steps in Data Analysis:**

**Data Collection:** Gathering raw data from various sources such as databases, APIs, surveys, or logs.

**Data Cleaning:** Removing or correcting inaccuracies, duplicates, and inconsistencies in the data.

**Exploratory Data Analysis (EDA):** Summarizing the main characteristics of the data using statistical methods and visualization tools.

**Data Transformation:** Preparing the data for analysis by normalizing, aggregating, or structuring it appropriately.

**Analysis and Modeling:** Applying techniques like statistical methods, machine learning, or predictive modeling to extract insights.

**Visualization and Reporting:** Presenting the results through dashboards, charts, graphs, or reports to communicate findings effectively.

**Tools: Excel, Python (Pandas, NumPy, Matplotlib, Seaborn):**

**Applications of Data Analysis:**

**Business:** Market trend analysis, customer segmentation, and performance evaluation.

**Healthcare:** Patient diagnosis, medical research, and drug effectiveness studies.

**Finance:** Fraud detection, risk assessment, and investment strategies.

**Education:** Analyzing student performance and improving learning outcomes.

**Sports:** Player performance evaluation and game strategy optimization.

**Simple Scenario:**

A retail company wants to analyze its sales data to understand trends and improve sales performance.

**1. Data Collection**
    
**Example:** Collect sales data for the past year from the company’s point-of-sale (POS) system.

**Data Includes:**
  1. Date of sale
  2. Product category
  3. Quantity sold
  4. Revenue
  5. Customer demographics (age, location)
     
**Purpose:** Gather raw data that answers questions like "Which products sell the most?" or "What regions are underperforming?"

**2. Data Cleaning**

Example: Inspect the dataset for issues.
                            
1. Remove duplicate sales entries.
   
3. Correct inconsistencies in product names (e.g., "t-shirt" vs. "T-shirt").
   
5. Handle missing data, such as revenue values for some transactions.
   
Why?: Clean data ensures accurate and reliable analysis.

**3. Exploratory Data Analysis (EDA)**

**Example:** Use descriptive statistics and visualizations to explore the data.
1. Find the total sales revenue.
2. Identify which product categories generate the most revenue.
   
4. Plot sales trends over time (e.g., sales increase during the holiday season).
   
Tool: Use Python (Matplotlib, Pandas) or Excel to create charts and summaries.

**Outcome:**

"Electronics" is the top-selling category.

Sales peak in December and dip in February.


**4. Data Transformation**

**Example:** Prepare the data for deeper analysis.
  
1. Group data by month to analyze monthly trends.
2. Aggregate data by customer age groups to understand customer segmentation.

  Why?: It makes patterns and relationships easier to identify.

**5. Analysis and Modeling**

**Example:** Answer key business questions:

1. Use trend analysis to predict next year's sales during peak seasons.
2. Apply clustering to group customers by purchase behavior.
3. Perform a correlation analysis to check if discounts lead to higher sales.

**Outcome:**

1. Discounts are most effective for electronics during the holiday season.
2. Younger customers (ages 18–25) prefer fashion-related products.

**6. Visualization and Reporting**

**Example:** Present findings to the management team.
1. Create a bar chart showing monthly sales revenue.
2. Use a pie chart to represent sales by product category.
3. Build a dashboard in Tableau or Power BI for interactive exploration.

**Insights Shared:**
1. Focus on stocking electronics in December for maximum sales.
2. Offer targeted discounts for fashion products to younger customers.

In [23]:
import pandas as pd
df=pd.read_csv("C:\\Users\\CVR\\Downloads\\Uber.csv")
print(df)
print(df.head())
print(df.tail())
print(df.iloc[0])
print(df.iloc[2:8])
print(df.iloc[2:8,1:3])
print(df.shape)
print(df.info())
print(df.describe())
print(df.isnull().sum())
print(df.iloc[-1:])

           START_DATE*         END_DATE* CATEGORY*            START*  \
0       1/1/2016 21:11    1/1/2016 21:17  Business       Fort Pierce   
1        1/2/2016 1:25     1/2/2016 1:37  Business       Fort Pierce   
2       1/2/2016 20:25    1/2/2016 20:38  Business       Fort Pierce   
3       1/5/2016 17:31    1/5/2016 17:45  Business       Fort Pierce   
4       1/6/2016 14:42    1/6/2016 15:49  Business       Fort Pierce   
...                ...               ...       ...               ...   
1151  12/31/2016 13:24  12/31/2016 13:42  Business           Kar?chi   
1152  12/31/2016 15:03  12/31/2016 15:38  Business  Unknown Location   
1153  12/31/2016 21:32  12/31/2016 21:50  Business        Katunayake   
1154  12/31/2016 22:08  12/31/2016 23:51  Business           Gampaha   
1155            Totals               NaN       NaN               NaN   

                 STOP*   MILES*         PURPOSE*  
0          Fort Pierce      5.1   Meal/Entertain  
1          Fort Pierce      5.0  

In [24]:
print(df['START*'].unique())

['Fort Pierce' 'West Palm Beach' 'Cary' 'Jamaica' 'New York' 'Elmhurst'
 'Midtown' 'East Harlem' 'Flatiron District' 'Midtown East'
 'Hudson Square' 'Lower Manhattan' "Hell's Kitchen" 'Downtown' 'Gulfton'
 'Houston' 'Eagan Park' 'Morrisville' 'Durham' 'Farmington Woods'
 'Whitebridge' 'Lake Wellingborough' 'Fayetteville Street' 'Raleigh'
 'Hazelwood' 'Fairmont' 'Meredith Townes' 'Apex' 'Chapel Hill'
 'Northwoods' 'Edgehill Farms' 'Tanglewood' 'Preston' 'Eastgate'
 'East Elmhurst' 'Jackson Heights' 'Long Island City' 'Katunayaka'
 'Unknown Location' 'Colombo' 'Nugegoda' 'Islamabad' 'R?walpindi'
 'Noorpur Shahan' 'Heritage Pines' 'Westpark Place' 'Waverly Place'
 'Wayne Ridge' 'Weston' 'East Austin' 'West University' 'South Congress'
 'The Drag' 'Congress Ave District' 'Red River District' 'Georgian Acres'
 'North Austin' 'Coxville' 'Convention Center District' 'Austin' 'Katy'
 'Sharpstown' 'Sugar Land' 'Galveston' 'Port Bolivar' 'Washington Avenue'
 'Briar Meadow' 'Latta' 'Jacksonville'

In [26]:
df['START*'].value_counts()

START*
Cary                201
Unknown Location    148
Morrisville          85
Whitebridge          68
Islamabad            57
                   ... 
Florence              1
Ridgeland             1
Daytona Beach         1
Sky Lake              1
Gampaha               1
Name: count, Length: 177, dtype: int64

In [32]:
a= df[df['MILES*'] > 50]
print(a)

           START_DATE*         END_DATE* CATEGORY*            START*  \
4       1/6/2016 14:42    1/6/2016 15:49  Business       Fort Pierce   
232    3/17/2016 12:52   3/17/2016 15:11  Business            Austin   
251    3/19/2016 19:33   3/19/2016 20:39  Business         Galveston   
268    3/25/2016 13:24   3/25/2016 16:22  Business              Cary   
269    3/25/2016 16:52   3/25/2016 22:22  Business             Latta   
270    3/25/2016 22:54    3/26/2016 1:39  Business      Jacksonville   
295     4/2/2016 12:21    4/2/2016 14:47  Business         Kissimmee   
296     4/2/2016 16:57    4/2/2016 18:09  Business     Daytona Beach   
297     4/2/2016 19:38    4/2/2016 22:36  Business      Jacksonville   
298     4/2/2016 23:11     4/3/2016 1:34  Business         Ridgeland   
299      4/3/2016 2:00     4/3/2016 4:16  Business          Florence   
546    7/14/2016 16:39   7/14/2016 20:05  Business       Morrisville   
559    7/17/2016 12:20   7/17/2016 15:25  Personal             B

In [31]:
a= df[(df['MILES*'] > 50)&(df['MILES*']<100)]
print(a)

          START_DATE*         END_DATE* CATEGORY*            START*  \
4      1/6/2016 14:42    1/6/2016 15:49  Business       Fort Pierce   
251   3/19/2016 19:33   3/19/2016 20:39  Business         Galveston   
295    4/2/2016 12:21    4/2/2016 14:47  Business         Kissimmee   
296    4/2/2016 16:57    4/2/2016 18:09  Business     Daytona Beach   
707   8/24/2016 13:01   8/24/2016 15:25  Business  Unknown Location   
710   8/25/2016 17:19   8/25/2016 19:20  Business  Unknown Location   
726   8/27/2016 14:01   8/27/2016 15:44  Business            Lahore   
751    9/6/2016 17:49    9/6/2016 17:49  Business  Unknown Location   
871  10/28/2016 20:13  10/28/2016 22:00  Business         Asheville   
873  10/29/2016 17:13  10/29/2016 19:19  Business        Hayesville   
880  10/30/2016 13:24  10/30/2016 14:37  Business       Bryson City   

                STOP*  MILES*        PURPOSE*  
4     West Palm Beach    63.7  Customer Visit  
251           Houston    57.0  Customer Visit  
295

In [36]:
#to print the miles between 50 and 100 and displaying the columns of start,stop,miles
print(a.iloc[:,3:6])

                START*             STOP*   MILES*
4          Fort Pierce   West Palm Beach     63.7
232             Austin              Katy    136.0
251          Galveston           Houston     57.0
268               Cary             Latta    144.0
269              Latta      Jacksonville    310.3
270       Jacksonville         Kissimmee    201.0
295          Kissimmee     Daytona Beach     77.3
296      Daytona Beach      Jacksonville     80.5
297       Jacksonville         Ridgeland    174.2
298          Ridgeland          Florence    144.0
299           Florence              Cary    159.3
546        Morrisville        Banner Elk    195.3
559              Boone              Cary    180.2
707   Unknown Location  Unknown Location     96.2
710   Unknown Location  Unknown Location     50.4
726             Lahore  Unknown Location     86.6
727   Unknown Location  Unknown Location    156.9
751   Unknown Location  Unknown Location     69.1
776   Unknown Location  Unknown Location    195.6


In [37]:
#display three unique cities
df.loc[df['START*'].isin(['New York','Cary','Topton'])]

Unnamed: 0,START_DATE*,END_DATE*,CATEGORY*,START*,STOP*,MILES*,PURPOSE*
7,1/7/2016 13:27,1/7/2016 13:33,Business,Cary,Cary,0.8,Meeting
8,1/10/2016 8:05,1/10/2016 8:25,Business,Cary,Morrisville,8.3,Meeting
10,1/10/2016 15:08,1/10/2016 15:51,Business,New York,Queens,10.8,Meeting
22,1/12/2016 16:02,1/12/2016 17:00,Business,New York,Queens County,15.1,Meeting
28,1/15/2016 11:43,1/15/2016 12:03,Business,Cary,Durham,10.4,Meal/Entertain
...,...,...,...,...,...,...,...
1049,12/13/2016 20:20,12/13/2016 20:29,Business,Cary,Cary,4.1,Meal/Entertain
1050,12/14/2016 16:52,12/14/2016 17:10,Business,Cary,Cary,3.4,
1051,12/14/2016 17:22,12/14/2016 17:34,Business,Cary,Cary,3.3,
1052,12/14/2016 17:50,12/14/2016 18:00,Business,Cary,Morrisville,3.0,Meal/Entertain


In [73]:
c = df.loc[
    (df['START*'].isin(['New York', 'Cary', 'Topton'])) & 
    (df['STOP*'].isin(['Queens', 'Durham', 'Houston'])) & 
    (df['MILES*'] > 10) & 
    (df['MILES*'] < 20)
]
c.reset_index(inplace=True,drop=True)
print(c)

df.sort_values(by='MILES*')
df.sort_values(by='MILES*',ascending=False)
#df.sort_values(by=['START*','MILES*'],ascending=[True,False])
#df.sort_values(by='START*',ascending=True)

           START_DATE*         END_DATE* CATEGORY*    START*   STOP*  MILES*  \
0  2016-01-10 15:08:00   1/10/2016 15:51  Business  New York  Queens    10.8   
1  2016-01-15 11:43:00   1/15/2016 12:03  Business      Cary  Durham    10.4   
2  2016-01-29 11:43:00   1/29/2016 12:03  Business      Cary  Durham    10.4   
3  2016-02-05 11:47:00    2/5/2016 12:07  Business      Cary  Durham    10.4   
4  2016-02-26 11:35:00   2/26/2016 11:59  Business      Cary  Durham    10.6   
5  2016-03-04 11:46:00    3/4/2016 12:06  Business      Cary  Durham    10.4   
6  2016-04-08 12:30:00    4/8/2016 12:48  Business      Cary  Durham    10.5   
7  2016-04-22 12:08:00   4/22/2016 12:28  Business      Cary  Durham    10.4   
8  2016-04-29 18:46:00   4/29/2016 19:18  Business      Cary  Durham    14.2   
9  2016-06-03 11:29:00    6/3/2016 11:49  Business      Cary  Durham    10.4   
10 2016-06-06 21:41:00    6/6/2016 22:00  Business      Cary  Durham    10.4   
11 2016-06-07 21:42:00    6/7/2016 22:00

Unnamed: 0,START_DATE*,END_DATE*,CATEGORY*,START*,STOP*,MILES*,PURPOSE*
1155,NaT,,,,,12204.7,
269,2016-03-25 16:52:00,3/25/2016 22:22,Business,Latta,Jacksonville,310.3,Customer Visit
270,2016-03-25 22:54:00,3/26/2016 1:39,Business,Jacksonville,Kissimmee,201.0,Meeting
881,2016-10-30 15:22:00,10/30/2016 18:23,Business,Asheville,Mebane,195.9,
776,2016-09-27 21:01:00,9/28/2016 2:37,Business,Unknown Location,Unknown Location,195.6,
...,...,...,...,...,...,...,...
1121,2016-12-27 12:53:00,12/27/2016 12:57,Business,Kar?chi,Kar?chi,0.6,Meal/Entertain
1110,2016-12-24 22:04:00,12/24/2016 22:09,Business,Lahore,Lahore,0.6,Errand/Supplies
44,2016-01-26 17:27:00,1/26/2016 17:29,Business,Cary,Cary,0.5,Errand/Supplies
420,2016-06-08 17:16:00,6/8/2016 17:18,Business,Soho,Tribeca,0.5,Errand/Supplies


In [79]:
import numpy as np
df["MILES_CAT"]=np.where(df['MILES*']>100,"Long trip","short trip")

df.head()
df['nc']=10
df

Unnamed: 0,START_DATE*,END_DATE*,CATEGORY*,START*,STOP*,MILES*,PURPOSE*,MILES_CAT,nc
0,2016-01-01 21:11:00,1/1/2016 21:17,Business,Fort Pierce,Fort Pierce,5.1,Meal/Entertain,short trip,10
1,2016-01-02 01:25:00,1/2/2016 1:37,Business,Fort Pierce,Fort Pierce,5.0,,short trip,10
2,2016-01-02 20:25:00,1/2/2016 20:38,Business,Fort Pierce,Fort Pierce,4.8,Errand/Supplies,short trip,10
3,2016-01-05 17:31:00,1/5/2016 17:45,Business,Fort Pierce,Fort Pierce,4.7,Meeting,short trip,10
4,2016-01-06 14:42:00,1/6/2016 15:49,Business,Fort Pierce,West Palm Beach,63.7,Customer Visit,short trip,10
...,...,...,...,...,...,...,...,...,...
1151,2016-12-31 13:24:00,12/31/2016 13:42,Business,Kar?chi,Unknown Location,3.9,Temporary Site,short trip,10
1152,2016-12-31 15:03:00,12/31/2016 15:38,Business,Unknown Location,Unknown Location,16.2,Meeting,short trip,10
1153,2016-12-31 21:32:00,12/31/2016 21:50,Business,Katunayake,Gampaha,6.4,Temporary Site,short trip,10
1154,2016-12-31 22:08:00,12/31/2016 23:51,Business,Gampaha,Ilukwatta,48.2,Temporary Site,short trip,10


In [84]:
import numpy as np
df['MILES_CAT'] = np.where(df['MILES*'] <= 100, "Short trip", 
                  np.where((df['MILES*'] > 100) & (df['MILES*'] <= 200), "Medium trip", 
                  np.where(df['MILES*'] > 200, "Long trip", "Unknown")))
df


Unnamed: 0,START_DATE*,END_DATE*,CATEGORY*,START*,STOP*,MILES*,PURPOSE*,MILES_CAT,nc,MILES_CAT1
0,2016-01-01 21:11:00,1/1/2016 21:17,Business,Fort Pierce,Fort Pierce,5.1,Meal/Entertain,Short trip,10,Short trip
1,2016-01-02 01:25:00,1/2/2016 1:37,Business,Fort Pierce,Fort Pierce,5.0,,Short trip,10,Short trip
2,2016-01-02 20:25:00,1/2/2016 20:38,Business,Fort Pierce,Fort Pierce,4.8,Errand/Supplies,Short trip,10,Short trip
3,2016-01-05 17:31:00,1/5/2016 17:45,Business,Fort Pierce,Fort Pierce,4.7,Meeting,Short trip,10,Short trip
4,2016-01-06 14:42:00,1/6/2016 15:49,Business,Fort Pierce,West Palm Beach,63.7,Customer Visit,Short trip,10,Medium
...,...,...,...,...,...,...,...,...,...,...
1151,2016-12-31 13:24:00,12/31/2016 13:42,Business,Kar?chi,Unknown Location,3.9,Temporary Site,Short trip,10,Short trip
1152,2016-12-31 15:03:00,12/31/2016 15:38,Business,Unknown Location,Unknown Location,16.2,Meeting,Short trip,10,Short trip
1153,2016-12-31 21:32:00,12/31/2016 21:50,Business,Katunayake,Gampaha,6.4,Temporary Site,Short trip,10,Short trip
1154,2016-12-31 22:08:00,12/31/2016 23:51,Business,Gampaha,Ilukwatta,48.2,Temporary Site,Short trip,10,Short trip


In [83]:
import numpy as np
conditions=[(df['MILES*']>100),(df['MILES*']>50)&(df['MILES*']<=100),(df['MILES*']<=50)]
categories=["Long trip","Medium","Short trip"]
df["MILES_CAT1"]=np.select(conditions,categories,default="Unknown")
df

Unnamed: 0,START_DATE*,END_DATE*,CATEGORY*,START*,STOP*,MILES*,PURPOSE*,MILES_CAT,nc,MILES_CAT1
0,2016-01-01 21:11:00,1/1/2016 21:17,Business,Fort Pierce,Fort Pierce,5.1,Meal/Entertain,Short trip,10,Short trip
1,2016-01-02 01:25:00,1/2/2016 1:37,Business,Fort Pierce,Fort Pierce,5.0,,Short trip,10,Short trip
2,2016-01-02 20:25:00,1/2/2016 20:38,Business,Fort Pierce,Fort Pierce,4.8,Errand/Supplies,Short trip,10,Short trip
3,2016-01-05 17:31:00,1/5/2016 17:45,Business,Fort Pierce,Fort Pierce,4.7,Meeting,Short trip,10,Short trip
4,2016-01-06 14:42:00,1/6/2016 15:49,Business,Fort Pierce,West Palm Beach,63.7,Customer Visit,Short trip,10,Medium
...,...,...,...,...,...,...,...,...,...,...
1151,2016-12-31 13:24:00,12/31/2016 13:42,Business,Kar?chi,Unknown Location,3.9,Temporary Site,Short trip,10,Short trip
1152,2016-12-31 15:03:00,12/31/2016 15:38,Business,Unknown Location,Unknown Location,16.2,Meeting,Short trip,10,Short trip
1153,2016-12-31 21:32:00,12/31/2016 21:50,Business,Katunayake,Gampaha,6.4,Temporary Site,Short trip,10,Short trip
1154,2016-12-31 22:08:00,12/31/2016 23:51,Business,Gampaha,Ilukwatta,48.2,Temporary Site,Short trip,10,Short trip


In [89]:
df['MILES_CAT1'].value_counts()

MILES_CAT1
Short trip    1128
Long trip       17
Medium          11
Name: count, dtype: int64

In [90]:
df.groupby('START*')['MILES*'].agg('mean')

START*
Agnew                2.775000
Almond              15.200000
Apex                 5.341176
Arabi               17.000000
Arlington            4.900000
                      ...    
West University      2.200000
Weston               4.000000
Westpark Place       2.182353
Whitebridge          4.020588
Winston Salem      133.600000
Name: MILES*, Length: 177, dtype: float64

In [91]:
df.groupby('PURPOSE*')['MILES*'].sum()

PURPOSE*
Airport/Travel       16.5
Between Offices     197.0
Charity ($)          15.1
Commute             180.2
Customer Visit     2089.5
Errand/Supplies     508.0
Meal/Entertain      911.7
Meeting            2851.3
Moving               18.2
Temporary Site      523.7
Name: MILES*, dtype: float64

In [97]:
grouped=df.groupby('PURPOSE*')['MILES*'].agg(['sum','mean','max'])
print(grouped)

                    sum        mean    max
PURPOSE*                                  
Airport/Travel     16.5    5.500000    7.6
Between Offices   197.0   10.944444   39.2
Charity ($)        15.1   15.100000   15.1
Commute           180.2  180.200000  180.2
Customer Visit   2089.5   20.688119  310.3
Errand/Supplies   508.0    3.968750   22.3
Meal/Entertain    911.7    5.698125   36.5
Meeting          2851.3   15.247594  201.0
Moving             18.2    4.550000    6.1
Temporary Site    523.7   10.474000   48.2


In [54]:
import pandas as pd
df = pd.read_csv("C:\\Users\\CVR\\Downloads\\Uber.csv")
df['START_DATE*'] = pd.to_datetime(df['START_DATE*'], errors='coerce')
s=df[(df['START_DATE*'].dt.year == 2016) & (df['START_DATE*'].dt.month == 1) &(df['START*']=='Cary')]
print(s.iloc[50:61])

Empty DataFrame
Columns: [START_DATE*, END_DATE*, CATEGORY*, START*, STOP*, MILES*, PURPOSE*]
Index: []


In [53]:
import pandas as pd
df = pd.read_csv("C:\\Users\\CVR\\Downloads\\Uber.csv")
df['START_DATE*'] = pd.to_datetime(df['START_DATE*'], errors='coerce')
df['END_DATE*'] = pd.to_datetime(df['END_DATE*'], errors='coerce')
df[(df['START_DATE*'].dt.year == 2016) & (df['START_DATE*'].dt.month == 1) &(df['END_DATE*'].dt.year == 2016) & (df['END_DATE*'].dt.month == 1)&(df['START*']=='Cary')]

Unnamed: 0,START_DATE*,END_DATE*,CATEGORY*,START*,STOP*,MILES*,PURPOSE*
7,2016-01-07 13:27:00,2016-01-07 13:33:00,Business,Cary,Cary,0.8,Meeting
8,2016-01-10 08:05:00,2016-01-10 08:25:00,Business,Cary,Morrisville,8.3,Meeting
28,2016-01-15 11:43:00,2016-01-15 12:03:00,Business,Cary,Durham,10.4,Meal/Entertain
30,2016-01-18 14:55:00,2016-01-18 15:06:00,Business,Cary,Cary,4.8,Meal/Entertain
34,2016-01-20 10:36:00,2016-01-20 11:11:00,Business,Cary,Raleigh,17.1,Meeting
37,2016-01-21 14:25:00,2016-01-21 14:29:00,Business,Cary,Cary,1.6,Errand/Supplies
38,2016-01-21 14:43:00,2016-01-21 14:51:00,Business,Cary,Cary,2.4,Meal/Entertain
39,2016-01-21 16:01:00,2016-01-21 16:06:00,Business,Cary,Cary,1.0,Meal/Entertain
43,2016-01-26 17:17:00,2016-01-26 17:22:00,Business,Cary,Cary,1.4,Errand/Supplies
44,2016-01-26 17:27:00,2016-01-26 17:29:00,Business,Cary,Cary,0.5,Errand/Supplies


In [33]:
b=df[df['START*']=='New York']
print(b)

         START_DATE*        END_DATE* CATEGORY*    START*             STOP*  \
10   1/10/2016 15:08  1/10/2016 15:51  Business  New York            Queens   
22   1/12/2016 16:02  1/12/2016 17:00  Business  New York     Queens County   
106  2/14/2016 16:35  2/14/2016 17:02  Business  New York  Long Island City   
423  6/10/2016 15:19  6/10/2016 16:28  Business  New York           Jamaica   

     MILES* PURPOSE*  
10     10.8  Meeting  
22     15.1  Meeting  
106    13.0  Meeting  
423    16.3  Meeting  


In [35]:
print(a['MILES*'])

4          63.7
232       136.0
251        57.0
268       144.0
269       310.3
270       201.0
295        77.3
296        80.5
297       174.2
298       144.0
299       159.3
546       195.3
559       180.2
707        96.2
710        50.4
726        86.6
727       156.9
751        69.1
776       195.6
788       112.6
869       107.0
870       133.6
871        91.8
873        75.7
880        68.4
881       195.9
1088      103.0
1155    12204.7
Name: MILES*, dtype: float64


In [7]:
import pandas as pd
temp=pd.DataFrame(
{
    'A':[1,2,3,4],
    'B':[10,20,30,40],
    'C':['2025-1-19','2025-1-21','2025-1-11','2025-1-22']
})
print(temp.info())
temp['C']=pd.to_datetime(temp['C'])
temp.dtypes

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   A       4 non-null      int64 
 1   B       4 non-null      int64 
 2   C       4 non-null      object
dtypes: int64(2), object(1)
memory usage: 228.0+ bytes
None


A             int64
B             int64
C    datetime64[ns]
dtype: object

In [14]:
import pandas as pd
temp = pd.DataFrame(
{
    'A': [1, 2, 3, 4],
    'B': [10, 20, 30, 40],
    'C': ['2025-1-19', '2025-1-21', '2025-1-11', '2025-1-22']
})
print(temp.info())
temp['C'] = pd.to_datetime(temp['C'])
print(temp.dtypes)
temp['C'] = temp['C'].dt.strftime('%d-%m-%Y')
print(temp.head())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   A       4 non-null      int64 
 1   B       4 non-null      int64 
 2   C       4 non-null      object
dtypes: int64(2), object(1)
memory usage: 228.0+ bytes
None
A             int64
B             int64
C    datetime64[ns]
dtype: object
   A   B           C
0  1  10  19-01-2025
1  2  20  21-01-2025
2  3  30  11-01-2025
3  4  40  22-01-2025


A    string[python]
B             int64
C            object
dtype: object

In [15]:
temp['A']=(temp['A']).astype('string')
temp.dtypes


A    string[python]
B             int64
C            object
dtype: object

In [16]:
temp['B']=(temp['B']).astype('float')
temp.dtypes

A    string[python]
B           float64
C            object
dtype: object

In [18]:
print(df['START*'].unique())

NameError: name 'df' is not defined