In [None]:
Q1: Load the Flight Price Dataset and Examine its Dimensions
To examine the dimensions of the flight price dataset, you would first load the dataset using pandas and check the number of rows and columns.
Code Example:
import pandas as pd

# Load the dataset
flight_data = pd.read_csv("flight_price_dataset.csv")

# Check the dimensions
print(flight_data.shape)
This will output something like (row_count, column_count) where row_count is the number of rows and column_count is the number of columns.
________________________________________
Q2: Distribution of Flight Prices
To visualize the distribution of flight prices, you can use a histogram.
import matplotlib.pyplot as plt
import seaborn as sns

# Create a histogram of flight prices
plt.figure(figsize=(10, 6))
sns.histplot(flight_data['Price'], kde=True)
plt.title('Distribution of Flight Prices')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()
This will show the spread of prices, helping to visualize the overall distribution (whether it's skewed, normal, etc.).
________________________________________
Q3: Range of Prices
You can calculate the minimum and maximum price in the dataset to understand the price range.
# Get the range of prices
min_price = flight_data['Price'].min()
max_price = flight_data['Price'].max()

print(f"Minimum Price: {min_price}")
print(f"Maximum Price: {max_price}")
This will give you the range of prices, providing insight into the lowest and highest flight prices.
________________________________________
Q4: Price Variation by Airline
To compare how flight prices vary by airline, you can create a boxplot.
Code Example:
plt.figure(figsize=(12, 6))
sns.boxplot(x='Airline', y='Price', data=flight_data)
plt.xticks(rotation=90)
plt.title('Flight Prices by Airline')
plt.xlabel('Airline')
plt.ylabel('Price')
plt.show()
The boxplot will show the spread, median, and possible outliers in prices for each airline, helping to visualize how prices differ across airlines.
________________________________________
Q5: Identifying Outliers in the Dataset
You can use a boxplot to identify potential outliers in the price distribution.
Code Example:

plt.figure(figsize=(8, 6))
sns.boxplot(y=flight_data['Price'])
plt.title('Boxplot of Flight Prices')
plt.ylabel('Price')
plt.show()
Outliers in a boxplot are usually the points that are outside the whiskers of the plot (often 1.5 times the interquartile range). These outliers could be unusually high or low prices, which may affect the overall analysis.
•	Impact of Outliers: Outliers may skew statistical measures like the mean and can impact the results of machine learning models. Depending on the situation, you may need to handle these outliers by removing them or using transformations.
________________________________________

Q6: Identifying the Peak Travel Season
To identify the peak travel season, you would likely analyze features such as:
•	Date of Travel: Use the date to identify the season (e.g., summer, holiday periods).
•	Month of Travel: Group prices by month to identify price fluctuations by season.
You can analyze the number of flights and the price trends per month or season to identify when flights are more expensive or in high demand.
# Extract month and create a count plot
flight_data['Month'] = pd.to_datetime(flight_data['Date_of_Travel']).dt.month

plt.figure(figsize=(10, 6))
sns.countplot(x='Month', data=flight_data)
plt.title('Number of Flights by Month')
plt.xlabel('Month')
plt.ylabel('Number of Flights')
plt.show()
Presenting to Boss: You can create visualizations such as bar charts and line graphs to present peak months and how prices vary during those periods.
________________________________________
Q7: Identifying Trends in Flight Prices
To identify trends in flight prices, you would analyze features such as:
•	Date of Booking: See if earlier bookings lead to lower prices.
•	Date of Travel: Prices may increase during weekends or holidays.
•	Airline: See if certain airlines tend to have consistently lower or higher prices.
Visualizations:
•	Line Plot: To show how prices vary over time.
•	Box Plot: To compare the distribution of prices by airline, class, or booking time.
Code Example:
# Line plot of average prices over time
flight_data['Booking_Date'] = pd.to_datetime(flight_data['Booking_Date'])

avg_prices = flight_data.groupby(flight_data['Booking_Date'].dt.to_period("M"))['Price'].mean()

plt.figure(figsize=(12, 6))
avg_prices.plot(kind='line')
plt.title('Average Flight Prices Over Time')
plt.xlabel('Date')
plt.ylabel('Average Price')
plt.show()
________________________________________
Q8: Factors Affecting Flight Prices
To identify factors affecting flight prices, you would focus on features such as:
1.	Airline: Some airlines may have higher or lower pricing strategies.
2.	Class of Travel: Economy vs. Business Class.
3.	Date of Travel: Prices tend to vary by season, holiday periods, etc.
4.	Flight Duration: Longer flights may have higher prices.
5.	Number of Stops: Direct flights vs. connecting flights.
6.	Advance Booking: Whether booking early reduces prices.
You can use correlation analysis or regression models to determine which factors are the strongest predictors of flight prices.
# Correlation matrix to identify factors affecting price
corr_matrix = flight_data.corr()

plt.figure(figsize=(10, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix of Flight Features')
plt.show()
Presenting to Management: You can use regression analysis results, correlation matrices, and visualizations (such as heatmaps) to demonstrate which factors most significantly influence flight prices.

Q9: Load the Google Playstore Dataset and Examine Its Dimensions
To load and examine the dimensions of the dataset:
Code Example:
import pandas as pd

# Load the dataset
playstore_data = pd.read_csv("google_playstore.csv")

# Check the dimensions
print(playstore_data.shape)
This will output the number of rows and columns in the dataset.
Q10: Rating Variation by Category
To visualize how ratings vary by app category, create a boxplot:
import matplotlib.pyplot as plt
import seaborn as sns

# Create a boxplot of ratings by category
plt.figure(figsize=(14, 8))
sns.boxplot(x='Category', y='Rating', data=playstore_data)
plt.xticks(rotation=90)
plt.title('App Ratings by Category')
plt.xlabel('Category')
plt.ylabel('Rating')
plt.show()
This boxplot will help visualize the distribution of app ratings across different categories.
Q11: Missing Values in the Dataset
To check for missing values:
Code Example:
# Check for missing values
missing_values = playstore_data.isnull().sum()
print(missing_values)

# Identify missing values
missing_data = missing_values[missing_values > 0]
print(missing_data)
Impact: Missing values can skew the analysis and impact the accuracy of your models. Depending on the extent, you might need to handle them by imputation or removal.
Q12: Relationship Between App Size and Rating
To visualize the relationship between app size and rating, create a scatter plot:
Code Example:
python
# Create a scatter plot of size vs. rating
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Size', y='Rating', data=playstore_data)
plt.title('Relationship Between App Size and Rating')
plt.xlabel('Size')
plt.ylabel('Rating')
plt.show()
Note: Ensure Size is properly cleaned and converted to a numerical format if it's not already.
Q13: Effect of App Type on Price
To compare average prices by app type, create a bar chart:
Code Example:
# Calculate average price by app type
avg_price_by_type = playstore_data.groupby('Type')['Price'].mean().reset_index()

# Create a bar chart
plt.figure(figsize=(10, 6))
sns.barplot(x='Type', y='Price', data=avg_price_by_type)
plt.title('Average Price by App Type')
plt.xlabel('App Type')
plt.ylabel('Average Price')
plt.show()
Q14: Top 10 Most Popular Apps
To identify the top 10 most popular apps by number of installs:
Code Example:
# Convert 'Installs' to numeric after removing commas and plus signs
playstore_data['Installs'] = playstore_data['Installs'].str.replace(',', '').str.replace('+', '').astype(float)

# Get top 10 most popular apps
top_10_apps = playstore_data[['App', 'Installs']].sort_values(by='Installs', ascending=False).head(10)
print(top_10_apps)
Q15: Analyzing Popular App Categories
To identify the most popular app categories:
Steps:
1.	Analyze Install Numbers: Group by category and aggregate total installs.
2.	Analyze Ratings: Check average ratings by category to identify high-quality apps.
Code Example:
# Total installs by category
install_by_category = playstore_data.groupby('Category')['Installs'].sum().reset_index()

# Average rating by category
rating_by_category = playstore_data.groupby('Category')['Rating'].mean().reset_index()

# Merge both to analyze popularity
category_analysis = pd.merge(install_by_category, rating_by_category, on='Category')

# Visualize total installs
plt.figure(figsize=(14, 8))
sns.barplot(x='Category', y='Installs', data=install_by_category.sort_values(by='Installs', ascending=False))
plt.xticks(rotation=90)
plt.title('Total Installs by Category')
plt.xlabel('Category')
plt.ylabel('Total Installs')
plt.show()

# Visualize average rating
plt.figure(figsize=(14, 8))
sns.barplot(x='Category', y='Rating', data=rating_by_category.sort_values(by='Rating', ascending=False))
plt.xticks(rotation=90)
plt.title('Average Rating by Category')
plt.xlabel('Category')
plt.ylabel('Average Rating')
plt.show()
Q16: Analyzing the Most Successful App Developers
To identify the most successful app developers using the Google Playstore dataset, consider the following features and data visualizations:
Features to Analyze:
1.	Number of Downloads/Installs: A higher number of installs often indicates a more successful app.
2.	Average Rating: Higher ratings usually correlate with more successful apps.
3.	Number of Apps Developed: A developer with multiple successful apps may be considered more successful.
4.	Price: Successful developers may price their apps strategically.
5.	Category: Analyze the categories the developer is involved in to determine their niche or specialty.
Data Visualizations:
1.	Bar Chart of Total Installs by Developer:
o	Show which developers have the highest total installs across all their apps.
Code Example:
# Group by developer and sum installs
installs_by_developer = playstore_data.groupby('Developer')['Installs'].sum().reset_index()

# Sort and plot
plt.figure(figsize=(14, 8))
sns.barplot(x='Installs', y='Developer', data=installs_by_developer.sort_values(by='Installs', ascending=False).head(10))
plt.title('Top 10 Developers by Total Installs')
plt.xlabel('Total Installs')
plt.ylabel('Developer')
plt.show()
2.	Boxplot of Ratings by Developer:
o	Compare the distribution of ratings for the top developers.
Code Example:
# Filter top developers
top_developers = installs_by_developer.sort_values(by='Installs', ascending=False).head(10)['Developer']
top_developer_data = playstore_data[playstore_data['Developer'].isin(top_developers)]

# Boxplot of ratings by developer
plt.figure(figsize=(14, 8))
sns.boxplot(x='Developer', y='Rating', data=top_developer_data)
plt.xticks(rotation=90)
plt.title('Ratings by Developer')
plt.xlabel('Developer')
plt.ylabel('Rating')
plt.show()
3.	Pie Chart of App Categories by Developer:
o	Show the distribution of app categories for each developer.
Code Example:

# Count number of apps per category for each developer
category_by_developer = playstore_data.groupby(['Developer', 'Category']).size().unstack().fillna(0)

# Plot pie charts for top developers
for developer in top_developers:
    plt.figure(figsize=(8, 8))
    category_distribution = category_by_developer.loc[developer]
    plt.pie(category_distribution, labels=category_distribution.index, autopct='%1.1f%%')
    plt.title(f'App Categories for {developer}')
    plt.show()
Q17: Identifying the Best Time to Launch a New App
To identify the best time to launch a new app using the Google Playstore dataset, analyze features related to the timing and performance of apps:
Features to Analyze:
1.	Release Date: Analyze if certain times of the year are associated with higher app success.
2.	Seasonality: Determine if there are seasonal trends in app installs and ratings.
3.	Category Trends: Identify if certain categories perform better during specific times.
Data Visualizations:
1.	Line Plot of Install Trends Over Time:
o	Show how the number of installs changes over time to identify peak periods.
Code Example:
# Assume 'Last Updated' column is available and converted to datetime
playstore_data['Last Updated'] = pd.to_datetime(playstore_data['Last Updated'])
playstore_data['Month'] = playstore_data['Last Updated'].dt.to_period('M')

# Group by month and sum installs
monthly_installs = playstore_data.groupby('Month')['Installs'].sum().reset_index()

# Line plot of installs over time
plt.figure(figsize=(14, 8))
sns.lineplot(x='Month', y='Installs', data=monthly_installs)
plt.title('Monthly Installs Trend')
plt.xlabel('Month')
plt.ylabel('Total Installs')
plt.xticks(rotation=45)
plt.show()
2.	Seasonal Heatmap:
o	Visualize seasonal variations in app performance.
Code Example:
# Extract month and year for heatmap
playstore_data['Month'] = playstore_data['Last Updated'].dt.month
playstore_data['Year'] = playstore_data['Last Updated'].dt.year

# Pivot table for heatmap
heatmap_data = playstore_data.pivot_table(index='Year', columns='Month', values='Installs', aggfunc='sum').fillna(0)

# Heatmap of installs by month and year
plt.figure(figsize=(12, 8))
sns.heatmap(heatmap_data, cmap='YlGnBu', annot=True)
plt.title('Seasonal Install Trends')
plt.xlabel('Month')
plt.ylabel('Year')
plt.show()
3.	Bar Chart of Average Ratings by Month:
o	Identify if there are better periods for app ratings.
Code Example:
# Group by month and calculate average rating
monthly_ratings = playstore_data.groupby('Month')['Rating'].mean().reset_index()

# Bar chart of average ratings by month
plt.figure(figsize=(10, 6))
sns.barplot(x='Month', y='Rating', data=monthly_ratings)
plt.title('Average Rating by Month')
plt.xlabel('Month')
plt.ylabel('Average Rating')
plt.show()



