# __Data Visualization__

## __Agenda__

In this lesson, we will cover the following concepts with the help of examples:
- Introduction
- Introduction to Matplotlib
- Line plot
- Scatter Plot
- Bar Chart
- Box Plot
- Radar Chart (Spider chart)
- Area Plot
- Polar Plot
- Tree Map
- Pie Chart
- Matplotlib for 3D Visualization

## __1. Introduction__
Data visualization is the graphical representation of data to reveal patterns, trends, and insights that might not be easily apparent from raw data alone. 
- It involves creating visual elements such as charts, graphs, and maps to communicate complex information in an understandable and interpretable form.

![image.png](attachment:6a417041-2a79-4f1d-9e16-eb4981248f2e.png)

Data visualization tools and libraries, such as Matplotlib, Seaborn, and Plotly, empower analysts, scientists, and business professionals to create compelling visualizations that enhance the understanding of data and support evidence-based decision-making.

## __2. Introduction to Matplotlib__

Matplotlib is a comprehensive and widely-used data visualization library in Python. It provides a flexible platform for creating static, animated, and interactive visualizations in Python. 

- Matplotlib can plot a variety of graphs, which helps in drawing: 


![image.png](attachment:30e4939b-22b0-4aa8-8ea9-504bc2c4cb91.png)

## __3. Line Plot__
A line plot is a basic type of chart that displays information as a series of data points connected by straight line segments. It is particularly useful for showing trends over a continuous interval or time. 

In [None]:
import matplotlib.pyplot as plt
import numpy as np

x = [1,2,3,5,8]
y = [5,8,2,6,1]

plt.plot(x,y)
plt.show()

In [None]:
# Generate data for a line plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x,y)
plt.show()

In [None]:
# Generate data for a line plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
z = np.cos(x)
plt.plot(x,y, linestyle='-', color ="red", label = "This is sine func" )
plt.plot(x,z,linestyle='-.',color = 'g', label = "This is cosine func" )  # rgbcmyk
plt.xlim(-2,15)
plt.ylim(-2,1.5)
plt.xlabel("X axis data")
plt.ylabel("Y axis data")
plt.title("Welcome to the Matplotlib")
plt.legend()
plt.show()

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Generate data for a line plot
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create a line plot
plt.plot(x, y, label='Sine wave', color='blue', linestyle='-', linewidth=2)

# Add labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line plot: Sine wave')

# Add a legend
plt.legend()

# Show the plot
plt.show()


## __4. Scatter Plot__
A scatter plot is used to display the relationship between two continuous variables. 
- Each point on the plot represents an observation, and the position of the point is determined by the values of the two variables. 
- It is helpful for identifying patterns, clusters, or trends in the data.

In [None]:
x = np.arange(1,10)
y = np.sin(x)
plt.plot(x,y,'v')
plt.show()

In [None]:
x = np.arange(1,10)
y = np.sin(x)
plt.plot(x,y,'o', color='r')
plt.show()

In [None]:
plt.scatter(x,y,s=50, c = 'r', alpha=0.9)
plt.show()

In [None]:
import pandas as pd
np.random.seed(0)
x = np.random.randint(0,10,(5,2))
df  = pd.DataFrame(x, columns=['A','B'])
df

In [None]:
df['C'] = list("abcca")
df

In [None]:
plt.scatter(df['A'],y=df['B'])

In [None]:
plt.plot(df['A'],df['B'],'o')

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Generate data for a scatter plot
x = np.random.rand(100)
y = 2 * x + np.random.randn(100)

# Create a scatter plot
plt.scatter(x, y, color='green', marker='o', label='Random data')

# Add labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter plot: Random data')

# Add a legend
plt.legend()

# Show the plot
plt.show()


## __5. Bar Chart__
A bar chart is a common way to represent categorical data. 
- It uses rectangular bars of lengths proportional to the values they represent. 
- This chart is useful for comparing the sizes or frequencies of different categories.

In [None]:
df['C'].value_counts()

In [None]:
plt.bar(df['C'].value_counts().index, df['C'].value_counts().values)

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Generate data for a bar chart
categories = ['Category A', 'Category B', 'Category C']
values = [25, 40, 15]

# Create a bar chart
plt.bar(categories, values, color='orange', edgecolor='black', label='Bar chart')

# Add labels and title
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Bar chart: Category comparison')

# Add a legend
plt.legend()

# Show the plot
plt.show()


In [None]:
# Bar plot --> To display the frequency of occuring of the categical data

## __6. Box Plot:__
A box plot (box-and-whisker plot) provides a graphical summary of the distribution of a dataset. 
- It displays the minimum, first quartile, median, third quartile, and maximum values. 
- This plot is useful for identifying outliers and understanding the spread and central tendency of the data.

In [None]:
df

In [None]:
plt.boxplot(df['B'])
plt.show()

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Generate data for a box plot
data = [np.random.normal(0, 1, 100), np.random.normal(0, 1.5, 100), np.random.normal(0, 2, 100)]

# Create a box plot
plt.boxplot(data, labels=['Group 1', 'Group 2', 'Group 3'], patch_artist=True, notch=True)

# Add labels and title
plt.xlabel('Groups')
plt.ylabel('Values')
plt.title('Box plot: Group comparison')

# Show the plot
plt.show()


## __7. Radar Chart (Spider Chart):__ 
A radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart with three or more quantitative variables represented on axes starting from the same point.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Generate random data for a radar chart
categories = ['Category A', 'Category B', 'Category C', 'Category D', 'Category E']
data = np.random.randint(1, 10, len(categories))

# Create a radar chart
angles = np.linspace(0, 2 * np.pi, len(categories), endpoint=False)
data = np.concatenate((data, [data[0]]))
angles = np.concatenate((angles, [angles[0]]))
plt.polar(angles, data, marker='o')

# Add labels
plt.thetagrids(np.degrees(angles[:-1]), labels=categories)

# Show the plot
plt.title('Radar chart: Random data')
plt.show()

## __8. Area Plot:__
An area plot is used to represent the cumulative sum of quantities over a continuous interval. 
- It is effective for illustrating the contribution of each variable to the total.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Generate data for an area plot
x = np.linspace(0, 5, 100)
y1 = x
y2 = x**2

# Create an area plot
plt.fill_between(x, y1, y2, alpha=0.5, label='Area between curves')

# Add labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Area between two curves')

# Add a legend
plt.legend()

# Show the plot
plt.show()


## __9. Polar Plot__
A polar plot represents data in polar coordinates. 
- It is useful for visualizing cyclic patterns and relationships.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Generate data for a polar plot
theta = np.linspace(0, 2 * np.pi, 100)
r = theta * 2

# Create a polar plot
plt.polar(theta, r, label='Polar plot')

# Add a legend
plt.legend()

# Show the plot
plt.show()


## __10. Treemap__
A treemap is an effective way to visualize hierarchical data structures. 
- It provides an overview of the entire hierarchy while allowing users to easily compare the sizes of different branches and identify patterns within the data.

In [None]:
import matplotlib.pyplot as plt
import squarify

# Sample hierarchical data
data = {
    'Root': {
        'Branch1': {'Leaf1': 20, 'Leaf2': 30},
        'Branch2': {'Leaf3': 25, 'Leaf4': 15}
    }
}

# Function to flatten hierarchical data
def flatten_hierarchy(node, parent=''):
    items = []
    for key, value in node.items():
        if isinstance(value, dict):
            items.extend(flatten_hierarchy(value, parent=f'{parent}\n{key}' if parent else key))
        else:
            items.append((parent, key, value))
    return items

# Flatten the hierarchical data for treemap plotting
flat_data = flatten_hierarchy(data)

# Plotting the treemap
fig, ax = plt.subplots(figsize=(6, 6))
squarify.plot(sizes=[item[2] for item in flat_data], label=[f'{item[0]}\n{item[1]}' for item in flat_data], alpha=0.7, ax=ax)

# Remove axis labels and ticks for better aesthetics
plt.axis('off')

# Add a title
plt.title('Treemap: Sample hierarchy')

# Show the plot
plt.show()


## __11. Pie Chart__
A pie chart is a circular statistical graphic that is divided into slices to illustrate numerical proportions. 
- Each slice represents a proportionate part of the whole, and the entire circle represents 100%.
- These charts are commonly used to visualize the distribution of a categorical variable or the relative sizes of different components within a whole.

In [None]:
import matplotlib.pyplot as plt

# Sample data
values = [30, 40, 30]
labels = ['Category A', 'Category B', 'Category C']

# Create a pie chart
plt.pie(values, labels=labels, autopct='%1.1f%%', startangle=90)

# Add a title
plt.title('Pie chart example')

# Display the chart
plt.show()


## __12. Matplotlib for 3D Visualization__
Matplotlib for 3D Visualization refers to the extension of the Matplotlib library's capabilities to create three-dimensional plots and visualizations in Python. 
### __Key Features and Capabilities:__
- __Creation of 3D axes:__ Matplotlib enables the creation of a 3D axes object for visualizing data in three dimensions.
- __Scatter plots, line plots, and surfaces:__ Users can generate 3D scatter plots, line plots, and surfaces to depict relationships in three-dimensional space.
- __Customization and labeling:__ Matplotlib offers options for customizing the appearance and labeling of 3D plots, enhancing visual interpretability.
- __Integration with Matplotlib's ecosystem:__ The 3D visualization features seamlessly integrate with Matplotlib's broader ecosystem, facilitating combined visualizations with other plot types.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Read the CSV file into a Pandas DataFrame, parsing dates
df = pd.read_csv('ADANIPORTS.csv', parse_dates=True)

# Calculate the difference between High and Low prices
df['H-L'] = df.High - df.Low

# Calculate the 100-day Moving Average (100MA) of the Close prices
df['100MA'] = df['Close'].rolling(100).mean()

# Create a 3D scatter plot
ax = plt.axes(projection='3d')
ax.scatter(df.index, df['H-L'], df['100MA'])

# Set labels for each axis
ax.set_xlabel('Index')
ax.set_ylabel('H-L')
ax.set_zlabel('100MA')


# Display the 3D scatter plot
plt.show()


In [None]:
# Create a 3D axes object for plotting
ax = plt.axes(projection='3d')

# Scatter plot in 3D with index, 'H-L', and 'Volume'
ax.scatter(df.index, df['H-L'], df['Volume'])

# Set labels for the x, y, and z axes
ax.set_xlabel('Index')
ax.set_ylabel('H-L')
ax.set_zlabel('Volume')

# Display the 3D scatter plot
plt.show()


# __Assisted Practice__

## __Problem Statement:__
Analyze the housing dataset using various types of plots to gain insights into the data.

## __Steps to Perform:__
- Create a line plot to visualize the trend of house prices over the years
- Use a scatter plot to visualize the relationship between two numerical variables, such as __LotArea__ and __SalePrice__
- Create a bar chart to show the count of houses in each __Neighborhood__
- Use a box plot to visualize the distribution of __SalePrice__ in each __Neighborhood__
- Create a pie chart to visualize the proportion of houses that fall into each __MSZoning__ category
- Use a 3D scatter plot to visualize __LotArea__, __OverallQual__, and __SalePrice__ together

In [None]:
# Solution
df = pd.read_csv('HousePrices.csv')
df.head()

In [None]:
plt.plot(df['price'])

In [None]:
len(df)

In [None]:
plt.scatter(df['sqft_lot'],df['price'])
plt.xlabel("Lot area")
plt.ylabel("Price")
plt.show()

In [None]:
data_bar = df['city'].value_counts()
data_bar

In [None]:
plt.figure()

In [None]:
plt.figure(figsize=(60,40))
plt.bar(data_bar.index, data_bar.values, edgecolor='black')
plt.xticks(rotation=90)
plt.savefig('export.jpeg')
plt.show()

In [None]:
for i in set(df['city']):
    plt.boxplot(df[df['city']==i]['price'])
    plt.title(f"box Plot for {i} city")
    plt.show()

In [None]:
# Create a 3D axes object for plotting
ax = plt.axes(projection='3d')
ax.scatter(df['price'], df['sqft_living'], df['sqft_above'])

# Set labels for the x, y, and z axes
ax.set_xlabel('price')
ax.set_ylabel('sqft_living')
ax.set_zlabel('sqft_above')

# Display the 3D scatter plot
plt.show()


https://jakevdp.github.io/PythonDataScienceHandbook/04.08-multiple-subplots.html

In [None]:
for i in range(1,7):
    plt.subplot(2,3,i)
    plt.title(f"2,3,{i}")
    plt.plot(x,y)

In [None]:
fig, ax = plt.subplots(2, 3, sharex='col', sharey='row')

https://jakevdp.github.io/PythonDataScienceHandbook/04.01-simple-line-plots.html

In [None]:
ax[0,0].plot(x,y)
fig

In [None]:
ax[1,1].plot(x,y)
fig

In [None]:
%pip install squarify

In [None]:
import squarify

size = [500,200,300,100]
labels = ['A','B','C','D']

squarify.plot(sizes=size, label=labels, alpha = 0.7)
plt.show()

In [None]:
# Bar plot --> categorical data
# Histogram on Numerical data

In [None]:
plt.hist(df['price'])

In [None]:
df.hist()

In [None]:
df.plot(kind='line')

In [None]:
df.plot(kind='box')