<img src="images\Logo_UCLL_ENG_RGB.png" style="background-color:white;" />

# Data Analytics & Machine learning

Lecturers: Aimée Lynn Backiel, Kenric Borgelioen, Sofie Torfs, Lies Bollens

Academic year 2025-2026

## Lab 4: Data visualization

### Lecture outline

1. Recap last week
2. Data visualization with Matplotlib, Seaborn and Plotly 

### Part 1: recap of last lecture(s)

#### Lab 1

1. We ensured we had a valid Python installation.
2. We learnt what a virtual environment is:
   * Isolated Python executable and packages.
   * We created a virtual environment.
3. Absolute path vs relative path recap.
4. Recap of data structures in Python

#### Lab 2
1. Installed Pandas
2. Learnt how to read data
3. Learnt how to calculate mean, mode, median etc.
4. Basic exploration of the 4 variables

#### Lab 3
1. Wrapped up computing summary statistics (mean, median, mode, ...)
2. Learnt how to deal with outliers 
3. Focused on exploration of data



### The case

Ada Turing Travelogue, or as everyone calls her, Ada just started working part time at her parents travel agency. She has a keen understanding and interest of everything related to applied computer science ranging from server & system management to full stack software development. Through database foundations she already understands how to query data and programming 1 and 2 covered the essentials about the Python programming language. Recently she has just decided to start learning about data analytics & machine learning as well.

She uses her skills to connect to the travel agency's database where she finds many, normalized, tables. Ada recalls what she learnt in database foundations and performs all the correct joins. Afterwards she saves the data in the `data/` folder.


She finds the following dataset:

| Column Name          | Description                                                                                       |
| -------------------- | ------------------------------------------------------------------------------------------------- |
| SalesID              | Unique identifier for each sale.                                                                  |
| Age                  | Age of the traveler.                                                                              |
| Country              | Country of origin of the traveler.                                                                |
| Membership_Status    | Membership level of the traveler in the booking system; could be 'standard', 'silver', or 'gold'. |
| Previous_Purchases   | Number of previous bookings made by the traveler.                                                 |
| Destination          | Travel destination chosen by the traveler.                                                        |
| Stay_length          | Duration of stay at the destination.                                                              |
| Guests               | Number of guests traveling (including the primary traveler).                                             |
| Travel_month         | Month in which the travel is scheduled.                                                           |
| Months_before_travel | Number of months prior to travel that the booking was made.                                       |
| Earlybird_discount   | Boolean flag indicating whether the traveler received an early bird discount.                     |
| Package_Type         | Type of travel package chosen by the traveler.                                                    |
| Cost                 | Calculated cost of the travel package.                                                            |
| Margin | The cost (for the traveler) - what the travel agency pays. |
 | Additional_Services_Cost| The amount of additional services (towels, car rentals, room service, ...) that was bought during the trip. |


### Helping Ada explore the dataset

The main goal for the remainder of this lab is to explore the data. We will specifically take five columns:

* Cost
* Age
* Stay length
* Destination
* Country

Our goal is to find interesting relationships between them. Last lab, we have done some analyses using pandas, by calculating numerical values. This lab, we will visualize these relations and compare the clarity of numbers vs plots. 

As was covered in the book and lecture there are to main data types in analytics: categorical and continuous data. This is a crucial first step in your analysis because it determines what methods make sense on your data.


**The goal is primarily to find out what influences the cost of the stay.**

### Part 2: Introduction to plotting with Matplotlib, Seaborn and Plotly

We have helped Ada so far to gain insights into her data by wrangling it into shape and making tables to summarize data. Now, to further enhance our understanding and visualize the patterns, trends, and potential anomalies, we will be plotting the data. 

Making data visual simplifies complex datasets and also makes it more intuitive for stakeholders to grasp key takeaways. By transitioning from tabular summaries to graphical plots, we can also communicate more effectively.

#### How and why would you visualize data? From Exploration to Presentation

When dealing with data, there's a journey we often embark on:

1. **Exploratory Phase (Exploratory Data Analysis)**: 
   - **What it is**: At this stage, we're diving into the data, much like a researcher in a lab. We're experimenting, testing hypotheses, and analyzing patterns. It's a phase of discovery, and sometimes it gets technical and intricate.
   - **Who it's for**: Primarily for those who are closely working with the data, like data scientists and analysts. It's about understanding, not necessarily communicating.
   - **Visuals**: Charts and graphs here can be detailed and complex because they serve as tools for understanding.

2. **Presentation Phase (Explanatory Data Visualization)**: 
   - **What it is**: Once we've gained insights from our exploration, it's time to share them. Think of this as distilling our findings into a clear message or story.
   - **Who it's for**: A wider audience, which could include stakeholders, team members, or anyone who needs to understand the main takeaways.
   - **Visuals**: Here, simplicity and clarity take the front seat. The visuals are designed to be understood quickly and should align with the principles of clarity, simplicity, accuracy, and relevance.

In short, the process starts with a deep dive into the data for insights and ends with a clear, concise presentation of those insights for everyone to grasp.

In the case of explanatory data analysis the following four principles hold:

1. **Clarity**: The main goal is to convey insights in a way that's immediately understandable. If you have to spend too much time figuring out what a chart is saying, then it's not doing its job.

2. **Simplicity**: The most effective visuals are often the simplest. It's about getting the message across, not showing off fancy graphics.

3. **Accuracy**: It's vital to represent data truthfully. Misleading visuals not only harm credibility but can also lead to wrong conclusions.

4. **Relevance**: Every piece of information on your visual should serve a purpose. If it doesn't help convey the main insight, consider removing it.

With these principles in mind, our aim in this lab is to create visuals that stand on their own and communicate effectively. We want every chart or graph to be so intuitive that it needs little to no explanation.

#### Reading and exploring data

As always, we start by loading our dataset. We start with the dataset that we ended with last lab: all the faulty costs are already removed. 

In [None]:
import pandas as pd # by convention
pd.options.display.float_format = '{:.2f}'.format

In [None]:
travel_dataset = pd.read_csv("data/lab_4_dataset.csv")

### Introduction to plotting with Matplotlib, Seaborn and Plotly

### 1) Matplotlib

<center>
<img src="https://matplotlib.org/stable/_images/sphx_glr_logos2_003.png" style="background-color:white">
</center>

The name matplotlib comes from matrix plotting library. It's a descendant from the MATLAB programming language. It's by now an older library (2003) that has some quirks, but it is still important to know the basics of Matplotlib since other Python plotting libraries build on top of it. We will start by showing some basic plotting skills in matplotlib and then move onto more sophisticated libraries, which you will probably prefer to use. 

In [None]:
# uncomment to install
# %pip install matplotlib
# %pip install seaborn 
# %pip install plotly 

In [None]:
import matplotlib.pyplot as plt # convention
import numpy as np

#### Plotting univariate data


The table below is a summary of the different types of plots for **numeric data**.

| Plot Type          | Description                                           | When to Use                                                      |
|--------------------|-------------------------------------------------------|------------------------------------------------------------------|
| **Histogram**      | Displays the distribution of a single continuous variable by dividing the data into bins and showing the frequency of observations in each bin. | To visualize the distribution of a variable, especially to identify its central tendency (mean), spread (standard deviation), and skewness (are low or high values more common).  |
| **Box Plot (or Whisker Plot)** | Shows the distribution of a variable using quartiles and displays potential outliers. | To get a summary of a variable's distribution in terms of its median, quartiles, and possible outliers. Useful when comparing the distribution across categories. |
| **Density Plot (or Kernel Density Plot)** | Provides a smoothed version of a histogram. | To visualize the distribution of a variable in a continuous manner. Particularly useful when comparing the distributions of multiple variables on the same plot. |
| **Violin Plot**    | Combines aspects of box plots and density plots.       | To visualize both the distribution and summary statistics of a variable. Especially useful when comparing across different categories. |


#### 1.1 ) Basic plotting in matplotlib: some quick examples. 


The syntax for plotting is generally `plt.<plotType>(x, y)`. 


In [None]:
plt.boxplot(travel_dataset["age"]); # Matplotlib prints things while plotting, the semicolon an suppress it.

A boxplot provides a comprehensive view of a dataset's distribution, offering more detailed insights than typical tables. The central line within the box represents the median, splitting the data into its lower and upper halves. The box itself is framed by two lines: the lower boundary represents the 25th percentile (or Q1), meaning 25% of the data lies below this value, and the upper boundary denotes the 75th percentile (or Q3), indicating that 75% of the data is below this point.

The range between Q3 and Q1 is known as the Interquartile Range (IQR). Beyond the box, the plot extends 'whiskers'. Their "distance" is calculated as `1.5 * IQR` both above and below the box, providing a range for typical data points. Any data outside these whiskers can be considered outliers.




Depending on the data, it might be more intuitive and explanatory to use a histogram instead of a boxplot. A histogram splits the data up into different bins, and then counts how many data points belong to each bin. When it is not immediately obvious which of the two plot types you prefer or is the best, you can always plot both. As usual, you do not have to only use one method. 

Below, we plot a histogram for the variable `age` and `cost`

In [None]:
plt.hist(travel_dataset["age"], bins=50);

In [None]:
plt.hist(travel_dataset["cost"], bins=100);

For categorical data, categories can often serve as a basis for comparison in other plots, like boxplots. This means you can use a single category to differentiate data within such plots. You can also produce the same type of plot multiple times, once for each category, to analyze patterns within individual categories.

If you want to just look at a categoric variable, you can use a bar plot, as shown below. This will give you very similar information to using the `value_counts()` function in pandas that we saw last lab.  


| Plot Type     | Description                                          | When to Use                                         |
|---------------|------------------------------------------------------|-----------------------------------------------------|
| **Count Plot**| Represents the frequency or count of each category.  | To see how often each category appears in the data. |

In [None]:
country_counts = travel_dataset["country"].value_counts()
plt.bar(x=country_counts.index, height=country_counts);


##### ❓ How do the plots so far score on our 4 criteria? Clarity, Simplicity, Accuracy and Relevance:

1. **Clarity**: The main goal is to convey insights in a way that's immediately understandable. If you have to spend too much time figuring out what a chart is saying, then it's not doing its job.

2. **Simplicity**: The most effective visuals are often the simplest. It's about getting the message across, not showing off fancy graphics.

3. **Accuracy**: It's vital to represent data truthfully. Misleading visuals not only harm credibility but can also lead to wrong conclusions.

4. **Relevance**: Every piece of information on your visual should serve a purpose. If it doesn't help convey the main insight, consider removing it.


YOUR ANSWER HERE 

#### 1.2) Customizing our graphs 

In order to use matplotlib properly it's important to understand how the library expects you to use it. Many of the qualms people have of matplotlib are due to fighting against it's API. To understand how the library wants to be used it is always a good idea to consult <a href="https://matplotlib.org/stable/users/explain/quick_start.html">the documentation</a>.

<center>
<img src="https://matplotlib.org/stable/_images/anatomy.png" style="background-color:white;width:50%">
</center>


The image below explains the key idea of Matplotlib. Plots are made on a `Figure` object. A `Figure` object can contain multiple `Axes`. `Axes` are the things you are plotting on. There are more elements than this, but for the scope of this course this is enough to get started. 

Instead of using `plt.<plotType>` as we have done in the past, we will explicitly make a `Figure` and plot on the axis. This allows us to configure the `Figure`.

The figure above shows many of the customization options you have.

In [None]:
fig, ax = plt.subplots() # Notice the S, subplots not subplot. 
ax.boxplot(travel_dataset["cost"])
ax.set_title("Distribution of the cost of travel")
ax.set_ylabel("Cost (in €)")
ax.set_xlabel("")
ax.set_xticklabels("");

We can also have multiple axes in a figure. We can do that by specifying the rows and columns in `plt.subplots()`.

Notice the difference between the left and the right plot. Which one of the two would you prefer to interpret? 

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=2)
fig.suptitle("Before and after of our plot")
ax[0].boxplot(travel_dataset["cost"])
ax[0].set_title("Distribution of the cost of travel")
ax[0].set_ylabel("Cost (in €)")
ax[0].set_xlabel("")
ax[0].set_xticklabels("");

ax[1].boxplot(travel_dataset["cost"]);

Notice how adding `nrows` and `ncols` turns our `Axes` into a (Numpy) array. To plot we simply index the right one and plot. The next step we need to do is make our plot larger. That is done by using the `figsize` argument in `plt.subplots`. 

In [None]:
fig, ax = plt.subplots(1,2, figsize=(12, 6))
fig.suptitle("Before and after of our plot")
ax[0].boxplot(travel_dataset["cost"])
ax[0].set_title("Distribution of the cost of travel")
ax[0].set_ylabel("Cost (in €)")
ax[0].set_xlabel("")
ax[0].set_xticklabels("");
ax[1].boxplot(travel_dataset["cost"]);

##### ❓ Make a figure that consists of two axes with a histogram and a boxplot for our Cost. Ensure the plot takes into account clarity. Play around with the bins argument of `plt.hist` to find a good value for the amount of bins 

In [None]:
#YOUR CODE HERE


##### ❓ Make a figure that will help you to directly compare the cost per destination. Are histograms or boxplots better for this purpose? 


In [None]:
#YOUR CODE HERE

YOUR ANSWER HERE

#### 1.3) Plotting directly from Pandas

There is an additional way to make that you can use if you do not want to interact directly with Matplotlib. Selecting data with Pandas and using `.plot()` will make a similar plot with less effort. It's still important to know the fundamentals of Matplotlib because it still uses it under the hood and customization may still require using the subplots API. We will show you two styles of interacting with it.

In [None]:
# here, we just directly plot from our pandas dataframe. Note that it is more difficult to customize the plot this way, 
# but it is faster to do simple plots. However, the clarity of the plot is not always optimal (for example, there is no title here )
travel_dataset["country"].value_counts().plot(kind='bar')

In [None]:
# Here, we first create the figure and axes with plt.subplots and then we pass the axes to pandas to plot on it. 
# This way, we can customize the plot more easily. The clarity of the plot is also better this way.
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
fig.suptitle("Departure locations and destinations")
ax[0].set_title("Country of origin")
ax[0].set_ylabel("Frequency")
travel_dataset["country"].value_counts().plot(kind="bar", ax=ax[0])
travel_dataset["destination"].value_counts().plot(kind="bar", title="Destination", ax=ax[1], ylabel="Frequency");


### 2) Bivariate and multivariate plotting with Seaborn

<center>
<img src="https://seaborn.pydata.org/_images/logo-wide-lightbg.svg" style="background-color:white;width:50%">
</center>




Seaborn is the next plotting library we will use. It should be your go-to in the Python, especially if you're using Pandas. It was specifically built with the Pandas library and data analysis in mind. It makes doing certain plots a lot easier and has a number of sensible defaults (e.g., colors) that are better than Matplotlib. At the end of the day, it is fully built on top of Matplotlib so everything we have learn about customization still holds. We will continue our analysis using Seaborn and see how it can aid us in specifically bivariate and multivariate analysis.

In [None]:
import seaborn as sns

Last lecture we looked at a number of relationships we found interesting:

* Cost and Destination
* Cost and Country
* Cost and Destination
* Age and Destination
* Country and Destination

#### 2.1) Plots for visualizing bivariate numeric data

| Plot Type          | Description                                           | When to Use                                                      |
|--------------------|-------------------------------------------------------|------------------------------------------------------------------|
| **Scatter Plot**   | Displays values for two variables for a set of data using dots. | To identify relationships or correlations between two numeric variables. |
| **Hexbin Plot**    | Groups points into hexagonal bins and colors them based on the count of points in each bin. | When there's a large amount of data that may overlap in a scatter plot. Useful for visualizing density and relationships between two numeric variables. |
| **Line Plot**      | Connects data points with lines. Typically used for **time series data**. | To visualize trends over time or the relationship between two numeric variables when there's an ordering to the data points. Do not use this if there's no possible observations between the lines. |
| **Joint Plot**     | Combines scatter plots with histograms for each variable. | To view the relationship between two numeric variables and their individual distributions simultaneously. |



In [None]:
# Scatterplot 
fig, ax = plt.subplots(figsize=(12, 6))
ax.set_title("Age versus cost of travel")
sns.scatterplot(travel_dataset, x="age", y="cost", ax=ax);



A a glance we can see that Seaborn comes with a number of sensible defaults:

1. The x and y-axis are labeled
2. Generally the colors look a bit easier on the eyes than matplotlib.

It also has plot types that do not exist natively in Matplotlib such as `sns.countplot` and `sns.jointplot`.

In [None]:
import matplotlib.ticker as ticker

Jointplot: combined scatterplot and histograms

Notice how Seaborn sometimes returns objects that wrap Matplotlib `Figure`s and `Axes`. 

It's a higher level interface to Matplotlib so we still need to dip into the lower level interface to do certain things.

❓ You can play with the `kind` variable (try out `hex` or `hist` or `scatter` and observe the differences). As always, if you want to know more about the different options, the offical documentation page provides very good examples. In this case: https://seaborn.pydata.org/generated/seaborn.jointplot.html

❓Another fun parameter is `hue`, which automatically colors the data points according to the row you give it. for example. try out `hue=membership_status`. At this point it wil probably make your plot too cluttered, as we are plotting the whole dataset, and already have a lot of information on our plot, but it will be a very useful parameter. We will come back on this later in the lab

In [None]:


g = sns.jointplot(travel_dataset, x="age", y="cost", kind='hex'); 
g.figure.subplots_adjust(top=0.9) # Add some spacing 
g.figure.suptitle("Relationship between age and cost")
g.ax_joint.yaxis.set_major_locator(ticker.MultipleLocator(5000)) # This is how to increase the tick frequency 

##### ❓ The last plot is a combination of a hexgrid and a histogram. How well does it score in terms of our 4 criteria:


1. **Clarity**: The main goal is to convey insights in a way that's immediately understandable. If you have to spend too much time figuring out what a chart is saying, then it's not doing its job.

2. **Simplicity**: The most effective visuals are often the simplest. It's about getting the message across, not showing off fancy graphics.

3. **Accuracy**: It's vital to represent data truthfully. Misleading visuals not only harm credibility but can also lead to wrong conclusions.

4. **Relevance**: Every piece of information on your visual should serve a purpose. If it doesn't help convey the main insight, consider removing it.

YOUR ANSWER HERE 

##### ❓ Make a figure that will help you to directly compare the cost per destination. Are histograms or boxplots better for this purpose? Normally, creating this plot using seaborn should be easier and less work than using matplotlib. 


In [None]:
#YOUR CODE HERE


YOUR OBSERVATIONS HERE

##### ❓The scatterplot of age vs cost had a lot of variable on the x-axis. A way to make to plot more clear could be to bin the ages in age groups and then plot a boxplot for the cost for each age group. Create such a plot and compare it with the scatterplot. Which one would you choose and why? (Would you choose the same plot for data exploration as for showing it in a presenation?)

In [None]:
#YOUR CODE HERE

#### 2.2) Categorical - Numeric Data: small multiples and colour coding


**Small multiples**

Imagine you have a lot of similar data points from different categories, and you want to compare them side by side. This is where the magic of small multiples comes into play.

**What Are Small Multiples?**
Small multiples are a series of similar graphs or charts using the same scale and axes, allowing them to be easily compared. They divide data by categories and present each category in its own panel within a larger visualization.

**Why Use Them?**

1. **Consistency**: Because each graph uses the same scale and axes, it's easy to compare data across categories directly.
2. **Clarity**: By separating data into individual panels, the viewer can clearly see patterns or trends within each category without them being obscured by other data.
3. **Efficiency**: Instead of toggling between different views or using interactive tools to sift through data, viewers get a simultaneous snapshot of all categories at once.

**Remember**: 
The general idea behind small multiples is consistency in presentation but separation in data. By creating one graph per category, you're allowing your audience to quickly and efficiently draw insights from a collective set of data points, making your presentation both comprehensive and comprehensible.

In [None]:
g = sns.displot(travel_dataset, x="cost", col="package_Type")
g.figure.suptitle("Distribution of the cost per package type")
g.figure.subplots_adjust(top=0.85)

**Color-Coding**

Sometimes, when we have multiple categories of data, it can be effective to display them all on one graph, using color as a key differentiator. 

**Why Use Color-Coding?**

1. **Unified View**: One of the main advantages of using color is that it presents all the categories in a single, unified plot. This can provide a holistic view and show interactions or overlaps between categories.
 
2. **Space Efficiency**: Instead of dividing your canvas into multiple sections, color-coding allows you to utilize the entire space for a singular, impactful visualization.

3. **Quick Comparisons**: With the right color choices, the eye can quickly discern between categories and compare their relative positions or values.

**When is Color-Coding Useful?**

1. **Limited Categories**: Color-coding works best when the number of categories is limited. If there are too many categories, the plot can become cluttered and colors hard to distinguish.

2. **Overlapping Data Points**: If you're interested in seeing where data points from different categories intersect or overlap, using color on a single plot can be very effective.

3. **Emphasis on Relationships**: When the relationship or interaction between categories is more important than individual category trends, a color-coded unified plot can be invaluable.


**Comparison with Small Multiples**: 
While small multiples separate data by categories into individual panels for clarity, color-coded plots display all data on one graph for a unified perspective. The choice between them often depends on the specific goals of the visualization and the nature of the data.

In [None]:
fig, ax = plt.subplots(figsize=(12,6))
ax.set_title("Distribution of cost per package type")
sns.histplot(travel_dataset, x="cost", hue="package_Type",ax=ax)

##### ❓Look back at the other plots we already made with seaborn this course and try to add different color-schemes. Adding color-coding happens by adding a parameter `hue=column_name` to our plots. Is color coding on `membership_status` useful, or on `country` or on `destination` or on any other categorical variable? In which plots we already made does it add value and in which plots does it just clutter the plot? Try out different values on different plots and observe how it changes the plots 

In [None]:
#YOUR CODE HERE



#### 2.3) Categorical - Categorical Data

| Plot Type          | Description                                           | When to Use                                                      |
|--------------------|-------------------------------------------------------|------------------------------------------------------------------|
| **Contingency Table (or Cross Tabulation)** | Shows the frequency of combinations of categories. | To summarize the relationship between two categorical variables in tabular form. |


In [None]:
fig, ax = plt.subplots(figsize=(8,8))
ax.set_title("Counts for countries and destinations")
sns.histplot(data=travel_dataset, x="country", y="destination", legend=True, ax=ax)

#### 2.4) Multivariate analysis 

**Multivariate Plots: Merging Small Multiples with Color-Coding**

**The combination's Strength**:
- **Depth & Breadth**: By pairing small multiples with color-coding, you can showcase multiple data dimensions at once.
- **Rich Insights**: Great for spotting intricate patterns and trends.
  
**Challenges**:
- **Information Overload**: Can be overwhelming due to the sheer amount of data presented.
- **Clarity & Legibility**: Risk of visual clutter. Keep designs clean and distinct.

**Best Use Cases**:
- **In-depth Exploration**: Ideal for deep data dives.
- **Specialized Audiences**: Suited for experts familiar with the data or domain.
  
**Conclusion**: 
Combining small multiples with color-coding offers a detailed data view, but it's essential to consider the audience. While powerful for **exploration**, it may be too dense for general presentations (explanation).


For example, we could try to explore the relationship between cost and age and see if there is difference depending on the country and the package type. The plot below will give us many variables all at once. In our exploration phase as data scientists, we can use this plot, but don't show it on a final presentation, because for outsiders, it will just be overwhelming. 

In [None]:

plot = sns.relplot(travel_dataset, x="age", y="cost", col="country", col_wrap=4, hue="package_Type");
plot.figure.suptitle("Relationship between cost and age by country and package type");
plot.figure.subplots_adjust(top=0.9) # Ensure 10 % of the space is left for the title

The true power of seaborn lies within the `hue` and `color` arguments. We can use that to make powerful graphs, as we already saw before


##### ❓Create a plot where you show the costs of the additional services costs versus the real cost and color it by the type of package. What do you observe? 


In [None]:
# YOUR ANSWER HERE

#### 3) Plotly 

💻📊💡 We'll be using a third data visualization library called Plotly. We will not explicitly cover the syntax but we encourage you to explore it: https://plotly.com/python/plotly-express/ and use it as the graphs are interactive.

Below, we have added some examples of how you can create plots using plotly and create similar graphs to the ones we have already made using seaborn. 

 ❓ For all of the graphs: figure out 
 1) What is being plotted and how it is being plotted
 2) What are the arguments you can give to plotly and can you play around with them to create different plots?

 CREATE YOUR OWN MARKDOWN CELL(S) WITH YOUR ANSWERS. You can choose if you want to explain each graph seperatly our discusds multiple together.

 We encourage you to play around with both seaborn and plotly. Which one of these libraries you use, is up to you. Depending on what you want to plot, you will probably prefer one over the other. Or, you will find out after a while that you are mostly working with one of the two libraries. That is all fine, the idea is that you use whatever library and plot that will enable you to visualize the data in the way you want to. 

In [None]:
import plotly.express as px

px.imshow(travel_dataset[["cost","age", "stay_length", "months_before_travel", "guests"]].corr().round(3), title="correlation heatmap", width=750)


In [None]:
px.histogram(travel_dataset, x="country", y="cost", histfunc="avg", title="Average cost per country")

In [None]:
px.histogram(travel_dataset, x="destination", y="cost", histfunc="avg", title="Average cost per destination")

In [None]:
px.histogram(travel_dataset, x="country", y="cost", histfunc="avg", title="Average cost per country, by destination (1)", color="destination", barmode="group")

In [None]:
px.histogram(travel_dataset, x="country", y="cost", histfunc="avg", title="Average cost per country, by destination (2)", facet_col="destination")

In [None]:
px.histogram(travel_dataset, x="package_Type", title="Distribution of the package")

In [None]:
px.histogram(travel_dataset, x="package_Type", y = "cost", histfunc="avg", color="destination", barmode="group", title="Distribution of the package")

In [None]:
px.scatter(travel_dataset, x="age", y="cost")