# Analysis of Trends in International Tourism 

Let's begin by importing the necessary libraries.

In [1]:
import pandas as pd
import sqlite3
import plotly.express as px
import seaborn as sns
from matplotlib import pyplot as plt
import numpy
import os
import re

## Setup

We need to connect to our SQL database, `tourism`, so that we can access our data. We also need to ensure we can input the pertinent SQL commands.

In [2]:
connection = sqlite3.connect('./data/db/tourism.db')
cursor = connection.cursor()

Now that we've gotten that taken care of, we can begin finding trends.

Let's start by seeing which regions contain the most popular destination countries and which regions have the most citizens that are international tourists.

In [None]:
query = '''
    SELECT
        c.region AS 'Region',
        FLOOR(AVG(a.arrivals)) AS 'Average_Yearly_Arrivals',
        FLOOR(AVG(d.departures)) AS 'Average_Yearly_Departures'
    FROM arrivals a
    JOIN country c ON c.country_code = a.country_code
    JOIN departures d ON d.country_code = a.country_code
    GROUP BY c.region; 
    '''

df_regions = pd.read_sql(query, connection)

df_regions = df_regions.astype({'Average_Yearly_Arrivals': 'int', 'Average_Yearly_Departures': 'int'})

df_regions

In [None]:
region_arrivals = sns.barplot(data=df_regions, y='Region', x='Average_Yearly_Arrivals', hue='Region')
region_arrivals.set(title='Average Annual International Tourist Arrivals by Region', xlabel='Average Arrivals (in Millions)', ylabel='')

plt.show()

The above graph shows that the region with the most inbound tourists is by far North America. The next most popular region is Europe & Central Asia, but North America leads by a factor of about three.

The least popular region is Sub-Saharan Africa, followed by South Asia.

In [None]:
region_departures = sns.barplot(data=df_regions, x='Average_Yearly_Departures', y='Region', hue='Region')
region_departures.set(title='Average Annual International Tourist Departures by Region', xlabel='Average Departures (in Millions)', ylabel='')

plt.show()

Based on the above graph, we can surmise that the trend identified with the average annual arrivals by region holds true here as well. One notable difference is that there are hardly any residents of Sub-Saharan Africa that travel abroad.

Let's increase our resolution to a country level.

In [9]:
query = '''
    SELECT
        a.country_code AS 'country_code',
        c.country AS 'country',
        FLOOR(AVG(a.arrivals)) AS 'average_arrivals',
        FLOOR(AVG(d.departures)) AS 'average_departures'
    FROM arrivals a
    JOIN country c ON c.country_code = a.country_code
    JOIN departures d ON d.country_code = a.country_code
    WHERE 'average_arrivals' IS NOT NULL
    AND 'average_departures' IS NOT NULL
    GROUP BY country; 
    '''

df_countries = pd.read_sql(query, connection)

#df_countries = df_countries.astype({'average_arrivals': 'int', 'average_departures': 'int'})

df_countries

Unnamed: 0,country_code,country,average_arrivals,average_departures
0,AFG,Afghanistan,,
1,ALB,Albania,2094769.0,3424800.0
2,DZA,Algeria,1569307.0,2099461.0
3,ASM,American Samoa,25439.0,
4,AND,Andorra,9276318.0,
...,...,...,...,...
212,VIR,Virgin Islands (U.S.),2600269.0,
213,PSE,West Bank and Gaza,318040.0,
214,YEM,"Yemen, Rep.",1071833.0,
215,ZMB,Zambia,692846.0,


In [10]:
fig = px.choropleth(df_countries, locations='country_code', color='average_arrivals', hover_name='country', title='Average Annual Arrivals by Country')
fig.show()