# Exploratory analysis on reliance of tourism

Data source: https://databank.worldbank.org/source/world-development-indicators

Used indicators: 

- International tourism, number of arrivals (ST.INT.ARVL): International inbound tourists (overnight visitors) are the number of tourists who travel to a country other than that in which they have their usual residence, but outside their usual environment, for a period not exceeding 12 months and whose main purpose in visiting is other than an activity remunerated from within the country visited. When data on number of tourists are not available, the number of visitors, which includes tourists, same-day visitors, cruise passengers, and crew members, is shown instead. Sources and collection methods for arrivals differ across countries. In some cases data are from border statistics (police, immigration, and the like) and supplemented by border surveys. In other cases data are from tourism accommodation establishments. For some countries number of arrivals is limited to arrivals by air and for others to arrivals staying in hotels. Some countries include arrivals of nationals residing abroad while others do not. Caution should thus be used in comparing arrivals across countries. The data on inbound tourists refer to the number of arrivals, not to the number of people traveling. Thus a person who makes several trips to a country during a given period is counted each time as a new arrival.
- Population, total (SP.POP.TOTL): Total population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. The values shown are midyear estimates.

There is no data from year 2020, but 2019 there is. 2019 is last whole year before covid-pandemic.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from io import BytesIO
from zipfile import ZipFile
import requests
import xml.etree.ElementTree as et

In [6]:
def get_WB_indicator(indicator, year):
    url_base='https://api.worldbank.org/v2/en/indicator/'
    f = requests.get(url_base+indicator+'?downloadformat=xml').content
    zf = ZipFile(BytesIO(f), 'r' )
    data=zf.read(zf.namelist()[0])
    dataroot = et.fromstring(data)
    df = pd.DataFrame()
    for m in dataroot[0]:
        if m.tag=='record':
            d={}
            for x in m:
                d[x.attrib['name']]=x.text
                if x.attrib['name']=='Country or Area':
                    d['Code']=x.attrib['key']
                
            record = pd.Series(data=d, dtype=str, index=['Country or Area', 'Item', 'Year','Value','Code'])
            if int(d['Year'])==year:
                df=df.append(record, ignore_index=True)
    df=df.rename(columns={'Value':indicator}).drop(['Item'], axis=1)
    return df

df1=get_WB_indicator('ST.INT.ARVL', 2019).set_index('Country or Area')
df2=get_WB_indicator('SP.POP.TOTL', 2019).set_index('Country or Area')

In [7]:
df=pd.concat([df1,df2], axis=1)
df['ST.INT.ARVL per population']=df['ST.INT.ARVL'].astype(float)/df['SP.POP.TOTL'].astype(float)
df.sort_values('ST.INT.ARVL per population', ascending=False).head(20)

Unnamed: 0_level_0,Year,ST.INT.ARVL,Code,Year,SP.POP.TOTL,Code,ST.INT.ARVL per population
Country or Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Andorra,2019,8235000.0,AND,2019,77146,AND,106.745651
"Macao SAR, China",2019,39406000.0,MAC,2019,640446,MAC,61.528997
San Marino,2019,1904000.0,SMR,2019,33864,SMR,56.2249
Sint Maarten (Dutch part),2019,1952000.0,SXM,2019,40733,SXM,47.921832
Turks and Caicos Islands,2019,1599000.0,TCA,2019,38194,TCA,41.865214
Cayman Islands,2019,2334000.0,CYM,2019,64948,CYM,35.936441
St. Kitts and Nevis,2019,1107000.0,KNA,2019,52834,KNA,20.952417
Virgin Islands (U.S.),2019,2074000.0,VIR,2019,106669,VIR,19.443325
"Bahamas, The",2019,7250000.0,BHS,2019,389486,BHS,18.614276
Aruba,2019,1951000.0,ABW,2019,106310,ABW,18.351989


In [8]:
df.describe()

Unnamed: 0,ST.INT.ARVL per population
count,213.0
mean,3.383991
std,10.716871
min,0.001981
25%,0.139905
50%,0.544401
75%,1.799374
max,106.745651


In [9]:
print('Missing values:',df['ST.INT.ARVL per population'].isna().sum())

Missing values: 53


In [10]:
df[df.index.str.contains('Africa')]

Unnamed: 0_level_0,Year,ST.INT.ARVL,Code,Year,SP.POP.TOTL,Code,ST.INT.ARVL per population
Country or Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Africa Eastern and Southern,2019,39826701.4025488,AFE,2019,660046272,AFE,0.060339
Africa Western and Central,2019,,AFW,2019,446911598,AFW,
Central African Republic,2019,,CAF,2019,4745179,CAF,
Middle East & North Africa,2019,127987047.716571,MEA,2019,456709496,MEA,0.280237
Middle East & North Africa (excluding high income),2019,67608256.1787089,MNA,2019,389457075,MNA,0.173596
Sub-Saharan Africa (excluding high income),2019,54779421.168165,SSA,2019,1106860245,SSA,0.049491
Sub-Saharan Africa,2019,55251253.9050801,SSF,2019,1106957870,SSF,0.049913
Middle East & North Africa (IDA & IBRD countries),2019,66910597.2929825,TMN,2019,384771769,TMN,0.173897
Sub-Saharan Africa (IDA & IBRD countries),2019,55251253.9050801,TSS,2019,1106957870,TSS,0.049913
South Africa,2019,14797000.0,ZAF,2019,58558267,ZAF,0.252688


In [11]:
df[df.index.str.contains('Europe')]

Unnamed: 0_level_0,Year,ST.INT.ARVL,Code,Year,SP.POP.TOTL,Code,ST.INT.ARVL per population
Country or Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Central Europe and the Baltics,2019,314638365.070248,CEB,2019,102398537,CEB,3.072684
Europe & Central Asia (excluding high income),2019,180724280.37354,ECA,2019,418760880,ECA,0.431569
Europe & Central Asia,2019,1183445651.59356,ECS,2019,920809471,ECS,1.285223
European Union,2019,966435420.872104,EUU,2019,447196538,EUU,2.161098
Europe & Central Asia (IDA & IBRD countries),2019,331270663.506501,TEC,2019,460791608,TEC,0.718916


In [12]:
df[df.index.str.contains('America')]

Unnamed: 0_level_0,Year,ST.INT.ARVL,Code,Year,SP.POP.TOTL,Code,ST.INT.ARVL per population
Country or Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
American Samoa,2019,52700.0007629395,ASM,2019,55312,ASM,0.952777
Latin America & Caribbean (excluding high income),2019,165231233.608795,LAC,2019,589503742,LAC,0.280289
Latin America & Caribbean,2019,201852567.05797,LCN,2019,646430786,LCN,0.312257
North America,2019,199244000.0,NAC,2019,365987250,NAC,0.544401
Latin America & the Caribbean (IDA & IBRD countries),2019,172906908.81459,TLA,2019,630644771,TLA,0.274175


In [13]:
df[df.index.str.contains('Asia')]

Unnamed: 0_level_0,Year,ST.INT.ARVL,Code,Year,SP.POP.TOTL,Code,ST.INT.ARVL per population
Country or Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
East Asia & Pacific (excluding high income),2019,292914980.541575,EAP,2019,2093675075,EAP,0.139905
East Asia & Pacific,2019,487076537.178296,EAS,2019,2340673749,EAS,0.208092
Europe & Central Asia (excluding high income),2019,180724280.37354,ECA,2019,418760880,ECA,0.431569
Europe & Central Asia,2019,1183445651.59356,ECS,2019,920809471,ECS,1.285223
South Asia,2019,26260132.181425,SAS,2019,1835776769,SAS,0.014305
East Asia & Pacific (IDA & IBRD countries),2019,289303382.62936,TEA,2019,2067982370,TEA,0.139896
Europe & Central Asia (IDA & IBRD countries),2019,331270663.506501,TEC,2019,460791608,TEC,0.718916
South Asia (IDA & IBRD),2019,26260132.181425,TSA,2019,1835776769,TSA,0.014305
