### Introduction

We have a MySQL database named `cities`, which contains a table called `cities` with data world's most popular tourist destinations. This table includes:
- City names
- Country
- Continent
- Geographical locations (latitude and longitude)
- Development ranking (HDI)
- Primary religion
- Language spoken


These attributes allow us to analyze and compare cities based on cultural, economic, and geographical similarities.


In [None]:
import pandas as pd

# URL to the CSV file on GitHub containing the data
url = "https://raw.githubusercontent.com/tomerud/EstiMate/main/most_touristic_cities.csv"

# Load the data into a DataFrame
cities_df = pd.read_csv(url)

# Display the first few rows to understand the structure
cities_df.head()


Unnamed: 0,city,country,continent,language,religion,human_development_index,latitude,longitude
0,Abu Dhabi,UAE,Asia,Arabic,Islam,0.89,24.4539,54.3773
1,Accra,Ghana,Africa,English,Christianity,0.611,5.6037,-0.187
2,Almaty,Kazakhstan,Asia,Kazakh/Russian,Islam,0.8,43.222,76.8512
3,Amman,Jordan,Asia,Arabic,Islam,0.72,31.9454,35.9284
4,Amsterdam,Netherlands,Europe,Dutch,Christianity,0.944,52.3676,4.9041


Let's have a look on our exotic locations:

In [8]:
from google.colab import files
uploaded = files.upload()

Saving touristic_locations.csv to touristic_locations.csv


In [9]:
import pandas as pd

# Load the uploaded CSV file into a DataFrame
locations_df = pd.read_csv("touristic_locations.csv")

# Optional: Display the first few rows to verify
locations_df.head()


Unnamed: 0,city,country,human_development_index,religion,language,latitude,longitude
0,Abu Dhabi,UAE,0.89,Islam,Arabic,24.4539,54.3773
1,Accra,Ghana,0.611,Christianity,English,5.6037,-0.187
2,Almaty,Kazakhstan,0.8,Islam,Kazakh/Russian,43.222,76.8512
3,Amman,Jordan,0.72,Islam,Arabic,31.9454,35.9284
4,Amsterdam,Netherlands,0.944,Christianity,Dutch,52.3676,4.9041


In [10]:
import plotly.express as px

# Create an interactive scatter plot map with Plotly
fig = px.scatter_geo(
    locations_df,
    lat='latitude',
    lon='longitude',
    hover_name='city',
    hover_data={
        "country": True,  # Show country in hover data
        "human_development_index": True,
        "religion": True,
        "language": True,
        "latitude": False,  # Hide in hover data
        "longitude": False  # Hide in hover data
    },
    title="Interactive Map of All Touristic Locations by City",
)

# Customize marker size, color, and opacity
fig.update_traces(marker=dict(size=5, color="blue", opacity=0.7))

# Use a minimalistic map style
fig.update_geos(
    projection_type="natural earth",
    showland=True,
    landcolor="whitesmoke",
    oceancolor="lightblue",
    showocean=True,
    showlakes=False,
    showcountries=True,
    countrycolor="lightgray",
    coastlinecolor="lightgray"
)

# Update layout for a minimalistic look
fig.update_layout(
    title="Interactive Map of All Touristic Locations by City",
    font=dict(size=12),
    margin={"r": 0, "t": 30, "l": 0, "b": 0},
    showlegend=False
)

# Display the map
fig.show()


## Step 2: Interactive Visualization of Cities by PCA Components

To better understand the similarities between cities, we'll apply Principal Component Analysis (PCA) to reduce the data's dimensionality. PCA combines several attributes into two main components, allowing us to visualize cities in a 2D space.

Each point in the scatter plot represents a city, and the color reflects its Development Ranking (HDI). This interactive plot lets us explore relationships between cities visually.


In [None]:
import numpy as np
import plotly.express as px
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.decomposition import PCA

# Encode categorical variables (Religion and Language)
encoder = LabelEncoder()
cities_df['Religion_encoded'] = encoder.fit_transform(cities_df['Religion'])
cities_df['Language_encoded'] = encoder.fit_transform(cities_df['Language'])

# Standardize data for PCA
features = ['Development Ranking (HDI)', 'Latitude', 'Longitude', 'Religion_encoded', 'Language_encoded']
scaler = StandardScaler()
scaled_features = scaler.fit_transform(cities_df[features])

# Dimensionality Reduction with PCA
pca = PCA(n_components=2)
cities_df[['PCA1', 'PCA2']] = pca.fit_transform(scaled_features)

# Interactive Plotting with Plotly, displaying city names with adjustments
fig = px.scatter(
    cities_df,
    x='PCA1',
    y='PCA2',
    color='Development Ranking (HDI)',
    text='City',  # Display city names
    title="Interactive Visualization of Cities by PCA Components",
    labels={'PCA1': 'PCA Component 1', 'PCA2': 'PCA Component 2'},
    color_continuous_scale=px.colors.sequential.Viridis
)

# Update the layout with a larger figure size and adjusted text position
fig.update_traces(
    textposition='top right',  # Position labels to the top right of each point to reduce overlap
    marker=dict(size=8, opacity=0.8)  # Smaller markers with slightly reduced opacity
)
fig.update_layout(
    coloraxis_colorbar=dict(title="Development Ranking (HDI)"),
    width=1400,  # Significantly wider plot
    height=1000  # Taller plot
)

# Display the interactive plot
fig.show()

