# **Exploratory Data Analysis of World Development Indicators**


<img src="https://images.pexels.com/photos/41949/earth-earth-at-night-night-lights-41949.jpeg" alt="Alternative text" />

**What is Exploratory Data Analysis?**

A crucial step in data science is exploratory data analysis (EDA), which entails examining and comprehending the underlying relationships, patterns, and characteristics of a dataset. Data analysts can learn about the distributions and correlations of the variables by using EDA to explore the data, spot outliers, trends, and missing values. To aid in data exploration, EDA frequently uses visualisations like histograms, scatter plots, and box plots. To quantify the relationships between variables, analysts may also conduct statistical calculations, such as calculating summary statistics and correlation coefficients. EDA aids in finding anomalies, verifying hypotheses, and directing decisions regarding additional analysis or modelling. EDA gives analysts a thorough understanding of the data, allowing them to make wise decisions, choose the best data transformations, and create successful data-driven strategies. It is an important step in the data analysis process, allowing analysts to uncover patterns, generate hypotheses, and derive actionable insights that can drive decision-making and contribute to successful data-driven outcomes.

## **Outline of Project**

* **Select and download real-world dataset**
* **Import and Install all the libraries**
* **Perform data preparation & cleaning**
* **Ask & answer questions about the data**
* **Perform exploratory analysis & visualization**
* **Summarize your inferences & write a conclusion**

## **Select and download real-world dataset**

This dataset is available on Kaggle. It contains data about the WDI data.The World Development Indicators (WDI) dataset is a comprehensive collection of development indicators compiled by the World Bank. It provides access to a wide range of global development data across various sectors and countries. The WDI data covers numerous topics related to social, economic, and environmental aspects of development.

We will examine this data and draw some conclusions.

Dataset link - https://www.kaggle.com/datasets/yallabalaji/wdi-data


Use the "Run" button to execute the code.

## Step 1 - Importing necessary Libraries and loading the data

In [None]:
import pandas as pd

In [None]:
%%capture
!pip install opendatasets
!pip install plotly.express
!pip install pycountry
!pip install country_converter

In [None]:
import opendatasets as od
import pandas
import pycountry
import plotly.express as px
import pandas as pd
import numpy as np
import country_converter as coco
import plotly.graph_objs as go

In [None]:
od.download("https://www.kaggle.com/datasets/yallabalaji/wdi-data")

Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: yallabalaji
Your Kaggle Key: ··········
Downloading wdi-data.zip to ./wdi-data


100%|██████████| 61.5M/61.5M [00:01<00:00, 44.5MB/s]





In [None]:
df = pd.read_csv('wdi-data/WDIData.csv')

In [None]:
df.shape

(383572, 67)

Lets check what is there inside the dataset !

In [None]:
df['Indicator Name'].nunique()

1442

The Total Number of Indicator are 1442.

There are more than 180+ countries . so when comibined 250*1442 = 360500 very close to 3.6 lac cols, we can say almost we can make of 1442 data frames which can analyzed .

##### Note :

Even Though we are have 3.6 Lac  rows of data , while cleaning and categorizing we are targetting for 8 meaningful insights , which will lead to grouping and cleaning unnecessary , irrelevant data which when finally comes out to be Minimized rows

## Step 2 - Data Cleaning and PreProcessing

In [None]:
pd.set_option('display.max_colwidth', None)
df[["Indicator Name", "Indicator Code"]].head(10)

Unnamed: 0,Indicator Name,Indicator Code
0,Access to clean fuels and technologies for cooking (% of population),EG.CFT.ACCS.ZS
1,"Access to clean fuels and technologies for cooking, rural (% of rural population)",EG.CFT.ACCS.RU.ZS
2,"Access to clean fuels and technologies for cooking, urban (% of urban population)",EG.CFT.ACCS.UR.ZS
3,Access to electricity (% of population),EG.ELC.ACCS.ZS
4,"Access to electricity, rural (% of rural population)",EG.ELC.ACCS.RU.ZS
5,"Access to electricity, urban (% of urban population)",EG.ELC.ACCS.UR.ZS
6,Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+),FX.OWN.TOTL.ZS
7,"Account ownership at a financial institution or with a mobile-money-service provider, female (% of population ages 15+)",FX.OWN.TOTL.FE.ZS
8,"Account ownership at a financial institution or with a mobile-money-service provider, male (% of population ages 15+)",FX.OWN.TOTL.MA.ZS
9,"Account ownership at a financial institution or with a mobile-money-service provider, older adults (% of population ages 25+)",FX.OWN.TOTL.OL.ZS


### Categorizing Military DF

In [None]:
grouped_df = df.groupby('Indicator Code')

In [None]:
military_df = grouped_df.get_group('MS.MIL.XPND.GD.ZS')

In [None]:
military_df = military_df[['Country Name','Country Code','2021']]

In [None]:
military_df=military_df.rename(columns={'2021': 'Military Expenditure'})
military_df.head(10)

Unnamed: 0,Country Name,Country Code,Military Expenditure
771,Africa Eastern and Southern,AFE,1.008492
2213,Africa Western and Central,AFW,1.187561
3655,Arab World,ARB,4.590772
5097,Caribbean small states,CSS,
6539,Central Europe and the Baltics,CEB,1.892527
7981,Early-demographic dividend,EAR,2.236317
9423,East Asia & Pacific,EAS,1.653361
10865,East Asia & Pacific (excluding high income),EAP,1.642449
12307,East Asia & Pacific (IDA & IBRD countries),TEA,1.642449
13749,Euro area,EMU,1.502301


### Grouping up Electrical Power Categorization :

In [None]:
distribution_losses_df = grouped_df.get_group('EG.ELC.LOSS.ZS')
distribution_losses_df = distribution_losses_df[['Country Name','Country Code','2014']]
distribution_losses_df = distribution_losses_df.rename(columns={'2014': 'Distribution_Losses'})
distribution_losses_df = distribution_losses_df.convert_dtypes()

In [None]:
power_gen_fossil_fuel_df = grouped_df.get_group('EG.ELC.FOSL.ZS')
power_gen_fossil_fuel_df = power_gen_fossil_fuel_df[['Country Code','2014']]
power_gen_fossil_fuel_df = power_gen_fossil_fuel_df.rename(columns={'2014': 'Fossil_Fuel_Power'})
power_gen_fossil_fuel_df = power_gen_fossil_fuel_df.convert_dtypes()


In [None]:
power_transmitted_fossil_gen_df = pd.merge(distribution_losses_df, power_gen_fossil_fuel_df, on='Country Code')

Adding Continent Column in DataFrame

In [None]:
%%capture
continents = coco.convert(names=power_transmitted_fossil_gen_df['Country Name'].tolist(), to='continent')
power_transmitted_fossil_gen_df['continent'] = continents



In [None]:
power_transmitted_fossil_gen_df['continent'] = power_transmitted_fossil_gen_df['continent'].replace('not found', 'Others')

In [None]:
power_transmitted_fossil_gen_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 266 entries, 0 to 265
Data columns (total 5 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Country Name         266 non-null    string 
 1   Country Code         266 non-null    string 
 2   Distribution_Losses  187 non-null    Float64
 3   Fossil_Fuel_Power    186 non-null    Float64
 4   continent            266 non-null    object 
dtypes: Float64(2), object(1), string(2)
memory usage: 13.0+ KB


In [None]:
Final_DF = pd.merge(military_df,power_transmitted_fossil_gen_df,on=['Country Code','Country Name'],how='outer')
Final_DF

Unnamed: 0,Country Name,Country Code,Military Expenditure,Distribution_Losses,Fossil_Fuel_Power,continent
0,Africa Eastern and Southern,AFE,1.008492,10.594766,66.606766,not found
1,Africa Western and Central,AFW,1.187561,17.903147,56.053836,not found
2,Arab World,ARB,4.590772,14.359833,85.941711,not found
3,Caribbean small states,CSS,,9.389468,,not found
4,Central Europe and the Baltics,CEB,1.892527,7.370623,56.494107,not found
...,...,...,...,...,...,...
261,Virgin Islands (U.S.),VIR,,,,America
262,West Bank and Gaza,PSE,,,,Asia
263,"Yemen, Rep.",YEM,,25.765106,100.0,Asia
264,Zambia,ZMB,1.304765,14.959867,2.836978,Africa


### Grouping up Urban City Population :



In [None]:
urban_pop_df = grouped_df.get_group('EN.URB.LCTY')
urban_pop_df = urban_pop_df[['Country Name','Country Code','2021']]
urban_pop_df = urban_pop_df.rename(columns={'2021': 'Urban_City_Pop'})
urban_pop_df = urban_pop_df.convert_dtypes()
urban_pop_df.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 266 entries, 1052 to 383182
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Country Name    266 non-null    string
 1   Country Code    266 non-null    string
 2   Urban_City_Pop  153 non-null    Int64 
dtypes: Int64(1), string(2)
memory usage: 8.6 KB


In [None]:
Final_DF = pd.merge(Final_DF,urban_pop_df,on=['Country Code','Country Name'])
Final_DF.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 266 entries, 0 to 265
Data columns (total 7 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Country Name          266 non-null    object 
 1   Country Code          266 non-null    object 
 2   Military Expenditure  194 non-null    float64
 3   Distribution_Losses   187 non-null    Float64
 4   Fossil_Fuel_Power     186 non-null    Float64
 5   continent             266 non-null    object 
 6   Urban_City_Pop        153 non-null    Int64  
dtypes: Float64(2), Int64(1), float64(1), object(3)
memory usage: 17.4+ KB


### Grouping Up Trade and Trade service percentage

In [None]:
trade_df = grouped_df.get_group('NE.TRD.GNFS.ZS')
trade_df = trade_df[['Country Name','Country Code','2021']]
trade_df = trade_df.rename(columns={'2021': 'Trade percent'})
trade_df = trade_df.convert_dtypes()
trade_df.head(10)

Unnamed: 0,Country Name,Country Code,Trade percent
1357,Africa Eastern and Southern,AFE,52.31218
2799,Africa Western and Central,AFW,36.711066
4241,Arab World,ARB,60.177156
5683,Caribbean small states,CSS,
7125,Central Europe and the Baltics,CEB,127.530822
8567,Early-demographic dividend,EAR,54.403065
10009,East Asia & Pacific,EAS,56.291077
11451,East Asia & Pacific (excluding high income),EAP,45.386985
12893,East Asia & Pacific (IDA & IBRD countries),TEA,45.386442
14335,Euro area,EMU,90.262054


In [None]:
trade_service_df = grouped_df.get_group('BG.GSR.NFSV.GD.ZS')
trade_service_df = trade_service_df[['Country Code','2021']]
trade_service_df = trade_service_df.rename(columns={'2021': 'Trade Service percent'})
trade_service_df = trade_service_df.convert_dtypes()
trade_service_df.head(10)

Unnamed: 0,Country Code,Trade Service percent
1358,AFE,8.344958
2800,AFW,8.156476
4242,ARB,12.740209
5684,CSS,30.386154
7126,CEB,21.927576
8568,EAR,9.464864
10010,EAS,8.096468
11452,EAP,5.096401
12894,TEA,5.096401
14336,EMU,26.219071


In [None]:
trade_and_service_df = pd.merge(trade_df,trade_service_df,on="Country Code")
trade_and_service_df.head(10)

Unnamed: 0,Country Name,Country Code,Trade percent,Trade Service percent
0,Africa Eastern and Southern,AFE,52.31218,8.344958
1,Africa Western and Central,AFW,36.711066,8.156476
2,Arab World,ARB,60.177156,12.740209
3,Caribbean small states,CSS,,30.386154
4,Central Europe and the Baltics,CEB,127.530822,21.927576
5,Early-demographic dividend,EAR,54.403065,9.464864
6,East Asia & Pacific,EAS,56.291077,8.096468
7,East Asia & Pacific (excluding high income),EAP,45.386985,5.096401
8,East Asia & Pacific (IDA & IBRD countries),TEA,45.386442,5.096401
9,Euro area,EMU,90.262054,26.219071


In [None]:
Final_DF = pd.merge(Final_DF,trade_and_service_df,on=['Country Name','Country Code'],how = 'inner')
Final_DF.head(10)

Unnamed: 0,Country Name,Country Code,Military Expenditure,Distribution_Losses,Fossil_Fuel_Power,continent,Urban_City_Pop,Trade percent,Trade Service percent
0,Africa Eastern and Southern,AFE,1.008492,10.594766,66.606766,not found,,52.31218,8.344958
1,Africa Western and Central,AFW,1.187561,17.903147,56.053836,not found,,36.711066,8.156476
2,Arab World,ARB,4.590772,14.359833,85.941711,not found,,60.177156,12.740209
3,Caribbean small states,CSS,,9.389468,,not found,,,30.386154
4,Central Europe and the Baltics,CEB,1.892527,7.370623,56.494107,not found,,127.530822,21.927576
5,Early-demographic dividend,EAR,2.236317,15.255739,77.612396,not found,,54.403065,9.464864
6,East Asia & Pacific,EAS,1.653361,5.412341,76.009424,not found,,56.291077,8.096468
7,East Asia & Pacific (excluding high income),EAP,1.642449,5.841038,75.224003,not found,,45.386985,5.096401
8,East Asia & Pacific (IDA & IBRD countries),TEA,1.642449,5.813362,75.357006,not found,,45.386442,5.096401
9,Euro area,EMU,1.502301,6.057205,38.963659,not found,,90.262054,26.219071


### Categorizing Ease Of Doing Business

In [None]:
ease_of_doing_business_index_df = grouped_df.get_group('IC.BUS.DFRN.XQ')
ease_of_doing_business_index_2019_df  = ease_of_doing_business_index_df [['Country Name','2019']]
ease_of_doing_business_index_2019_df['Category'] = pd.cut(ease_of_doing_business_index_2019_df['2019'], bins=[0, 20, 40, 60, 80,100], labels=['Bad','Poor', 'Average', 'Good', 'Excellent'])
ease_of_doing_business_index_2019_df= ease_of_doing_business_index_2019_df.rename(columns={'2019': 'Ease_of_Doing_Business'})
ease_of_doing_business_index_2019_df.head(10)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Country Name,Ease_of_Doing_Business,Category
322,Africa Eastern and Southern,53.540691,Average
1764,Africa Western and Central,49.843345,Average
3206,Arab World,56.50469,Average
4648,Caribbean small states,58.217698,Average
6090,Central Europe and the Baltics,76.335055,Good
7532,Early-demographic dividend,58.128581,Average
8974,East Asia & Pacific,65.958631,Good
10416,East Asia & Pacific (excluding high income),60.01535,Good
11858,East Asia & Pacific (IDA & IBRD countries),60.01535,Good
13300,Euro area,76.078806,Good


In [None]:
Final_DF=pd.merge(Final_DF,ease_of_doing_business_index_2019_df,on=['Country Name'],how ='inner')

In [None]:
Final_DF['Distribution_Losses'] = Final_DF['Distribution_Losses'].replace(0.00, 0.01)
Final_DF['Fossil_Fuel_Power'] = Final_DF['Fossil_Fuel_Power'].replace(0.00, 0.01)
Final_DF['Trade percent'] = Final_DF['Trade percent'].replace(0.00, 0.01)
Final_DF['Trade Service percent'] = Final_DF['Trade Service percent'].replace(0.00, 0.01)

### Categorizing Mobile Connection of Top 4 developed countries with India

In [None]:
mobile_df = grouped_df.get_group('IT.CEL.SETS')
mask = mobile_df['Country Name'].isin(['India', 'United States','China','Japan','United Kingdom'])
refined_mobile_df = mobile_df[mask]
refined_mobile_df = refined_mobile_df.drop(['Indicator Name','Indicator Code','Country Code'],axis=1)


In [None]:
melted_mobile_df = pd.melt(refined_mobile_df,id_vars='Country Name',var_name='Years', value_name='Subscribers')
melted_mobile_df = melted_mobile_df.dropna()
melted_mobile_df

Unnamed: 0,Country Name,Years,Subscribers
0,China,1960,0.000000e+00
1,India,1960,0.000000e+00
2,Japan,1960,0.000000e+00
3,United Kingdom,1960,0.000000e+00
4,United States,1960,0.000000e+00
...,...,...,...
305,China,2021,1.732661e+09
306,India,2021,1.154047e+09
307,Japan,2021,2.004788e+08
308,United Kingdom,2021,7.977300e+07


### Categorizing Credit Information Index

In [None]:
credit_information_index_df = grouped_df.get_group('IC.CRD.INFO.XQ')
credit_information_index_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 266 entries, 302 to 382432
Data columns (total 67 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    266 non-null    object 
 1   Country Code    266 non-null    object 
 2   Indicator Name  266 non-null    object 
 3   Indicator Code  266 non-null    object 
 4   1960            0 non-null      float64
 5   1961            0 non-null      float64
 6   1962            0 non-null      float64
 7   1963            0 non-null      float64
 8   1964            0 non-null      float64
 9   1965            0 non-null      float64
 10  1966            0 non-null      float64
 11  1967            0 non-null      float64
 12  1968            0 non-null      float64
 13  1969            0 non-null      float64
 14  1970            0 non-null      float64
 15  1971            0 non-null      float64
 16  1972            0 non-null      float64
 17  1973            0 non-null    

From The information we came to know that from 2013 to 2019 we have lots of use full data.
lets get year 2013 to 2019

For 2018

In [None]:
credit_index_df = grouped_df.get_group('IC.CRD.INFO.XQ')
refined_credit_index_df = credit_index_df.drop(['Indicator Name','Indicator Code','Country Code'],axis=1)
melted_credit_index_df = pd.melt(refined_credit_index_df ,id_vars='Country Name',var_name='Years', value_name='Score')
melted_credit_index_df = melted_credit_index_df.dropna()
melted_credit_index_df

Unnamed: 0,Country Name,Years,Score
14098,Africa Eastern and Southern,2013,2.240000
14099,Africa Western and Central,2013,0.772727
14100,Arab World,2013,3.380952
14101,Caribbean small states,2013,0.461538
14102,Central Europe and the Baltics,2013,6.272727
...,...,...,...
15954,Vietnam,2019,8.000000
15956,West Bank and Gaza,2019,8.000000
15957,"Yemen, Rep.",2019,0.000000
15958,Zambia,2019,8.000000


### Categorizing GDP per capita :

In [None]:
test_df = grouped_df.get_group('NY.GDP.PCAP.CD')
test_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 266 entries, 474 to 382604
Data columns (total 67 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    266 non-null    object 
 1   Country Code    266 non-null    object 
 2   Indicator Name  266 non-null    object 
 3   Indicator Code  266 non-null    object 
 4   1960            134 non-null    float64
 5   1961            136 non-null    float64
 6   1962            138 non-null    float64
 7   1963            138 non-null    float64
 8   1964            138 non-null    float64
 9   1965            149 non-null    float64
 10  1966            152 non-null    float64
 11  1967            155 non-null    float64
 12  1968            160 non-null    float64
 13  1969            160 non-null    float64
 14  1970            169 non-null    float64
 15  1971            172 non-null    float64
 16  1972            172 non-null    float64
 17  1973            172 non-null  

In [None]:
ease_df = test_df[['Country Name','2021']]
ease_df = ease_df.sort_values('2021', ascending=False)
ease_df.head(10)

Unnamed: 0,Country Name,2021
258592,Monaco,234315.460504
238404,Luxembourg,133590.146976
101414,Bermuda,114090.328339
205238,Ireland,100172.079253
342228,Switzerland,91991.600458
283106,Norway,89154.276093
123044,Cayman Islands,86568.769637
317714,Singapore,72794.003023
368184,United States,70248.629
164862,Faroe Islands,69010.309801


Selecting Top 5 GDP per capita Countries

In [None]:
gdp_per_captia_df = grouped_df.get_group('NY.GDP.PCAP.CD')
mask = gdp_per_captia_df['Country Name'].isin(['Monaco','Luxembourg','Bermuda','Ireland','Switzerland'])
refined_gdp_per_captia_df = gdp_per_captia_df[mask]
refined_gdp_per_captia_df = refined_gdp_per_captia_df.drop(['Indicator Name','Indicator Code','Country Code'],axis=1)
melted_gdp_per_captia_df = pd.melt(refined_gdp_per_captia_df,id_vars='Country Name',var_name='Years', value_name='GDP Per Capita')
melted_gdp_per_captia_df = melted_gdp_per_captia_df.dropna()
melted_gdp_per_captia_df

Unnamed: 0,Country Name,Years,GDP Per Capita
0,Bermuda,1960,1902.402119
1,Ireland,1960,685.614712
2,Luxembourg,1960,2242.015817
4,Switzerland,1960,1787.360348
5,Bermuda,1961,1961.538169
...,...,...,...
305,Bermuda,2021,114090.328339
306,Ireland,2021,100172.079253
307,Luxembourg,2021,133590.146976
308,Monaco,2021,234315.460504


This Final DF has The Following Imformation. From this we can plot multiple visualization

In [None]:
Final_DF.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 266 entries, 0 to 265
Data columns (total 11 columns):
 #   Column                  Non-Null Count  Dtype   
---  ------                  --------------  -----   
 0   Country Name            266 non-null    object  
 1   Country Code            266 non-null    object  
 2   Military Expenditure    194 non-null    float64 
 3   Distribution_Losses     187 non-null    Float64 
 4   Fossil_Fuel_Power       186 non-null    Float64 
 5   continent               266 non-null    object  
 6   Urban_City_Pop          153 non-null    Int64   
 7   Trade percent           214 non-null    Float64 
 8   Trade Service percent   204 non-null    Float64 
 9   Ease_of_Doing_Business  238 non-null    float64 
 10  Category                238 non-null    category
dtypes: Float64(4), Int64(1), category(1), float64(2), object(3)
memory usage: 24.6+ KB


In [None]:
Final_DF.head(10)

Unnamed: 0,Country Name,Country Code,Military Expenditure,Distribution_Losses,Fossil_Fuel_Power,continent,Urban_City_Pop,Trade percent,Trade Service percent,Ease_of_Doing_Business,Category
0,Africa Eastern and Southern,AFE,1.008492,10.594766,66.606766,not found,,52.31218,8.344958,53.540691,Average
1,Africa Western and Central,AFW,1.187561,17.903147,56.053836,not found,,36.711066,8.156476,49.843345,Average
2,Arab World,ARB,4.590772,14.359833,85.941711,not found,,60.177156,12.740209,56.50469,Average
3,Caribbean small states,CSS,,9.389468,,not found,,,30.386154,58.217698,Average
4,Central Europe and the Baltics,CEB,1.892527,7.370623,56.494107,not found,,127.530822,21.927576,76.335055,Good
5,Early-demographic dividend,EAR,2.236317,15.255739,77.612396,not found,,54.403065,9.464864,58.128581,Average
6,East Asia & Pacific,EAS,1.653361,5.412341,76.009424,not found,,56.291077,8.096468,65.958631,Good
7,East Asia & Pacific (excluding high income),EAP,1.642449,5.841038,75.224003,not found,,45.386985,5.096401,60.01535,Good
8,East Asia & Pacific (IDA & IBRD countries),TEA,1.642449,5.813362,75.357006,not found,,45.386442,5.096401,60.01535,Good
9,Euro area,EMU,1.502301,6.057205,38.963659,not found,,90.262054,26.219071,76.078806,Good


## Step 3 Exploring and Uncovering Insights: Posing Questions and Finding Answers (Q/A)

We've already learned a lot about the WDI data by looking at individual columns in the dataset. Let's put a few specific questions to the test and see how data frame operations and visualisations can assist us in responding.

1. Does a country's military expenditure as a percentage of GDP becomes aggressive compare to its neighbors or other countries in its region?
2. Which Countries are doing bad in distribution losses when compared to its power generated ?
3. What are top 10 largest populated cities all around world ?
4. How trade is important contributing in GDP all over the world.
5. How Rapid growth of cellular subscription spiked over 60 years from various countries
6. How Credit Information Index for World is performed over 2013 to 2019? What is median score Credit Information index in year 2019.
7. How much part of countries in world are doing bad at ease of doing business?
8. How GDP per capita is growed from last 60 years to till now for top countries . what country stands out to be top gdp per capita in world ?

### 1. Does a country's military expenditure as a percentage of GDP becomes aggressive compare to its neighbors or other countries in its region??

In [None]:
fig = px.choropleth(Final_DF,
                    locations='Country Code',
                    locationmode='ISO-3',
                    color='Military Expenditure',
                    hover_name='Country Code',
                    projection='natural earth',
                    color_continuous_scale='Greens'
                    )
fig.update_layout(title_text='The Cost of Security: A Choropleth Map of Military Expenditure as a Percentage of GDP')
fig.update_traces(colorscale='RdYlGn', reversescale=True)

# Show the map
fig.show()

Military spending (% of GDP) is the amount a country spends on military as a percentage of its Gross Domestic Product (GDP). This indicator shows the relative size of a country's military budget compared to its overall economic output. High military spending as a percentage of GDP could indicate that the country is prioritizing the military, perhaps at the expense of other government spending priorities such as health care and education. . However, high military spending may also indicate a country's need to protect itself from security threats or to participate in peacekeeping operations. On the other hand, a low military spending as a percentage of GDP may indicate that the country is focusing on other priorities such as economic development, but investment in military capabilities and readiness It may also indicate a lack of As such, military spending (% of GDP) provides useful insight into a country's priorities and potential strengths and weaknesses.

From the chart we can summarize the following point

1. Among all the countries Oman is aggressively spending lot of % GDP on its miltary expenditures.

2. Similarly we can observe that its neighbouring country Saudi Arabia is also aggresivly spending , which followed by Algeria

### 2. Which Countries are doing bad in distribution losses when compared to its power generated ?

In [None]:
Power_DF = Final_DF[['Country Name','Country Code','continent','Fossil_Fuel_Power','Distribution_Losses']]
Power_DF = Power_DF.dropna()

In [None]:
fig = px.treemap(Power_DF, path=[px.Constant("world"), 'continent', 'Country Name'], values='Fossil_Fuel_Power',
                  color='Distribution_Losses', hover_data=['Country Code'],
                  color_continuous_scale='RdBu',
                  color_continuous_midpoint=np.average(Power_DF['Distribution_Losses'], weights=Power_DF['Fossil_Fuel_Power']))
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.update_layout(title_text='Global Energy Mix: Proportions of Fossil Fuel Energy Generation and Distribution Losses')
fig.show()

Fossil fuel energy generation and distribution losses refer to the energy wastage that occurs during the production and transmission of electricity from fossil fuels. When fossil fuels like coal, oil, or natural gas are burned to generate power, a significant amount of energy is lost as heat due to various inefficiencies in the process. Additionally, during the transmission and distribution of electricity through power grids, energy is lost due to resistance in the wires and other equipment. These losses not only lead to a decrease in overall energy efficiency but also contribute to higher fuel consumption and greenhouse gas emissions. To address this issue, efforts are being made to improve the efficiency of fossil fuel power plants, upgrade transmission infrastructure, and promote the integration of renewable energy sources. By reducing these losses, we can enhance energy sustainability and mitigate environmental impacts associated with fossil fuel energy generation.

From the icicle chart we can summarize Togo and Libya are doing very bad with distribution losses.They are having about 70% of distribution which mean more than half of their power generated is wasted in distribution it self.
However countries of same continent such as south sudan and Mautitus are doing well bringing their losses very closly to 5%.

### 3. What are top 10 largest populated cities all around world ?

In [None]:
Top_10_Populated_Cities=Final_DF.sort_values('Urban_City_Pop',ascending=False).head(10)

In [None]:
fig = px.bar(Top_10_Populated_Cities, x='Country Name', y='Urban_City_Pop')
fig.update_xaxes(title_text='Country')
fig.update_yaxes(title_text='Max Pop in city')
fig.update_layout(title_text='Top 10 Countries with the Largest City Populations 2021')
fig.show()

We see from chart that Japan has Max population in city wise then India goes in 2nd while China on 3rd and Brazil following next.

### 4. How trade is important contributing in GDP all over the world.

In [None]:
trade_df = Final_DF[['Country Name','Country Code','continent','Trade percent','Trade Service percent',]]
trade_df = trade_df.dropna()

In [None]:
fig = px.sunburst(trade_df, path=['continent', 'Country Name'], values='Trade percent',
                  color='Trade Service percent', hover_data=['Country Code'],
                  color_continuous_scale='earth',
                  color_continuous_midpoint=np.average(trade_df.dropna()['Trade Service percent'], weights=trade_df.dropna()['Trade percent']))
fig.update_layout(title_text='International Trade and Services Contribution by Country')
fig.show()

Trade plays a vital role in the GDP (Gross Domestic Product) of every country. It contributes to economic growth by boosting exports, generating revenue, and creating jobs. Export-oriented industries contribute to GDP by tapping into foreign markets. Imports fulfill domestic demand for goods and services, enhancing the standard of living. Trade also encourages specialization, allowing countries to focus on areas of comparative advantage, increasing efficiency and productivity. However, the impact of trade on GDP can be influenced by factors like trade policies, tariffs, and global economic conditions. Nonetheless, trade remains a crucial driver of economic development, shaping the GDP of each country through increased exchange, market expansion, and economic interdependence.

From the sunburst chart it is observed that Luxembourg has height trade percent , followed by Singapore and China

### 5. How Rapid growth of cellular subscription spiked over 60 years from various countries

In [None]:
fig = px.line(melted_mobile_df, x="Years", y="Subscribers", color='Country Name')
fig.update_layout(title_text='Decades of Mobile Growth: Cellular Subscriptions in Key Countries from 1960 to 2021')
fig.show()

Over the past 60 years, cellular subscription rates have surged in countries like India, China, the USA, Japan, and the UK. This rapid growth can be attributed to various factors. In India and China, the increasing affordability of smartphones and competitive pricing have driven widespread adoption. Developed countries like the USA, Japan, and the UK have experienced significant growth due to advancements in mobile technology and the transition to faster networks like 4G and 5G. The rise in cellular subscriptions signifies the growing importance of mobile communication in our interconnected world. It has facilitated connectivity, information access, and participation in the digital economy. Mobile devices have become an integral part of daily life, enabling individuals to stay connected, conduct business, and access a wide range of services. This trend highlights the transformative impact of cellular technology on societies and economies, opening up new opportunities for social and economic development.

### 6. How Credit Information Index for World is performed over 2013 to 2019? What is median score Credit Information index in year 2019.

In [None]:
fig = px.box(melted_credit_index_df, x="Years", y="Score")
fig.update_layout(title_text='Box Plot of Credit Information Index for World  (2013-2019)')
fig.show()


---



From the charts we observe that from year 2013 to 2019 , we been seen that interquartile region is dereasing, that means as the years are progressing the q1 or quartile region 1 is increasing. Median of value 5 in year 2013,2014,2015 and increased median from year 2016,2017,2018,2019 of value 6.


---


There is great improvement from 2013 to 2019 of quartile region 1 , most
countries improved the credit Informarion index.


---

The Median Credit Information Index value is 6 which remained same for last 4 years from 2019.


---

### 7. How much part of countries in world are doing bad at ease of doing business?

In [None]:
fig = go.Figure(data=[go.Pie(labels=Final_DF['Category'], values=Final_DF['Ease_of_Doing_Business'])])
fig.update_layout(title='Ease of Doing Business Index for Different Categories of Countries in 2019')
fig.show()

The ease of doing business is a critical factor worldwide, with significant implications for economic growth and development. A favorable business environment attracts investment, encourages entrepreneurship, and stimulates job creation. Streamlined regulations, efficient government processes, and transparent governance systems are key elements that contribute to the ease of doing business. Countries that prioritize these factors experience increased productivity, competitiveness, and innovation. Moreover, a conducive business climate fosters investor confidence, leading to more foreign direct investment and the transfer of advanced technologies. By reducing bureaucratic hurdles and corruption, countries can establish trust between the government and businesses, promoting long-term economic stability. Ultimately, a high ease of doing business index drives economic growth, enhances global competitiveness, and improves the overall well-being of a nation by creating opportunities for business expansion, fostering entrepreneurship, and attracting both domestic and international investments.



---

It is observed from pie chart that only 2.41% of world countries are doing poor in ease of doing business


---



### 8. How GDP per capita is growed from last 60 years to till now for top countries . what country stands out to be top gdp per capita in world ?

In [None]:
fig = px.line(melted_gdp_per_captia_df, x="Years", y="GDP Per Capita", color='Country Name')
fig.update_layout(title_text='GDP per capita trends of top five countries in the world')
fig.show()


Monaco, Luxembourg, Bermuda, Ireland, and Switzerland are countries known for their impressive GDP per capita figures. Monaco, a small city-state, has one of the highest GDP per capita in the world, fueled by its thriving luxury services and tourism sectors. Luxembourg, a landlocked country, boasts a strong financial industry and serves as a major banking and investment hub. Bermuda, a British Overseas Territory, stands out for its offshore financial services and insurance sector. Ireland has experienced remarkable economic growth, driven by industries like technology, pharmaceuticals, and finance. Switzerland, renowned for its precision manufacturing and banking sector, consistently ranks among the countries with the highest GDP per capita. These nations' strong economic performance, favorable business environments, and specialization in key sectors contribute to their elevated GDP per capita, signifying high living standards and economic prosperity.


---

Monaco stands out to be the highest gdp per capita around the world.


---



## Summary


The Exploratory Data Analysis on WDI is Done and Following are Steps completed and Observations noted.

1. Downloaded the dataset from Kaggle website https://www.kaggle.com/datasets/yallabalaji/wdi-data).

2. Data preparation Cleaning PreProcessing and Analysis is done with Pandas

3. Exploratory analysis and visualization was done along with asking and answering interesting questions.

4. The following are the important observations throughout Analysis:

---
* From 194 countries in Year 2021,Oman is Aggressively Spending on Military Expenditure. Neighbouring Countries are spending similar value of its revenue on military expenditure

---
* Togo and Libya are doing very bad with distribution losses.Countries of same continent such as South Sudan and Mauritius are doing well bringing their losses very closely to 5%.

---
* Japan has Max population in city wise while India goes in 2nd , China on 3rd and Brazil following next.

---
* Luxembourg has height trade percent followed by Singapore and China

---
* The Median Credit Information Index value is 6 which remained same for last 4 years from 2019

---
* Monaco stands out to be the highest gdp per capita around the world

## Future Work Ideas

* Work and Experiment on different categories of data and try to establish relationship between them.
* Try to Implement maps and graphs using Javascript  D3, three.js, and MapboxGL.
* Do an Analysis on alcohol, tobacco and unemployment.

## References

Kaggle Data Set : https://www.kaggle.com/datasets/yallabalaji/wdi-data

World Bank : https://databank.worldbank.org/source/world-development-indicators

Data Science and Machine Learning Bootcamp by Jovian: https://jovian.ai/learn/zero-to-data-analyst-bootcamp

QuillBot : https://quillbot.com/

Stack Overflow : https://stackoverflow.com/

Google : https://www.google.com/

Medium : https://medium.com/

Pandas Guide: https://pandas.pydata.org/docs/user_guide/index.html

Plotly user guide : https://plotly.com/python/getting-started/