### Objective
---
Is there any relationship between the GDP (in terms of purchasing power parity) of a country and the percentage of its Internet users? And is this trend similar for low-income/middle-income/high-income countries?

**Links to reffer**
1. https://towardsdatascience.com/data-analytics-with-python-by-web-scraping-illustration-with-cia-world-factbook-abbdaa687a84

2. https://nbviewer.jupyter.org/github/tirthajyoti/Web-Database-Analytics/blob/master/CIA-Factbook-Analytics2.ipynb

3. https://github.com/tirthajyoti/Web-Database-Analytics

In [1]:
import requests
from bs4 import BeautifulSoup
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re

In [2]:
url="https://www.cia.gov/library/publications/the-world-factbook/"
res=requests.get(url)
res.status_code

200

In [3]:
soup = BeautifulSoup(res.text, "html.parser")
print(soup.prettify())

<!DOCTYPE html>
<!-- THIS TEMPLATE IS USED TO GENERATE THE AGENCY VERSION OF THE WFB SITE -->
<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7" lang="en">
<![endif]-->
<!--[if IE 7]>
<html class="no-js lt-ie9 lt-ie8" lang="en"> <![endif]-->
<!--[if IE 8]>
<html class="no-js lt-ie9" lang="en"> <![endif]-->
<!--[if gt IE 8]>
<!-->
<html class="no-js" lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
 <!--<![endif]-->
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <meta charset="utf-8"/>
  <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
  <title>
   The World Factbook - Central Intelligence Agency
  </title>
  <meta content="" name="description"/>
  <meta content="width=device-width" name="viewport"/>
  <meta content="FEB 1, 2018" name="LastModified"/>
  <link href="stylesheets/smallscreen.css" rel="stylesheet" type="text/css"/>
  <!--[if lt IE 9]>
  <link href="stylesheets/fullscreen.css" rel="stylesheet" type="text/c

---
### Extract the country names and codes from the parsed HTML doc

Here is how in BeautifulSoup we use the **"find_all"** method to find all the country names and codes embedded in the HTML.<br />
**Basically, the idea is to find the HTML tags named ‘option’. The "text" in that tag is the country name and the "char 5 and 6" of the tag value represent the 2-character country code.**

In [4]:
country_codes=[]
country_names=[]
for tag in soup.find_all('option'):
    country_codes.append(tag.get('value')[5:7])
    country_names.append(tag.text)

temp=country_codes.pop(0) # To remove the first entry 'World'
temp=country_names.pop(0) # To remove the first entry 'World'

In [5]:
 print('COUNTRY NAMES\n'+'-'*30) #This line will print "country name" and "30 ---"
for country in country_names[1:]:
    print(country,end=',')
print('\n\nCOUNTRY CODES\n'+'-'*30)
for country in country_codes[1:]:
    print(country,end=',')

COUNTRY NAMES
------------------------------

            Afghanistan
          ,
            Akrotiri
          ,
            Albania
          ,
            Algeria
          ,
            American Samoa
          ,
            Andorra
          ,
            Angola
          ,
            Anguilla
          ,
            Antarctica
          ,
            Antigua and Barbuda
          ,
            Arctic Ocean
          ,
            Argentina
          ,
            Armenia
          ,
            Aruba
          ,
            Ashmore and Cartier Islands
          ,
            Atlantic Ocean
          ,
            Australia
          ,
            Austria
          ,
            Azerbaijan
          ,
            Bahamas, The
          ,
            Bahrain
          ,
            Baker Island
          ,
            Bangladesh
          ,
            Barbados
          ,
            Belarus
          ,
            Belgium
          ,
            Belize
          ,
            B

### Extract the demographics

In [29]:
# Base URL
"""
In the below mentioned link at each country code we get different information.
for example: https://www.cia.gov/library/publications/resources/the-world-factbook/geos/af.html
here, af = AFGHANISTAN

We get the whole information about the country on this page like

Introduction :: AFGHANISTANPanel

Geography :: AFGHANISTANPanel 

People and Society :: AFGHANISTANPanel

Government :: AFGHANISTANPanel 

Economy :: AFGHANISTANPanel 

Energy :: AFGHANISTANPanel 

Communications :: AFGHANISTANPanel 

Military and Security :: AFGHANISTANPanel 

Transportation :: AFGHANISTANPanel 

Terrorism :: AFGHANISTANPanel 

Transnational Issues :: AFGHANISTAN

"""

urlbase = 'https://www.cia.gov/library/publications/the-world-factbook/geos/'
demographics1=[]
demographics2=[]
demographics3=[]
demographics4=[]
demographics5=[]

offset = len('65 years and over: ')

# Iterate over every country
for i in range(1,len(country_names)-1):
    country_html=country_codes[i]+'.html'
    url_to_get=urlbase+country_html
    # Read the HTML from the URL and pass on to BeautifulSoup
    html = requests.get(url_to_get).text
    soup = BeautifulSoup(html, 'html.parser')
    
    txt=soup.get_text()
    #result=soup.find(id="field-age-structure")
    pos1=soup.find_all('category_data subfield numeric',string= '0-14 years: ')
    #print(result)
    print(pos1)
    
    #pos2=soup.find('15-24 years: ').get_text()
    #pos3=soup.find('25-54 years: ').get_text()
    #pos4=soup.find('55-64 years: ').get_text()
    #pos5=soup.find('65 years and over: ').get_text()
    
    
    if pos1==-1:
        print(f"**0-14 years % data not found for {country_names[i]}!**")
        demographics1.append(np.nan)
    else:
        text=txt[pos1+12:pos1+18]
        end=re.search('%',text).start()
        a=float((txt[pos1+12:pos1+12+end]))
        demographics1.append(a)
        print(f"0-14 years % data extraction complete for {country_names[i]}!")
        
    

[]


TypeError: can only concatenate list (not "int") to list