## <font color='blue'>Chalenge Scraping</font>

****** This notebook use the version Python 3.7 ******

*** 
2.1 - Write a small script in your programming language of choice that will extract and print the
<b> * operating income, * revenue, * net income, * total assets, and * total equity </b> of Tesla to the
terminal. Please provide the final amounts in danish kroner (DKK), assuming the
conversion rate is <b>US 100.00</b> = <b>647.20 DKK</b>. The information on Wikipedia is considered
up to date.
***

In [2]:
import requests
from urllib.error import HTTPError
from bs4 import BeautifulSoup
import re
from IPython.display import display, Image

In [3]:
# DEFINE THE TARGET DESIRED 
target = "https://en.wikipedia.org/wiki/Tesla,_Inc."

In [4]:
# RESPONSIBLE TO LOAD THE HTML PAGE

def loadPage(url):
    """
    Summary or Description of the Function
    Parameters:
    url (string): Url where are available the page desired

    Returns:
    object: BeautifulSoup
   """
    try:
        req = requests.get(url)
    except requests.exceptions.RequestException as e:
        print(e)
        return None
    return BeautifulSoup(req.text, 'html.parser')

In [5]:
data_tesla = loadPage(target)
if data_tesla is not None:
    # TAG WHERE ALL DATA ARE AVAILABLE
    details = data_tesla.findAll("table",{"class","infobox vcard"})

In [6]:
# ITEMS DESIRED
desired = ["operating income", "revenue", "net income", "total assets", "total equity"]

In [7]:
# CONVERT US$21.461 > 138.896 DKK

def convertDolarToDanish(line):
    """
    Summary or Description of the Function
    Parameters:
    line (string): Line where currency need to be converted and replaced

    Returns:
    string: String with currency converted dolar to danish kroner
   """
    # Regex used to take currency string i.e: US$21.461
    regex = re.compile(r"[a-z]+[\$][\− \-]?[0-9]+.?[0-9]+", re.IGNORECASE)
    value = regex.findall(line, re.IGNORECASE)
    
    # Cleanning data: Diffenrences between - signal were identified
    value = str(value[0]).replace("−","-")
    
    # Taking only the currency value and then converting
    value_dolar = re.compile("[a-z]+[\$]", re.IGNORECASE).split(value)
    value_kroner = "{:.3f} {}".format((float(value_dolar[1]) * 6.472),  "DKK")
     
    return regex.sub(value_kroner, line)

In [8]:
for sibling in data_tesla.find('table', {"class","infobox vcard"}).tr.next_siblings:
    
    # Verify if item is desired (check a list for it)
    if str(sibling.get_text()).lower().startswith(tuple(desired)):
        print("ORIGINAL: {}".format(str(sibling.get_text())))
        print("CONVERTED: {}".format(convertDolarToDanish(str(sibling.get_text()))))        

ORIGINAL: Revenue US$21.461 billion (2018)
CONVERTED: Revenue 138.896 DKK billion (2018)
ORIGINAL: Operating income US$-0.388 billion (2018)
CONVERTED: Operating income -2.511 DKK billion (2018)
ORIGINAL: Net income US$−0.976 billion (2018)
CONVERTED: Net income -6.317 DKK billion (2018)
ORIGINAL: Total assets US$29.740 billion (2018)
CONVERTED: Total assets 192.477 DKK billion (2018)
ORIGINAL: Total equity US$4.923 billion (2018)
CONVERTED: Total equity 31.862 DKK billion (2018)


*** 
2.2 - Please provide a brief description on how we could adapt the program, such that given a
keyword, the program would extract any relevant information from the table with class
“infobox vcard" of a Wikipedia page if available. For example given the keyword “CEO”
and the url to the Tesla Wikipedia page, this program would return “Elon Musk”. You
may provide (pseudo) code, but this is optional. Please try to be as unambiguous as
possible.
***

In [40]:
#building...

for sibling in data_tesla.find('table', {"class","infobox vcard"}).tr.next_siblings:
    lista = []

    for tags in sibling.find_all('div', {"class","plainlist"}):
        print(tags.name, tags.ul.get_text())

    

div NASDAQ: TSLA
div Automotive
div Martin Eberhard
div Robyn Denholm (Chairman)
div Electric vehicles
div SolarCity
