<h1>Scraping fidelity.com</h1>
In this assignment, you will scrape data from fidelity.com. The goal of the exercise is to get the latest sector performance data from the US markets, and to get the total market capitalization for each sector. 

The end result is to write a function: <i>get_us_sector_performance()</i> that will return a list of tuples. Each tuple should correspond to a sector and should contain the following data:
<li>the sector name
<li>the amount the sector has moved 
<li>the market capitalization of the sector
<li>the market weight of the sector
<li>a link to the fidelity page for that sector

<p>
The data should be sorted by decreasing order of market weight. I.e., the sector with the highest weight should be in the first tuple, etc.

<h3>Process</h3>
<li>Get a list of sectors and the links to the sector detail pages from the url (see function)
<li>Loop through the list and call the function <i>get_sector_change_and_market_cap(sector_page_link)</i> for each sector
<li>Accumulate the name, the change, the capitalization, the weight and the link for each sector in output_list (see function)
<li>Sort the list by market weight

<b>Notes:</b>
<li>Note that the market weight is a string with a % sign at the back. You will need to get rid of the % and convert the string into a float before you can sort it
<li>Your starting data is the url listed below. You need to extract all data, including links to the sector pages, from the page at this url
<li>To sort a list of tuples by an arbitrary element, use the example at the bottom of this notebook

In [1]:
def get_us_sector_performance():
    output_list = []
    url = "https://eresearch.fidelity.com/eresearch/goto/markets_sectors/landing.jhtml"
    
    #**** Your code goes here ****
    import requests
    from bs4 import BeautifulSoup
    response = requests.get(url)

    if response.status_code == 200:
        soup = BeautifulSoup(response.text)
        sector_title = soup.find("table",{"class":"sector-list"})
        sector_link = soup.find_all("a", {"class":"heading1"})
        
        for sec in sector_link:
            sector_name = sec.get_text().strip()
            sector_page_link = ("https://eresearch.fidelity.com" + sec.get("href")).strip()
            sector_change = get_sector_change_and_market_cap(sector_page_link)[0]
            sector_market_cap = get_sector_change_and_market_cap(sector_page_link)[1]
            sector_market_weight = get_sector_change_and_market_cap(sector_page_link)[2]
            output = (sector_name, sector_change, sector_market_cap, sector_market_weight, sector_page_link)
            output_list.append(output)
        
        output_list.sort(key = lambda k: k[3], reverse = True)
        
    return output_list

In [2]:
def get_sector_change_and_market_cap(sector_page_link):
    
    #**** Your code goes here ****
    import requests
    from bs4 import BeautifulSoup
    response = requests.get(sector_page_link)
    
    if response.status_code == 200:
        soup = BeautifulSoup(response.text)
        data_lines = soup.find_all("table", class_ = "snapshot-data-tbl")
        
        for line in data_lines:
            data = line.find_all("td")
            sector_change = float(data[0].find("span").get_text().replace("-","").replace("%","").strip())
            sector_market_cap=data[1].find("span").get_text()
            sector_market_weight=float(data[2].find("span").get_text().replace("%","").strip())
            
    return sector_change,sector_market_cap,sector_market_weight

In [3]:
#Test get_sector_change_and_market_cap()
link = "https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=25"
get_sector_change_and_market_cap(link)
#Should return something like (2.87, '$7.03T', 11.49) (close of day 8/10/2022)
#Note that the conversion to int of the change and the weight

(0.86, '$7.24T', 10.64)

In [4]:
#Test get_us_sector_performance()
get_us_sector_performance()


[('Information Technology',
  0.57,
  '$13.02T',
  27.23,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=45'),
 ('Health Care',
  0.49,
  '$7.67T',
  14.38,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=35'),
 ('Financials',
  0.24,
  '$8.33T',
  11.56,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=40'),
 ('Consumer Discretionary',
  0.86,
  '$7.24T',
  10.64,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=25'),
 ('Industrials',
  0.48,
  '$5.49T',
  8.34,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=20'),
 ('Communication Services',
  0.0,
  '$4.51T',
  8.16,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=50'),
 ('C

<h2>Example of the return value</h2>
Note that your result will be different (this is as of close of day 8/10/2022)
<pre>
[('Information Technology',
  2.77,
  '$13.42T',
  27.91,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=45'),
 ('Health Care',
  1.12,
  '$7.63T',
  14.32,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=35'),
 ('Consumer Discretionary',
  2.87,
  '$7.03T',
  11.49,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=25'),
 ('Financials',
  2.32,
  '$7.72T',
  10.63,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=40'),
 ('Communication Services',
  2.77,
  '$4.70T',
  8.41,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=50'),
 ('Industrials',
  2.22,
  '$5.09T',
  7.83,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=20'),
 ('Consumer Staples',
  0.74,
  '$4.05T',
  6.6,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=30'),
 ('Energy',
  0.71,
  '$3.48T',
  4.37,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=10'),
 ('Utilities',
  0.45,
  '$1.74T',
  3.0,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=55'),
 ('Real Estate',
  1.44,
  '$1.62T',
  2.9,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=60'),
 ('Materials',
  2.88,
  '$2.30T',
  2.52,
  'https://eresearch.fidelity.com/eresearch/markets_sectors/sectors/sectors_in_market.jhtml?tab=learn&sector=15')]
 </pre>

<h3>Sorting</h3>
<li>the <span style="color:red">sort</span> function sorts a list "in-place". I.e., the list itself changes so that the contents are in sorted order</li>
<li>the <span style="color:red">sorted</span> function returns a new sorted list</li>
<li>both functions take arguments that determine the key (<span style="color:red">key=</span>) and the order. Ascending is the default order, to flip it use <span style="color:red">reverse=True</span></li>
<li>sort and sorted will only work if the data is sortable. For example, sorted([1,9,2,8,11,'a']) will not work because an integer and a string cannot be ordered. In Python, the order operator is the < (less than) sign</li>

In [5]:
x = [1,9,2,8,11]
x.sort()
print(x) #x is now a sorted list
x.sort(reverse=True)
print(x) #x is not sorted in descending order

y = [1,9,2,8,11]
sorted(y)
print(y) #y is unchanged because sorted(y) returned a new list

z = sorted(y)
print(z) #z contains the sorted contents of y

[1, 2, 8, 9, 11]
[11, 9, 8, 2, 1]
[1, 9, 2, 8, 11]
[1, 2, 8, 9, 11]


<h3>comparing tuples</h3

In [6]:
(1,2,3,5) < (1,2,4,1) 
#Python does an element wise comparison, somewhat like when strings are compared
#When sorting tuples (or lists), that element wise comparison is used to determine ordering

True

In [7]:
x = [('c',17.4,'f'),('e',1.74,'bb'),('d',29.2,'z'),('a',23.2,'b'),('d',29.2,'a')]
x.sort() #Sorts using tuple comparison, going left to right. Note the order of the 'd' tuples
x

[('a', 23.2, 'b'),
 ('c', 17.4, 'f'),
 ('d', 29.2, 'a'),
 ('d', 29.2, 'z'),
 ('e', 1.74, 'bb')]

<li>When a key is specified, sort (and sorted) will only use that key
<li>In the example below, note that the two 'd' tuples are not re-ordered (because 'z' and 'a' are not compared)
<li>sort and sorted are stable sort functions. If two elements are "equal" (per the sorting rule), they will be returned in the same order as they were in the original array

<h1>lambda functions</h1>
<li>lambda functions are anonymous functions, created on the fly, and typically meant to be used once</li>
<li>since they are unnamed, they are not callable but are meant to be used in context</li>
<li>but, since python functions are first order functions, you can give them a name</li>

<li>lambda functions can have only one expression and they return whatever the expression returns</li>
<li>the if .. else .. structure in a lambda function is in the form of an "expression if"</li>
<li>multiple arguments are separated by a comma</li>





In [8]:
#Example
#Three arguments, a,b, c separated by commas
#The function returns a if (condition 1) else it returns b if (condition 2); else it returns c
#Note the expression if structure
#No return statement. Something is always returned!

func = lambda a,b,c: a if a>b and a>c else b if b>a and b>c else c
func(10,3,8)

10

In [9]:
x = [('c',17.4,'f'),('e',1.74,'bb'),('d',29.2,'z'),('a',23.2,'b'),('d',29.2,'a')]
x.sort(key=lambda k: k[1]) 
x

[('e', 1.74, 'bb'),
 ('c', 17.4, 'f'),
 ('a', 23.2, 'b'),
 ('d', 29.2, 'z'),
 ('d', 29.2, 'a')]

In [10]:
#Alternatively, for the sort (and max, min) function
#itemgetter returns the item at the specified index in a collection
#the key= below tells sort to use the element at location 1 when comparing elements in x



from operator import itemgetter
x = [('c',17.4,'f'),('e',1.74,'bb'),('d',29.2,'z'),('a',23.2,'b'),('d',29.2,'a')]
x.sort(key=itemgetter(1)) 
x

[('e', 1.74, 'bb'),
 ('c', 17.4, 'f'),
 ('a', 23.2, 'b'),
 ('d', 29.2, 'z'),
 ('d', 29.2, 'a')]