# Exercise 1: Cumulative Revenue and Cash Flow Growth
As of the writing of this exercise, the five largest public companies in the United States by market capitalization are **Microsoft** (0000789019), **Apple** (0000320193), **Alphabet** (0001652044), **Amazon** (0001018724), and **Nvidia** (0001045810). Let’s suppose that we are interested in investing in these companies and want to examine their cumulative revenue and operating cash flow growth over the past decade. We will use XBRL’s API to create graphs for this purpose. I’ll start by walking through the procedure for the cumulative revenue growth. *Please note that Google was restructured into Alphabet during 2015. To obtain financial data for Alphabet prior to 2015, one must use Google’s CIK of 0001288776.* 

## Step 1: Import Required Stata Modules

In [None]:
import os, re, sys, json, requests, getpass, urllib
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import display, HTML
from datetime import datetime
from urllib.parse import urlencode

## Step 2: Obtain XBRL API Access

In [None]:
password = getpass.getpass(prompt = 'Enter Your XBRL US Password: ')

body_auth = {'username' : 'vac35@psu.edu', 
            'client_id': 'Obtain Client ID from XBRL Website', 
            'client_secret' : 'Obtain Client Secret from XBRL Website', 
            'password' : ''.join(password), 
            'grant_type' : 'password', 
            'platform' : 'ipynb' }

payload = urlencode(body_auth)
url = 'https://api.xbrl.us/oauth2/token'
headers = {"Content-Type": "application/x-www-form-urlencoded"}

res = requests.request("POST", url, data = payload, headers = headers)
auth_json = res.json()

if 'error' in auth_json:
    print ("Access Denied")
else:
    print ("Access Granted.")
    
access_token = auth_json['access_token']
refresh_token = auth_json['refresh_token']
newaccess = ''
newrefresh = ''

## Step 3: Query the XBRL API for Revenue Data

##### Substep 3.1: Identify Relevant XBRL Elements
In this step, you need to identify the tags associated with your request. For this example, we are interested in identifying yearly revenue. An easy way to approach this would be to go to an actual XBRL filing from a relevant company to see the tag associated with total revenues. 

Here is a link for Apple's 2020 Interactive 10-K filing - https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-20-000096&xbrl_type=v. 

On this webpage, you can click on 'Financial Statements', 'Consolidated Statements of Operations', and 'Net Sales'. This will open up a box defining 'Net Sales', providing references, and more details. Click the '+' icon in front of 'Details' to obtain the tag. As you can see the 'Net Sales' value is tagged as 'us-gaap_RevenueFromContractWithCustomerExcludingAssessedTax'. The prefix 'us-gaap' defines the relevant taxonomy and we can safely ignore it. 

*Please note that tags can and will change over time. For example, 'Revenues' and 'SalesRevenueNet' were also used to capture total revenues at some point in time.*

In [None]:
XBRL_Elements = ['RevenueFromContractWithCustomerExcludingAssessedTax', 
                 'Revenues', 
                 'SalesRevenueNet'
                 ]

##### Substep 3.2: Identify Relevant Companies
In this step, you need to supply the companies that you want the data from. When using XBRL's API, companies are identified using the SEC's Central Index Key (or CIK).  

In [None]:
Companies = ['0000789019', # Microsoft Corp
             '0000320193', # Apple Inc.
             '0001652044', # Alphabet Inc.
             '0001288776', # Google (Now Alphabet Inc.)
             '0001018724', # Amazon Com Inc
             '0001045810'  # Nvidia Corp
                ]

##### Substep 3.3: Identify Relevant Years
In this step, you need to supply the years that you are requesting the data for.

In [None]:
Years = ['2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022'] 

##### Substep 3.4: Identify Relevant Filings
In this step, you need to supply the specific filings from which the relevant data was filed on. For our example, we are interested in annual data so we use 10-K filings. 

In [None]:
Filings = ['10-K']

##### Substep 3.5: Run the Query

The query will store the data into a dataframe titled *df*.

In [None]:
Fields = ['entity.cik',
          'entity.name.sort(ASC)',
          'report.filing-date',
          'period.fiscal-year',
          'report.document-type',
          'concept.local-name',
          'fact.value',
          'unit']

Parameters = {'concept.local-name': ','.join(XBRL_Elements),
              'period.fiscal-period': 'Y',
              'period.fiscal-year': ','.join(Years),
              'unit': 'USD',
              'entity.cik': ','.join(Companies),
              'report.document-type': ','.join(Filings)}  

has_dimensions = 'FALSE'

if has_dimensions == 'ALL':
    dimension_options = ['TRUE', 'FALSE']
else:
    dimension_options = [has_dimensions]

search_endpoint = 'https://api.xbrl.us/api/v1/fact/search'
    
all_res_list = []
for dimensions_param in dimension_options:

    print('Getting the data for: "fact.has-dimensions" = {}'.format(dimensions_param))
    
    done_retrieving_all_results = False
    offset = 0

    while not done_retrieving_all_results:

        Parameters['fact.has-dimensions'] = dimensions_param
        Parameters['fields'] = ','.join(Fields) + ',fact.offset({})'.format(offset) 

        res = requests.get(search_endpoint, params = Parameters, headers={'Authorization' : 'Bearer {}'.format(access_token)})
        
        res_json = res.json()
        res_list = res_json['data']
        all_res_list += res_list
        
        paging_dict = res_json['paging']

        print('Number of Observations Obtained: ', paging_dict['count'])

        if paging_dict['count'] >= 2000:
            offset += paging_dict['count']
        else:
            done_retrieving_all_results = True
    
df = pd.DataFrame(all_res_list)
print('Number of Observations: {}'.format(len(df)))

## Step 4: Clean the Data

##### Substep 4.1: Keep Relevant Variables

The API queries provide a lot of information. In this step, we only want to keep the relevant variables. To do so, we create a new dataframe titled *rev* and then rename our variables into more manageable names.

In [None]:
rev = df[['entity.cik', 'entity.name', 'report.filing-date', 'period.fiscal-year', 'fact.value']]
rev.columns = ['cik', 'company', 'filing', 'fyear', 'rev']

##### Substep 4.2: Merge Alphabet & Google

As previously mentioned, Google was restructured into Alpabet during 2015. As a result, we need to link Google's financial data to Alphabet's. To do so, we just replace Google's CIK code with Alphabet's CIK Code.

In [None]:
rev.loc[rev['cik'] == '0001288776', 'cik'] = '0001652044'

##### Substep 4.3: Remove Duplicate Observations

Our query provided a lot of duplicate facts. We want to keep one observation per CIK - fiscal year. First, we sort the observations by CIK, fiscal year, and filing date. There are two options that are available here - restated data or as-filed data. If we want to see how the market responded to the filing, we keep the first observation using 'keep = 'first''. If we want to see the most accurate figure, use the last option using 'keep = 'last''. This will provide restated data if the data was restated. 

In [None]:
rev = rev.sort_values(by = ['cik', 'fyear', 'filing'])
rev = rev.drop_duplicates(subset = ['cik', 'fyear'], keep = 'last')

##### Substep 4.4: Create Revenue Growth Variable

In order to create the revenue growth variable, we first need to sort the data by CIK and fiscal year. Next, we obtain the lagged variable which we name *lag_rev*. We then create the revenue growth variable which we name *rev_growth*. Finally, we drop observations with missing values of *rev_growth*. 

In [None]:
rev = rev.sort_values(by = ['cik', 'fyear'])
rev['lag_rev'] = rev.groupby('cik')['rev'].shift(1)
rev['rev_growth'] = (rev['rev'] - rev['lag_rev'] ) / rev['lag_rev']
rev.dropna(subset = ['rev_growth'], inplace = True)

##### Substep 4.5: Create Cumulative Revenue Growth Variable

In order to create the cumulative revenue growth variable, we first need to take the natural log of one plus the revenue growth percentage (as you cannot add up percentages). Next, we sort the data to ensure that they are in the correct order. We then create the cumulative revenue growth variable by using the cumsum() function. Finally, we take the expotential minus one to arrive at the cumulative revenue growth variable. 

In [None]:
rev['cum_rev_growth'] = np.log(1 + rev['rev_growth'])
rev = rev.sort_values(by = ['cik', 'fyear'])
rev['cum_rev_growth'] = rev.groupby('cik')['cum_rev_growth'].cumsum()
rev['cum_rev_growth'] = (np.exp(rev['cum_rev_growth']) - 1)

## Step 5: Graph The Results

There are many different graphing options to choose from when using Python. 

In [None]:
plt.figure(figsize=(10, 10))

ax = sns.lineplot(data = rev, x = 'fyear', y = 'cum_rev_growth', hue = 'cik', marker = 'o')
sns.set(font="Times New Roman")
plt.title('Cumulative Revenue Growth by Year', fontname = 'Times New Roman', fontsize = 28)
plt.xlabel('Year', fontname = 'Times New Roman', fontsize = 16)
plt.ylabel('Cumulative Return Growth', fontname = 'Times New Roman', fontsize = 16)
legend_handles, _ = ax.get_legend_handles_labels()
ax.legend(legend_handles, ['Apple Inc.', 'Microsoft Corporation', 'Amazon.com, Inc.', 'Nvidia Corporation', 'Alphabet Inc.'], fontsize = 12, bbox_to_anchor=(1.04, 0.5), loc="center left",)
plt.ylim(0,8)
ticks_loc = ax.get_yticks()
ax.set_yticks(ax.get_yticks().tolist())
ax.set_yticklabels(['{:,.0%}'.format(x) for x in ticks_loc])
ax.yaxis.set_label_coords(-0.075, 0.5)
ax.xaxis.set_label_coords(0.5, -0.075)

plt.savefig('Cumulative Revenue Growth.png', dpi = 1000, bbox_inches="tight")
plt.show()

## Exercise 1: Repeat the Analysis for Operating Cash Flows

## Token Refresher

In [None]:
token = token if newrefresh != '' else refresh_token 

refresh_auth = {'client_id': 'a04fc50b-a62c-4e96-8578-6e71b3c9bc52', 
                'client_secret' : 'dc6805e2-f03b-4f68-808d-89cfffcfc469', 
                'grant_type' : 'refresh_token',
                'platform' : 'ipynb', 
                'refresh_token' : ''.join(token) }
refreshres = requests.post(url, data=refresh_auth)
refresh_json = refreshres.json()
access_token = refresh_json['access_token']
refresh_token = refresh_json['refresh_token']#print('access token: ' + access_token + 'refresh token: ' + refresh_token)
print('Token Refreshed')