# Exercise 2: Who is Holding the Most Cash?
Let's suppose that we think Artificial Intelligence (AI) technology is going to radically transform the American economy. We also believe that investing in such technology will require very large capital outlays from firms and that the future lending environment is going to make it prohibitive to successfully fund upcoming projects with new capital raises (i.e., debt or equity issuances). As of the end of 2022, who do we believe will be in the best position to take advance of this opportunity?

To answer this question, we can use XBRL's API to gather the universe of public filers’ cash holdings at their fiscal year ending in 2022. We will identify the 10 firms with the most cash on hand. I'll walk through the procedure and then ask you to use a similar procedure to identify which 15 firms lost the most money (i.e., had the lowest net income) in 2022.

## Step 1: Import Required Stata Modules

In [None]:
import os, re, sys, json, requests, getpass, urllib
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from IPython.display import display, HTML
from datetime import datetime
from urllib.parse import urlencode

## Step 2: Obtain XBRL API Access

In [None]:
password = getpass.getpass(prompt = 'Enter Your XBRL US Password: ')

body_auth = {'username' : 'vac35@psu.edu', 
            'client_id': 'Obtain Client ID from XBRL Website', 
            'client_secret' : 'Obtain Client Secret from XBRL Website', 
            'password' : ''.join(password), 
            'grant_type' : 'password', 
            'platform' : 'ipynb' }

payload = urlencode(body_auth)
url = 'https://api.xbrl.us/oauth2/token'
headers = {"Content-Type": "application/x-www-form-urlencoded"}

res = requests.request("POST", url, data = payload, headers = headers)
auth_json = res.json()

if 'error' in auth_json:
    print ("Access Denied")
else:
    print ("Access Granted.")
    
access_token = auth_json['access_token']
refresh_token = auth_json['refresh_token']
newaccess = ''
newrefresh = ''

## Step 3: Query the XBRL API

##### Substep 3.1: Identify Relevant XBRL Elements
In this step, you need to identify the tags associated with your request. For this example, we are interested in cash holdings. An easy way to approach this would be to go to an actual XBRL filing from a relevant company to see the tag associated with cash. 

Here is a link for Apple's 2022 Interactive 10-K filing - https://www.sec.gov/cgi-bin/viewer?action=view&cik=320193&accession_number=0000320193-22-000108&xbrl_type=v#. 

On this webpage, you can click on 'Financial Statements', 'Consolidated Balance Sheets', and 'Cash and cash equivalents'. This will open up a box defining 'Cash and cash equivalents', providing references, and more details. Click the '+' icon in front of 'Details' to obtain the tag. As you can see the 'Cash and cash equivalents' value is tagged as 'us-gaap_CashAndCashEquivalentsAtCarryingValue'. The prefix 'us-gaap' defines the relevant taxonomy and we can safely ignore it. 

*Please note that tags can and will change over time.*

In [None]:
XBRL_Elements = ['CashAndCashEquivalentsAtCarryingValue']

##### Substep 3.2: Identify Other Relevant Parameters
In this step, you need to supply the years that you are requesting the data for and the specific filings from which the relevant data was filed on. For our example, we are only interested in annual data for 2022. 

*Please note that, as compared to Exercise 1, we are not entering any parameters for companies. This is because we are interested in all companies for this query.*

In [None]:
Filings = ['10-K']
Years = ['2022'] 

##### Substep 3.3: Run the Query

The query will store the data into a dataframe titled *df*.

Please note that the line *'entity.cik': ','.join(Companies),* has been removed from the parameters list.

In [None]:
Fields = ['entity.cik',
          'entity.name.sort(ASC)',
          'report.filing-date',
          'period.fiscal-year',
          'report.document-type',
          'concept.local-name',
          'fact.value',
          'unit']

Parameters = {'concept.local-name': ','.join(XBRL_Elements),
              'period.fiscal-period': 'Y',
              'period.fiscal-year': ','.join(Years),
              'unit': 'USD',
              'report.document-type': ','.join(Filings)}  

has_dimensions = 'FALSE'

if has_dimensions == 'ALL':
    dimension_options = ['TRUE', 'FALSE']
else:
    dimension_options = [has_dimensions]

search_endpoint = 'https://api.xbrl.us/api/v1/fact/search'
    
all_res_list = []
for dimensions_param in dimension_options:

    print('Getting the data for: "fact.has-dimensions" = {}'.format(dimensions_param))
    
    done_retrieving_all_results = False
    offset = 0

    while not done_retrieving_all_results:

        Parameters['fact.has-dimensions'] = dimensions_param
        Parameters['fields'] = ','.join(Fields) + ',fact.offset({})'.format(offset) 

        res = requests.get(search_endpoint, params = Parameters, headers={'Authorization' : 'Bearer {}'.format(access_token)})
        
        res_json = res.json()
        res_list = res_json['data']
        all_res_list += res_list
        
        paging_dict = res_json['paging']

        print('Number of Observations Obtained: ', paging_dict['count'])

        if paging_dict['count'] >= 2000:
            offset += paging_dict['count']
        else:
            done_retrieving_all_results = True
    
df = pd.DataFrame(all_res_list)
print('Number of Observations: {}'.format(len(df)))

## Step 4: Clean the Data

##### Substep 4.1: Keep Relevant Variables

In this step, we create a new dataframe titled *cash* and then rename our variables into more manageable names.

In [None]:
cash = df[['entity.cik', 'entity.name', 'report.filing-date', 'fact.value']]
cash.columns = ['cik', 'company', 'filing', 'cash']

##### Substep 4.2: Remove Duplicate Observations

We want to keep one observation per CIK - fiscal year. First, we sort the observations by CIK and filing date. There are two options that are available here - restated data or as-filed data. If we want to see how the market responded to the filing, we keep the first observation using 'keep = 'first''. If we want to see the most accurate figure, use the last option using 'keep = 'last''. This will provide restated data if the data was restated. Please note that this may cause the results to change over time.

In [None]:
cash = cash.sort_values(by = ['cik', 'filing'])
cash = cash.drop_duplicates(subset = ['cik'], keep = 'last')

##### Substep 4.3: Further Clean Data

First, keep only relevant variables (i.e., *cik*, *company*, and *cash*). Next, ensure that *cash* is displayed as a numeric value so that it can be sorted. Finally, scale cash by a million. 

In [None]:
cash = cash[['cik', 'company', 'cash']]
cash.columns = ['CIK', 'Company', 'Cash']
cash['Cash'] = pd.to_numeric(cash['Cash'], errors='coerce')
cash['Cash'] = cash['Cash'] / 1000000

##### Substep 4.3: Keep Top 10 Cash Holding Firms

First, sort observations by cash in descending order (i.e., ascending = False). Next, keep the top 10 observations. 

In [None]:
cash = cash.sort_values(by = 'Cash', ascending = False)
cash['Cash'] = cash['Cash'].apply(lambda x: f'${x:,.2f}')
cash = cash.head(10)
cash 

## Which 15 Firms Had The Lowest Net Income in 2022?

## Token Refresher

In [None]:
token = token if newrefresh != '' else refresh_token 

refresh_auth = {'client_id': 'a04fc50b-a62c-4e96-8578-6e71b3c9bc52', 
                'client_secret' : 'dc6805e2-f03b-4f68-808d-89cfffcfc469', 
                'grant_type' : 'refresh_token',
                'platform' : 'ipynb', 
                'refresh_token' : ''.join(token) }
refreshres = requests.post(url, data=refresh_auth)
refresh_json = refreshres.json()
access_token = refresh_json['access_token']
refresh_token = refresh_json['refresh_token']#print('access token: ' + access_token + 'refresh token: ' + refresh_token)
print('Token Refreshed')