### Discounting Amazon Cash Flows with ML

In this notebook I will try to predict Amazon's cash flow using machine learning. Edgar only provides data since 2009 meaning a single company will have at most 56 quarterly reports. Since this is not enough data to train a model, I will also train using data from 26 major retail companies. Amazon has operated at <10% net margin over the last 3 years so I've selected retail companies that also were <10% net margin in the last reported period with the exception of Ebay.

Most of the companies in the list are "big box" retailers that rely on high volume at low net margins with the best examples being Walmart(WMT) and Coscto(COST). The data for these companies should be a good approximation for Amazon since most of Amazon's revenue is from online retail at the lowest possible prices.

In [2]:
RETAIL_TICKERS = ['amzn', 'wmt', 'cost', 'cvs', 'tgt', 'kr', 'dltr', 'dg', 'cpng', 'bby', 'aci', 'bj',
                  'musa', 'gps', 'w', 'ebay', 'aeo', 'm', 'anf', 'kss', 'go', 'psmt', 'cwh', 'woof']

In [3]:
import edgar
import pandas as pd

pd.set_option('display.float_format', lambda x: '%.2f' % x)

In [28]:
retail_df = pd.DataFrame()
for tkr in RETAIL_TICKERS:
    df = edgar.get_company_facts(tkr).dropna(subset=['Revenue', 'COGS', 'Income'], axis=1)

    for y in list(filter(lambda x: 'Q' not in x, df.columns)):
        for q in range(1, 5):
            qtr_str = f"{y}Q{q}"
            if qtr_str in df.columns and q < 4:
                df[y] -= df[qtr_str]
            elif q == 4:
                df[qtr_str] = df[y]
                df.drop(y, axis=1, inplace=True)
    retail_df = pd.concat([retail_df, df], axis=1)

retail_df.loc['Gross Profit'] = retail_df.loc['Revenue']-retail_df.loc['COGS']

scale = {k: 1000000 for k in retail_df.index}
scale['EPS'] = 1
retail_df = retail_df.div(scale, axis=0)
retail_df.drop(['SG&A', 'Other Income Loss', 'Operating Expenses'], inplace=True)

In [27]:
pd.options.plotting.backend = "plotly"
from sklearn.model_selection import train_test_split
from tensorflow import keras
from keras import layers

train, test = train_test_split(retail_df.transpose(), test_size=0.25)

train_Y = train['Income']
train_X = train.drop('Income', axis=1)
test_Y = test['Income']
test_X = test.drop('Income', axis=1)

model = keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=[len(train_X.columns)]),
    layers.Dense(128, activation='relu'),    
    layers.Dense(64, activation='relu'),
    layers.Dense(1),
])
model.compile(
    optimizer='adam',
    loss='mae'
)
history = model.fit(
    train_X, train_Y,
    batch_size=128,
    epochs=200
)
pd.DataFrame(history.history)

          Revenue   COGS  Gross Profit  Operating Expenses  Taxes  Shares  \
CY2013Q1    False  False         False                True  False   False   
CY2009Q3    False  False         False                True  False   False   
CY2022Q3    False  False         False                True  False   False   
CY2023Q3    False  False         False                True  False   False   
CY2020Q2    False  False         False                True  False   False   
...           ...    ...           ...                 ...    ...     ...   
CY2007Q4    False  False         False                True  False    True   
CY2013Q4    False  False         False                True  False   False   
CY2022Q2    False  False         False                True  False   False   
CY2016Q3    False  False         False               False  False   False   
CY2020Q3    False  False         False                True  False   False   

            EPS  Operating Income  Pretax Income  
CY2013Q1  False         

Unnamed: 0,loss
0,
1,
2,
3,
4,
...,...
195,
196,
197,
198,
