##### Datenverarbeitung:
- Wir benötigen die Daten im exakten Format wie FinRL verwendet
- Wir können entweder stündliche Sentiment Daten Aggregieren und Yahoo Finance (Daily) verwenden
- ...oder wir können Stunden Daten von FMP verwenden (nur sehr kurzer Zeitraum)

In beiden Fällen haben wir leider zu wenig Daten :(
Sentiment Daten reichen nicht weit genug in die Vergangenheit und FMP bietet keine stündlichen Stock Daten die weit genug in die Vergangenheit reichen...

##### To Dos
- Logic to manipulate strings in URL (Iterate over symbols and pages)
- Append to df based on FinRL logic (timesteps as index)
- Reformat to obtain desired DataFrame 

In [33]:
import pandas as pd

try:
    # For Python 3.0 and later
    from urllib.request import urlopen
except ImportError:
    # Fall back to Python 2's urllib2
    from urllib2 import urlopen

import certifi
import json

def get_jsonparsed_data(url):
    """
    Receive the content of ``url``, parse it as JSON and return the object.

    Parameters
    ----------
    url : str

    Returns
    -------
    dict
    """
    response = urlopen(url, cafile=certifi.where())
    data = response.read().decode("utf-8")
    return json.loads(data)


In [87]:
df = pd.DataFrame()

ticker = ['AAPL'] # wird erweitert analog zu Datageneration
for tick in ticker:
    for page in range(155): # there are up to 155 pages for the most researched stocks 
        url = f'https://financialmodelingprep.com/api/v4/historical/social-sentiment?symbol={tick}&page={page}&apikey=e708ff02e4e41eb0af3b96ca26e2bca2' 
        #print(url)
        url = (url)
        data = get_jsonparsed_data(url)
        data = pd.DataFrame(data)
        df = df._append(data)


  response = urlopen(url, cafile=certifi.where())


In [88]:
df

Unnamed: 0,date,symbol,stocktwitsPosts,twitterPosts,stocktwitsComments,twitterComments,stocktwitsLikes,twitterLikes,stocktwitsImpressions,twitterImpressions,stocktwitsSentiment,twitterSentiment
0,2023-08-27 17:00:00,AAPL,10,0,1,0,4,0,31962,0,0.0000,0
1,2023-08-27 16:00:00,AAPL,10,0,2,0,3,0,175810,0,0.5520,0
2,2023-08-27 15:00:00,AAPL,2,0,0,0,0,0,25683,0,0.5143,0
3,2023-08-27 14:00:00,AAPL,4,0,1,0,1,0,36413,0,0.5113,0
4,2023-08-27 13:00:00,AAPL,5,0,1,0,2,0,36469,0,0.4933,0
...,...,...,...,...,...,...,...,...,...,...,...,...
95,2023-08-23 18:00:00,AAPL,31,0,6,0,29,0,89241,0,0.5335,0
96,2023-08-23 17:00:00,AAPL,52,0,17,0,50,0,110603,0,0.5547,0
97,2023-08-23 16:00:00,AAPL,53,0,21,0,56,0,106485,0,0.5292,0
98,2023-08-23 15:00:00,AAPL,87,0,26,0,78,0,130491,0,0.5485,0


In [90]:
df['date'] = df['date'].astype('datetime64[ns]')

Um es nach Tagen zu aggregieren:

In [98]:
agg = df.groupby([df.date.dt.date,'symbol']).mean().reset_index(drop=True)

In [100]:
agg["date"] = agg.date.apply(lambda x: x.strftime("%Y-%m-%d"))

In [101]:
agg

Unnamed: 0,date,stocktwitsPosts,twitterPosts,stocktwitsComments,twitterComments,stocktwitsLikes,twitterLikes,stocktwitsImpressions,twitterImpressions,stocktwitsSentiment,twitterSentiment
0,2023-08-23,50.7,0.0,24.0,0.0,47.6,0.0,137905.8,0.0,0.5438,0.0
1,2023-08-24,25.458333,0.0,8.375,0.0,19.208333,0.0,121427.5,0.0,0.531663,0.0
2,2023-08-25,18.666667,0.0,9.416667,0.0,20.791667,0.0,151625.791667,0.0,0.537096,0.0
3,2023-08-26,5.375,0.0,4.25,0.0,8.333333,0.0,46406.333333,0.0,0.539217,0.0
4,2023-08-27,4.166667,0.0,0.611111,0.0,2.888889,0.0,50694.722222,0.0,0.537417,0.0


Stündliche Daten aus FMP umwandeln:

In [85]:
df = pd.DataFrame()
ticker = ['AAPL']
for tick in ticker:
        url = f'https://financialmodelingprep.com/api/v3/historical-chart/1hour/{tick}?apikey=e708ff02e4e41eb0af3b96ca26e2bca2' 
        url = (url)
        data = get_jsonparsed_data(url)
        data = pd.DataFrame(data)
        data["tic"] = tick

        df = df._append(data)
        #df = df.reset_index()
df = df[[
                "date",
                "open",
                "high",
                "low",
                "close",
                "volume",
                "tic",
            ]]
df['date'] = df['date'].astype('datetime64[ns]')
df["hour"] = df["date"].dt.hour
df["date"] = df.date.apply(lambda x: x.strftime("%Y-%m-%d %H:%M:%S"))
df = df.dropna()
df = df.reset_index(drop=True)
df = df.sort_values(by=["date", "tic"]).reset_index(drop=True)
df

  response = urlopen(url, cafile=certifi.where())


Unnamed: 0,date,open,high,low,close,volume,tic,day
0,2023-05-30 10:00:00,177.720,178.0982,176.9200,177.4400,7623827,AAPL,1
1,2023-05-30 11:00:00,177.435,177.7900,176.7504,177.3215,5106817,AAPL,1
2,2023-05-30 12:00:00,177.330,177.9750,177.0900,177.8300,5286563,AAPL,1
3,2023-05-30 13:00:00,177.850,177.8600,176.8200,176.9700,4190956,AAPL,1
4,2023-05-30 14:00:00,176.965,177.8500,176.5900,177.4700,4602883,AAPL,1
...,...,...,...,...,...,...,...,...
427,2023-08-25 12:00:00,176.660,177.8400,176.6400,177.4900,3520731,AAPL,4
428,2023-08-25 13:00:00,177.490,177.6500,176.9400,177.4200,2708989,AAPL,4
429,2023-08-25 14:00:00,177.415,178.7391,177.3401,178.6100,4531273,AAPL,4
430,2023-08-25 15:00:00,178.610,179.1500,177.2156,178.6000,9031274,AAPL,4
