# Rolling Logs with Streaming Data
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/advanced/Streaming_Data_with_Log_Rotation.ipynb)

Now that you've become family with the ["Getting Started"](https://github.com/whylabs/whylogs/blob/9618e5dd6570bc484579ec1325f2f512ff56977f/python/examples/basic/Getting_Started.ipynb) and the basic examples, let's see what else whylogs can be used for! So far, you've seen it ingest rows and dataframes during the logging process, but now let's look at ways to handle large amounts of changing data such as streaming with ... rolling logs! (sometimes also called log rotation)

Instead of needing to plan out how you log in intervals with batching we handle all of that for you. The Logger will create your session, log information at the requested intervals of seconds, minutes, hours, or days and at that interval write out your profile to a .bin file and flush the log getting ready to receive more data.

#### Why would you want this?
Well, logging data throughout a given time period allows a higher grain of precision to your statistical profiles, and having these logs written regularly not only ensures their safety but also allows more options for merging profiles when it comes time for analysis. We'll go into that in depth in the ["Merging Profiles"](https://github.com/whylabs/whylogs/blob/9618e5dd6570bc484579ec1325f2f512ff56977f/python/examples/basic/Merging_Profiles.ipynb) notebook, but you can also see a simple example of it at the end of this notebook.

We recommend that you have multiple intervals per timeline of your analysis. For example, if you want to look at the changes daily taking it at least hourly will help get a good profile estimation. Doing it too frequently where a profile may only have a couple lines is not preferred so play around with the balance that is right for your needs.

## Simple Example using Bitcoin Ticker
To start off, let's see how logging works; this will be an extremely basic example to show the syntax. We'll get data from BlockChain's ticker as this Jupyter notebook runs. To make you not wait for too long I'll have it run while constantly gathering data and rolling over the file every 20 seconds. This will give enough data for an example for the notebook without making you wait too long.

The data picked is just a pull of the json API from the given website being used over time. This allows for easy streaming into a Jupyter that is quick and consistently changing, but in reality this is where you'd want to hook up your predictive models, larger data, CSV, etc.

#### Imports
First let's make sure we have everything installed and ready for input. We will be using the file structure to record the .bin files, and "psutil" to get the CPU information.

In [1]:
import pandas as pd
!pip install psutil
!pip install whylogs

import time
import whylogs as why

import os
from os import listdir
from os.path import isfile

tmp_path = os.path.join(os.getcwd(), "example_output")

if not os.path.isdir(tmp_path):
    os.makedirs(tmp_path)

You should consider upgrading via the '/Users/melanie/Dev/whylogs-v1/python/.venv/bin/python -m pip install --upgrade pip' command.[0m[33m
You should consider upgrading via the '/Users/melanie/Dev/whylogs-v1/python/.venv/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m

FileExistsError: [Errno 17] File exists: '/Users/melanie/Dev/whylogs/python/examples/advanced/example_output'

Here is a super simple function to see the amount of files that are here before and after the logging.

In [2]:
def count_files(tmp_path):
    only_files = [f for f in listdir(tmp_path) if isfile(os.path.join(tmp_path, f))]
    return len(only_files)

print(count_files(tmp_path))

10


Now it's on to the actual logging! We will first create the logger, mark it as "rolling", and set the interval in terms of Seconds, Minutes, Hours, or Days. Lastly we want to make sure we give it the base file name, and create a writer. For this example we will be using the local writer to put files on the local system. We will have this running in this case continuously, but feel free to play with the time.sleep to see how different intervals and number of entries work.

In [3]:
url = "https://blockchain.info/ticker"

with why.logger(mode="rolling", interval=30, when="S", base_name="cpu_streaming_data") as logger:
    logger.append_writer("local", base_dir=tmp_path)
    log_number=0

    #  You may prefer to do a "while true" or until a certain time.
    #  log_number < 1000 is limited for this example
    while log_number < 1000:
        log_number += 1
        bitcoin_ticker_df = pd.read_json(url)
        logger.log(bitcoin_ticker_df)
        if log_number % 100 == 0:
            print("Log number:" + str(log_number) +
              "     Logged Files: " + str(count_files(tmp_path)))
        time.sleep(0)

Log number:100     Logged Files: 1
Log number:200     Logged Files: 1
Log number:300     Logged Files: 2
Log number:400     Logged Files: 3
Log number:500     Logged Files: 4
Log number:600     Logged Files: 5
Log number:700     Logged Files: 6
Log number:800     Logged Files: 7
Log number:900     Logged Files: 8
Log number:1000     Logged Files: 9


KeyboardInterrupt: 

## Merging Profiles from .bin
Ok, so we have saved .bin!! Huzzah! .... and what do we do with them?

Let us read them in from our local file system and merge them in a couple of ways. Please check out the ["Merging Profile"](https://github.com/whylabs/whylogs/blob/9618e5dd6570bc484579ec1325f2f512ff56977f/python/examples/basic/Merging_Profiles.ipynb) notebook for an indepth.

In [5]:
merged_profiles_view = None
for f in listdir(tmp_path):
    path = os.path.join(tmp_path, f)
    if isfile(path) and f[0] != ".":
        reading_result = why.read(path)
        result_view =  reading_result.view()

        if merged_profiles_view:
            merged_profiles_view.merge(result_view)
        else:
            merged_profiles_view = result_view

merged_profiles_view.to_pandas()

Unnamed: 0_level_0,types/integral,types/fractional,types/boolean,types/string,types/object,frequent_items/frequent_strings,counts/n,counts/null,cardinality/est,cardinality/upper_1,cardinality/lower_1,type
column,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
ARS,0,348,0,87,0,"[FrequentItem(value='6182689.120000', est=348,...",435,0,2.0,2.0001,2.0,SummaryType.COLUMN
AUD,0,348,0,87,0,"[FrequentItem(value='41940.220000', est=348, u...",435,0,2.0,2.0001,2.0,SummaryType.COLUMN
BRL,0,348,0,87,0,"[FrequentItem(value='146974.050000', est=348, ...",435,0,2.0,2.0001,2.0,SummaryType.COLUMN
CAD,0,348,0,87,0,"[FrequentItem(value='38361.840000', est=348, u...",435,0,2.0,2.0001,2.0,SummaryType.COLUMN
CHF,0,348,0,87,0,"[FrequentItem(value='29233.740000', est=348, u...",435,0,2.0,2.0001,2.0,SummaryType.COLUMN
CLP,0,348,0,87,0,"[FrequentItem(value='24975449.530000', est=348...",435,0,2.0,2.0001,2.0,SummaryType.COLUMN
CNY,0,348,0,87,0,"[FrequentItem(value='191407.130000', est=348, ...",435,0,2.0,2.0001,2.0,SummaryType.COLUMN
CZK,0,348,0,87,0,"[FrequentItem(value='702678.710000', est=348, ...",435,0,2.0,2.0001,2.0,SummaryType.COLUMN
DKK,0,348,0,87,0,"[FrequentItem(value='212351.240000', est=348, ...",435,0,2.0,2.0001,2.0,SummaryType.COLUMN
EUR,0,348,0,87,0,"[FrequentItem(value='28397.200000', est=348, u...",435,0,2.0,2.0001,2.0,SummaryType.COLUMN


# What's next?
- Check out a deeper dive into rolling logs with a flask app here. # In progress
- Get to know ["Merging Profiles"](https://github.com/whylabs/whylogs/blob/9618e5dd6570bc484579ec1325f2f512ff56977f/python/examples/basic/Merging_Profiles.ipynb) and how to use them.
- See how all this can be visualized in ["Notebook Profile Visualizer"](https://github.com/whylabs/whylogs/blob/9618e5dd6570bc484579ec1325f2f512ff56977f/python/examples/basic/Notebook_Profile_Visualizer.ipynb)