adding a pandas wrapper #28

jjjchens235 · 2020-10-19T21:39:49Z

Hey, thanks for your great work!
This pull request is for extending the OuraClient class with a OuraClientDataFrame class that returns each of the summary objects as a pandas DataFrame.

turing-complet · 2020-10-21T06:44:12Z

Thanks, this is a great idea! I have some other changes in progress as well, which will probably involve re-targeting the branch among other things, but I hope to get these integrated (and published to pypi) sometime soon.

jjjchens235 · 2020-10-21T11:31:56Z

Yes!!! Very exciting for me, thank you!
By chance, will bedtime summary be included in the new changes?

https://cloud.ouraring.com/docs/bedtime

aapris · 2020-10-21T11:38:21Z

After you've finished this, I can add option to save the data into a InfluxDB database. :) Actually it is pretty simple task, here is the code I'm using:

def dataframe_into_influxdb(args: dict, df: pd.DataFrame, tag_columns=None):
    if tag_columns is None:
        tag_columns = ['dev-id']
    if args.get('influxdb_database') is None or args.get('influxdb_measurement') is None:
        logging.debug('Not saving into InfluxDB (no database or measurement name given')
        return False
    protocol = 'line'
    client = DataFrameClient(host=args.get('influxdb_host'), port=args.get('influxdb_port'),
                             username=args.get('influxdb_username'), password=args.get('influxdb_password'),
                             database=args.get('influxdb_database'))
    logging.info('Create database: {}'.format(args.get('influxdb_database')))
    client.create_database(args.get('influxdb_database'))
    len1 = len(df)
    df = df.dropna(thresh=2)  # Drop all lines where all but dev-id columns are NaN
    len2 = len(df)
    if len1 > len2:
        logging.warning('Dropped {} NaN rows'.format(len1 - len2))
    try:
        client.write_points(df, args.get('influxdb_measurement'),
                            tag_columns=tag_columns, protocol=protocol, batch_size=5000)
    except InfluxDBClientError as err:
        df.to_csv('error_data.csv')
        logging.error('Erroneous DF written to file error_data.csv')
        raise

    return True

turing-complet · 2020-10-22T01:18:03Z

will bedtime summary be included in the new changes?

I didn't know about that, but will plan on it :) Also looking into adding the personal access token option, since I believe they released that since I last worked on this.

turing-complet · 2020-10-22T01:38:21Z

After you've finished this, I can add option to save the data into a InfluxDB database. :)

I like this idea too, but want the package to remain basically agnostic to other technologies. I think some kind of extension model would work here, where users could take advantage of something like this if desired, but without requiring all users to install unnecessary packages. I have some ideas for this, just need to make time to do it - stay tuned :)

turing-complet · 2020-10-22T03:41:38Z

@jjjchens235 I'm going to merge this into the dev branch, where I can put other features as well before creating another release. Thanks for this!

jjjchens235 · 2020-10-22T12:10:40Z

Sweet, thanks!!

I was thinking about my implementation a little bit, and I think it doesn't give the user enough flexibility in the **_to_df_edited functions (ie. sleep_to_df_edited).

If they decide to call that function, certain cols will be edited no matter what. It might be better to give the user access to each conversion function directly, so they have greater control over what cols are updated.

The two conversion functions are currently convert_to_hrs(), and convert_to_dt(). I want to also add a convert_hypogram() since the current format is not very human read-able.

turing-complet · 2020-10-26T09:05:01Z

If they decide to call that function, certain cols will be edited no matter what. It might be better to give the user access to each conversion function directly, so they have greater control over what cols are updated.

Makes sense - maybe we just tweak it so the user can pass in which cols to convert? This could still have a default for the cols and the conversion function.

I want to also add a convert_hypogram() since the current format is not very human read-able.

For sure, one of the first things I noticed on the api, but I haven't looked into it much. Definitely open to ideas/PR's here.

One question I had while updating some tests - which structure were you using to create the data frames? In the docs, they show the response body as either {"activity": [{...}, ]} and {"activity": {...}}. Depending on the structure, the data frame, when constructed, could expand to have rows for each list property (e.g. class_5min) or just one row where the list is in a single cell.

Curious if you had thoughts on that. There is a potential trade off in the resulting size vs usability, so I'm considering having an option to specify which representation to create.

jjjchens235 · 2020-10-30T14:58:59Z

I will take a look at this next week, I have been incredibly busy, thank you for your detailed input.

jjjchens235 · 2020-11-07T21:32:40Z

I have finished updating the code for

giving user option to choose which cols to update
making hypnogram more readable

This is the open loop for me:

One question I had while updating some tests - which structure were you using to create the data frames? In the docs, they show the response body as either {"activity": [{...}, ]} and {"activity": {...}}. Depending on the structure, the data frame, when constructed, could expand to have rows for each list property (e.g. class_5min) or just one row where the list is in a single cell.
Curious if you had thoughts on that. There is a potential trade off in the resulting size vs usability, so I'm considering having an option to specify which representation to create.

From what I can tell, there is no response body with this structure: {"activity": [{...}, ]}. Rather, the response bodies are always {"activity": {...}}, but inside the inner dictionary, the values could be either a list, i.e. {"met_1min": [ 1.2,1.1,0.9]} or they could be a long string, i.e. {"class_5min":"1112211"}.

I'm basing off this documentation: https://cloud.ouraring.com/docs/activity

turing-complet · 2020-11-07T22:46:11Z

Check out this page https://cloud.ouraring.com/docs/daily-summaries for an example of the structure. I think either one could be returned, depending on if the request is for one day or multiple days.

jjjchens235 · 2020-11-08T01:44:27Z

I see, I think ultimately it is a moot point because I don't think expanding the data-frame for multiple days would be a good idea. Take this scenario where the user selects met_1min which returns a list, and cal_total which returns a single integer.
The data would look something like this:

summary_date	met_1min	cal_total
1/1/20	[2,3,4,5,6,7]	500

If it was expanded, one possible expansion method would be like this, don't think it's clean solution since it duplicates data from cal_total and summary_date.

summary_date	met_1min	cal_total
1/1/20	2	500
1/1/20	3	500
1/1/20	4	500
1/1/20	5	500
1/1/20	6	500
1/1/20	7	500

turing-complet · 2020-11-12T07:14:00Z

Sorry for the delay, I agree about the structure though, and users can always expand the rows using pandas if needed. Turns out the only thing needed was to wrap the single day response in a list so pandas creates it correctly (in this case a single row data frame). I do that in #32. Some other changes there as well, hoping it doesn't conflict with your current work - looking forward to that btw :) After that I think it will be a good time to publish the recent work.

jjjchens235 · 2020-11-12T18:31:48Z

I took a look at #32 and it looks great.

The changes I made follows the pattern below, there will need to be some adjustments because of #32, but I wanted to get your thoughts. The default function arguments are the columns that will be converted by default, but this gives the user the ability to specify specific cols to convert.

Within the converter class (not shown below), it will check to make sure the passed in cols are valid (they must appear in the dataframe, and they must be a col where it would make sense to convert).

    def sleep_df_edited(self, start=None, end=None, metrics=None, to_dt_metrics=['bedtime_end', 'bedtime_start'], to_hr_metrics=['awake', 'deep', 'duration', 'light', 'onset_latency', 'rem', 'total']):
        """
        Create a dataframe from sleep summary dict object.
        Some cols are unit converted for easier use or readability.

        :param start: Beginning of date range
        :type start: string representation of a date i.e. '2020-10-31'

        :param end: End of date range, or None if you want the current day.
        :type end: string representation of a date i.e. '2020-10-31'

        :param metrics: Metrics to include in the df.
        :type metrics: A list of strings, or a string

        :param to_dt_metrics: metrics to convert from string to datetime
        :type to_dt_metrics: list

        :param to_hr_metrics: metrics to convert from seconds to hours
        :type to_hr_metrics: list
        """
        sleep_df = self.sleep_df_raw(start, end, metrics)
        sleep_df = SleepConverter().convert_to_dt(sleep_df, to_dt_metrics)
        sleep_df = SleepConverter().convert_to_hrs(sleep_df, to_hr_metrics)
        sleep_df = SleepConverter().convert_hypnogram(sleep_df)
        return sleep_df

turing-complet · 2020-11-12T20:37:56Z

The idea looks good. My initial questions are mostly to do with granularity

why have separate arguments for dt and hr metrics? the converter knows which ones are which
similarly, why split the calls instead of just calling convert_metrics
will hypnogram be optional? (supposing there was a single list argument, should it have hypnogram_5min)

One possibility would be taking a subclass of UnitConverter as the argument and leaving it up to the user to define that, then if convert=True just calling my_converter().convert_metrics(df). I'm not necessarily advocating for this, but could be worth exploring.

jjjchens235 · 2020-11-12T21:33:00Z

why have separate arguments for dt and hr metrics? the converter knows which ones are which
similarly, why split the calls instead of just calling convert_metric

I think splitting it makes it more clear what is being converted.
The user will immediately know for sleep_df, there is a datetime conversion, and an hour conversion. For activity_df, they will immediately know that the conversion is just a datetime conversion.
With the current set-up, I think it would be hard for the user to know what columns can be converted in what way unless they dig into converters.py source code which they shoudn't have to do.

will hypnogram be optional? (supposing there was a single list argument, should it have hypnogram_5min)>

I was planning for it not be optional since this would be desired behavior for the majority of use cases. However, if it was optional I think it would be simple to add an argument 'is_convert_hypogram=True' to the sleep_df() argument list.

Edit:
Alternatively, we can keep the current structure, and have each of the child converter classes (sleep, activity) override the convert_metrics() function. For example inside sleepconverter's convert_metrics(), it can convert to datetime, hours, and hypnogram if the cols are specified by the user. And for activityconverter, only datetime will be converted inside its own convert_metrics() function.

It would be easier to implement and the user will have to worry about less arguments. On the negative side, it will be harder for the user to understand what is being converted and how it is being converted.

turing-complet · 2020-11-13T02:27:28Z

I see the trade off about having knowledge of which columns map to which conversions embedded in converters.py. However I'm leaning toward keeping it to one argument for the following reasons

users won't have to determine which bucket a column goes in, just whether to convert it
there is only one supported conversion per column - putting an hour column in the to_dt_metrics argument (for example) will either fail or won't be meaningful, so it's not really an option to begin with

To your point, I agree it should be clear what transformation is happening without having to dig into the code. I think this might be better solved using the docstrings though (just something explaining that each column will be converted depending on what kind of value it has, and that they can be selected explicitly). Thoughts?

jjjchens235 · 2020-11-14T18:28:57Z

Makes sense!
I think next steps should be that convert_metrics() inside child classes (SleepConverter and ActivityConverter) should be overridden, i.e. ActivityConverter does not need to convert to hours but SleepConverter does.

Then inside SleepConverter class, convert_to_hypogram() code can be added, this was what i had come up with:

def convert_hypnogram_helper(self, hypnogram):
    d = {'1': 'D', '2': 'L', '3': 'R', '4': 'A'}
    return ''.join(list(map(lambda h: d[h], hypnogram)))

def convert_hypnogram(self, sleep_df):
    if 'hypnogram_5min' in sleep_df:
        sleep_df['hypnogram_5min'] = sleep_df['hypnogram_5min'].apply(self.convert_hypnogram_helper)
    return sleep_df

Finally, the tests I had written make sure that if user passes in a col argument that does not exist in the df, or if the col argument is not convert able, that these two cases are handled.

If you would like, I can try implementing, but I don't have much time right now, and you're more familiar with the code at this point, so it might make more sense for you to finish everything. Let me know what you want, and thank you so much for letting me contribute.

turing-complet · 2020-11-14T23:27:50Z

Thanks for the code sample - I'm happy to finish integrating that (either way is fine, but sounds like I will probably get to it first). And thank you for the contributions/discussions, as well as helping motivate me to finally update the code, all much appreciated.

adding a pandas wrapper

5d82976

turing-complet changed the base branch from master to dev October 22, 2020 03:41

turing-complet approved these changes Oct 22, 2020

View reviewed changes

turing-complet merged commit 7f0c2ba into turing-complet:dev Oct 22, 2020

adding a pandas wrapper #28

adding a pandas wrapper #28

Uh oh!

Conversation

jjjchens235 commented Oct 19, 2020

Uh oh!

turing-complet commented Oct 21, 2020

Uh oh!

jjjchens235 commented Oct 21, 2020

Uh oh!

aapris commented Oct 21, 2020

Uh oh!

turing-complet commented Oct 22, 2020

Uh oh!

turing-complet commented Oct 22, 2020

Uh oh!

turing-complet commented Oct 22, 2020

Uh oh!

jjjchens235 commented Oct 22, 2020

Uh oh!

turing-complet commented Oct 26, 2020

Uh oh!

jjjchens235 commented Oct 30, 2020

Uh oh!

jjjchens235 commented Nov 7, 2020

Uh oh!

turing-complet commented Nov 7, 2020

Uh oh!

jjjchens235 commented Nov 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

turing-complet commented Nov 12, 2020

Uh oh!

jjjchens235 commented Nov 12, 2020

Uh oh!

turing-complet commented Nov 12, 2020

Uh oh!

jjjchens235 commented Nov 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

turing-complet commented Nov 13, 2020

Uh oh!

jjjchens235 commented Nov 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

turing-complet commented Nov 14, 2020

Uh oh!

Uh oh!

jjjchens235 commented Nov 8, 2020 •

edited

Loading

jjjchens235 commented Nov 12, 2020 •

edited

Loading

jjjchens235 commented Nov 14, 2020 •

edited

Loading