-
Notifications
You must be signed in to change notification settings - Fork 29
adding a pandas wrapper #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks, this is a great idea! I have some other changes in progress as well, which will probably involve re-targeting the branch among other things, but I hope to get these integrated (and published to pypi) sometime soon. |
Yes!!! Very exciting for me, thank you! |
After you've finished this, I can add option to save the data into a InfluxDB database. :) Actually it is pretty simple task, here is the code I'm using: def dataframe_into_influxdb(args: dict, df: pd.DataFrame, tag_columns=None):
if tag_columns is None:
tag_columns = ['dev-id']
if args.get('influxdb_database') is None or args.get('influxdb_measurement') is None:
logging.debug('Not saving into InfluxDB (no database or measurement name given')
return False
protocol = 'line'
client = DataFrameClient(host=args.get('influxdb_host'), port=args.get('influxdb_port'),
username=args.get('influxdb_username'), password=args.get('influxdb_password'),
database=args.get('influxdb_database'))
logging.info('Create database: {}'.format(args.get('influxdb_database')))
client.create_database(args.get('influxdb_database'))
len1 = len(df)
df = df.dropna(thresh=2) # Drop all lines where all but dev-id columns are NaN
len2 = len(df)
if len1 > len2:
logging.warning('Dropped {} NaN rows'.format(len1 - len2))
try:
client.write_points(df, args.get('influxdb_measurement'),
tag_columns=tag_columns, protocol=protocol, batch_size=5000)
except InfluxDBClientError as err:
df.to_csv('error_data.csv')
logging.error('Erroneous DF written to file error_data.csv')
raise
return True |
I didn't know about that, but will plan on it :) Also looking into adding the personal access token option, since I believe they released that since I last worked on this. |
I like this idea too, but want the package to remain basically agnostic to other technologies. I think some kind of extension model would work here, where users could take advantage of something like this if desired, but without requiring all users to install unnecessary packages. I have some ideas for this, just need to make time to do it - stay tuned :) |
@jjjchens235 I'm going to merge this into the |
Sweet, thanks!! I was thinking about my implementation a little bit, and I think it doesn't give the user enough flexibility in the **_to_df_edited functions (ie. sleep_to_df_edited). If they decide to call that function, certain cols will be edited no matter what. It might be better to give the user access to each conversion function directly, so they have greater control over what cols are updated. The two conversion functions are currently convert_to_hrs(), and convert_to_dt(). I want to also add a convert_hypogram() since the current format is not very human read-able. |
Makes sense - maybe we just tweak it so the user can pass in which cols to convert? This could still have a default for the cols and the conversion function.
For sure, one of the first things I noticed on the api, but I haven't looked into it much. Definitely open to ideas/PR's here. One question I had while updating some tests - which structure were you using to create the data frames? In the docs, they show the response body as either Curious if you had thoughts on that. There is a potential trade off in the resulting size vs usability, so I'm considering having an option to specify which representation to create. |
I will take a look at this next week, I have been incredibly busy, thank you for your detailed input. |
I have finished updating the code for
This is the open loop for me:
From what I can tell, there is no response body with this structure: {"activity": [{...}, ]}. Rather, the response bodies are always {"activity": {...}}, but inside the inner dictionary, the values could be either a list, i.e. {"met_1min": [ 1.2,1.1,0.9]} or they could be a long string, i.e. {"class_5min":"1112211"}. I'm basing off this documentation: https://cloud.ouraring.com/docs/activity |
Check out this page https://cloud.ouraring.com/docs/daily-summaries for an example of the structure. I think either one could be returned, depending on if the request is for one day or multiple days. |
I see, I think ultimately it is a moot point because I don't think expanding the data-frame for multiple days would be a good idea. Take this scenario where the user selects met_1min which returns a list, and cal_total which returns a single integer.
If it was expanded, one possible expansion method would be like this, don't think it's clean solution since it duplicates data from cal_total and summary_date.
|
Sorry for the delay, I agree about the structure though, and users can always expand the rows using pandas if needed. Turns out the only thing needed was to wrap the single day response in a list so pandas creates it correctly (in this case a single row data frame). I do that in #32. Some other changes there as well, hoping it doesn't conflict with your current work - looking forward to that btw :) After that I think it will be a good time to publish the recent work. |
I took a look at #32 and it looks great. The changes I made follows the pattern below, there will need to be some adjustments because of #32, but I wanted to get your thoughts. The default function arguments are the columns that will be converted by default, but this gives the user the ability to specify specific cols to convert. Within the converter class (not shown below), it will check to make sure the passed in cols are valid (they must appear in the dataframe, and they must be a col where it would make sense to convert).
|
The idea looks good. My initial questions are mostly to do with granularity
One possibility would be taking a subclass of |
I think splitting it makes it more clear what is being converted.
I was planning for it not be optional since this would be desired behavior for the majority of use cases. However, if it was optional I think it would be simple to add an argument 'is_convert_hypogram=True' to the sleep_df() argument list. Edit: It would be easier to implement and the user will have to worry about less arguments. On the negative side, it will be harder for the user to understand what is being converted and how it is being converted. |
I see the trade off about having knowledge of which columns map to which conversions embedded in converters.py. However I'm leaning toward keeping it to one argument for the following reasons
To your point, I agree it should be clear what transformation is happening without having to dig into the code. I think this might be better solved using the docstrings though (just something explaining that each column will be converted depending on what kind of value it has, and that they can be selected explicitly). Thoughts? |
Makes sense! Then inside SleepConverter class, convert_to_hypogram() code can be added, this was what i had come up with:
Finally, the tests I had written make sure that if user passes in a col argument that does not exist in the df, or if the col argument is not convert able, that these two cases are handled. If you would like, I can try implementing, but I don't have much time right now, and you're more familiar with the code at this point, so it might make more sense for you to finish everything. Let me know what you want, and thank you so much for letting me contribute. |
Thanks for the code sample - I'm happy to finish integrating that (either way is fine, but sounds like I will probably get to it first). And thank you for the contributions/discussions, as well as helping motivate me to finally update the code, all much appreciated. |
Hey, thanks for your great work!
This pull request is for extending the OuraClient class with a OuraClientDataFrame class that returns each of the summary objects as a pandas DataFrame.