Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BACK-2218][BACK-2187] Add hourlystats and multiple summary periods #74

Closed
wants to merge 32 commits into from

Conversation

Roukoswarf
Copy link
Contributor

No description provided.

@Roukoswarf Roukoswarf changed the title [BACK-2218] Add hourlystats and multiple summary periods [BACK-2218][BACK-2187] Add hourlystats and multiple summary periods Jul 28, 2022
@Roukoswarf Roukoswarf requested a review from tjotala July 28, 2022 15:13
@clintonium-119
Copy link
Member

/query qa2

@tidebot
Copy link
Collaborator

tidebot commented Aug 10, 2022

image: tidepool/clinic:hourlystats-660a198a6d4518ff3af4c93cc5a47ff6fb6a334f

clinics/creator/creator.go Outdated Show resolved Hide resolved
api/mappers.go Outdated Show resolved Hide resolved
api/mappers.go Outdated Show resolved Hide resolved
@@ -105,10 +105,154 @@ func (r *repository) Initialize(ctx context.Context) error {
SetBackground(true).
SetName("PatientSummaryLastUploadDate"),
},

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having more than 15 indexes per collection is usually considered a bad practice and likely means that the model is not optimal. It's probably best to store all periods in an array, instead of an object/map so you can than have a multikey index on summary.periods.period.timeCGMUsePercent (where periods is an array, and period is an attribute in an object) instead of having to create an index per period.

Copy link
Contributor

@toddkazakov toddkazakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left inline comments. My main concern is that currently, we need to have a separate index per period, which is most likely an anti-pattern. I left a suggestion how this can be refactored so we can use a mutlikey indexes.

Copy link
Contributor

@toddkazakov toddkazakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left inline comments

LastUploadDateFrom *time.Time
LastUploadDateTo *time.Time

TimeCGMUsePercentCmp1d *string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should refactor this and combine the comparator and value in a single struct.

func MaybeApplyNumericFilter(selector bson.M, field string, cmp *string, value float64) {
if f, ok := cmpToMongoFilter(cmp); ok {
selector[field] = bson.M{f: value}
func MaybeApplyNumericFilter(selector bson.M, period string, field string, cmp *string, value float64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was used to apply generic numeric filters to a query, but now it only works for summary periods - the name of the function should reflect that.

{
Keys: bson.D{
{Key: "clinicId", Value: 1},
{Key: "summary.periods.14d.percentTimeInHigh", Value: 1},

{Key: "summary.periods.14d.timeCGMUsePercent", Value: 1},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure our use of those indices (with the $ne trick) achieves much, as the all levels of the b-tree index which use the $ne operator must be interated over, instead of doing a binary search. Have you ran any benchmarks against a non-indexed filter/sort? Is it possible that the performance is actually worse as we'd have to iterate over all levels, even though the fields are not really used? Also, I guess the order here matters a lot - most frequently used fields should come first.

patients/repo.go Outdated Show resolved Hide resolved
func MaybeApplyNumericFilter(selector bson.M, period string, field string, cmp *string, value float64) {
if operator, ok := cmpToMongoFilter(cmp); ok {
// ugly, but needed to ensure index prefix
for _, filterable := range filterablePeriodFields {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't add $ne for every filterable summary field, as it could result in multiple iterations over the index which are not required. You should add it only to those fields that come before field you're filtering by in the compound index order.

If you have a compound index on a.b.c, and you want to filter the results by b, you should add $ne: -1 only to a, but not c.

if operator, ok := cmpToMongoFilter(cmp); ok {
// ugly, but needed to ensure index prefix
for _, filterable := range filterablePeriodFields {
if _, exists := selector["summary.periods."+period+"."+filterable]; !exists {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably better to use fmt.Sprintf instead of concatenation

{Key: "summary.periods.1d.hasTimeCGMUsePercent", Value: -1},

{Key: "summary.periods.1d.glucoseManagementIndicator", Value: 1},
{Key: "summary.periods.1d.hasGlucoseManagementIndicator", Value: -1},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should boolean fields be part of this index? The fields are not filterable, and I'm not sure the index can be utilized for sorting given that the index is compound.

description: Rotating list containing the stats for each currently tracked hour in order
items:
$ref: '#/components/schemas/PatientSummaryStat'
PatientSummaryStat:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, if the API is not backward compatible, the clinic client must be updated in all services that use it (e.g. clinic worker, shoreline, hydrophone). You should probably delete all summaries, update clinic client in all services that use it, and only then push this PR to production.

@Roukoswarf
Copy link
Contributor Author

Superseded by #83

@Roukoswarf Roukoswarf closed this Apr 14, 2023
@Roukoswarf Roukoswarf deleted the hourlystats branch December 4, 2023 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants