You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Splitting out the processing as a separate task from the telemetry collection covered in #3056.
Processing
Due to some of the records containing redundant data, structured queries aren't suitable to generate the retention chart directly. Instead we'll run a daily scheduled web job to convert the records into a form that's easier to query.
As an example, using a rolling 4 day period (where _ marks data outside the 4-day period), the following table shows the "real" user (not included in actual data) and corresponding logged activity. It also shows which records would be ignored due to being redundant with data submitted later.
Day
User
Activity Record
Redundant
1
A
[1, _, _, _]
X
1
C
[1, _, _, _]
X
2
A
[1, 1, _, _]
2
C
[1, 1, _, _]
X
2
D
[1, 0, _, _]
3
B
[1, 0, 0, _]
X
4
A
[1, 0, 1, 1]
4
B
[1, 1, 0, 0]
Or alternatively to show how the data aligns across days:
Note that marking a record as redundant only means it matches the same usage pattern - it doesn't actually have to originate from the same user. Since record 4A ends in 1, 1, it needs to cancel out a record from day 2 starting with 1, 1 and a record from day 1 starting with 1. In this example the cancelled record from day 2 actually came from user C, but that's okay as balancing the numbers so each day's activity only gets counted once is what matters.
So to determine how many unique users were active for at least two days in this time period, we simply count how many non-redundant records have at least two 1s within the four day range. That's 2A, 4A, and 4B for a total of three unique users. The actual users who met this criteria were A, B, and C, but we don't need to know that - only how many of them there were.
The text was updated successfully, but these errors were encountered:
Splitting out the processing as a separate task from the telemetry collection covered in #3056.
Processing
Due to some of the records containing redundant data, structured queries aren't suitable to generate the retention chart directly. Instead we'll run a daily scheduled web job to convert the records into a form that's easier to query.
As an example, using a rolling 4 day period (where
_
marks data outside the 4-day period), the following table shows the "real" user (not included in actual data) and corresponding logged activity. It also shows which records would be ignored due to being redundant with data submitted later.[1, _, _, _]
[1, _, _, _]
[1, 1, _, _]
[1, 1, _, _]
[1, 0, _, _]
[1, 0, 0, _]
[1, 0, 1, 1]
[1, 1, 0, 0]
Or alternatively to show how the data aligns across days:
Note that marking a record as redundant only means it matches the same usage pattern - it doesn't actually have to originate from the same user. Since record 4A ends in
1, 1
, it needs to cancel out a record from day 2 starting with1, 1
and a record from day 1 starting with1
. In this example the cancelled record from day 2 actually came from user C, but that's okay as balancing the numbers so each day's activity only gets counted once is what matters.So to determine how many unique users were active for at least two days in this time period, we simply count how many non-redundant records have at least two
1
s within the four day range. That's 2A, 4A, and 4B for a total of three unique users. The actual users who met this criteria were A, B, and C, but we don't need to know that - only how many of them there were.The text was updated successfully, but these errors were encountered: