New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Activity stream concept to replace feeds on dashboard #1679
Comments
a few thoughts on the technical side. I think the important thing for speed is that all items are in one table and filtered down from there rather than a bunch of extra joins. I'd see something similar to what we did for the update atom feed where we have 1 or 2 objects / actions / verbs setup but just extending it
Then in their settings page they would have a matrix of tick box grid for each type action down the left, and across the top 'Me | Friends' + 'Fav crags | My country | World' (I'll do a mockup if / when we need to) to specify what's in or out. I've just done a trawl though stats to see what feeds people are actually watching: 624 users have set their feed: Ticks:
Updates: (114)
The question is whether the extra summary data is generated at display time or ahead of time. Mostly a size / performance trade off so would need testing |
Just done a bit of apache log trawling and found something which utterly convinces me this is worth doing - already just TODAY there have been 91 api/climber/updates so far, the vast majority of them from the dashboard. The vast majority of those updates are in quick succession - in other words people are opening up their dashboard, looking at their feed, then toggling to another feed, then toggling back. There are other types of update to the same api endpoint but mostly from other pages so it's hard to split them out. |
I agree this is worthwhile doing. I don't think we need a new table, the ActivityLog has Account, Node, Item, Star Schema Nx and a serialised Data field. At the moment Node is mandatory, so we cannot use it for the likes of 'Simon started following Campbell'. This constraint can be relaxed. The table contains every index related activity, which includes favoriting and ascents. This is good. I think we need to work on two aspects of the business logic.
At the moment the sql query is ordered list by time. However the new feed concept will aggregate activity for a particular person which may want to things out of order. In other words once we have decided to aggregate, say Brendan's ascents, then we need to make sure we go far enough back to get the relevant ascents. It is likely that I would be interested in Brendan's activity from a week ago, because I am following him, over somebody else's activity from 5 minutes ago who I am not following. If I am not following anybody, who do we show activity for? If people who I am following have not been active for 3 months who do we show? (eg friends of friends, same country?) If we occasionally throw in friends of friends then this may encourage more account following and better community networking. Thinking out aloud, once we know who to show in the feed then the rest is pretty much already there and straight forward. IMO choosing who to show is the big innovative step here. This feed needs to also be sympathetic to notable ascents - #246, which I am still keen on. |
If we don't use a new table we'll be doing a lot of group processing either at query time or display time and throwing away a lot of intermediate data. ie I could do 100+ edits in a crag to 50 unique routes, but I'd want to that to show as just one record: 'Brendan updated 50 routes in Arapiles'. It would be possible without the new table but I don't think anywhere near as simple nor as fast as a new higher level table, with most of the processing done at activity time instead of query/display time. |
see also #1232 |
I am now looking at implementing a new table. I want to get agreement on some key dimensions. If somebody logs three ascents on saturday and four on sunday in arapiles then will this be two stream items? I am assuming yes. If somebody logs 2 ascents at arapiles in the morning then 4 ascents in grampians in the afternoon then will this be two stream items? I am assuming yes. If somebody does the ascent on saturday but logs it on monday then will this create a stream from the monday date? I am assuming yes, otherwise there are a lot of complexities. If somebody creates a route and logs an ascent of that route on the same day then will this will create two streams? I think the 'what' needs a lot more discussion of exactly what goes into it (for example maybe rather than 'upload photo' it is 'contributed resource' which includes photos, topos and embeds. Whatever the outcome of the specifics of this discussion I think we need a list of possible 'what' items and an association of activity items to 'what'. Preposed fields for the 'Stream' table
And for the 'Stream Collection'
This means we will probably have to add some more Ativity Items (eg Add a discussion, following, etc). |
Yes to all of above, except:
I see this as creating the event retrospectively on saturday, so the publish date with be monday, but the update date will be monday and so appear in the feed now rather than down the list. I don't see why this would be an issue. If I later on tuesday go and add yet another tick for saturday it would just append it to the same event and not create another event, but would update the updated date so it floats back to the top of the feed. I think to make this clearer rename the fields to 'publish date' -> 'event date' and 'update date' -> 'publish time or 'stream time'. The former is just a date while the latter is a full time stamp.
Maybe rename to 'verbPhrase'? What is the stream collection for? The one complexity I've been grappling with is how this will work when looking at a route or sub area. Lets say I tick 20 routes on saturday, 10 in area A, 10 in area B. These have the same TLC so get appended into the same event. Now if another person looks at area A or B then it should show this item, but if they look at area C then it should not. So how do we make this work? Doing this correctly would be a join back to tick or update table which I wanted to avoid. Need to hash this out some more The verb phrases and the aggregation to form a summary also needs a lot more hashing out before we do anything. I'm gonna be a bit tight for time now, my family is visiting now for the next week or so, so I'd prefer to take our time and hash it out and get it right before we do too much code |
There are a couple of problems I see with using activity date versus publish date (and it should be noted that it is only an issue for logging ascents).
I don't see a problem creating a stream on a publish date, but report on activity date. So if somebody comes back from a weekend in in Arapiles and logs ascents on monday, then all the ascents will be in the same stream, but we could identify the separate dates within the stream. This sort of makes sense anyway. There will still be edge cases, but ultimately it does not matter as long as most of the time the aggregation looks good. The collection is essentially for dashboard. It is all about the social networking of people. I would just continue to use our ascent logs and activity logs as is at the crag level. I sort of came to this conclusion and suddenly everything is a lot easier. If we wanted to aggregate and display streams at a crag level then we could work out which stream an activity is associated with using the 'Stream Collection' table above. I am in the process of creating this table now. Notes to self:
|
I have been thinking about the 'with' concept you have discussed above. I think this is best implemented in the 'Stream Collection' table rather than the stream itself. Some people may climb in the morning with one person, and in the afternoon with somebody else. I think I would rather summarise these together (eg Simon climbed 5 routes yesterday at Arapiles with Brendan and Campbell) Notes to self:
|
I have been playing with the verb phrases and have come up with this list from our activity list :
Note that discussions and following need to be added to the activity item list, because they are not currently being tracked as activities. Also what about the scenario where somebody creates a route then 10 minutes later edits it. In the model above it would be two streams, however I think we should be smart enough to add the update route to the 'create route' stream if it is on the same day. |
The point of this stream idea is to aggregate as much as makes sense. I am thinking that things like tagging, updating descriptions, history and location we can just call 'updated' and things like reparent, resequence deleteing, renaming we can call 'reorganised'. This will result in the following steams phrase grouping for area contributions:
Similar for routes I think the verb 'documented' is the right word for creating an area/route on our site. The verb 'setting' is different from 'documented' because the route setter created the physical route in the gym and documented it. |
Sorry about this discussion brain dump, but issues tease themselves out as I start to put specifics together. If, in one sitting, you create an area then routes under the area, then you start adding history and locations then this should all be under one stream. So I think we need the concept of a parent stream for logging purposes. In other words if I create a route the system checks parent streams before it creates a new stream. Putting this all together I am now getting a configuration that looks like this
|
ok my first thing would be to not try and automatically generate a sentence, the collection of data and the display of that should be quite separate. I see it working with a very small number of verb phrases, probably just [tick, update, follow, favorite, upload, discuss, climber (see very end)] and those are simply a key for grouping, they won't even be directly displayed. This avoids really clunky language like 'documented' as a one-size-fits-none approach. The aggregation would work something like this, in rough pseudo code with dates omitted: I sit down and create 5 new routes in a batch on Monday in Araps, so this creates one new stream event item:
Then I tick 3 of those routes which results in another event
An hour later I then I edit 1 of the same routes I just added, and 6 other routes, and then also added a topo. Because it's the same day and same verb it appends data to the same event.
This is all that probably needs to be done at collection / aggregation time. Then later when it is displayed that chunk just gets passed to a template with each id inflated into it's atom data. The template can then make some nuanced decisions about what language is displayed, ie if there is only newRoutes then say 'Brendan added 5 new routes including X and Y', or if there is both updated routes and new routes then 'Brendan updates 10 routes including 2 news ones: x and Y'. This will give a massively superior sentence structure than trying to just craft one from pieces of data. We can get away with this a bit better in the old feeds because each item was very simple. Also note that if I did 10 edits to the same route this will be exactly equivalent to a single edit as the data is only showing what was touched and not a list of transactions. So over time an 'update' event can change dramatically depending on what else happens to it. Most of the other verbs are quite simple and the data will be just a list. The only other one with more complex behaviour is the 'discuss' event which could start out as:
after template => 'Simon posted in Araps: Lost shoes near bard' but then an hour later evolves into:
after template => 'Simon and 5 others are discussing 'Lost shoes near bard' in Araps The template I'm hoping is pure runtime, the data behind it could be aggressively cached in needed, but leaving it until runtime to render means we can personalize it like: after template => 'You and 5 others are discussing 'Lost shoes near bard' in Araps or 'Brendan started following You' Some of the other simple ones like favorites and follows could potentially go from a single to multiple, but I think are such relatively rare events that they are worthy items in their own right. The only one that should definitely be grouped is 'upload' when someone uploads a batch of 10 pics. Most of the sentences I think should be about a paragraph at most, the only exception is ticking. I think there is heaps of value in showing all ticks in full unless the number of ticks is really massive, maybe more than 15 ticks. 15 routes in a day, even boulder problems, is a big day, if it is more then it is very likely retrospective route ticking. We would still have the 'me too' and 'comment' stuff pretty well exactly as we do now next to the routes which are shown. When it comes to gyms, I think these are all just 'updates' but it is up to the template to turn that into the various updates: 'Chris set 10 new routes' vs 'Chris updated the gym opening hours' or even 'Chris updated the gym opening hours and set 10 new routes' So the table that you listed above I think maps reasonably well to the names of the keys of the data hash in my examples. When ever there is a large amount of data that is condensed or summarized, the template should link off to either which ever facet page where you can see all the gory details. I just thought of another event verb type too: 'climber' which is triggered when they either sign up, change their avatar, update their profile or add a website link etc. One last thought: edits and updates to a particular crag are generally fairly spread apart and clustered, they typically come through in big batches or very small incremental changes. If two people both make edits to the same crag on the same day, it is very likely that they are communicating and probably where at the crag that day. With that in mind, what do you think about the grouping of edits being purely by TLC and not by TLC+climber? So then an update might look like: Brendan and Chris updated 40 routes in Araps ? Even if they didn't talk to each other it still makes sense so I don't think it could hurt, and you can still click and drill down into the details. |
Few more thoughts:
Going with my model above we'd need something like: ActivityEventGroup * primary key
And to store the links between this and each tick / update / photo etc: ActivityEventItems * = foreign key
Now this second table is suspiciously close to the existing ActivityLog table, so I reckon we can just use it, but need to close the gap in particular I don't think it includes ticks at the moment. And we'd need to add the type, subtype, and grouping-date. This in turn makes we think of edge cases like what happens to old events after a reparent which changes the tlc, do we care? This in turn makes me question whether we now actually need the ActivityEventGroup table, if we add the extra grouping columns into the ActivityLog and group by on them then perhaps we don't need it after all. The only extra field at the moment is the updated date, which would just be a max(lastmod). One less join. I know you mentioned this at the start but I couldn't see how it could be done as-is, but with the extra columns I think this would work. The other difference between a big group-by and the table above is that multiple links are represented, ie 10 edits to 1 route would be 10 rows not 1, but this too could also be grouped-by and condensed back to one row. The inital raw sql results are gonna be quite small, and then after go off and fetch the atom details as needed. For the couple of types of updates which don't belong to a node, like following and updating profile, I guess they could just be attached to the world node? or just null? |
Oh also, if we do reuse the ActivityLog table this will impact the existing atom feed, in particular poluting the update feeds with tick data. We can easily work around this and filter it back out, but if this is all working well it would be better to just redo the atom feeds so they display the exact same stuff as well. This may mean that not all of the atom feeds make sense anymore, and it would be good to audit what actually gets used. It may be we can just scrap a few of them and consolidate all into just a single public feed about a node, and a single public feed about a person. also quick look at log to see things that look like real atom clients: grep feed access.log | grep -v Mozill | grep -v java | cut --delim=' ' -f 12-14| cut -c1-30 | sort | uniq -c | sort -nr
|
Ticks are in the activity log already and that filter you mentioned is already in place. The Node is mandatory in ActivityLog, so we would have to relax that to allow for non-index updates, my gut feeling is not to use World node for this. Given that we are still in experimental mode then we should just modify the ActivityLog table a see if we can push this with SQL. Can we call 'type' something like 'streamGroup'. Actually I think we can infer both type and subtype from the existing activity item field - I will have to audit this. I prefer a smaller list of types (streamGroups) so I am happy with your shortlist. If you rollup all the index update types to just 'update' then some issues I came across just disappear. Discussions are not in the activity log, so we would have to put it in. I think we should only put public discussions in this log, private discussions should never go in. I think we need to add grouping-date as a discrete mysql date. Currently activity log uses a continuous time variable, which is no good for grouping performance. Activity changes information. We should make sure we present the original information in the historical logs. The activity items store a pre and post update state where applicable. All tables have a lastmod field. The activity log currently does not have it's lastmod date updated, by design. (I think it is set to null when created, but the record is never updated so this field is never updated). However there are some activities that we would want to bring back to the top of the list after certain user actions (eg comment on an ascent should bring that stream grouping back to the top). |
the only difference between doing this without the extra table and join is that if I am in say Area B looking at the feed, I'll get a pseudo event which only shows the updates inside B, but if I go up a level I'll get a different pseudo event which would include A and B's updates. I'm fine with this either way. The crucial important different between this situation, and what we have now,if we simply grouped it at say the JS or template level, is that two batches of interleaved ticks would get correctly grouped. |
Thinking about how this is displayed in the dashboard. Are there two different types of groupings that we want to present.
|
Both these grouping are the same, because both groupings are by account+tlc+day. The only edge case where these would different is where I tick in multiple crags in a day and I'm happy for that to be shown twice. One edge case with the 'updates' is how would it work if we move 10 routes from TLC1 to TLC2, would we want this to create 2 pseudo stream events? Also I've been trying to figure out how we can do this without querying ActivityLog, grouping, ordering, and then joining back to ActivityLog again to get the details again |
I think we can do the initial raw query without re-join back onto ActivityLog by using the GROUP_CONCAT function to return the list of affected node ids. Needs some testing, anyway I got to get to work |
I have been starting to play with a query to make sure I understand the issues here. GROUP_CONCAT works exactly like we want it to (it also allows DISTINCT). I think that we talked about this not being a standard facet search query. I concur with that after a little bit of investigation. We cannot use a limit based on number of activity records, but rather have to get whole days at a time (we may use a limit for the aggregated group by result). Note that a partial day may lead to inconsistent data (in other words the query could report 'Campbell updated 5 routes in Arapiles' when he actually updated 10. This means we have to use a date as the way of limiting the query. But how do we pick a date. The next part of the discussion is focusing on the dashboard functionality of this list, not the crag summary, as I think the crag summary is fairly simple. I think we always want to show a populated activity feed in a users dashboard. This means that we cannot just base this on a user friends. I think we also want to automate this as much as possible. Thinking about 4 use cases:
I think we should be able to automate a feed for all these use cases, without the user having to configure anything. Furthermore we should be able to balance these so that if somebody has only one friend with one update yesterday, we should also show them info from lower level feeds. I am thinking of defining a UNION of queries. Selecting as many queries as we need to make the query full of information. I am following a lot of accounts. I am more interested in the fact that Campbell did 5 ascents last week than some random person yesterday. But we don't know how many days to go back to get interesting information from friends. What about having a controller, which selects the query to use for each persons dashboard? Do we want the user to be able to page back into the history of this feed? |
Yup, lesson learnt from the atom feeds and facets. I think we just pick sensible defaults based on rough heuristics. Most people climb (and hence log in) around once a week or fortnight, so a person's feed should be say 2 weeks or 1 month. Where we expect large amounts of data, say at a region or country level, and where we have a prebaked stat around ticks last month which is known to be high, we could scale this down to less. Note this is only at the original raw query but we'd also limit this between the raw query and inflating the atom data. If it goes over a limit then we'd pass a 'nextDateFrom' to the template which would render a link where people can load more.
There are two things the balance, 1) is showing people what they want by mind reading, 2) encouraging them to configure it explicitly, via following people, and faving crags, and where needs extra settings it so that we can do 1) I think where people don't have friends, or don't have favorites, we focus on leading them through that process and not just showing them something meaningless. ie if the have no fav's then prompt them for their top 3 favorite crags on the spot inline. if they have no friends, and they do have fav's, then ask them if they know some of the people who also climb at those crags. (but this stuff can be round 3 features)
There is a risk of automatically adding more stuff is that is will potentially hide what they actually want. Eg I Campbell who ticked a week ago, but then we also pull in extra data which is more recent, so the only thing I actually want to see is now lost down the stream. We could go down the route of no longer sorting by date, but by relevance but this is a slippery slope, not only is it hard, but I don't really like it personally when facebook suppresses stuff.
Yes, I thought we'd already talked about this. My comment second comment on this thread:
So a mockup is needed.... I an argue the case for these being checkboxes, but I think it's simpler to represent it as a scale from less on the left to more on the right, maybe something even like a volume control with an angled triangle. Lie this but way slimmer and simpler
Yes. But can be round 2, but the basic mechanics of going backwards should be exactly the same as displaying the last week, except in stead of asking for (now..now- 1 week) you'd ask for a different range. The processing should be identical. My thoughts generally around all this is to KISS and then build it up only as needed. ie start purely as a union of exactly what they follow and fav, and not worry about default feeds yet. One thing that really shits me about facebook is that the stream has become progressively less and less deterministic over time, there is more stuff my friends post that I never see, and more shit I don't want to see ending up in the feed. I don't want to go down that route. Another edge cases to thrash out: I think forums will be a special case because out of all of the events these are the only ones that really get updated and so get moved up the feed across a day boundary. Other events like ticks and updates would shuffle up if they get more ticks or updates, but only up until a day boundary, but a discussion could keep on bubbling up every time it gets a new post, so I think the query for this will be someone more different than the other. We are never grouping on replies (like we would with ticks / updates), only every querying the discussion row itself so I think this is easy. Also when we do the query and union, one thing I feel is quite important is telling the user why it is in their feed, and giving them the option to remove it (even if initially that is simply a link to their feed settings). By this I only mean stuff like 'This is in your feed because you follow X' -> 'Unfollow X', or 'This is in your feed because you commented on this discussion' -> 'Leave this discussion' and not the more vague facebook style of 'Hide this post' which creates a complex signal to FB to show less of that kind of thing generally. An event could be in your feed for multiple reasons so when we union them we'd tack on an extra static string so we know which query it came from to display. |
Also I think not every event verb type should be represented in the settings, less settings the better:
So this is just 'ticks | updates | forums' |
In respect for your KISS, let's do the checkboxes for version 1 and move on to your slider idea later. Firstly it is the simplest to implement, and secondly I think the checkboxes explain what is going on better. I am on board with your suggestions. I think the system should select good defaults (ie worldwide for somebody just signed up with a low activity country). After somebody follows someone for the first time the system ups the default to the follow checkbox. I think we are pretty close to end-to-end agreement with something that should be implementable without too much effort (maybe a day or two or three of four). |
Just did the wife test: yes let's go with checkboxes :) Sent from my iPhone On 18/10/2014, at 8:04 PM, Simon Dale notifications@github.com wrote: In respect for your KISS, let's do the checkboxes for version 1 and move on I am on board with your suggestions. I think the system should select good After somebody follows someone for the first time the system ups the I think we are pretty close to end-to-end agreement with something that — |
I want to start doing some some more design refinements but want to document them so we don't step on each other.
|
Go for it. I have stopped work on this for now and hoping you would do your final magic before we broaden the audience for further comment. |
|
and more followup on the comments on ticks stuff, I find this a bit weird: to me the event is always the tick, and people are commenting on it. The reason I see it is because a friend of mine commented on it, but what it is hasn't changed. This all comes back to the concept of comments making an event float back up to the top, and also around whether comments are on the action of the event. |
We need to leave the action versus event discussion issue until we have an event table. The underlying mechanisms will be the same. The bigger issue is whether or not you can achieve what you want by just manipulating the template display. My issue with discussions floating events back up to the top is that the date is an underlying part of the timeline basis of the display. If the event occurs on Sunday and the discussion on Monday then what date is the event displayed on in the time line. If it is no longer displayed on Sunday then the time line loses transparency, which is something I am against. It is displayed on both days, in which case it is just a template display issue and our current model is fine. There are also potentially confusing timezone issues with discussions. Alternatively we can abandon the concept of a timeline altogether and just have most recently updated, like facebook. |
Facebook still does have a full timeline, but having the two dates makes it much simpler. ie if you look at a persons page you can go back to any event and it is ordered by time created. It is only in your facebook dashboard feed which is ordered by last updated. I've only just now realised this distinction but it makes a lot more sense rather than forcing one or the other order in both places. This probably makes sense for us too, ie on a users profile we see their updates in strict created time order, ditto for the stream of updates to an area. It is only in your dashboard that events would be sorted by most recently updated. It's all kinda expected bumps as we've got a lot of concept to hash out and it's not till you see them in action do they really sink in. So are we committed to adding the event table? Would I better off just hanging back and letting you sort that before I do too much more work on the front end for this? The question of discussions on ticks vs events isn't fully dependant on this but there is some overlap. I can get only get half of what I want just by tweaking the existing template, the rest requires the underlying data model change. |
I think we are committed to the event table as there are a lot of blocking issues based on it's implementation. However I want to take what we have got to the users as beta before implementing the event table. This is for two reasons: 1) it will be a while before I get some time to put into it again as the next major issue which is overdue is getting our app sorted and 2) I want to implement with lot's of feedback so we get it right first go if possible. I don't want to put it into the dashboard until we have the event table implemented, but I do want it exposed to the users next release. If we have two different sort date options then I think we can work with this. Is there any timezone implications with this. Last updated should probably work on GMT time, while created time should probably be on crag time zone time. |
Isn't it pretty well out the door as a beta already? A few people have seen it, the only difference between dev and prod is cosmetic tweaks from the last week. Agree with getting the table sorted before we swap it out into the dashboard, but I'm also hesitant to show it to a large number of users > 20. Getting a dozen power users feedback will uncover pretty well everything, but putting it out to general users I think will cause more confusion than it's worth. Also we need to make it clear to the test users that this will replace the dashboard and that all the side bar stuff is irrelevant for testing purposes. I'm also a bit wary of starting on the app while still having this half done. We've got a whole lot of stuff that is pretty big in progress, dashboard, markdown changes, profile pages?. Would be better to not have so many balls in the air at once, and the app is going to be a long hall. |
OK, maybe we should invite a couple more people to have a look after we release the cosmetic changes next release. It will be interesting to see if there is a difference in feedback after some cosmentic changes. |
Re-imagining the feeds completely, to show all types of activity, ticking, editings, fav's etc all in one place. Sussinctly, customizably, and pretty darn neat (we think!)
A very high level idea to rework the whole feeds concept in the dashboard. The general principles are:
So some examples are:
The way I would see this working internally is that each type of action would have a date field and a natural grouping key, ie ticks would be grouped by person, date, and TLC. Each time a new tick is added then an action summary item is made, any more new ticks with the same. Potentially as more items are added the summary text it updated, and potentially it's sort date updated. So like facebook old items can get moved back up the activity feed, like a discussion that gets more comments, or a photo that has comments on it, or if Chris later that day added another 10 extra routes.
See also #1676 for including 'my' ticks
The text was updated successfully, but these errors were encountered: