Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correlate sum with count of distinct users #32

Open
hungerburg opened this issue Jan 22, 2021 · 3 comments
Open

Correlate sum with count of distinct users #32

hungerburg opened this issue Jan 22, 2021 · 3 comments

Comments

@hungerburg
Copy link

This is a wishlist item - perhaps you can see value in it: (I cannot assess, if it is in scope of taghistory at all…)

When considering taghistory a popularity contest, that shows the count of votes (i.e. objects carrying a specific tag), where every voter has unlimited votes, it would be nice to also have a count of voters (the people that applied a tag to an object).

For little or medium used tags, one could immediately see, if it was applied (voted on, in contest speak) by a single, a few or by many. For tags with many occurences, the distribution will be quite flat, but it might still be reasonable to have the number, when comparing tags that are close in meaning, e.g.

Hope it is clear :)

@matkoniecz
Copy link

matkoniecz commented Feb 1, 2021

So it is request to show total number of people who ever added this tag, right?

@hungerburg
Copy link
Author

Yes, focus is on adding. Result would be a time series of the size of a tags active user base.

If I understand correctly, history has to be crawled from the start with every run, so to know the previous state, and that I think is what taghistory does. No idea though, how much that would increase execution time or memory requirements.

PS: Meanwhile I learned, that the ohsome API has a /users endpoint, where I can retrieve number of users interacting with something, having or having had a certain tag. I have yet to find out what kind of interaction that applies to: creation, modification, deletion, some, or all… Documentation there is sparse.

@hungerburg
Copy link
Author

Of course this is not perfect: active must not be understood as active in a short timeframe, but as an accumulated having been actively setting the tag some time in the past value. Splits will also overstate user base, especially when just motivated by relation building. But it will be better than the number currently shown on taginfo. Then, when deletions are not handled, the total will differ from the number of items having a certain tag. Could possibly be handled by software that treats the user not as a boolean but as a counter, which could provide further base for even more statistical tools that do not work on time series but on distribution of activity within a window of time, eg. would allow to not only calculate the mean, but also the median.

I would have liked something like that last year when researching how to recognize hiking trails, that are not bound to a relation, as there are several tags that commonly are only found on those. The only method to get a sense of the user base that added them, was to look at random items returned from an overpass query and examining each one separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants