Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

old.reddit.com sending analytics data over normal URLs #10768

Closed
6 tasks done
pfgithub opened this issue Dec 5, 2021 · 13 comments
Closed
6 tasks done

old.reddit.com sending analytics data over normal URLs #10768

pfgithub opened this issue Dec 5, 2021 · 13 comments
Labels

Comments

@pfgithub
Copy link

pfgithub commented Dec 5, 2021

Prerequisites

I tried to reproduce the issue when...

  • uBO is the only extension
  • uBO with default lists/settings
  • using a new, unmodified browser profile

URL(s) where the issue occurs

https://old.reddit.com

Describe the issue

old.reddit.com is sending analytics data over normal-looking urls

I've seen it using many different urls, for example

  • /api/login.json
  • /api/submit
  • /api/comment
  • /api/register
  • /api/vote.json
  • /api/friend.json

These URLs can't be blocked directly because they're also required for normal functioning of the site. It does use those URLs with no query parameters and the request body is a weird json object with a bunch of numbers

It seems to pick a random one on each page load. At the top of the page, there's a script which has an "events_collector_v2_url" that says what url it will send analytics data to.

Screenshot(s)

image

image

image

image

image

image

uBlock Origin version

1.39.0

Browser name and version

Firefox: 95

Settings

Should be default

Notes

I'm not sure if this should be reported here or to easylist.

This seems to have existed since back when reddit was open source, since there are references to "events_collector_url" in open source code: https://github.com/reddit-archive/reddit/blob/master/r2/r2/lib/template_helpers.py

I also found a 2018 blog post about it: https://chefkochblog.wordpress.com/2018/04/02/facebook-is-on-the-sinking-ship-oh-dont-worry-reddit-will-replace-it/

@uBlock-user
Copy link
Contributor

Analytics is already blocked with these two filters in uBO privacy list --

reddit.com##+js(no-xhr-if, method:POST url:/^https:\/\/www\.reddit\.com$/)
reddit.com##+js(no-fetch-if, url:/^https:\/\/www\.reddit\.com$/ method:post)

Those are not the endpoints of data collection urls, so no analytics data is being sent to those urls.

@pfgithub
Copy link
Author

pfgithub commented Dec 6, 2021

This looks quite a bit like analytics data to me, and it's sent by code handling posting analytics data. It does not get blocked with the latest filterlists or if I manually add those rules.

new.reddit.com analytics seems to get blocked fine, it's just old.reddit.com analytics that is getting through

Example request: (made at page load)

POST | https://www.reddit.com/api/comment.json

{
  "1": {
    "lst": [
      "rec",
      1,
      {
        "1": { "str": "cookie_monitor" },
        "2": { "str": "observe" },
        "3": { "str": "cookie" },
        "5": { "i64": 16…2 },
        "6": { "str": "885…624" },
        "106": {
          "rec": {
            "1": { "i32": 1493 },
            "2": { "i32": 933 },
            "3": { "i32": 909 },
            "4": { "i32": 272 },
            "7": { "tf": 0 }
          }
        },
        "107": { "rec": { "2": { "str": "web" } } },
        "108": {
          "rec": {
            "8": { "str": "84060…30ee1" },
            "9": { "i64": 1626…4 }
          }
        },
        "109": {
          "rec": {
            "1": {
              "str": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:100.0) Gecko/20100101 Firefox/100.0"
            },
            "2": { "str": "www.reddit.com" },
            "3": { "str": "https://www.reddit.com/" }
          }
        },
        "112": {
          "rec": {
            "1": { "str": "t2_o…9" },
            "2": { "i64": 1436…0 },
            "3": { "tf": 1 },
            "13": { "tf": 0 },
            "17": { "tf": 0 }
          }
        },
        "113": { "rec": { "1": { "tf": 1 }, "2": { "str": "en-us" } } },
        "115": {
          "rec": {
            "1": { "str": "rc…bi" },
            "6": { "i64": 163…1 }
          }
        },
        "265": {
          "rec": { "1": { "lst": ["rec", 2, "datadome", "G_ENABLED_IDPS"] } }
        },
        "500": { "rec": { "1": { "str": "CA" } } }
      }
    ]
  }
}

@uBlock-user
Copy link
Contributor

uBlock-user commented Dec 6, 2021

You can try using json-prune scriptlet and let us know if it works.


All I see here is cookies and user-agent being collected, which doesn't fall under analytics as you claim. Every website you visit knows this about you as the browser itself sends this data.

@pfgithub
Copy link
Author

pfgithub commented Dec 6, 2021

If all this data is sent elsewhere as part of required functioning of the site, nevermind then. This just felt like a really sketchy way to send event info to the server.

image

image

image


I'm not sure how to use a scriptlet, but I'm guessing something blocking things from json.parse in response data won't have any effect here because this is in request data that gets sent to the server


This is pretty easy to reproduce:

  • go to old.reddit.com
  • sign in
  • open the web inspector network tab and filter by Fetch/XHR
  • refresh the page
  • one of the requests will be to a random api endpoint. the form body will contain analytics data. Some other events include when clicking the report button on a post and probably all the buttons while going through the new account onboarding screen

@uBlock-user
Copy link
Contributor

uBlock-user commented Dec 6, 2021

If the data doesn't matter

Where did I say that ? I said "All I see here is cookies and user-agent being collected, which doesn't fall under analytics as you claim.", Cookies and user-agent alone is not analytics.

@uBlock-user
Copy link
Contributor

Did you try using json-prune scriptlet which I asked you in #10768 (comment) ?

@pfgithub
Copy link
Author

pfgithub commented Dec 6, 2021

Where did I say that ?

Sorry, I updated the wording in my reply to clarify.

Did you try using json-prune scriptlet

No. I said that in my reply, and also said why I don't expect it to be related although I haven't used it so I'm not sure.

@pfgithub pfgithub closed this as completed Dec 6, 2021
@uBlock-user uBlock-user reopened this Dec 6, 2021
@uBlock-user
Copy link
Contributor

No need to close the issue, from the last snapshot you posted it looks like the wrapper code for analytics script, I don't see any endpoints URLs being mentioned there other than /api/* urls you mentioned in the first post.

@gorhill what do you think ?

@krystian3w
Copy link
Contributor

I would recommend assuming that the submitter does not know how to write scriptlets (skip the easy ones like set / aeld / nostif / nosiif and very easy like ra / rc).

@uBlock-user
Copy link
Contributor

@gorhill

@gorhill
Copy link
Member

gorhill commented Dec 12, 2021

If OP thinks there is something which should be blocked, please provide the appropriate filter and most important ensure nothing is broken as a result -- I just don't have the time to look into every single case of 1st-party analytic people think should be blocked.

@uBlock-user
Copy link
Contributor

Blocking anything off the path www.reddit.com/api/ is prone to breakage.

@ghost
Copy link

ghost commented Jan 28, 2022

old.reddit.com##+js(set, r.config.events_collector_secret, '')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants