Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address 1st-party tracker blocking #780

Open
aeris opened this issue Nov 10, 2019 · 74 comments
Labels

Comments

@aeris
Copy link

@aeris aeris commented Nov 10, 2019

Helle here!

Since friday, we hit a case of 1st-party tracking that seems to be unblockable.

This occurs on https://www.liberation.fr/, embedding a 1st-party tracker f7ds.liberation.fr, which point to a ugly tracking provider Eulerian via the CNAME liberation.eulerian.net.

This provider clearly states it provide unblockable tracker
EJAeTXvWwAAqTPz
EJAwd5wWkAAjmsN

Seems Criteo starts to ask the same to their customer, with 1st-party tracking pointing to *.dnsdelegation.io subdomain.

In this case, it seems really difficult to block such tracker by tools like uBlock:

  • subdomain is mostly random (f7ds.example.org), even if we found some ea.* pattern
  • detection can sometime be done with CNAME resolution (to *.eulerian.net or *.dnsdelegation.io), but this is difficult to integrate to browser (those steps are internal to DNS client resolver)
  • IP filtering is not efficient, tracker provider can easily change IP without notifying it customers. CNAME change is more complexe, but provider can generate quite a bunch on random subdomain in advance and ask it customer to change the subdomain in case of too high blocking (or proactivly trigger a rotation each X days).

Do you have any way to detect then block such content from the browser?
The only (not so) efficient way I have at the moment is using DNS tools like PiHole to blacklist range of IP and CNAME pattern resolution. And even this way, it doesn't cover all the possible case… Even tools like µMatrix seems totally inefficient on such tracker…

@uBlock-user

This comment has been minimized.

Copy link
Member

@uBlock-user uBlock-user commented Nov 10, 2019

Do not post any filter list issues or issues where website's functionality is broken. We have uAssets issue tracker for that, post there instead.

https://github.com/uBlockOrigin/uBlock-issues#ublock-issues

@uBlockOrigin uBlockOrigin locked and limited conversation to collaborators Nov 10, 2019
@gorhill gorhill reopened this Nov 10, 2019
@uBlockOrigin uBlockOrigin unlocked this conversation Nov 10, 2019
@gorhill

This comment has been minimized.

Copy link
Member

@gorhill gorhill commented Nov 10, 2019

It's a technique used to bypass filters/rules, it's something which needs to be investigated.

@llacb47

This comment has been minimized.

Copy link

@llacb47 llacb47 commented Nov 10, 2019

Dupe/related discussion: uBlockOrigin/uAssets#6538

@uBlock-user

This comment has been minimized.

Copy link
Member

@uBlock-user uBlock-user commented Nov 10, 2019

Aren't they lying to PSL with these first-party domain entries ?

Edit: It's an inline-script, should be able to defuse via a scriptlet.

liberation.fr##+js(aopw, EA_data) works.

@uBlock-user

This comment has been minimized.

Copy link
Member

@uBlock-user uBlock-user commented Nov 10, 2019

Here's a crude dump of sites using Eulerian Analytics inline-script -- https://publicwww.com/websites/EA_data/

@llacb47

This comment was marked as off-topic.

Copy link

@llacb47 llacb47 commented Nov 10, 2019

@uBlock-user that scriptlet will only work for sites inserting the script using that variable. For other sites like oui.sncf, use this: uBlockOrigin/uAssets#6538 (comment)

@uBlock-user

This comment was marked as off-topic.

Copy link
Member

@uBlock-user uBlock-user commented Nov 10, 2019

Websites I tested so far are using that variable, except for the one you mentioned. oui.sncf redirects me to https://en.oui.sncf/fr/?redirect=yes where parseInt.+?3600000 is not found in the inline-script.

As per view-source:https://en.oui.sncf/fr/?redirect=yes, this is the js --

<script>
(function(d, s, id) {
  if (d.getElementById(id)) return;
  var js = d.createElement(s),
      fjs = d.getElementsByTagName(s)[0],
      vscaUrl = "//wblt.oui.sncf";

  js.id = id;
  js.async = true;

  js.src = vscaUrl + "/prod/" +
      (vsca_pageTag.config.vsca_version ? vsca_pageTag.config.vsca_version + "/" : "") +
      vsca_pageTag.config.siteId +
      "/vsca.js?M2lU3mD1O47ZAzgnp0wX";
    
  fjs.parentNode.insertBefore(js, fjs);
}(document, 'script', 'vscascript'));
</script> 

Filter -- oui.sncf##+js(acis, document.getElementById, vscaUrl)

@llacb47

This comment was marked as off-topic.

Copy link

@llacb47 llacb47 commented Nov 10, 2019

I'm in US, it did not redirect me. The inline script on oui.sncf is

  <!--begineulerian-->
  <script type="text/javascript">
    (function(){var d=document,l=d.location;if(!l.protocol.indexOf('http')){var o=d.createElement('script'),a=d.getElementsByTagName('script')[0],cn=parseInt((new Date()).getTime()/3600000);o.type='text/javascript';o.async='async';o.defer='defer';
    o.src='//v.oui.sncf/content/vsc-fr/8lL.QlYVeQ7BL6AqQORYg_FeHeIQMaObMRxsXxGG0g--/'+cn+'.js';
    a.parentNode.insertBefore(o,a);}})();
  </script>
  <!--endeulerian-->

And the inline script in #780 (comment) is not Eulerian, it is another tracker, not the one @aeris is talking about. Another site: officedepot.fr. Add officedepot.fr##+js(acis, document.createElement, parseInt)

@uBlock-user

This comment was marked as off-topic.

Copy link
Member

@uBlock-user uBlock-user commented Nov 10, 2019

Probably because of difference in geo-location of ourselves, we're not being served the same script. It may not be Eulerian but it's in the same vein as that.

Another site: officedepot.fr

That one definitely EA -- https://myip.ms/info/whois/109.232.195.156/k/3227454398/website/ea.officedepot.fr

@aeris

This comment has been minimized.

Copy link
Author

@aeris aeris commented Nov 10, 2019

New detection :
keyade.com, on rueducommerce.fr
omtrdc.net, on sfr.fr

@llacb47

This comment was marked as off-topic.

Copy link

@llacb47 llacb47 commented Nov 11, 2019

Offtopic:

Weird thing: it seems a pattern is the scripts ending with 7825. So here's a regex you can add to your filters ... (note-i'm not a regex expert obviously)
/(\.\w+)[.]?\/[A-z]{7}(7825)\.js$/

Example scripts:

https://f7ds.liberation.fr/aaAAaaA7825.js
https://v.oui.sncf/SNCFVOU7825.js
https://ea.officedepot.fr/potfrWW7825.js

Test sites: https://www.maeva.com and https://www.brandalley.fr/

Also another PublicWWW search: https://publicwww.com/websites/%22parseInt%28%28new+Date%28%29%29.getTime%28%29%2F3600000%29%22/

@gwarser

This comment has been minimized.

Copy link
Member

@gwarser gwarser commented Nov 11, 2019

Wondering if #44 can will apply here if implemented.

@gorhill

This comment has been minimized.

Copy link
Member

@gorhill gorhill commented Nov 11, 2019

Can't apply, the case given as example make use of legitimate subdomains, statics.liberation.fr, medias.liberation.fr.

I am looking at https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/dns/resolve, it can be used to expose the CNAME:

browser.dns.resolve('f7ds.liberation.fr', [ "canonical_name" ]).then(r => { console.log(r); });
Promise { <state>: "pending" }
Object { addresses: (1) […], canonicalName: "atc.eulerian.net", isTRR: false }

I will prototype and evaluate how to optimally use this in uBO with the utmost care.

@uBlock-user

This comment has been minimized.

Copy link
Member

@uBlock-user uBlock-user commented Nov 11, 2019

Will this be applied in uMatrix too ?

@gorhill

This comment has been minimized.

Copy link
Member

@gorhill gorhill commented Nov 11, 2019

Yes.

@uBlock-user

This comment has been minimized.

Copy link
Member

@uBlock-user uBlock-user commented Nov 11, 2019

You will need to add a new permission named 'dns' in the manifest to use this API - https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/dns and since this is Firefox only API, how will you address this in Chromium ?

@aeris

This comment was marked as off-topic.

Copy link
Author

@aeris aeris commented Nov 11, 2019

I am looking at https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/dns/resolve, it can be used to expose the CNAME:

Time to think about the future too. This detection can easily be bypassed with CNAME removal and a direct A/AAAA. Perhaps time to include IP range blacklist or AS number detection ? 🤔
For Eulerian, IP (109.232.197.0/24) and ASN (AS50234) are dedicated, so no false positive or negative, but may be more complicated in case of mutualised ones…

@gorhill

This comment was marked as off-topic.

Copy link
Member

@gorhill gorhill commented Nov 11, 2019

how will you address this in Chromium ?

uBO already make use of Firefox-specific API, for example, filterResponseData().

@uBlock-user

This comment has been minimized.

Copy link
Member

@uBlock-user uBlock-user commented Nov 11, 2019

I meant how will you fix this in Chromium..

@gorhill

This comment has been minimized.

Copy link
Member

@gorhill gorhill commented Nov 11, 2019

Best to assume it can't be fixed on Chromium if it does not support the proper API.

@gwarser

This comment was marked as off-topic.

Copy link
Member

@gwarser gwarser commented Nov 11, 2019

the case given as example make use of legitimate subdomains

In case by case basis, regex with whitelist-approach assertion can be used:

/^https:\/\/(?!www|images|medias|statics)/$script,1p,domain=liberation.fr
@gwarser

This comment has been minimized.

Copy link
Member

@gwarser gwarser commented Nov 20, 2019

Next step will be to add syntax for this?

Something like:

||liberation.fr^$resolve-cname

And then maybe type option:

*$third-party,cname,domain=liberation.fr

???

@marekr

This comment has been minimized.

Copy link

@marekr marekr commented Nov 20, 2019

Dumbest of ideas, throw in a JS based DoH client to query a DNS provider of choice in Chrome (Google, Cloudflare?). DoH is just a JSON api and it is designed to be fast (as a replacement for normal DNS)...

Alternatively, there once was a undocumented chrome.dns.resolve function ala

chrome.dns.resolve("google.com", function(res){ console.log(res)});

They even still show the "dns" permission in the manifest documentation, meaning the dns functions may still exist as long as the permission is declared.

@bonswouar

This comment has been minimized.

Copy link

@bonswouar bonswouar commented Nov 20, 2019

Totally agree with @marekr .

Although this cleaner solution would be risky, as chrome.dns.resolve is undocumented it could spontaneously disappear, couldn't it ?

(by the way I was wondering why you guys were talking about Cloudflare's DoH when there is a Google's one)

@devonbleak

This comment has been minimized.

Copy link

@devonbleak devonbleak commented Nov 20, 2019

Don't count too much on detecting CNAMEs solving this - DNS providers are providing A record aliases that do the CNAME resolution on the backend and spit out the IP in a direct A record. It's trivial to create NS records that forward subdomains to such providers to hide this kind of thing.

@x0wllaar

This comment has been minimized.

Copy link

@x0wllaar x0wllaar commented Nov 20, 2019

@bonswouar Because using Google's dns for privacy is kinda funny

Plus, cloudflare is the default choice for DoH in Firefox, for example

@marekr I suggested using DoH some time earlier, and this idea is basically incompatible with uBO.

@marekr

This comment has been minimized.

Copy link

@marekr marekr commented Nov 20, 2019

@uBlock-user

This comment has been minimized.

Copy link
Member

@uBlock-user uBlock-user commented Nov 20, 2019

Doesn't matter which DNS resolver you argue about in terms of privacy and what not, using an external DNS resolver for pulling CNAME data has already been declined by gorhill.

@uBlockOrigin uBlockOrigin locked and limited conversation to collaborators Nov 20, 2019
@uBlockOrigin uBlockOrigin unlocked this conversation Nov 23, 2019
pull bot pushed a commit to tiger17168/uBlock that referenced this issue Nov 23, 2019
Related issue:
- uBlockOrigin/uBlock-issues#780

Related commit:
- gorhill@3a564c1

This adds two new advanced settings:

- cnameIgnoreRootDocument
  - Default to `true`
  - Tells uBO to skip CNAME-lookup for root document.

- cnameReplayFullURL
  - Default to `false`
  - Tells uBO whether to replay the whole URL or just
    the origin part of it.
    Replaying only the origin part is meant to lower
    undue breakage and improve performance by avoiding
    repeating the pattern-matching of the whole URL --
    which pattern-matching was most likely already
    accomplished with the original request.

This commit is meant to explore enabling CNAME-lookup
by default for the next stable release while:

- Eliminating a development burden by removing the
  need to create a new filtering syntax to deal with
  undesirable CNAME-cloaked hostnames

- Eliminating a filter list maintainer burden by
  removing the need to find/deal with all base
  domains which engage in undesirable CNAME-cloaked
  hostnames

The hope is that the approach implemented in this
commit should require at most a few unbreak rules
with no further need for special filtering syntax
or filter list maintance efforts.
@ad-m

This comment was marked as off-topic.

Copy link

@ad-m ad-m commented Nov 23, 2019

I would like to point out that this issue should not focus solely on CNAME records. NS records can be used for the same purpose (the provider must only provide the DNS nameserver that responds to all queries with the correct IP address or pre-register webmaster domain in own DNS nameserver and ask webmaster to delegate subdomain to provider DNS nameserver). And also - supported by leading cloud providers - the ANAME record (ALIAS). Importantly, the ANAME record from the DNS client perspective is only visible as an A record that points to the selected IP address. This IP address can be shared among many clients or be subject to frequent changes in the cloud environment.

@RPiList

This comment was marked as off-topic.

Copy link

@RPiList RPiList commented Nov 25, 2019

Strangely, uMatrix with default settings does not catch f7ds.liberation.fr. Because in default, it trusts the main domain liberation.fr.

But on my main pc I configured uMatrix to not even trust the main domain. Here I don't see f7ds.liberation.fr

But of course, I know that this kind of configuration is only for people who know what they are doing.

@TuningYourCode

This comment has been minimized.

Copy link

@TuningYourCode TuningYourCode commented Nov 26, 2019

https://www.ingenioustechnologies.com/tracking/ is doing the same as first party tracking system.

Looks like they recommend their customers to use "marketing.net.*".

Example:
Non-authoritative answer:
marketing.net.brillen.de canonical name = tr-brillen-de.affex.org.
tr-brillen-de.affex.org canonical name = lb1.affex.org.
Name: lb1.affex.org
Address: 35.187.117.15

Non-authoritative answer:
marketing.net.home24.de canonical name = tr-home24-de.affex.org.
tr-home24-de.affex.org canonical name = lb1.affex.org.
Name: lb1.affex.org
Address: 35.187.117.15

@Sispheor

This comment has been minimized.

Copy link

@Sispheor Sispheor commented Nov 27, 2019

I agree with @devonbleak. Detecting the CNAME will solve only a part of the issue.

What happen if they just give an IP from an AWS pool of VM without any DNS entry? Here the only way to detect it would be to interrogate the api to see if it correspond to a tracker. But it would be too heavy.

And what happen if they get all info required though their own APi and then send everything from their backend to the tracker api? I'm even wondering if it's not already the case.

@yory8

This comment has been minimized.

Copy link

@yory8 yory8 commented Nov 27, 2019

@Sispheor your last argument seems unlikely to me: ads tracker need to do the harvesting themselves, otherwise site owners would just send them fake data to improve their site's value (by claiming they receive much more users than true).

@Sispheor

This comment has been minimized.

Copy link

@Sispheor Sispheor commented Nov 27, 2019

You are right @yory8 .
So we are safe on this part, thanks God ;)

@ad-m

This comment has been minimized.

Copy link

@ad-m ad-m commented Nov 27, 2019

Mixpanel already recommend setup proxy to improve user tracking which use ad-blockers: https://help.mixpanel.com/hc/en-us/articles/115004499463-Ad-Blockers-Affect-Mixpanel

@afontenot

This comment has been minimized.

Copy link

@afontenot afontenot commented Nov 28, 2019

Does anyone know of a subscribable list dedicated to blocking these tracking subdomains? https://trackingthetrackers.com/site/espn.com says that sw88.espn.com is a CNAME for a tracking server, but I don't see it on any lists. (I would report it directly to a list maintainers, but I don't know which ones consider these tracking servers to be in-scope.) Apple has one as well that was only blocked by Peter Lowe's server list.

@immanuelfodor

This comment has been minimized.

Copy link

@immanuelfodor immanuelfodor commented Nov 28, 2019

Here is one project that tries to collect them even from the PiHole local database if you have one: https://git.frogeye.fr/geoffrey/eulaurarien

Here is the compiled list of the project's author: https://hostfiles.frogeye.fr/firstparty-trackers-hosts.txt (19446 domains as of writing, contains the mentioned sw88.espn.com)

mapx- added a commit to uBlockOrigin/uAssets that referenced this issue Nov 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.