Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNR] Add support for wildcards in initiatorDomains and excludedInitiatorDomains fields #394

Open
maximtop opened this issue May 18, 2023 · 21 comments
Labels
follow-up: chrome Needs a response from a Chrome representative follow-up: safari Needs a response from a Safari representative neutral: firefox Not opposed or supportive from Firefox topic: dnr Related to declarativeNetRequest

Comments

@maximtop
Copy link

maximtop commented May 18, 2023

This is a common scenario where a domain has multiple variations of the same domain name in different domain zones.

I've counted over 700 rules in AdGuard filters that follow this pattern. Here's an example of such a rule:

/1.js|$domain=kat.*|kickass.*|kickass2.*|kickasstorrents.*|kat2.*|kattracker.*|thekat.*|thekickass.*|kickassz.*|kickasstorrents2.*|topkickass.*|kickassgo.*|kkickass.*|kkat. *|kickasst.*|kick4ss.*|katbay.*|kickasshydra.*|kickasskat.*|kickassbay.*|torrentkat.*|kickassuk.*|torrentskickass.*|kickasspk.*|kickasstrusty.*|katkickass.*|kickassindia. *|kickass-usa.*|kickassaustralia.*|kickassdb.*|kathydra.*|kickassminds.*|kickassunlocked.*|kickassmovies.*|kickassfull.*|bigkickass.*|katfreak.*|kickasstracker.*.

Introducing wildcards would reduce the size of DNR rulesets and make life easier for filter developers.

The rule in the ruleset could look like this

[
    {
        "id": 1,
        "action": {
            "type": "block"
        },
        "condition": {
            "urlFilter": "/adscript",
            "initiatorDomains": ["example.*"]
        }
    }
]
@bershanskiy
Copy link
Member

To avoid syntax ambiguity, could you please describe the desired wildcard syntax?

Specifically:

  • Are wildcard * used exclusively in place of an entire label (like *.com matching all com domains) or can labels contain wildcards (like ex*.com matching example.com and extra.com). In practice, I implemented this before and this case did not come up very often and it was much easier not to support it.
  • Are there right-sided wildcards, which would differentiate this from CSP widlcards? I assume, yes, since your example contains them.
  • Are left-sided wildcards limited in depth (like *.com mathcing a.com but not b.a.com) or, matching CSP behavior, not limited in depth (like (like *.com matching both a.com and b.a.com)? Or is it Chromuim extension host permission syntax (like [*.]example.com matching both a.com and b.a.com, but *.example.com mathcing a.com but not b.a.com)?
  • Can there be multiple wildcards in a single pattern? (I would prefer if this was supported, since I encountered this case before in practice)

@sammacbeth
Copy link

sammacbeth commented May 23, 2023

Note there is some existing wildcard support for domain matching in DNR implementations:

  • In Chrome, domains specified in initiatorDomains and excludedInitiatorDomains implicitly match all subdomains. The docs state "Sub-domains of the listed domains are also matched." For example, example.com matches example.com, as well as foo.example.com.
  • In Safari, only the exact domain is matched, unless the a * prefix is used. For example, example.com matches example.com but not foo.example.com. To get behavior matching Chrome we need to specify *example.com. This comes from Safari's Content Blocker rule spec "Add * in front to match domain and subdomains".

@maximtop
Copy link
Author

@bershanskiy,

To answer your queries:

1, 2, 3, 4 - From what I have observed, there isn't a significant need for wildcard support outside of TLDs.

However, there are developments being made toward supporting regex, which might prove helpful in more complex scenarios. For instance, UBlock recently implemented support for regex, as shown in this commit: gorhill/uBlock@b1de8d3.

@dotproto dotproto added topic: dnr Related to declarativeNetRequest and removed needs-triage labels May 25, 2023
@Rob--W Rob--W added opposed: firefox Opposed by Firefox neutral: chrome Not opposed or supportive from Chrome neutral: safari Not opposed or supportive from Safari labels May 25, 2023
@Rob--W
Copy link
Member

Rob--W commented May 25, 2023

I've added labels to reflect the positions from the meeting notes (pending to be merged in #397).

The opposed: firefox Opposed by Firefox position on this issue reflects the decision based on the given context and background, but if there are compelling use cases we're willing to reconsider.

@maximtop
Copy link
Author

During the recent W3C call, I was asked to provide more examples. Let's go back to the rule I first presented. I selected a domain (kickass.*) from this rule and ran a search for related issues here: https://github.com/AdguardTeam/AdguardFilters. The search returned 42 issues.

issues
https://github.com/AdguardTeam/AdguardFilters/issues/115172
https://github.com/AdguardTeam/AdguardFilters/issues/72978
https://github.com/AdguardTeam/AdguardFilters/issues/72980
https://github.com/AdguardTeam/AdguardFilters/issues/72977
https://github.com/AdguardTeam/AdguardFilters/issues/72974
https://github.com/AdguardTeam/AdguardFilters/issues/72975
https://github.com/AdguardTeam/AdguardFilters/issues/72972
https://github.com/AdguardTeam/AdguardFilters/issues/72979
https://github.com/AdguardTeam/AdguardFilters/issues/72873
https://github.com/AdguardTeam/AdguardFilters/issues/72976
https://github.com/AdguardTeam/AdguardFilters/issues/74912
https://github.com/AdguardTeam/AdguardFilters/issues/68940
https://github.com/AdguardTeam/AdguardFilters/issues/72992
https://github.com/AdguardTeam/AdguardFilters/issues/72993
https://github.com/AdguardTeam/AdguardFilters/issues/72991
https://github.com/AdguardTeam/AdguardFilters/issues/72988
https://github.com/AdguardTeam/AdguardFilters/issues/72989
https://github.com/AdguardTeam/AdguardFilters/issues/72984
https://github.com/AdguardTeam/AdguardFilters/issues/72985
https://github.com/AdguardTeam/AdguardFilters/issues/72986
https://github.com/AdguardTeam/AdguardFilters/issues/72987
https://github.com/AdguardTeam/AdguardFilters/issues/72981
https://github.com/AdguardTeam/AdguardFilters/issues/72982
https://github.com/AdguardTeam/AdguardFilters/issues/72983
https://github.com/AdguardTeam/AdguardFilters/issues/64396
https://github.com/AdguardTeam/AdguardFilters/issues/72961
https://github.com/AdguardTeam/AdguardFilters/issues/50342
https://github.com/AdguardTeam/AdguardFilters/issues/34760
https://github.com/AdguardTeam/AdguardFilters/issues/13090

I have selected unique domain names from these issues. There's potential for more to be found.

domains
kickass.ws
kickass.one
kickass.red
kickass.earth
kickass.kim
kickass.name
kickass.id
kickass.pink
kickass.onl
kickass.me.uk
kickass2.app
kickasstorrents.mobi
kat2.app
kickasstorrents.fun
kat2.xyz
kat2.space
kickass2.website
kickass2.fun
kickass2.top
kickass2.space
kickass2.online
kickass2.xyz
kickass2.mobi
kickass.love
kat2.website
kickass.cd

Next, I searched for rules that use these domains and identified 5.

rules
/help/?yd=?$popup,domain=kat.*|kickass.*|kickass2.*|kickasstorrents.*|kat2.*|kattracker.*|thekat.*|thekickass.*|kickassz.*|kickasstorrents2.*|topkickass.*|kickassgo.*|kkickass.*|kkat.*|kickasst.*|kick4ss.*|katbay.*|kickasshydra.*|kickasskat.*|kickassbay.*|torrentkat.*|kickassuk.*|torrentskickass.*|kickasspk.*|kickasstrusty.*|katkickass.*|kickassindia.*|kickass-usa.*|kickassaustralia.*|kickassdb.*|kathydra.*|kickassminds.*|katkickass.*|kickassunlocked.*|kickassmovies.*|kickassfull.*|bigkickass.*|kickasstracker.*|katfreak.*|kickasstracker.*|katfreak.*|kickasshydra.*|katbay.*|kickasst.*|kkickass.*|kattracker.*|topkickass.*|thekat.*|kat.*|kat2.*|kick4ss.*|kickass.*|kickass2.*|kickasstorrents.*|kat.fun|kat2.app|kat2.space|kat2.website|kat2.xyz|kick4ss.net|kickass.cd|kickass.earth|kickass.id|kickass.kim|kickass.love|kickass.me.uk|kickass.name|kickass.one|kickass.red|kickass.vc|kickass.ws|kickass2.app|kickass2.fun|kickass2.mobi|kickass2.online|kickass2.space|kickass2.top|kickass2.website|kickass2.xyz|kickassgo.com|kickasstorrent.cr|kickasstorrents.fun|kickasstorrents.icu|kickasstorrents.mobi|kickasstorrents.to|kickasstorrents2.net|kickassz.com|kkat.net|thekickass.org|kickasstorrents.space|thekat.cc|topkickass.org|kattracker.com

/r.js|$domain=kat.*|kickass.*|kickass2.*|kickasstorrents.*|kat2.*|kattracker.*|thekat.*|thekickass.*|kickassz.*|kickasstorrents2.*|topkickass.*|kickassgo.*|kkickass.*|kkat.*|kickasst.*|kick4ss.*|katbay.*|kickasshydra.*|kickasskat.*|kickassbay.*|torrentkat.*|kickassuk.*|torrentskickass.*|kickasspk.*|kickasstrusty.*|katkickass.*|kickassindia.*|kickass-usa.*|kickassaustralia.*|kickassdb.*|kathydra.*|kickassminds.*|katkickass.*|kickassunlocked.*|kickassmovies.*|kickassfull.*|bigkickass.*|kickasstracker.*|katfreak.*|kickasstracker.*|katfreak.*|kickasshydra.*|katbay.*|kickasst.*|kkickass.*|kattracker.*|topkickass.*|thekat.*|kat.*|kat2.*|kick4ss.*|kickass.*|kickass2.*|kickasstorrents.*|kat.fun|kat2.app|kat2.space|kat2.website|kat2.xyz|kick4ss.net|kickass.cd|kickass.earth|kickass.id|kickass.kim|kickass.love|kickass.me.uk|kickass.name|kickass.one|kickass.red|kickass.vc|kickass.ws|kickass2.app|kickass2.fun|kickass2.mobi|kickass2.online|kickass2.space|kickass2.top|kickass2.website|kickass2.xyz|kickassgo.com|kickasstorrent.cr|kickasstorrents.fun|kickasstorrents.icu|kickasstorrents.mobi|kickasstorrents.to|kickasstorrents2.net|kickassz.com|kkat.net|thekickass.org|kickasstorrents.space|thekat.cc|topkickass.org|kattracker.com

/k.js|$domain=kat.*|kickass.*|kickass2.*|kickasstorrents.*|kat2.*|kattracker.*|thekat.*|thekickass.*|kickassz.*|kickasstorrents2.*|topkickass.*|kickassgo.*|kkickass.*|kkat.*|kickasst.*|kick4ss.*|katbay.*|kickasshydra.*|kickasskat.*|kickassbay.*|torrentkat.*|kickassuk.*|torrentskickass.*|kickasspk.*|kickasstrusty.*|katkickass.*|kickassindia.*|kickass-usa.*|kickassaustralia.*|kickassdb.*|kathydra.*|kickassminds.*|katkickass.*|kickassunlocked.*|kickassmovies.*|kickassfull.*|bigkickass.*|kickasstracker.*|katfreak.*|kickasstracker.*|katfreak.*|kickasshydra.*|katbay.*|kickasst.*|kkickass.*|kattracker.*|topkickass.*|thekat.*|kat.*|kat2.*|kick4ss.*|kickass.*|kickass2.*|kickasstorrents.*|kat.fun|kat2.app|kat2.space|kat2.website|kat2.xyz|kick4ss.net|kickass.cd|kickass.earth|kickass.id|kickass.kim|kickass.love|kickass.me.uk|kickass.name|kickass.one|kickass.red|kickass.vc|kickass.ws|kickass2.app|kickass2.fun|kickass2.mobi|kickass2.online|kickass2.space|kickass2.top|kickass2.website|kickass2.xyz|kickassgo.com|kickasstorrent.cr|kickasstorrents.fun|kickasstorrents.icu|kickasstorrents.mobi|kickasstorrents.to|kickasstorrents2.net|kickassz.com|kkat.net|thekickass.org|kickasstorrents.space|thekat.cc|topkickass.org|kattracker.com

/1.js|$domain=kat.*|kickass.*|kickass2.*|kickasstorrents.*|kat2.*|kattracker.*|thekat.*|thekickass.*|kickassz.*|kickasstorrents2.*|topkickass.*|kickassgo.*|kkickass.*|kkat.*|kickasst.*|kick4ss.*|katbay.*|kickasshydra.*|kickasskat.*|kickassbay.*|torrentkat.*|kickassuk.*|torrentskickass.*|kickasspk.*|kickasstrusty.*|katkickass.*|kickassindia.*|kickass-usa.*|kickassaustralia.*|kickassdb.*|kathydra.*|kickassminds.*|katkickass.*|kickassunlocked.*|kickassmovies.*|kickassfull.*|bigkickass.*|kickasstracker.*|katfreak.*|kickasstracker.*|katfreak.*|kickasshydra.*|katbay.*|kickasst.*|kkickass.*|kattracker.*|topkickass.*|thekat.*|kat.*|kat2.*|kick4ss.*|kickass.*|kickass2.*|kickasstorrents.*|kat.fun|kat2.app|kat2.space|kat2.website|kat2.xyz|kick4ss.net|kickass.cd|kickass.earth|kickass.id|kickass.kim|kickass.love|kickass.me.uk|kickass.name|kickass.one|kickass.red|kickass.vc|kickass.ws|kickass2.app|kickass2.fun|kickass2.mobi|kickass2.online|kickass2.space|kickass2.top|kickass2.website|kickass2.xyz|kickassgo.com|kickasstorrent.cr|kickasstorrents.fun|kickasstorrents.icu|kickasstorrents.mobi|kickasstorrents.to|kickasstorrents2.net|kickassz.com|kkat.net|thekickass.org|kickasstorrents.space|thekat.cc|topkickass.org|kattracker.com

/\/....\//$script,domain=kat.*|kickass.*|kickass2.*|kickasstorrents.*|kat2.*|kattracker.*|thekat.*|thekickass.*|kickassz.*|kickasstorrents2.*|topkickass.*|kickassgo.*|kkickass.*|kkat.*|kickasst.*|kick4ss.*|katbay.*|kickasshydra.*|kickasskat.*|kickassbay.*|torrentkat.*|kickassuk.*|torrentskickass.*|kickasspk.*|kickasstrusty.*|katkickass.*|kickassindia.*|kickass-usa.*|kickassaustralia.*|kickassdb.*|kathydra.*|kickassminds.*|katkickass.*|kickassunlocked.*|kickassmovies.*|kickassfull.*|bigkickass.*|kickasstracker.*|katfreak.*|kickasstracker.*|katfreak.*|kickasshydra.*|katbay.*|kickasst.*|kkickass.*|kattracker.*|topkickass.*|thekat.*|kat.*|kat2.*|kick4ss.*|kickass.*|kickass2.*|kickasstorrents.*|kat.fun|kat2.app|kat2.space|kat2.website|kat2.xyz|kick4ss.net|kickass.cd|kickass.earth|kickass.id|kickass.kim|kickass.love|kickass.me.uk|kickass.name|kickass.one|kickass.red|kickass.vc|kickass.ws|kickass2.app|kickass2.fun|kickass2.mobi|kickass2.online|kickass2.space|kickass2.top|kickass2.website|kickass2.xyz|kickassgo.com|kickasstorrent.cr|kickasstorrents.fun|kickasstorrents.icu|kickasstorrents.mobi|kickasstorrents.to|kickasstorrents2.net|kickassz.com|kkat.net|thekickass.org|kickasstorrents.space|thekat.cc|topkickass.org|kattracker.com

There are 20 tlds extracted from the rule.

tlds
.fun
.app
.space
.website
.xyz
.com
.net
.id
.kim
.love
.me.uk
.name
.one
.red
.vc
.ws
.mobi
.cr
.icu
.to
.org

All unique domain names mentioned in the rule are currently down. There are 38 such domains in total.

domain names
bigkickass
kat
kat2
katbay
katfreak
kathydra
katkickass
kattracker
kick4ss
kickass-usa
kickass
kickass2
kickassaustralia
kickassbay
kickassdb
kickassfull
kickassgo
kickasshydra
kickassindia
kickasskat
kickassminds
kickassmovies
kickasspk
kickasst
kickasstorrent
kickasstorrents
kickasstorrents2
kickasstracker
kickasstrusty
kickassuk
kickassunlocked
kickassz
kkat
kkickass
thekat
thekickass
topkickass
torrentkat
torrentskickass

The calculation 38 * 20 equals 760, which illustrates how quickly the number of domains in the initiatorDomains field can grow. And note that this doesn't even include all the popular TLDs.

You'll find more examples below:

more examples

rules

  • @@||cdn.cookielaw.org/scripttemplates/otSDKStub.js$domain=blaklader.*

domains

  • blaklader.at
  • blaklader.be
  • blaklader.ca
  • blaklader.com
  • blaklader.cz
  • blaklader.de
  • blaklader.dk
  • blaklader.ee
  • blaklader.es
  • blaklader.fi
  • blaklader.fr
  • blaklader.ie
  • blaklader.it
  • blaklader.nl
  • blaklader.no
  • blaklader.pl
  • blaklader.se
  • blaklader.uk

issues

rules
@@/popunder.$domain=streameast.*

domains

  • streameast.live
  • streameast.io
  • streameast.xyz
  • streameast.to
  • streameast.watch

issues

rules

  • ||prod-adops-proxy.dnitv.net^$redirect=nooptext,domain=discoveryplus.*
  • ||akamaihd.net^$media,domain=discoveryplus.*
  • ||dnitv.com^$media,domain=discoveryplus.*
  • @@||mparticle.com^/login$domain=discoveryplus.

domains

  • discoveryplus.in
  • discoveryplus.se
  • discoveryplus.dk
  • discoveryplus.it
  • discoveryplus.com

issues

rules

  • ||snigelweb-com.videoplayerhub.com^$domain=tellows.

domains

  • tellows.com
  • tellows.ch
  • tellows.jp
  • tellows.ru
  • tellows.in
  • tellows.mx
  • tellows.co.nz
  • tellows.com.br
  • ve.tellows.net
  • ae.tellows.net
  • cn.tellows.net
  • eg.tellows.net
  • tellows.tw
  • dz.tellows.net
  • sa.tellows.net
  • tellows.tw
  • ir.tellows.net
  • ar.tellows.net
  • tellows.co.za
  • id.tellows.net
  • cl.tellows.net
  • tellows-au.com
  • tellows-tr.com
  • tellows.co
  • no.tellows.net
  • il.tellows.org

issues

rules

  • ||googletagmanager.com/gtag/js?id=$xmlhttprequest,redirect=nooptext,domain=streamingcommunity.*
  • @@||scws.xyz^$domain=streamingcommunity.*

domains

  • streamingcommunity.to
  • streamingcommunity.net
  • streamingcommunity.co
  • streamingcommunity.one
  • streamingcommunity.xyz
  • streamingcommunity.vip
  • streamingcommunity.website
  • streamingcommunity.fun
  • streamingcommunity.site
  • streamingcommunity.icu
  • streamingcommunity.bar
  • streamingcommunity.cc
  • streamingcommunity.press
  • streamingcommunity.tech
  • streamingcommunity.actor
  • streamingcommunity.love

issues

rules

  • ||pagead2.googlesyndication.com/pagead/js/adsbygoogle.js$script,redirect=googlesyndication-adsbygoogle,important,domain=bospedia.com|rockmods.net|aiimsneetshortnotes.com|arabseed.|poppamr.com|adnit.xyz|seg-ashort1a-ma.tk|miuiku.com|tatangga.com|flash-firmware.blogspot.com|ghostsnet.com|rezkozpatch.xyz|rumahit.id|sekilastekno.com|romfirmware.com|orirom.com|gsmfirmware.net|animekuro.net|ashort1a.xyz|akwam.|health-and.me

domains

  • on.akwam.cx
  • akwam.us
  • akwam.to
  • on.akwam.cz
  • re.akwam.news
  • akwam.io
  • akwam.in
  • akwam.im
  • akwam.cc
  • old.akwam.co
  • eg4.akwam.net
  • akwam.org

issues

@gorhill
Copy link

gorhill commented May 26, 2023

Concerning uBlock Origin, at the moment, I count 269 filters which must be thrown away when converting current filter lists to DNR rules. Of course, those 269 filters represent at least twice the number of MV3 DNR rules that would be created otherwise since the purpose of an entity is to match more than one domain name.


[1] "Entity-based" is the name uBO uses to refer to domain name entries for which the TLD is replaced by *.

@Yuki2718
Copy link

Yuki2718 commented May 26, 2023

but if there are compelling use cases we're willing to reconsider.

Filter lists are maiatained by small numbers of people (mostly volunteers) but sites who are most serious about circumventing blocker have long been changing their domain, often TLD, rapidly. Although domain wildcard in cosmetic or scriptlet filters are far more often used, there is still need for wildcard in network filter; for example,
uBlockOrigin/uAssets@ec340c5
AdguardTeam/AdguardFilters@3b2798e
If you say ABP does not support this, yes, and thus they're quite behind of AG/uBO in addressing those sites.

@gorhill
Copy link

gorhill commented May 26, 2023

ABP does not support this

It still is an open issue in ABP: https://gitlab.com/eyeo/adblockplus/abc/adblockpluscore/-/issues/123.

@Alex-302
Copy link

More examples:

The main purpose of using * as a TLD is to make the action of the rule somewhat broader, but not to make it generic unnecessarily(e.g. ##.banner). This is important when using scriptlets (JS rules) and element hiding rules, which can cause problems on other sites.
It also eliminates the need to keep track of domain changes (when it's predictable), and users don't have to report a problem every time (unless some new problem has been discovered).

@erosman
Copy link

erosman commented May 30, 2023

Considering an example e.g. example.*

  • Would right-handed wildcard be limited to TLD?
    example.com vs example.google.com

  • Would right-handed wildcard be limited to one or multi-part TLD?
    example.com, example.co.uk, example.git-pages.rit.edu

  • Would right-handed wildcard be implemented by The Public Suffix List?

See also: tld service for webextensions

@Yuki2718
Copy link

This https://github.com/gorhill/uBlock/wiki/Static-filter-syntax#entity

@oliverdunk
Copy link
Member

@maximtop and others, thanks for the examples. Those are all really helpful.

https://gitlab.com/eyeo/adblockplus/abc/adblockpluscore/-/issues/123 (linked above) has some really interesting discussion on the pros/cons of this.

the * is supposed to match only the public suffix.

I missed this the first time (and it's an important distinction I think). The ask is not actually for a true wildcard - just the ability to match arbitrary TLDs. Which could open the possibility of exploring other syntax for this where you just list the pre-TLD part of the domain and don't actually use an asterisk character.

One thing I'm still not clear about is why creating an entirely new domain is any more friction to changing the TLD. If we implement this, what's to stop websites evading blocking tools by just changing the domain entirely?

I definitely still have some concerns around the number of sites intended to be matched vs. the number that would actually be matched with a wildcard. I think this could very easily encourage lazy rule creation which ends up blocking genuine sites that happen to share the domain (but not TLD) of something else.

@gorhill
Copy link

gorhill commented May 30, 2023

what's to stop websites evading blocking tools by just changing the domain entirely

Everything content blockers do can be evaded, and yet we are still here. In the end the way we design our content blockers is to make it as easy as possible for filter list maintainers to do their task, to avoid hardship. The wildcard for public suffix has proven useful in reducing hardship after years of usage.


concerns around the number of sites intended to be matched vs. the number that would actually be matched with a wildcard

Both AdGuard and uBO have public filter list issue trackers, and I can't remember a case of false positive caused by filters with wildcarded public suffix in their domain option. Maybe it has occurred (and maybe a filter list maintainer can provide links to these), but certainly it's so rare that it shouldn't be a concern. Majority of false positives are from broad filters which are meant to apply everywhere (no domain option), filter with wildcarded public suffix do not qualify as broad filters.

@Yuki2718
Copy link

Yuki2718 commented May 30, 2023

As said, the feature was added because many sites, as a fact, change only TLDs. For cosmetic/scriptlet fiters we have countless examples:
uBlockOrigin/uAssets@7d05efc
AdguardTeam/AdguardFilters@237ab8f (note there already are rules for 1337x.is)
Without wildcard, lists auhors are foreced to chase them or use generic rules which are much more risky as @Alex-302 pointed out.

@Alex-302
Copy link

@oliverdunk

I think this could very easily encourage lazy rule creation which ends up blocking genuine sites that happen to share the domain (but not TLD) of something else.

In any case, we (the authors of the filters) try to make rules that carry a minimum risk of breakage. Not all of these sites change TLDs to bypass ad blocking. Many are blocked for violating copyrights or censoring in countries with a target audience.
For example, Tripadvisor and Pinterest also have many regional domains. We use our own implementations of *, and also have not registered any problems because of this either. It is not a problem if a site has 3-5 regional domains.
At the same time, there are some filter authors who add dangerous rules (for example like ##.ad or ##.banner which will break sites) even without a wildcard:)

@Yuki2718
Copy link

Yuki2718 commented May 30, 2023

Maybe it has occurred (and maybe a filter list maintainer can provide links to these)

For scriptlet and denyallow filters we had uBlockOrigin/uAssets@47ad111 and uBlockOrigin/uAssets@8f66870
There are ceratainly cases changing entire domain so AG and uBO have implemented regex-fied domain specification:
AdguardTeam/CoreLibs#1550
uBlockOrigin/uBlock-issues#2234
This is welcome addition from a filter authors point of view, but definitely less often used.

@oliverdunk
Copy link
Member

Thanks all. Definitely not indicating a decision here, just wanted to mention something that we should decide if we're concerned about as we discuss this more :)

@Rob--W Rob--W added follow-up: chrome Needs a response from a Chrome representative and removed opposed: firefox Opposed by Firefox neutral: chrome Not opposed or supportive from Chrome neutral: safari Not opposed or supportive from Safari labels Jun 22, 2023
@Rob--W Rob--W added neutral: firefox Not opposed or supportive from Firefox follow-up: safari Needs a response from a Safari representative labels Jun 22, 2023
@Rob--W
Copy link
Member

Rob--W commented Jun 22, 2023

As stated in the meeting, Chrome and Safari will follow up with the engineering teams to determine the feasibility of implementing this.

@dotproto
Copy link
Member

the * is supposed to match only the public suffix.

I missed this the first time (and it's an important distinction I think). The ask is not actually for a true wildcard - just the ability to match arbitrary TLDs. Which could open the possibility of exploring other syntax for this where you just list the pre-TLD part of the domain and don't actually use an asterisk character.

I also missed this originally and I agree that this is an important distinction. The use of .* to denote TLD matching confused me. I expected the asterisk to behave like a * wildcard in DOS, a .* pattern a regular expression, or like a ** pattern in a glob pattern. Limiting the terminal .* in a domain to TLDs or public suffixes is a significant reduction in scope from what I was originally envisioning.

I don't currently have any concerns with this request. In abstract I'm a bit concerned about the theoretical possibility of false positives, but this comment by @gorhill addresses those worries.

Both AdGuard and uBO have public filter list issue trackers, and I can't remember a case of false positive caused by filters with wildcarded public suffix in their domain option. Maybe it has occurred (and maybe a filter list maintainer can provide links to these), but certainly it's so rare that it shouldn't be a concern. Majority of false positives are from broad filters which are meant to apply everywhere (no domain option), filter with wildcarded public suffix do not qualify as broad filters.

@ameshkov
Copy link

ameshkov commented Aug 2, 2023

From what I am reading in the last meeting minutes this proposal in its current form this proposal causes a lot of confusion.

May I suggest an alternative approach? Instead of extending initiatorDomains we could add a new condition initatorRegexFilter (and possibly excludedInitatorRegexFilter which is basically the same as regexFilter, but for the initiator URLs).

UPD: initatorUrlFilter would be even better.

@ameshkov
Copy link

The issue was discussed during the WECG in-person meeting.

In order to choose the more appropriate way forward it is necessary to figure out what is the current situation with filtering rules.

  • How many rules are there that use .TLD matching?
  • How many filtering rules are there that would require initiatorUrlFilter or initiatorRegexFilter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
follow-up: chrome Needs a response from a Chrome representative follow-up: safari Needs a response from a Safari representative neutral: firefox Not opposed or supportive from Firefox topic: dnr Related to declarativeNetRequest
Projects
None yet
Development

No branches or pull requests