Description
Feature or enhancement
Proposal:
in line 1023 of lib/http/cookiejar.py:
if cookie.domain_specified:
req_host, erhn = eff_request_host(request)
domain = cookie.domain
if self.strict_domain and (domain.count(".") >= 2):
# XXX This should probably be compared with the Konqueror
# (kcookiejar.cpp) and Mozilla implementations, but it's a
# losing battle.
i = domain.rfind(".")
j = domain.rfind(".", 0, i)
if j == 0: # domain like .foo.bar
tld = domain[i+1:]
sld = domain[j+1:i]
if sld.lower() in ("co", "ac", "com", "edu", "org", "net",
"gov", "mil", "int", "aero", "biz", "cat", "coop",
"info", "jobs", "mobi", "museum", "name", "pro",
"travel", "eu") and len(tld) == 2:
# domain like .co.uk
_debug(" country-code second level domain %s", domain)
return False
well, the Second-Level Domain tuple are written in 2006. I've noticed in today's wiki we've got some new second-level domains to add.
Source: https://en.wikipedia.org/wiki/Second-level_domain
I've dumped all the countries metioned in the wiki link, some of them are from https://github.com/derangeddk/cc2lds (based on wiki, written in 2022) which I check each of them to make sure it's correct. And for the new ones, I've copied them manually.
the dumped data is as follows:
second-level-country-domains.zip
then I wrote a script to count how many times each second-level domain is used.
import os
import yaml
from collections import defaultdict
def count_domains_in_folder(folder_path):
domain_counts = defaultdict(int)
total_countries = 0
for filename in os.listdir(folder_path):
if filename.endswith('.yml') or filename.endswith('.yaml'):
filepath = os.path.join(folder_path, filename)
try:
with open(filepath, 'r', encoding='utf-8') as file:
data = yaml.safe_load(file)
if data and isinstance(data, dict):
for country_code, domains in data.items():
if isinstance(domains, list):
total_countries += 1
for domain in domains:
domain_counts[domain] += 1
except Exception as e:
print(f"Error processing file {filename}: {e}")
sorted_domains = sorted(domain_counts.items(), key=lambda x: x[1], reverse=True)
return sorted_domains, total_countries
folder_path = './second-level-country-domains'
sorted_domains, total_countries = count_domains_in_folder(folder_path)
print("\nin list domains: ")
for domain, count in sorted_domains:
if domain in ("co", "ac", "com", "edu", "org", "net",
"gov", "mil", "int", "aero", "biz", "cat", "coop",
"info", "jobs", "mobi", "museum", "name", "pro",
"travel", "eu"):
print(f"in-list {domain}: {count} times ({count/total_countries:.1%})")
print('\nfirst 20 domains not in list:')
for domain, count in sorted_domains[:20]:
if domain not in ("co", "ac", "com", "edu", "org", "net",
"gov", "mil", "int", "aero", "biz", "cat", "coop",
"info", "jobs", "mobi", "museum", "name", "pro",
"travel", "eu"):
print(f"not-in-list {domain}: {count} times ({count/total_countries:.1%})")
Well, the results:
in list domains:
in-list org: 21 times (77.8%)
in-list net: 21 times (77.8%)
in-list gov: 19 times (70.4%)
in-list edu: 18 times (66.7%)
in-list com: 17 times (63.0%)
in-list ac: 14 times (51.9%)
in-list co: 13 times (48.1%)
in-list mil: 13 times (48.1%)
in-list info: 5 times (18.5%)
in-list biz: 4 times (14.8%)
in-list int: 3 times (11.1%)
in-list name: 3 times (11.1%)
in-list coop: 2 times (7.4%)
in-list pro: 2 times (7.4%)
in-list mobi: 2 times (7.4%)
in-list travel: 2 times (7.4%)
in-list museum: 1 times (3.7%)
in-list aero: 1 times (3.7%)
first 20 domains not in list:
not-in-list tv: 5 times (18.5%)
not-in-list or: 4 times (14.8%)
not-in-list nom: 4 times (14.8%)
not-in-list sch: 4 times (14.8%)
not-in-list web: 4 times (14.8%)
not-in-list tm: 3 times (11.1%)
not-in-list gen: 3 times (11.1%)
not-in-list go: 3 times (11.1%)
not-in-list ltd: 3 times (11.1%)
As the result shows, I think we can add .or
, .tv
, .nom
, .sch
, .web
to the list.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response