A powerful Python utility for downloading, filtering, processing, and analyzing domain data from the Tranco list.
- Overview
- Features
- Requirements
- Installation
- Usage
- Advanced Features
- Sample Outputs
- Troubleshooting
- License
The Tranco Domain List Processor is a comprehensive tool that helps you download, filter, and analyze domain data from the Tranco list - a research-oriented top sites ranking that aims to be more reliable than commercial rankings. This script provides multiple filtering options, automated scheduling, and different output formats for your domain data needs.
- Automatic Downloads: Fetches the latest Tranco list directly from the official website
- Custom Domain Filtering:
- Filter by Top-Level Domain (TLD):
.com
,.net
,.org
, etc. - Filter by domain length (excluding TLD)
- Filter by Tranco rank range
- Filter by custom regex patterns
- Filter by Top-Level Domain (TLD):
- Multiple Output Formats:
- Plain text files (one domain per line)
- CSV format (with rank information)
- JSON format (with detailed domain metadata)
- HTML format (with clickable links)
- TLD Analysis:
- Comprehensive distribution statistics
- Top TLD occurrence rankings
- Exportable analysis reports
- Advanced Processing:
- Multithreaded processing for large datasets
- Batch processing of multiple TLDs
- Progress bars for long operations
- Automation:
- Automated scheduling via Windows Task Scheduler
- Set daily, weekly, or monthly update schedules
- Python 3.6+
- Required Python packages:
- requests
- beautifulsoup4
- tqdm
- Clone or download this repository to your local machine
- Install the required dependencies:
pip install requests beautifulsoup4 tqdm
Run the script with Python:
python tranco.py
When you run the script, you'll be presented with a main menu:
- Filter domains from Tranco list - Download and filter domains based on your criteria
- Analyze TLD distribution - Get statistics about TLD distribution in the Tranco list
- Delete all text files from directory - Clean up output files
To filter and save domains:
- Select option 1 from the main menu
- Follow the prompts to specify your filtering criteria
- Choose your preferred output format
- The script will download the latest Tranco list and apply your filters
- Filtered domain lists will be saved to your current directory
The script offers multiple filtering options that can be combined:
Enter TLD name(s) separated by commas or spaces (e.g., com net io), or 'all' for all TLDs: com org net
Use all
to process all available TLDs in the Tranco list.
Do you want to filter domains by length? (y/n): y
Enter minimum domain length (characters in the domain name, excluding TLD): 5
Enter maximum domain length (characters in the domain name, excluding TLD): 10
Domain length refers to the characters in the domain name excluding the TLD.
Do you want to filter domains by ranking? (y/n): y
Enter minimum rank (1 = highest ranked): 1
Enter maximum rank: 1000
This allows you to get only the top N domains (e.g., top 1000).
Do you want to filter domains by a pattern? (y/n): y
Enter a regex pattern to match domains (e.g. 'tech|ai' for domains containing 'tech' or 'ai'): tech|ai
Use standard regex patterns to match specific domain characteristics.
The TLD distribution analysis feature provides insights into the composition of the Tranco list:
- Shows the count and percentage of each TLD
- Identifies the most common TLDs
- Allows saving the complete distribution to a file
The script supports four output formats:
- TXT: Simple text file with one domain per line
- CSV: Comma-separated values with rank and domain
- JSON: Structured JSON format with detailed domain information
- HTML: Interactive HTML page with clickable domain links
Example of JSON output:
[
{
"rank": 1,
"domain": "google.com",
"name": "google",
"tld": "com"
},
...
]
Example of HTML output:
<!DOCTYPE html>
<html>
<head>
<title>Filtered Domains (.com)</title>
<!-- CSS styling -->
</head>
<body>
<h1>Top .com Domains</h1>
<table>
<tr><th>Rank</th><th>Domain</th></tr>
<tr><td>1</td><td><a href="http://google.com" target="_blank">google.com</a></td></tr>
<!-- More domains -->
</table>
</body>
</html>
The script can set up automatic recurring downloads via Windows Task Scheduler:
Do you want to set up automatic scheduling for this script? (y/n): y
Schedule frequency options:
1. Daily
2. Weekly
3. Monthly
Enter your choice (1-3): 1
Enter the time to run (HH:MM, 24-hour format): 03:00
This feature requires administrative privileges in Windows.
When processing multiple TLDs (especially with the all
option), the script uses multithreading to significantly speed up the processing time. The number of threads is automatically optimized based on your system's CPU cores.
After processing, you'll get domain list files named according to your filters. Examples:
tranco_1M5JD_com_domains.txt
- All .com domainstranco_1M5JD_com_domains_5-10chars.txt
- .com domains with 5-10 characterstranco_1M5JD_com_domains_rank1-1000.txt
- Top 1000 .com domainstld_distribution_analysis_2023-04-15.txt
- TLD distribution analysis report
Issue: Script fails to download the Tranco list Solution: Check your internet connection and ensure you have proper permissions to write files to the current directory
Issue: Task scheduler setup fails Solution: Run the script as administrator to set up scheduled tasks
Issue: Script seems slow when processing all TLDs
Solution: This is normal for large datasets. The script uses multithreading to speed up processing, but it still takes time. Consider filtering by specific TLDs instead of using all
- "Error fetching the website" - Check your internet connection
- "Error downloading the file" - Check disk space and permissions
- "Error setting up scheduled task" - Try running as administrator
This project is released under the MIT License. See the LICENSE file for details.
Created with ❤️ by Khan Sunny