GitHub - j3ssie/durl: Remove duplicate URLs by retaining only the unique combinations of hostname, path, and parameter names

Diff URLs

Remove duplicate URLs by retaining only the unique combinations of hostname, path, and parameter names.

Install

go install github.com/j3ssie/durl@latest

Usage

# basic usage
cat wayback_urls.txt | durl | tee differ-urls.txt

# with extra regex
cat wayback_urls.txt | durl -e 'your-regex-here' | tee differ-urls.txt

# only get the scope domain
cat spider-urls.txt | durl -t 'target.com' | tee in-scope-url.txt

# parse JSONL data
cat large-jsonl-data.txt | durl -t 'target.com' -f url | tee in-scope-jsonl-data.txt

Covered cases

The following examples illustrate the criteria used to ensure each URL is considered unique and listed only once:

URLs with the same hostname, path, and parameter names

http://sample.example.com/product.aspx?productID=123&type=customer
http://sample.example.com/product.aspx?productID=456&type=admin

Paths indicating static content like blog, news or calender.

https://www.example.com/cn/news/all-news/public-1.html
https://www.sample.com/de/about/business/countrysites.htm
https://www.sample.com/de/about/business/very-long-string-here-that-exceed-100-char.htm
https://www.sample.com/de/blog/2022/01/02/blog-title.htm

URLs with numeric variations

https://www.example.com/data/0001.html
https://www.example.com/data/0002.html

Static file will be ignore like http://example.com.com/cdn-cgi/style.css
Select a url JSON field from the input then filtering with all of the cases above.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diff URLs

Install

Usage

Covered cases

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

j3ssie/durl

Folders and files

Latest commit

History

Repository files navigation

Diff URLs

Install

Usage

Covered cases

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages