Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minimal perfect hash domain matcher #743

Merged
merged 9 commits into from
Mar 15, 2021

Conversation

darsvador
Copy link
Contributor

@darsvador darsvador commented Mar 7, 2021

mphbench
hybridbench
plainbench

As you can see, the mph domain matcher is 30% faster than the original domain matcher, which is comparable to the pure ac automata.

related issue: XTLS/Xray-core#192

  • convert domain names to lowercase when matching domain (app/router/condition.go)
  • rename domain matcher config hybrid to mph
  • add a minimal perfect hash domain matcher
  • remove the hybrid domain matcher

Note: The place where app/dns/dns.go involves domain name matching has not been modified.

@dyhkwong
Copy link
Contributor

dyhkwong commented Mar 7, 2021

need strings.ToLower() here as well

func (s *DNS) LookupIP(domain string, option dns.IPOption) ([]net.IP, error) {

@darsvador

This comment has been minimized.

@darsvador darsvador marked this pull request as draft March 8, 2021 15:40
@kslr
Copy link
Contributor

kslr commented Mar 10, 2021

Are you ready?

@darsvador darsvador marked this pull request as ready for review March 11, 2021 02:33
@darsvador darsvador marked this pull request as draft March 12, 2021 16:00
@darsvador
Copy link
Contributor Author

I will provide a new domain matcher implementation in this pr, which can achieve the same speed as pure AC automata but save more memory than hybrid matcher. Stay tuned.

@darsvador darsvador changed the title Convert domain names to lowercase when matching Add minimal perfect hash domain matcher Mar 13, 2021
@darsvador darsvador marked this pull request as ready for review March 13, 2021 05:01
@darsvador
Copy link
Contributor Author

darsvador commented Mar 13, 2021

Implementation detail

The MphDomainMatcher is divided into three parts:

  1. full and domain patterns are matched by Rabin-Karp algorithm and minimal perfect hash table;
  2. substr patterns are matched by ac automaton;
  3. regex patterns are matched with the regex library.

Matching problem definition:

  • a domain rule baidu.com can be seen as exact match moc.udiab and moc.udiab. when traversing the domain names in reverse order. And moc.udiab and moc.udiab. should not appear in the middle of the string.
  • a full rule baidu.com can be seen as exact match moc.udiab when traversing the domain names in reverse order. And moc.udiab should not appear in the middle of the string.
  • a substr rule baidu.com is a matching problem that checks if baidu.com is a substring of the given domain names. substr rules can be matched by ACAutomaton.

Through the above definition, we can merge the full and domain rules together to match. The simplest way is to store these rules in the HashMap. However, when we query, we need to calculate the hash value of the same string and its substrings. This additional overhead can be reduced by rolling hash.
We choose 32bit FNV-prime 16777619 to calculate our rolling hash.

Inspired by "Hash, displace, and compress" algorithm, we can design a minimal perfect hash table through two rounds hashes. The first round of hash is rolling hash, which we get directly from the process of traversing the string. The second round of hash is memhash in the golang source code.

In this way, when checking whether the rule is hit, we only need to calculate the hash and compare it once.

func (g *MphMatcherGroup) Lookup(rollingHashValue uint32, s string) bool {
	i0 := int(rollingHashValue) & g.level0Mask
	seed := g.level0[i0]
	i1 := int(memhash(s,seed)) & g.level1Mask
	n := g.level1[i1]
	return s == g.rules[int(n)]
}

@kslr kslr merged commit ac1e5cd into v2fly:master Mar 15, 2021
@kslr
Copy link
Contributor

kslr commented Mar 15, 2021

Thanks for your work. This is great!

Loyalsoldier added a commit that referenced this pull request Mar 16, 2021
* update geoip, geosite

* Chore: bump google.golang.org/grpc from 1.35.0 to 1.36.0 (#711)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Chore: bump github.com/miekg/dns from 1.1.39 to 1.1.40 (#712)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add /opt to assets location (#715)

* Add definition for transport layer chained proxy

* Regenerate protobuf for transport layer chain proxy

* Added Transport Layer Chained Proxy Support

* Fix dependency cycle caused by core import in internet package

* Fix forced outbound tag not set correctly

* Disable routing for platform initialized detour

* Added Auto generated file

* don't build tagged outbound dial on configure setting

* Fix for context with empty content

* Fix ALPN being set to h2 by default when using TCP (#716)

* Deprecate legacy VMess header with a planned decommission (#717)

* Zero Security imaginary security level

* Regenerate protobuf for Zero Security imaginary security level

* Imaginary Security Lever: zero: turn off all security on payload data

* Test for Imaginary Security Level: zero

* Fix panic: index out of range (#727)

* Chore: update dependencies & protobuf (#728)

* A memory-efficient and fast hybrid matcher (#639)

* a faster DomainMatcher implementation

* rename benchmark name

* fix linting errors

* add hybrid matcher

* add rabin-karp algorithm

* rename test & fix linting errors

* add more comment

* format code

* revert `MatcherGroup` match func

* fix linting errors

* Allow the selection of domain matcher

* Apply domain selector choice

* json parsing rule for domain matcher

* output debug message when ACAutomatonDomainMatcher is enabled.

* update version

* update geoip, geosite

* Chore: bump github.com/google/go-cmp from 0.5.4 to 0.5.5 (#732)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* workaround crash when V is not in context

* rename config for NewACAutomatonDomainMatcher to hybrid

* Allow bulk definition of domain matcher at parent level

* fix misbehaving code crash and create bug on transport level front proxy

* fixing misbehaving code in mux that do not propagate context

* create session content in the context if do not exist yet

* Create a name for linear domain matcher

* update version to 4.35.1

* Chore: update protobuf & dependencies (#748)

* Chore: bump actions/stale from v3.0.17 to v3.0.18 (#752)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* DNS: refine Android bootstrap DNS logic (#767)

* Chore: bump github.com/pires/go-proxyproto from 0.4.2 to 0.5.0 (#751)

Bumps [github.com/pires/go-proxyproto](https://github.com/pires/go-proxyproto) from 0.4.2 to 0.5.0.
- [Release notes](https://github.com/pires/go-proxyproto/releases)
- [Commits](pires/go-proxyproto@v0.4.2...v0.5.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Grpc Gun Transport (#757)

* introduce grpc transport structure

* fix package name inconsistency

* grpc gun transport dialer and listener

* add selective build tag

* add grpc:gun listener

* add grpc:gun config

* add generated files

* various bug fix for gun:grpc transport

* Cache dialed connections

* grpc:gun Use V2Ray Managed Dial function

* Update destination.pb.go

* Update gun.go

* GunSettings -> GunConfig

* gu -> gs

* add grpc alias

Co-authored-by: RPRX <63339210+rprx@users.noreply.github.com>
Co-authored-by: kslr <kslrwang@gmail.com>

* fix applied wrong name, and wrong varible name

* Add grpcSettings (alias of gunSettings)

* update geoip, geosite

* loopback outbound, allow you to redirect connection to the dispatcher again (#770)

* Added Loop back proxy

* Added json processing for lo proxy

* Fix bug for lo proxy

* Fix bug for lo proxy

* rename the outbound name

* Loopback: update naming and fix lint issues

* Chore: change lo to loopback

Co-authored-by: kslr <kslrwang@gmail.com>
Co-authored-by: loyalsoldier <10487845+Loyalsoldier@users.noreply.github.com>

* update version

* Chore: format import using goimports (#780)

* Chore: fix lint according to golangci-lint errors (#781)

* Chore: fix lint according to golangci-lint errors
* Chore: regenerate pb.go files

* Add minimal perfect hash domain matcher (#743)

* rename to HybridDomainMatcher & convert domain to lowercase

* refactor code & add open hashing for rolling hash map

* fix lint errors

* update app/dns/dns.go

* convert domain to lowercase in `strmatcher.go`

* keep the original matcher behavior

* add mph domain matcher & conver domain names to loweercase when matching

* fix lint errors

* fix lint errors

* Route: mph add alias hybrid

* FakeDNS: use 198.18.0.0/15 as default IP pool (#779)

* Add remote address to grpc transport layer conn (#783)

* Add remote address to grpc transport layer conn

* go fmt

* Revert "Test: fix http2 dial timeout (#570)" (#778)

* Revert "Test: fix http2 dial timeout (#570)"

This reverts commit 405a051.

* Feat: lower the payload size

* Remove state.NegotiatedProtocolIsMutual

It has been deprecated since Go 1.16 because it shouldn't be used: this value is always true.

* Chore: format code

Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kid <44045911+kidonng@users.noreply.github.com>
Co-authored-by: Shelikhoo <xiaokangwang@outlook.com>
Co-authored-by: 秋のかえで <autmaple@protonmail.com>
Co-authored-by: DarthVader <61409963+darsvador@users.noreply.github.com>
Co-authored-by: CalmLong <37164399+CalmLong@users.noreply.github.com>
Co-authored-by: RPRX <63339210+rprx@users.noreply.github.com>
Co-authored-by: kslr <kslrwang@gmail.com>
Co-authored-by: maskedeken <52683904+maskedeken@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants