HLOC: Hints-Based Geolocation Leveraging Multiple Measurement Frameworks
Our architecture approach:
The Codes Parsing process collect location codes from the selected sources, merges them if desired and stores them in the database.
The pre-processing step parses IP/DNS files and classifies/filters the domain into multiple groups. The domains with their group information are then stored in the database. We cannot deliver the IP/DNS files due to its size.
The find step does:
- Create a trie out of all location information
- Match the domain labbel stored in the pre-processing step against this trie
- Store the resulting location hints in the database
The measure step does:
- Read all domains randomly
- Conduct measurements with various frameworks for all location hints of a domain
- Store all measurement results in the database
We produce a daily export with all location hints for all domains with the minimal RTT measurement to the corresponding IP address. The file contains the following columns (the csv column title is in brackets):
- Domain id (domain_id): The id of the domain entry in the domains table of the database.
- Domain name (domain_name): The full domain name.
- IP address (ip_address): The IP address of the domain name from the time of the rDNS export.
- Location hint id (location_hints_id): The id of the location hint in the location_hints table of the database.
- Location code (hint_location_code): The location code which has been found in the domain name and which is checked
- Location code type (location_hint_type): The type of the location code (e.g. iata, geonames, \ldots)
- Hint Location id (hint_location_id): The id of the location, corresponding to the location code, in the locations table of the database.
- Hint location latitude (hint_location_lat): The latitude of the hint location.
- Hint location longitude (hint_location_lon): The latitude of the hint location.
- Probe id (probe_id): The id of the measurement probe in the probes table of the database. The probe information is for the measurement with the global minimum RTT to the IP address.
- Probe location latitude (probe_location_lat): The latitude of the hint location.
- Probe location longitude (probe_location_lon): The latitude of the hint location.
- Measurement result id (measurement_results_id): The id of the measurement result in the measurement_results table of the database. This measurement conatins the current global minimum RTT to the destination IP address.
- RIPE Atlas measurement id (ripe_measurement_id): If the fastest measurement was from RIPE Atlas this column contains the RIPE Atlas measurement id else it is empty.
- Measurement timestamp (measurement_timestamp): The timestamp of the measurement as a UNIX timestamp in UTC
- Measurement type (measurement_type): The source of the measurement, e.g. RIPE Atlas, Caida, \ldots
- Is from traceroute (from_traceroute): A boolean value indicating if this measurement result was extracted from a traceroute measurement.
- Minimal RTT (min_rtt_ms): The RTT in milliseconds of the measurement
- Distance (distance_km): The distance between the probe location and the hints location (the suspected location). This is relevant to determine the maximal error and if a hint can be considered valid.
- Is the hint possible (possible): A boolean value indicating if the location hint is theoretical still possible considering this global minimal RTT.