Skip to content

webis-de/aitools4-aq-geolocation

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
lib
 
 
 
 
 
 
 
 

AItools 4 - Acquisition - Geolocation

Library and program for the historical geolocalization of IP addresses.

The program allows you to geolocate IP addresses for specific time instants in the past. It incorporates data from the different regional Internet registries (RIRs) as well as from IPlocation database files (you can get a current IPlocation file from https://lite.ip2location.com/database-ip-country-region-city-latitude-longitude-zipcode-timezone ).

The program uses RIR data to determine when IP addresses are reassigned and assumes---if it does not have contradicting information---that IP addresses do not change their geolocation (i.e., their time zone) when they are not reassigned.

Since RIR data does not contain geolocation information other than the country, IPlocation databases are used to determine the time zone of an IP address. More IPlocation databases will improve the reliability of the geolocalization, but the program will in any case only output geolocations if it is somewhat sure about it. However, you will need older IPlocation databases when you want to geolocate addresses at older times. This site or product includes IP2Location LITE data available from http://lite.ip2location.com.

Provided time data version: April 2016

Requirements

Setup

  • Update the following files if you want to geolocate IP addresses after the time data version listed at the top of this document (not necessary otherwise):

  • You also might have to update the time zone database of your Java VM (if you get errors that some time zone is unknown)

  • Process IPlocation databases.

    • Github: You should have already received one parsed IP2location Lite DB11 along with this code that you can use to test this software in iplocation-parsed: just unzip it.

    • Put all your IPlocation database CSV files in one directory (we here use "data/iplocation")

    • The files may have to be renamed (file format is detected by file name). The format is displayed when you run with your classpath:

      java -Xmx8G -cp <classpath> de.aitools.aq.geolocating.iplocations.IplocationIpBlocks
      
    • After renaming, run with your classpath:

      java -Xmx8G -cp <classpath> de.aitools.aq.geolocating.iplocations.IplocationIpBlocks data/iplocation data/iplocation-parsed
      
  • Update RIR database if you want to geolocate IP addresses after the time data version listed at the top of this document (not necessary otherwise):

    • Put all RIR registry files in a directory structure starting at "data/rir" (they are called something like delegated-.*-)

    • Yes, you need all such registry files ever, as each file only contains the last assignment of an IP. You might also want to ask johannes.kiesel@uni-weimar.de for a more up-to-date version.

    • Run with your classpath:

      java -Xmx8G -cp <classpath> de.aitools.aq.geolocating.rir.RirIpBlocks data/rir data/rir-parsed
      

Quickstart

  • Run with your classpath:

    java -Xmx8G -cp <classpath> de.aitools.aq.geolocating.Geolocator data/iplocation-parsed data/rir-parsed <input> <time-format> <output>
    

    Where:

    • input is a file containing the IPv4 addresses and times for the historical geolocation. One address per line:

      <address>[TAB]<time>
      
    • time-format is the format of the field in the input file. The format needs to be specified for Java SimpleDateFormat: http://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html

    • output is the file where the output should be written to. Output format is either (on successful geolocalization)

      <address>[TAB]<time>[TAB]<time-zone>[TAB]<country-code>
      

      or (on failed geolocalization)

      <address>[TAB]<time>
      

      and can be deserialized again using de.aitools.aq.geolocating.Geolocalization#parse(InputStream)

  • You can test if everything works using

        java -Xmx8G -cp <classpath> de.aitools.aq.geolocating.Geolocator data/iplocation-parsed data/rir-parsed example.txt "YYYY-MM-dd'T'HH:mm:ss" example-geolocated.txt
    

    The output should be something like this (written to standard output, in this case it shows that it decided three times for "true" (i.e.: valid geolocalization) as in all three cases there was information from RIR and Iplocation, and this information was not inconsistent but time zone consistent):

       Decisions:
       3
       RIR = true	3
         IPlocation = true	3
           inconsistent = true	0	false
           inconsistent = false	3
             time zone consistent = true	3	true
             time zone consistent = false	0
               locally time zone consistent = true	0	true
               locally time zone consistent = false	0	false
         IPlocation = false	0
           1 time zone = true	0	true
           1 time zone = false	0	false
       RIR = false	0	false
    

    And this (written to example-geolocated.txt):

       70.19.29.244	2016-01-04T07:42:27Z	America/New_York	US
       31.121.85.30	2016-01-04T07:29:15Z	Europe/London	GB
       86.23.18.214	2014-12-29T14:36:33Z	Europe/London	GB
    

    Where the third column gives the Olson time zone and the fourth column gives the country code. Third and fourth column will be missing if not enough or conflicting geolocation information is available.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages