No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
lib
resources/de/aitools/aq/geolocating/timezones
src/de/aitools/aq
LICENSE
README.md
example.txt

README.md

AItools 4 - Acquisition - Geolocation

Library and program for the historical geolocalization of IP addresses.

The program allows you to geolocate IP addresses for specific time instants in the past. It incorporates data from the different regional Internet registries (RIRs) as well as from IPlocation database files (you can get a current IPlocation file from https://lite.ip2location.com/database-ip-country-region-city-latitude-longitude-zipcode-timezone ).

The program uses RIR data to determine when IP addresses are reassigned and assumes---if it does not have contradicting information---that IP addresses do not change their geolocation (i.e., their time zone) when they are not reassigned.

Since RIR data does not contain geolocation information other than the country, IPlocation databases are used to determine the time zone of an IP address. More IPlocation databases will improve the reliability of the geolocalization, but the program will in any case only output geolocations if it is somewhat sure about it. However, you will need older IPlocation databases when you want to geolocate addresses at older times. This site or product includes IP2Location LITE data available from http://lite.ip2location.com.

Provided time data version: April 2016

Requirements

Setup

  • Update the following files if you want to geolocate IP addresses after the time data version listed at the top of this document (not necessary otherwise):

  • You also might have to update the time zone database of your Java VM (if you get errors that some time zone is unknown)

  • Process IPlocation databases.

    • Github: You should have already received one parsed IP2location Lite DB11 along with this code that you can use to test this software in iplocation-parsed: just unzip it.

    • Put all your IPlocation database CSV files in one directory (we here use "data/iplocation")

    • The files may have to be renamed (file format is detected by file name). The format is displayed when you run with your classpath:

      java -Xmx8G -cp <classpath> de.aitools.aq.geolocating.iplocations.IplocationIpBlocks
      
    • After renaming, run with your classpath:

      java -Xmx8G -cp <classpath> de.aitools.aq.geolocating.iplocations.IplocationIpBlocks data/iplocation data/iplocation-parsed
      
  • Update RIR database if you want to geolocate IP addresses after the time data version listed at the top of this document (not necessary otherwise):

    • Put all RIR registry files in a directory structure starting at "data/rir" (they are called something like delegated-.*-)

    • Yes, you need all such registry files ever, as each file only contains the last assignment of an IP. You might also want to ask johannes.kiesel@uni-weimar.de for a more up-to-date version.

    • Run with your classpath:

      java -Xmx8G -cp <classpath> de.aitools.aq.geolocating.rir.RirIpBlocks data/rir data/rir-parsed
      

Quickstart

  • Run with your classpath:

    java -Xmx8G -cp <classpath> de.aitools.aq.geolocating.Geolocator data/iplocation-parsed data/rir-parsed <input> <time-format> <output>
    

    Where:

    • input is a file containing the IPv4 addresses and times for the historical geolocation. One address per line:

      <address>[TAB]<time>
      
    • time-format is the format of the field in the input file. The format needs to be specified for Java SimpleDateFormat: http://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html

    • output is the file where the output should be written to. Output format is either (on successful geolocalization)

      <address>[TAB]<time>[TAB]<time-zone>[TAB]<country-code>
      

      or (on failed geolocalization)

      <address>[TAB]<time>
      

      and can be deserialized again using de.aitools.aq.geolocating.Geolocalization#parse(InputStream)

  • You can test if everything works using

        java -Xmx8G -cp <classpath> de.aitools.aq.geolocating.Geolocator data/iplocation-parsed data/rir-parsed example.txt "YYYY-MM-dd'T'HH:mm:ss" example-geolocated.txt
    

    The output should be something like this (written to standard output, in this case it shows that it decided three times for "true" (i.e.: valid geolocalization) as in all three cases there was information from RIR and Iplocation, and this information was not inconsistent but time zone consistent):

       Decisions:
       3
       RIR = true	3
         IPlocation = true	3
           inconsistent = true	0	false
           inconsistent = false	3
             time zone consistent = true	3	true
             time zone consistent = false	0
               locally time zone consistent = true	0	true
               locally time zone consistent = false	0	false
         IPlocation = false	0
           1 time zone = true	0	true
           1 time zone = false	0	false
       RIR = false	0	false
    

    And this (written to example-geolocated.txt):

       70.19.29.244	2016-01-04T07:42:27Z	America/New_York	US
       31.121.85.30	2016-01-04T07:29:15Z	Europe/London	GB
       86.23.18.214	2014-12-29T14:36:33Z	Europe/London	GB
    

    Where the third column gives the Olson time zone and the fourth column gives the country code. Third and fourth column will be missing if not enough or conflicting geolocation information is available.