Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date/time value is out of range in the Uber 2014 data #3

Closed
marklit opened this issue Feb 9, 2016 · 6 comments
Closed

Date/time value is out of range in the Uber 2014 data #3

marklit opened this issue Feb 9, 2016 · 6 comments

Comments

@marklit
Copy link

@marklit marklit commented Feb 9, 2016

When I import the Uber data for 2014 I get an error message that the date/time field value is out of range for the value 4/13/2014 0:01:00.

$ ./import_uber_trip_data.sh
Tue Feb  9 14:32:16 EET 2016: beginning load for data/uber-raw-data-apr14.csv
ERROR:  date/time field value out of range: "4/13/2014 0:01:00"
HINT:  Perhaps you need a different "datestyle" setting.
CONTEXT:  COPY uber_trips_staging, line 15076, column pickup_datetime: "4/13/2014 0:01:00"
Tue Feb  9 14:32:16 EET 2016: finished raw load for data/uber-raw-data-apr14.csv

It looks as though this value exists in a number of records:

$ grep -n "4/13/2014 0:01:00" data/uber-raw-data-apr14.csv
15076:"4/13/2014 0:01:00",40.7075,-73.9483,"B02512"
98906:"4/13/2014 0:01:00",40.7342,-73.999,"B02598"
98907:"4/13/2014 0:01:00",40.6316,-73.8876,"B02598"
98908:"4/13/2014 0:01:00",40.7668,-73.9676,"B02598"
98909:"4/13/2014 0:01:00",40.7345,-74.0014,"B02598"
98910:"4/13/2014 0:01:00",40.7463,-73.983,"B02598"
98911:"4/13/2014 0:01:00",40.7059,-73.92,"B02598"
263998:"4/13/2014 0:01:00",40.7396,-74.0276,"B02617"
263999:"4/13/2014 0:01:00",40.7386,-74.0055,"B02617"
264000:"4/13/2014 0:01:00",40.737,-73.9882,"B02617"
264001:"4/13/2014 0:01:00",40.7139,-73.9598,"B02617"
422703:"4/13/2014 0:01:00",40.7409,-74.0075,"B02682"
422704:"4/13/2014 0:01:00",40.714,-73.9579,"B02682"
422705:"4/13/2014 0:01:00",40.7336,-74.0048,"B02682"
422706:"4/13/2014 0:01:00",40.7573,-73.9847,"B02682"
422707:"4/13/2014 0:01:00",40.7162,-73.9549,"B02682"
422708:"4/13/2014 0:01:00",40.7358,-73.998,"B02682"
422709:"4/13/2014 0:01:00",40.6923,-73.9878,"B02682"
422710:"4/13/2014 0:01:00",40.7325,-74.0035,"B02682"
422711:"4/13/2014 0:01:00",40.7335,-74.0043,"B02682"
422712:"4/13/2014 0:01:00",40.6776,-73.912,"B02682"
422713:"4/13/2014 0:01:00",40.6352,-73.9503,"B02682"

Just about every other month in 2014 for the Uber data is raising the same issue.

@marklit marklit changed the title Uber 2014 date/time value is out of range Date/time value is out of range in the Uber 2014 data Feb 9, 2016
@toddwschneider
Copy link
Owner

@toddwschneider toddwschneider commented Feb 10, 2016

Hmm, I can't reproduce this and just confirmed it works okay on my setup. What OS and version of PostgreSQL are you using? I'm on Postgres 9.4.4 / OS X 10.10.5

@marklit
Copy link
Author

@marklit marklit commented Feb 10, 2016

It's a fresh install of Ubuntu 14.04.3 LTS. PostgreSQL and Postgis were installed with the following command:

$ sudo apt-get update
$ sudo apt-get install postgresql-9.3-postgis-2.1

I downloaded the data on Saturday (2016-02-06).

@MatthewWilkes
Copy link

@MatthewWilkes MatthewWilkes commented Feb 14, 2016

Apologies for jumping in to this without sufficient context, I was reading @marklit's blog post but haven't tried reproducing this myself.

@marklit Is it just the 13th that you get errors for, no other days of the month? It looks suspiciously like it's trying to import m/d/y dates as d/m/y. If it's using the machine's locale that may explain why it fails in London and works in NYC. Apologies if it's too basic a question, it's just that you haven't explicitly addressed it yet.

@toddwschneider
Copy link
Owner

@toddwschneider toddwschneider commented Feb 14, 2016

@MatthewWilkes good idea, thanks!

@marklit can you try running with this change to set the datestyle to MDY: 0f08601

If that works, I'll merge into master

@marklit
Copy link
Author

@marklit marklit commented Feb 15, 2016

It's importing without any complaints now. Thanks @toddwschneider.

Here's my steps in testing out 0f08601:

This was all done on a fresh Ubuntu 15 installation.

$ sudo apt-get install postgresql-9.4-postgis-2.1 postgis git unzip
$ git clone https://github.com/toddwschneider/nyc-taxi-data.git
$ cd nyc-taxi-data
$ git checkout 0f08601
$ vi minimal_downloads.txt

Contents of minimal_downloads.txt:

https://raw.githubusercontent.com/fivethirtyeight/uber-tlc-foil-response/master/uber-trip-data/uber-raw-data-apr14.csv
https://raw.githubusercontent.com/fivethirtyeight/uber-tlc-foil-response/master/uber-trip-data/uber-raw-data-janjune-15.csv.zip
$ cat minimal_downloads.txt | \
          xargs -n 1 -P 6 wget -P data/ &
$ sudo su - postgres -c "psql -c 'CREATE USER mark; ALTER USER mark WITH SUPERUSER;'"
$ ./initialize_database.sh
$ ./import_uber_trip_data.sh

Relevant output from import_uber_trip_data.sh:

Mon Feb 15 09:40:33 PST 2016: beginning load for data/uber-raw-data-apr14.csv
COPY 564516
Mon Feb 15 09:40:35 PST 2016: finished raw load for data/uber-raw-data-apr14.csv
SELECT 564516
CREATE INDEX
SELECT 554565
INSERT 0 564516
TRUNCATE TABLE
DROP TABLE
DROP TABLE
Mon Feb 15 09:40:47 PST 2016: loaded trips for data/uber-raw-data-apr14.csv
@toddwschneider
Copy link
Owner

@toddwschneider toddwschneider commented Feb 15, 2016

Thanks, I merged the change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants
You can’t perform that action at this time.