Date/time value is out of range in the Uber 2014 data #3

Closed
marklit opened this Issue Feb 9, 2016 · 6 comments

Comments

Projects
None yet
3 participants
@marklit

marklit commented Feb 9, 2016

When I import the Uber data for 2014 I get an error message that the date/time field value is out of range for the value 4/13/2014 0:01:00.

$ ./import_uber_trip_data.sh
Tue Feb  9 14:32:16 EET 2016: beginning load for data/uber-raw-data-apr14.csv
ERROR:  date/time field value out of range: "4/13/2014 0:01:00"
HINT:  Perhaps you need a different "datestyle" setting.
CONTEXT:  COPY uber_trips_staging, line 15076, column pickup_datetime: "4/13/2014 0:01:00"
Tue Feb  9 14:32:16 EET 2016: finished raw load for data/uber-raw-data-apr14.csv

It looks as though this value exists in a number of records:

$ grep -n "4/13/2014 0:01:00" data/uber-raw-data-apr14.csv
15076:"4/13/2014 0:01:00",40.7075,-73.9483,"B02512"
98906:"4/13/2014 0:01:00",40.7342,-73.999,"B02598"
98907:"4/13/2014 0:01:00",40.6316,-73.8876,"B02598"
98908:"4/13/2014 0:01:00",40.7668,-73.9676,"B02598"
98909:"4/13/2014 0:01:00",40.7345,-74.0014,"B02598"
98910:"4/13/2014 0:01:00",40.7463,-73.983,"B02598"
98911:"4/13/2014 0:01:00",40.7059,-73.92,"B02598"
263998:"4/13/2014 0:01:00",40.7396,-74.0276,"B02617"
263999:"4/13/2014 0:01:00",40.7386,-74.0055,"B02617"
264000:"4/13/2014 0:01:00",40.737,-73.9882,"B02617"
264001:"4/13/2014 0:01:00",40.7139,-73.9598,"B02617"
422703:"4/13/2014 0:01:00",40.7409,-74.0075,"B02682"
422704:"4/13/2014 0:01:00",40.714,-73.9579,"B02682"
422705:"4/13/2014 0:01:00",40.7336,-74.0048,"B02682"
422706:"4/13/2014 0:01:00",40.7573,-73.9847,"B02682"
422707:"4/13/2014 0:01:00",40.7162,-73.9549,"B02682"
422708:"4/13/2014 0:01:00",40.7358,-73.998,"B02682"
422709:"4/13/2014 0:01:00",40.6923,-73.9878,"B02682"
422710:"4/13/2014 0:01:00",40.7325,-74.0035,"B02682"
422711:"4/13/2014 0:01:00",40.7335,-74.0043,"B02682"
422712:"4/13/2014 0:01:00",40.6776,-73.912,"B02682"
422713:"4/13/2014 0:01:00",40.6352,-73.9503,"B02682"

Just about every other month in 2014 for the Uber data is raising the same issue.

@marklit marklit changed the title from Uber 2014 date/time value is out of range to Date/time value is out of range in the Uber 2014 data Feb 9, 2016

@toddwschneider

This comment has been minimized.

Show comment
Hide comment
@toddwschneider

toddwschneider Feb 10, 2016

Owner

Hmm, I can't reproduce this and just confirmed it works okay on my setup. What OS and version of PostgreSQL are you using? I'm on Postgres 9.4.4 / OS X 10.10.5

Owner

toddwschneider commented Feb 10, 2016

Hmm, I can't reproduce this and just confirmed it works okay on my setup. What OS and version of PostgreSQL are you using? I'm on Postgres 9.4.4 / OS X 10.10.5

@marklit

This comment has been minimized.

Show comment
Hide comment
@marklit

marklit Feb 10, 2016

It's a fresh install of Ubuntu 14.04.3 LTS. PostgreSQL and Postgis were installed with the following command:

$ sudo apt-get update
$ sudo apt-get install postgresql-9.3-postgis-2.1

I downloaded the data on Saturday (2016-02-06).

marklit commented Feb 10, 2016

It's a fresh install of Ubuntu 14.04.3 LTS. PostgreSQL and Postgis were installed with the following command:

$ sudo apt-get update
$ sudo apt-get install postgresql-9.3-postgis-2.1

I downloaded the data on Saturday (2016-02-06).

@MatthewWilkes

This comment has been minimized.

Show comment
Hide comment
@MatthewWilkes

MatthewWilkes Feb 14, 2016

Apologies for jumping in to this without sufficient context, I was reading @marklit's blog post but haven't tried reproducing this myself.

@marklit Is it just the 13th that you get errors for, no other days of the month? It looks suspiciously like it's trying to import m/d/y dates as d/m/y. If it's using the machine's locale that may explain why it fails in London and works in NYC. Apologies if it's too basic a question, it's just that you haven't explicitly addressed it yet.

Apologies for jumping in to this without sufficient context, I was reading @marklit's blog post but haven't tried reproducing this myself.

@marklit Is it just the 13th that you get errors for, no other days of the month? It looks suspiciously like it's trying to import m/d/y dates as d/m/y. If it's using the machine's locale that may explain why it fails in London and works in NYC. Apologies if it's too basic a question, it's just that you haven't explicitly addressed it yet.

@toddwschneider

This comment has been minimized.

Show comment
Hide comment
@toddwschneider

toddwschneider Feb 14, 2016

Owner

@MatthewWilkes good idea, thanks!

@marklit can you try running with this change to set the datestyle to MDY: 0f08601

If that works, I'll merge into master

Owner

toddwschneider commented Feb 14, 2016

@MatthewWilkes good idea, thanks!

@marklit can you try running with this change to set the datestyle to MDY: 0f08601

If that works, I'll merge into master

@marklit

This comment has been minimized.

Show comment
Hide comment
@marklit

marklit Feb 15, 2016

It's importing without any complaints now. Thanks @toddwschneider.

Here's my steps in testing out 0f08601:

This was all done on a fresh Ubuntu 15 installation.

$ sudo apt-get install postgresql-9.4-postgis-2.1 postgis git unzip
$ git clone https://github.com/toddwschneider/nyc-taxi-data.git
$ cd nyc-taxi-data
$ git checkout 0f08601
$ vi minimal_downloads.txt

Contents of minimal_downloads.txt:

https://raw.githubusercontent.com/fivethirtyeight/uber-tlc-foil-response/master/uber-trip-data/uber-raw-data-apr14.csv
https://raw.githubusercontent.com/fivethirtyeight/uber-tlc-foil-response/master/uber-trip-data/uber-raw-data-janjune-15.csv.zip
$ cat minimal_downloads.txt | \
          xargs -n 1 -P 6 wget -P data/ &
$ sudo su - postgres -c "psql -c 'CREATE USER mark; ALTER USER mark WITH SUPERUSER;'"
$ ./initialize_database.sh
$ ./import_uber_trip_data.sh

Relevant output from import_uber_trip_data.sh:

Mon Feb 15 09:40:33 PST 2016: beginning load for data/uber-raw-data-apr14.csv
COPY 564516
Mon Feb 15 09:40:35 PST 2016: finished raw load for data/uber-raw-data-apr14.csv
SELECT 564516
CREATE INDEX
SELECT 554565
INSERT 0 564516
TRUNCATE TABLE
DROP TABLE
DROP TABLE
Mon Feb 15 09:40:47 PST 2016: loaded trips for data/uber-raw-data-apr14.csv

marklit commented Feb 15, 2016

It's importing without any complaints now. Thanks @toddwschneider.

Here's my steps in testing out 0f08601:

This was all done on a fresh Ubuntu 15 installation.

$ sudo apt-get install postgresql-9.4-postgis-2.1 postgis git unzip
$ git clone https://github.com/toddwschneider/nyc-taxi-data.git
$ cd nyc-taxi-data
$ git checkout 0f08601
$ vi minimal_downloads.txt

Contents of minimal_downloads.txt:

https://raw.githubusercontent.com/fivethirtyeight/uber-tlc-foil-response/master/uber-trip-data/uber-raw-data-apr14.csv
https://raw.githubusercontent.com/fivethirtyeight/uber-tlc-foil-response/master/uber-trip-data/uber-raw-data-janjune-15.csv.zip
$ cat minimal_downloads.txt | \
          xargs -n 1 -P 6 wget -P data/ &
$ sudo su - postgres -c "psql -c 'CREATE USER mark; ALTER USER mark WITH SUPERUSER;'"
$ ./initialize_database.sh
$ ./import_uber_trip_data.sh

Relevant output from import_uber_trip_data.sh:

Mon Feb 15 09:40:33 PST 2016: beginning load for data/uber-raw-data-apr14.csv
COPY 564516
Mon Feb 15 09:40:35 PST 2016: finished raw load for data/uber-raw-data-apr14.csv
SELECT 564516
CREATE INDEX
SELECT 554565
INSERT 0 564516
TRUNCATE TABLE
DROP TABLE
DROP TABLE
Mon Feb 15 09:40:47 PST 2016: loaded trips for data/uber-raw-data-apr14.csv

@toddwschneider toddwschneider closed this in #4 Feb 15, 2016

@toddwschneider

This comment has been minimized.

Show comment
Hide comment
@toddwschneider

toddwschneider Feb 15, 2016

Owner

Thanks, I merged the change

Owner

toddwschneider commented Feb 15, 2016

Thanks, I merged the change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment