Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of bound errors on dates from import job #1242

Open
rpfernandezjr opened this issue Feb 17, 2021 · 3 comments
Open

Out of bound errors on dates from import job #1242

rpfernandezjr opened this issue Feb 17, 2021 · 3 comments

Comments

@rpfernandezjr
Copy link

Cronjob Import Error

When running the import cronjob, and fetching data from Unizins Data Warehouse (UDW), We some times get a failed job due to dates being out of bounds.

Description

In postgres this is a valid date format. 0021-03-01 04:55:00

      due_date       |     local_date      |                             name                              |     course_id     |        id         | points_possible | assignment_group_id |       ad_id       
---------------------+---------------------+---------------------------------------------------------------+-------------------+-------------------+-----------------+---------------------+-------------------
 0021-03-01 04:55:00  | 0021-02-28 23:22:49 | Lab and Medical Math Quiz                                     | 10160000000444444 | 1016000000444444 |               5 |   10160000000666666 | 10160000007777777

When the cronjob pulls this row, then tries to process it, inserting into the MySQL database, the entire process fails with

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 1858, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data)
  File "pandas/_libs/tslibs/conversion.pyx", line 198, in pandas._libs.tslibs.conversion.datetime_to_datetime64
  File "pandas/_libs/tslibs/np_datetime.pyx", line 117, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 21-03-01 04:55:00
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/code/dashboard/cron.py", line 61, in util_function
    df.to_sql(con=engine, name=mysql_table, if_exists='append', index=False)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/generic.py", line 2653, in to_sql
    sql.to_sql(
  File "/usr/local/lib/python3.8/site-packages/pandas/io/sql.py", line 512, in to_sql
    pandas_sql.to_sql(
  File "/usr/local/lib/python3.8/site-packages/pandas/io/sql.py", line 1306, in to_sql
    table = SQLTable(
  File "/usr/local/lib/python3.8/site-packages/pandas/io/sql.py", line 622, in __init__
    self.table = self._create_table_setup()
  File "/usr/local/lib/python3.8/site-packages/pandas/io/sql.py", line 868, in _create_table_setup
    column_names_and_types = self._get_column_names_and_types(self._sqlalchemy_type)
  File "/usr/local/lib/python3.8/site-packages/pandas/io/sql.py", line 858, in _get_column_names_and_types
    column_names_and_types += [
  File "/usr/local/lib/python3.8/site-packages/pandas/io/sql.py", line 859, in <listcomp>
    (str(self.frame.columns[i]), dtype_mapper(self.frame.iloc[:, i]), False)
  File "/usr/local/lib/python3.8/site-packages/pandas/io/sql.py", line 970, in _sqlalchemy_type
    if col.dt.tz is not None:
  File "/usr/local/lib/python3.8/site-packages/pandas/core/accessor.py", line 85, in _getter
    return self._delegate_property_get(name)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/accessors.py", line 62, in _delegate_property_get
    values = self._get_values()
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/accessors.py", line 53, in _get_values
    return DatetimeIndex(data, copy=False, name=self.name)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py", line 245, in __new__
    dtarr = DatetimeArray._from_sequence(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 313, in _from_sequence
    subarr, tz, inferred_freq = sequence_to_dt64ns(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 1754, in sequence_to_dt64ns
    data, inferred_tz = objects_to_datetime64ns(
  File "/usr/local/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 1863, in objects_to_datetime64ns
    raise e
  File "/usr/local/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 1848, in objects_to_datetime64ns
    result, tz_parsed = tslib.array_to_datetime(
  File "pandas/_libs/tslib.pyx", line 481, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 698, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 694, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 566, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslibs/np_datetime.pyx", line 117, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 21-03-01 04:55:00

From what I can gather, MySQL doesn't like days to go this far back. Same thing happens when the date is too far ahead into the future, lets say with a date of 4712-12-11 00:00:00

Steps to Reproduce :

  1. Set a due_at date of 4712-12-11 00:00:00 assignment_dim table or set the date_start in the enrollment_term_dim for a course_id.
  2. Run the cronjob

Expected Behavior.

I don't think that an incompatible date should cause the cronjob to throw its hands up, IMO i think it should log an error/warning and continue importing the records that are valid.

@rpfernandezjr rpfernandezjr changed the title Out of bound dates on Out of bound errors on dates from import job Feb 17, 2021
@zqian
Copy link
Member

zqian commented Feb 17, 2021

@rpfernandezjr are you using Postgres for MyLA there?

@pelaprat
Copy link

No, we're using a MySQL database for MyLA. But we represent the Canvas Data in a postgres database, too (not for MyLA, but for other purposes).

@jonespm jonespm added this to To do in MyLA-2021.02.02 via automation Feb 18, 2021
@jennlove-um jennlove-um added this to To do in MyLA-2021.03.01 via automation Mar 11, 2021
@jennlove-um jennlove-um removed this from To do in MyLA-2021.02.02 Mar 11, 2021
@jennlove-um jennlove-um removed this from To do in MyLA-2021.03.01 Jul 8, 2021
@jennlove-um jennlove-um added this to To do in MyLA-Default-Project via automation Jul 8, 2021
@jennlove-um jennlove-um self-assigned this Oct 14, 2022
@jennlove-um
Copy link
Contributor

Assigned to myself for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants