Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Unknown variables or column: 'abs((trip_duration_min - pred_final))'" #4

Closed
triper1022 opened this issue Mar 7, 2022 · 3 comments
Closed

Comments

@triper1022
Copy link

triper1022 commented Mar 7, 2022

Hi~

I am trying to reproduce this notebook, vaex-taxi-ml-article.ipynb.

But I encountered the error like below.
I almost completely follow your code.
I can not find the reason.
I share my colab notebook here.


KeyError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/vaex/scopes.py in evaluate(self, expression, out)
112 # logger.debug("try avoid evaluating: %s", expression)
--> 113 result = self[expression]
114 except KeyError:

34 frames
KeyError: "Unknown variables or column: 'abs((trip_duration_min - pred_final))'"

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
KeyError: "Unknown variables or column: 'clip(predicted_duration_min, 3, 25)'"

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
KeyError: "Unknown variables or column: 'incremental_prediction_function(PCA_pickup_0, PCA_pickup_1, PCA_dropoff_0, PCA_dropoff_1, standard_scaled_arc_distance, pickup_time_x, pickup_day_x, pickup_month_x, direction_angle_x, pickup_time_y, pickup_day_y, pickup_month_y, direction_angle_y, pickup_is_weekend)'"

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
in ()

in ()

in ()

/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
806 "Found array with %d sample(s) (shape=%s) while a"
807 " minimum of %d is required%s."
--> 808 % (n_samples, array.shape, ensure_min_samples, context)
809 )
810

ValueError: Found array with 0 sample(s) (shape=(0, 14)) while a minimum of 1 is required.

@triper1022
Copy link
Author

triper1022 commented Mar 7, 2022

Sorry, I found the reason!!!
After I split train-test dataframe with index slice, it works again.

But I really would like to ask why we can not use condition to split dataframe like below. Hope this question won't bother you.

df_train = df[df.pickup_datetime < np.datetime64('2015-01-01')]
df_test = df[df.pickup_datetime >= np.datetime64('2015-01-01')]

@JovanVeljanoski
Copy link
Member

Hey,

Yeah so you can't really use filtering since that becomes part of the automatic pipeline, so it propagates further. Just like any other filtering that is done during the "data cleaning" phase.

@triper1022
Copy link
Author

Thanks for your instant reply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants