New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting error at end of training: AbstractFeatureResourceE does not exist. [Op:SimpleMLModelTrainer] #4
Comments
Sorry you bumped into this error. Let me see if we can reproduce it here. |
Can you print the first few rows of Ps: I was able to get the same error by not having any features in the dataset. i.e. the dataset only contains the label. If this is the issue for you, this is a "terrible error message" type bug :) |
I am also getting this issue in both Pop!_OS (with a GTX 1080 Ti) as well as Ubuntu 20.04.2 LTS on a VPS without a videocard. I was attempting to follow the example code posted here with my own dataset with a single column. |
Is it the case that you have no features (except the label) as @achoum mentioned ? Can you print the first few rows of |
@achoum Sure, please find the result of train_df.head(3) below:
3 rows × 107 columns |
Second hypothesis: Is one of the column names contains a comma? # List the column names containing a comma.
print([column for column in train_df.columns if "," in column]) If this is the case, and until we fix it (in next release; eta:next week), you can temporally remove the comma from the feature names with the following code: ...
train_df = pd.read_csv(timeseries_file_path,usecols=csv_feature_columns,nrows=10000)
train_df = train_df .rename(mapper=lambda x : x.replace(",", "<comma>"), axis="columns")
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="total_site_electricity_kwh")
model = tfdf.keras.RandomForestModel()
model.fit(train_ds) This is already the second bug you helped us identify :) Thanks both ! |
Yes, it turns out 13 of the column names did have a comma. Thanks for the temporary solution. I will look out for your fix in the code. |
@achoum I am getting this new error on training the model after fixing the commas in the column names.
|
Thanks for the follow up :)! This next error is less obvious. I suspect a problem with IO, but I need some more details to figure it out. Do you mind telling me:
import pandas as pd
ds = pd.DataFrame({"x":[1,2,3,4], "label":[0,1,0,1]})
model = tfdf.keras.RandomForestModel(num_trees=10)
model.fit(tfdf.keras.pd_dataframe_to_tf_dataset(ds,label="label"))
If you use ipython, the logs will appear by default. If you use colab, you can run: !pip install wurlitzer -U
from wurlitzer import sys_pipes
with sys_pipes():
model.fit(train_ds)
Can you share the result of the following command (with your directory path instead of !ls -l -R /tmp/tmpxsmqm10p Cheers, |
If I run 'model.fit' by itself, I get the following error:
Yes, this is the first model.
Yes, I was able to train it. I found that the problem is when I use a large number of training samples. When I tried a few hundred samples from my dataset, it was a training error. But when I train with 10000 training samples, I get the error. Also, I am using Jupyter lab. But I don't see the logs in the tmp folder. Do I need to use the code you mentioned above to get the logs? |
Thanks for the extra details.
Interesting.
Yes, using the two pieces of code I mentioned above would be very informative for us. !pip install wurlitzer -U
from wurlitzer import sys_pipes
# Assuming `train_ds` is the training dataset.
with sys_pipes():
model.fit(train_ds)
The logs will be informative. |
I ran the following and got logs. From the logs, I can understand the model is training. But after training with 10000 samples I am getting the error as before.
Please find the result of: ls -l -R /tmp/tmp18kaly80 after I train with 10000 samples. Also, I just noticed that you need to specify the task as an argument when instancing the model. My task was a regression. However, from the logs, it seems to use a classification loss by default. Could that have been the problem. I will run it again after specifying it as a regression task.
|
@achoum It seems the problem was due to the wrong task specification (regression instead of default classification). After I made the following modifications, the model is training and return the model without error even with more than 10000 samples.
|
That's great. Thanks for the details. The classification model trained on the regressive value was seeing individual regressive values as separate classes. This is much harder to generalize and this explains the extremely large model size for a small (10k examples) dataset. I'll add a warning (and possibly an error for the extreme case) when a situation like this is detected. Thanks again. |
The bugs and the non-intuitive configuration problems have been solved in the 0.1.5 release. Thanks all :) |
@achoum Thank you for fixing the issues with the new release! Closing this issue. |
I am getting the following error when I try a simple model.
The text was updated successfully, but these errors were encountered: