-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"requests.exceptions.HTTPError: 404 Client Error" while trying tapioca train-classifier #11
Comments
If you created the Solr collection yourself, then it probably lacks the You should run There might be a way to add the endpoint after the fact, having already ingested the dump in a collection - but I am not sure how! I will make it clearer in the docs that you should not create the Solr collection yourself. |
Dear authors, Thanks for the quick explanation!
Actually, if I create the Solr collection first, then run Moreover, I also tried
I have also checked the status of Solr. The status is the same as that I mentioned in the last post. So I am not sure why there is a "Connection aborted" error. Could you please give some hints? Thanks a lot! |
For the HTTP 400 error you get, there should be some logs available in the Solr web interface. Can you check there and report what exactly cases this Bad Request error? |
Hi, Here is the logs (I use linux terminal to run Solr and Opentapioca) for the program:
(I name the collection as
|
Thanks! The Solr logs themselves should be accessible on the Solr web interface. By default it runs at http://hostname:8983/solr/. |
Thanks a lot for the reply!
Therefore, I guess that the error is caused by the data. To skip the malicious data, is it fine to add a try-exception for line 121 P.S. I am running the experiments on a Linux server without web interface. Therefore, I cannot reach the Web interface. I will tried this method later if the above solution does not help. |
Add exception handling for line 121 |
@heathersherry wonderful! Do you think you could create a pull request for that change? I think it would make a lot of sense! |
Sure! Thanks again for creating such a great project. :) |
Yes, you should be able to create a pull request by first creating a fork of this repository in your own account, pushing your change there and then creating the pull request. Alternatively, if you only want to propose a change to a single file (as it is the case here), you should be able to view that file on Github and use the edit link there. |
Fixed Issue opentapioca#11.
Fixed Issue opentapioca#11.
I also get the same error when I want to create a collection. |
I am getting the same error as just above, but in the log there are a few NoSuchFileExceptions about solr-9.0.0/lib and /dist and then |
Dear authors,
Thanks for sharing the great project.
I tried to follow the documents of this project to run it. Everything goes smoothly, until I tried to train a classifier on the dataset.
I create a Solr collection named
collection_5
and run:bunzip2 < latest-all.json.bz2 | tapioca index-dump collection_5 - --profile profiles/human_organization_place.json
Everything works well. I index the Wikidata dump in the Solr collection successfully.
Then I tried this command to get the classifier:
tapioca train-classifier -c collection_5 -b data/wd_2019-02-24.bow.pkl -p data/wd_2019-02-24.pgrank.npy -d data/merged_RSS-500_and_istex_train.ttl -o data/rss_istex_classifier.pkl
It fails with this error information:
(I put the opentapioca project in the folder
/data2/xxx/related_work
)Could you please give some hints for solving this problem? Is it some problems brought by Solr? I have checked the status of Solr, it seems everything is working well.
Thanks a lot!
The text was updated successfully, but these errors were encountered: