Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save Recepies from rezeptwelt.de #562

Closed
aschilling opened this issue Mar 15, 2014 · 5 comments
Closed

Save Recepies from rezeptwelt.de #562

aschilling opened this issue Mar 15, 2014 · 5 comments
Assignees

Comments

@aschilling
Copy link

Hi everybody,

I am really amazed how well the data extraction in wallabag works. It is even that good that I started to archive my cook recipes with it. However, one thing at the rezeptwelt.de site which does not work yet, is to extract also the ingredients part. Would it be possible to extract for articles from this site also the div:

<div class="global-active ingredients-box">

Thanks

Andy

@tcitworld
Copy link
Member

We use special rules files for each website when it's not directly working. Have a look here for examples and here for tutorial.

If you do not manage, just tell us. ;)

@aschilling
Copy link
Author

I agree that it is a special case. But don't you think that we could collect all special extraction cases in a dedicated github repo which could be included someday in an official release.

@tcitworld
Copy link
Member

You misunderstood me. We include every special extraction cases inside wallabag already (that's what I was linking above). Just let me find the time to study the website* (or do it yourself, and post the file here) and I will make it be into the next release.

It may take me only seconds, but not tonight. ;)

@nicosomb nicosomb added the Bug label May 19, 2014
@nicosomb nicosomb added this to the 1.7.0 milestone May 19, 2014
@nicosomb nicosomb modified the milestones: 1.8.0, 1.7.0 May 29, 2014
@tcitworld tcitworld self-assigned this Jun 5, 2014
@nicosomb
Copy link
Member

With this file, named rezeptwelt.de.txt:

body: //div[@class='step-content'] | //div[@class='global-active ingredients-box']
title: //div[@class='step-1-container']

tidy: no
test_url: http://www.rezeptwelt.de/backen-herzhaft-rezepte/w%C3%BCrstchen-schlangen/530372

I have the ingredients.
But, recipes content is in duplicate, because in source code, we have the content twice ... Strange website.

@tcitworld tcitworld modified the milestones: 1.9.1, 1.9.0 Feb 10, 2015
@tcitworld tcitworld modified the milestones: 1.9.1, 1.9.2 May 22, 2015
@nicosomb nicosomb removed this from the 1.9.2 milestone Jan 24, 2016
@nicosomb nicosomb removed the Bug label Feb 19, 2016
@j0k3r
Copy link
Member

j0k3r commented Apr 10, 2016

Works fine on the v2

@j0k3r j0k3r closed this as completed Apr 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants