Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add python scrapers #9984

Merged
merged 3 commits into from Jan 7, 2017
Merged

Add python scrapers #9984

merged 3 commits into from Jan 7, 2017

Conversation

@notspiff
Copy link
Contributor

notspiff commented Jun 15, 2016

I saw this mentioned in some other PR so i figured i would re-offer my implementation.

This adds python based scrapers. I have rebased it and slightly modernized it and it appears to still work fine on the surface at least.

Sure, this is far from optimal. But it has one very obvious strength: it works within the limits set by the current system. There are of course a bunch of things that can be done more sane if XML support is removed (batch episode processing comes to mind, and converting to XML so some classes can parse is kinda stupid) . The idea was that this allows a seamless transition.

@stefansaraev

This comment has been minimized.

Copy link
Contributor

stefansaraev commented Jun 15, 2016

nice

libxml2 stays, as long libplist (requirment for airplay) stays, libxslt is another story

passing pure json data around (as json seems to be the future of scrapers) is next step, eventualy, I guess.

@ScudLee @Razzeee

@notspiff

This comment has been minimized.

Copy link
Contributor Author

notspiff commented Jun 15, 2016

uhm, why would you do that. the whole point of the scraper is to translate to a common format.

@stefansaraev

This comment has been minimized.

Copy link
Contributor

stefansaraev commented Jun 15, 2016

right. slap me for talking too soon before reading

@Razzeee

This comment has been minimized.

Copy link
Member

Razzeee commented Jun 16, 2016

Without looking at the code, oh how much I would love that feature! Thank you.

liz.setProperty('video.tags', 'Very / Bad')
liz.setProperty('video.studio', 'Studio1 / Studio2')
liz.setProperty('video.fanarts', '2')
liz.setProperty('video.fanart1.url', 'DefaultBackFanart.png')

This comment has been minimized.

Copy link
@phil65

phil65 Jun 16, 2016

Member

Wouldnt we need a 2nd prefix in order to indicate that something should get written into the art table? (so sth like "video.art.fanart" in this case)
EDIT: ignore, that's probably what action = "getartwork" is for

This comment has been minimized.

Copy link
@notspiff

notspiff Jun 16, 2016

Author Contributor

Dont follow. All fanarts are stored based on the cfanart member afaik.
16. jun. 2016 09:37 skrev "Philipp Temminghoff" notifications@github.com:

In addons/metadata.demo.movies/demo.py
#9984 (comment):

  •    liz.setProperty('video.actor1.name', 'spiff')
    
  •    liz.setProperty('video.actor1.role', 'himself')
    
  •    liz.setProperty('video.actor1.sort_order', '2')
    
  •    liz.setProperty('video.actor1.thumb', '/home/akva/Pictures/fish.jpg')
    
  •    liz.setProperty('video.actor1.thumb_aspect', 'banner')
    
  •    liz.setProperty('video.actor2.name', 'monkey')
    
  •    liz.setProperty('video.actor2.role', 'orange')
    
  •    liz.setProperty('video.actor2.sort_order', '1')
    
  •    liz.setProperty('video.actor1.thumb_aspect', 'poster')
    
  •    liz.setProperty('video.actor2.thumb', '/home/akva/Pictures/coffee.jpg')
    
  •    liz.setProperty('video.set_name', 'Spiffy creations')
    
  •    liz.setProperty('video.set_overview', 'Horrors created by spiff')
    
  •    liz.setProperty('video.tags', 'Very / Bad')
    
  •    liz.setProperty('video.studio', 'Studio1 / Studio2')
    
  •    liz.setProperty('video.fanarts', '2')
    
  •    liz.setProperty('video.fanart1.url', 'DefaultBackFanart.png')
    

Wouldnt we need a 2nd prefix in order to indicate that something should
get written into the art table? (so sth like "video.art.fanart" in this
case)


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/xbmc/xbmc/pull/9984/files/94362ac88617f4c7cd9157aff6bbbd22cd3b30ea#r67299282,
or mute the thread
https://github.com/notifications/unsubscribe/AFktYDUUQu_PIuw5QSzAEspjhAZUXYDfks5qMP1GgaJpZM4I2zXQ
.

@garbear

This comment has been minimized.

Copy link
Member

garbear commented Jun 19, 2016

If this needs testing, I can cherry pick into my RetroPlayer branch. This excites me, because Heimdall is now a possibility. It'd be awesome if, when RetroPlayer drops in v18, we could include a full blown game library populated using Heimdall and this PR.

@notspiff

This comment has been minimized.

Copy link
Contributor Author

notspiff commented Jun 20, 2016

please give it a good run. there surely are issues expected to pop up when the scraping isn't instant.

@akva2

This comment has been minimized.

Copy link
Contributor

akva2 commented Jan 6, 2017

i need a volunteer pythonista to do the scrapers. even though i could probably spit em out i do not want to maintain those on top of all the other addons in my ballpark...

@ronie

This comment has been minimized.

Copy link
Member

ronie commented Jan 6, 2017

i wouldn't mind to create a music scraper, sounds like fun :-)

perhaps others can chime in too, iirc @phil65 showed some interest as well?

@phate89

This comment has been minimized.

Copy link
Contributor

phate89 commented Jan 6, 2017

I can help with tvshow/movie scraper addons for sure.. I think @phil65 already wrote some code for themoviedb for his extendedinfo addon.. I whink we need this mainly because noone knows in the team anymore how old scrapers even work..

notspiff added 3 commits Jun 15, 2016
no need to lock if this item is not on screen
these are implemented like plugin sources.
scraper operation is orchestrated as a virtual file system.
for debugging / documentation purposes only.

REMOVE THIS COMMIT
@notspiff notspiff force-pushed the notspiff:add_python_scrapers branch from 94362ac to 415e128 Jan 6, 2017
@akva2

This comment has been minimized.

Copy link
Contributor

akva2 commented Jan 6, 2017

great, thanks guys! branch is rebased, i tested the movie bit and it appeared to still worked fine. shout if anything's unclear from the examples, you see things that could be done better or whatever. i know the (new) unique_id stuff for video tags needs more work, just did the quickie for now.

@akva2

This comment has been minimized.

Copy link
Contributor

akva2 commented Jan 6, 2017

some easy whoopsies:

  1. the demo scrapers are not put into the build dir. copy them manually
  2. remember to enable the scrapers using the browser (my favorite new functionality, it's so fun)
@jjd-uk

This comment has been minimized.

Copy link
Member

jjd-uk commented Jan 6, 2017

Maybe @olympia or @ScudLee might have an interest in this as well.

@Paxxi

This comment has been minimized.

Copy link
Member

Paxxi commented Jan 6, 2017

Looks good to me. Want to hear about the tests and then we can likely merge this and start cleaning up the scraper system internally.

@Paxxi Paxxi merged commit 7b4c559 into xbmc:master Jan 7, 2017
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@MartijnKaijser MartijnKaijser added this to the L 18.0-alpha1 milestone Jan 8, 2017
@phate89

This comment has been minimized.

Copy link
Contributor

phate89 commented Jan 9, 2017

@notspiff I was looking into implementing a first scraper when I got a question.
With setinfo https://github.com/xbmc/xbmc/blob/master/xbmc/interfaces/legacy/ListItem.cpp#L333 we can already set all the infotag variables directly from python (at least with video files). you added new fileitem properties that we need to import one by one. There's a reason to do this instead of using directly the infotag we can set directly from Python?

@akva2

This comment has been minimized.

Copy link
Contributor

akva2 commented Jan 9, 2017

i opted for info labels to keep one common interface. you can use the tags directly, but then you need to mix it with info labels for the stuffs that doesn't directly apply to the tag (thumbs et al). no strong opinion on the approach as such, but that was the rationale.

@phate89

This comment has been minimized.

Copy link
Contributor

phate89 commented Jan 9, 2017

The main problem I found is that it becomes hard with complex fields and we have already specific functions. For example you can add more than one rating but you can give them a name and set it to default too.
a thing like this is easier with a "setrating" function than with separate strings..
Some other opinion on this?

@akva2

This comment has been minimized.

Copy link
Contributor

akva2 commented Jan 9, 2017

well, it's very likely that i'd make the other choice if i were to develop the code today. none of those existed back when i took the decision, but i certainly see why you want to use them now. it shouldn't be too hard to move it over, current code does info labels -> info tag -> xml serialization (for stupid reasons),
you'd just skip the first step there.

@phate89

This comment has been minimized.

Copy link
Contributor

phate89 commented Jan 9, 2017

I understand that and it's not for sure a complaint :) it's something needed from a long time.. but since we just started I think we should start from the best place possible.
If everyone agree I can do the changes needed. But I just want to see if everyone think it's a positive change or not and worth the trouble.. @ronie @phil65 any opinion on this?

@ronie

This comment has been minimized.

Copy link
Member

ronie commented Jan 9, 2017

sounds good. python offers multiple methods we could use

  • setInfo()
  • setRating()
  • setCast()
  • setArt()

setProperty() would still be needed for specific artist/album infolabels though...

@phate89

This comment has been minimized.

Copy link
Contributor

phate89 commented Jan 9, 2017

Set property is needed also for video thumbs/fanarts right now since SetArt does a different thing (if I'm not wrong). Anyway could be a good moment to fill the Python API gaps..

@MartijnKaijser

This comment has been minimized.

Copy link
Member

MartijnKaijser commented Jan 9, 2017

setArt should exactly do that and is supposed to replace setProperty
https://codedocs.xyz/xbmc/xbmc/group__python__xbmcgui__listitem.html#gad3f9b9befa5f3d2f4683f9957264dbbe

@phate89

This comment has been minimized.

Copy link
Contributor

phate89 commented Jan 9, 2017

I was referring to https://github.com/xbmc/xbmc/blob/master/xbmc/video/VideoInfoTag.h#L266
It's a xml that's separate than usual artworks. Not sure for what it's used though..

@phil65

This comment has been minimized.

Copy link
Member

phil65 commented Jan 9, 2017

I would also be in favour of properly using VideoInfoTag etc instead of just using listitem properties. Ideally listitem creation would look the same as for plug-ins.
For most parts that shouldnt be hard to change, not sure what to do with fanartX.dim, fanartX.preview properties (and some more) though.. That stuff isnt part of our python API. Perhaps @notspiff could give us some pointers here.

@notspiff notspiff deleted the notspiff:add_python_scrapers branch Jan 10, 2017
@notspiff

This comment has been minimized.

Copy link
Contributor Author

notspiff commented Jan 10, 2017

well at its core the problem is that setArt() is meant to set the active artwork for an item, while we want to set the available artwork for an item. so either you have to extend setArt to allow the latter, or you have to add an additional binding.

@Montellese

This comment has been minimized.

Copy link
Member

Montellese commented Jan 28, 2017

Shouldn't the last commit have been removed before merging? ;-)

@ronie

This comment has been minimized.

Copy link
Member

ronie commented Feb 4, 2017

@notspiff

i'm working on python artist and album scrapers...
if the album scraper returns a musicbrainz artist id, kodi crashes

i think kodi tries to parse the artist scraper .py file as an xml file,
trying to find out if the artist scraper can search for an artist by musicbrainz id.

http://paste.ubuntu.com/23921807/

edit: this only happens if 'prefer online information' is enabled.

@notspiff

This comment has been minimized.

Copy link
Contributor Author

notspiff commented Feb 4, 2017

@DaveTBlake

This comment has been minimized.

Copy link
Member

DaveTBlake commented Feb 4, 2017

trying to find out if the artist scraper can search for an artist by musicbrainz id.

Artist names are very non-unique e.g. MB has 23 bands called "Eclipse". AFAIK Universal scraper attempts to get an artist MBID (if it doesn't already have one in the lib from song file tagging) and uses it to access other online sources for artist info.

I also wonder if getting the artist scraper to return artist disambiguation data is of possible relevence here too? Again, the desc that MB has, so I assume must also return in their API, that allows you to judge which of the 23 bands called "Eclipse" is the right one.

But that may not have been what you were asking, my scraper knowledge is very thin so sorry if this is clutter.

@ronie

This comment has been minimized.

Copy link
Member

ronie commented Feb 4, 2017

@DaveTBlake i'll open a thread on the forum where we can discuss to future of music scraping.
i only started looking into it a day ago and have a few questions myself.

would certainly love to hear your ideas/opinions as well.

@faush01

This comment has been minimized.

Copy link
Contributor

faush01 commented on 7c994ce Feb 20, 2017

Is it safe to use this for creating items off screen and then sending then to screen, like in an addon that displays a bunch of movies from a media server?

Could you comment on this in the thread please, I just don't want to start using this if it is not intended to be used like that.

http://forum.kodi.tv/showthread.php?tid=307394

@@ -68,6 +69,9 @@ namespace XBMCAddonUtils
};

#define LOCKGUI XBMCAddonUtils::GuiLock __gl
#define LOCKGUIIF(cond) std::unique_ptr<XBMCAddonUtils::GuiLock> __gl; \

This comment has been minimized.

Copy link
@FernetMenta

FernetMenta Jul 15, 2017

Member

I really hate those crappy macros. sorry, I couldn't resist :)
The problem here is that this crappy marco does not only was its name suggests. It also exits the GIL.

This comment has been minimized.

Copy link
@akva2

akva2 Jul 15, 2017

Contributor

how is that a problem ? the code is right there to read, one time instead of many. the code would be just as crappy, only pasted all over, it if was not put in a macro.

This comment has been minimized.

Copy link
@FernetMenta

FernetMenta Jul 15, 2017

Member

LOCKGUIIF(cond)
vs
XBMCAddonUtils::GuiLock _gl(cond)

I really don't see an advantage of a macro here.

But this is not the actual problem. The ctor of GuiLock also exits the GIL. If you passed cond as an argument to the ctor, you could leave the gui lock but still exit gil.

This comment has been minimized.

Copy link
@akva2

akva2 Jul 15, 2017

Contributor

it was 2 lines of code that had to be pasted all over. macro use was a no brainer. the whole point of this macro IS to lock the GIL. we have an on-screen listitem and it needs to not be accessed by any thread in the the python interpreter while we update it on the c++ side.

This comment has been minimized.

Copy link
@FernetMenta

FernetMenta Jul 15, 2017

Member

LOCKGUI unlocks GIL as we do in every other python interface function. Now with offscreen use, GIL is not unlocked. Calling into Kodi without unlocking GIL can lead to deadlocks.

This comment has been minimized.

Copy link
@FernetMenta

FernetMenta Jul 15, 2017

Member

another problem is that LOCKGUI on list items seems to be broken in general because there is no language hook in thread local storage.

This comment has been minimized.

Copy link
@Paxxi

Paxxi Jul 15, 2017

Member

Afaik all our python hooks call unblock_threads before calling into Kodi itself, trying to lock the Gil again here might be problematic. Testing will show i guess

This comment has been minimized.

Copy link
@FernetMenta

FernetMenta Jul 15, 2017

Member

GIL is held here and should be unlocked, not locked. I will submit a PR soon.

@jeryuni

This comment has been minimized.

Copy link

jeryuni commented Feb 13, 2018

image

"liz.setProperty('video.date_added', '2016-01-01')" does not yet work in 'metadata.demo.movies / demo.py'.
Is this a bug?
Does 'date_added' mean 'premiered'?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.