Replace Eyed3 with Mutagen #47

Closed
wants to merge 13 commits into
from

Projects

None yet

3 participants

@dbrgn
Collaborator

OK, this should solve #1, #27 and #38.

I rewrote ID3Manager and replaced EyeD3 with Mutagen.

ID3Manager is now called MetadataManager. Getter and setter methods with immediate persistence have been replaced with properties and a separate save() method.

The indexer uses docopt instead of its own argument parsing. I cherry-picked the commit by @felixhummel.

This pull request is probably not 100% ready for merge yet, as I didn't really test details of the indexer. And the models are not tested at all yet. Storing/updating metadata probably fails at the moment, as I've introduced a new save() method on the MetadataManager. I'll try to find the time to do that tomorrow.

If someone else wants to test this PR, feel free. Feedback is always welcome.

felixhummel and others added some commits Mar 14, 2013
@felixhummel felixhummel shiva-indexer with docopt
Conflicts:
	indexer.py
	setup.py
bbf9ec6
@dbrgn dbrgn Rewrote ID3Manager, replaced EyeD3 with Mutagen.
ID3Manager is now called MetadataManager. Getter and setter methods with
immediate persistence have been replaced with properties and a separate
save() method.

Integration into indexer and models is not yet done and will follow with
the next commit.
e8bec9d
@dbrgn dbrgn Added docopt to requirements.pip e937443
@dbrgn dbrgn Rewrote indexer to use new MetadataManager 965fdd1
@dbrgn dbrgn referenced this pull request Mar 30, 2013
Closed

SyntaxError at indexing #38

@dbrgn
Collaborator

Supported formats via Mutagen project page, common file extensions via Wikipedia.

@tooxie tooxie and 1 other commented on an outdated diff Mar 31, 2013
- def get_id3_reader(self):
- if not self.id3r or not self.id3r.same_path(self.file_path):
- self.id3r = ID3Manager(self.file_path)
+ self.count += 1
+ if self.count % 10 == 0:
@tooxie
tooxie Mar 31, 2013

Let's make this magic number a setting, just for clarity. Something like PERSIST_AFTER_TRACKS, or something that makes more sense.

@dbrgn
dbrgn Mar 31, 2013

True. I'm not even sure if we should have this at all. It speeds up re-indexing after an error occurs, but on the other side I'm not sure whether we should presist any data into the database if an error occured. What do you think?

@tooxie tooxie and 1 other commented on an outdated diff Mar 31, 2013
self.config = config
self.use_lastfm = use_lastfm
self.no_metadata = no_metadata
+ self.verbose = verbose
+ self.quiet = quiet
self.count = 0
@tooxie
tooxie Mar 31, 2013

Could you please rename this to track_count?

@dbrgn
dbrgn Mar 31, 2013

Yes, good idea.

@tooxie tooxie commented on the diff Mar 31, 2013
indexer.py
- if os.path.isdir(dir_name):
- for name in os.listdir(dir_name):
- self.file_path = os.path.join(dir_name, name)
- if os.path.isdir(self.file_path):
- self.walk(self.file_path)
- else:
+ # Otherwise, recursively walk the directory looking for files
+ else:
+ for root, dirs, files in os.walk(target):
@tooxie
tooxie Mar 31, 2013

How does os.walk perform with lots of files and directories? Is it better that calling self.walk recursively? I honestly don't know, many of Shiva users have large collections, this may make a big difference.

@dbrgn
dbrgn Mar 31, 2013

I doubt that os.walk will be slower than os.listdir, it uses the same top-down recursive approach. And readability is better. IMO using os.listdir would be premature optimization if you don't already know that it will be faster.

I tried to measure it but it's pretty hard because there seems to be some caching going on. The first time I walked my music folder it took 5.5 seconds, the following times it was down to 1.8 to 2 seconds.

A possibility to speed up things is mentioned here: http://stackoverflow.com/a/4739544/284318 But it's filesystem specific.

@felixhummel
felixhummel Apr 2, 2013

I let os.walk run against my 200GB music collection which is usually at least three directory levels deep (genre, artist, album).
Also I find os.walk much more readable.

@tooxie
tooxie Apr 2, 2013

I was thinking about the fact that os.walk returns a list, if it had to read all the filesystem first and load it into memory it would be quite slow. But it has to be a generator, which makes it definitely faster than calling os.listdir recursively. 👍

@tooxie tooxie commented on an outdated diff Mar 31, 2013
if __name__ == '__main__':
- use_lastfm = '--lastfm' in sys.argv
- no_metadata = '--nometadata' in sys.argv
+ from docopt import docopt
@tooxie
tooxie Mar 31, 2013

A big 👍 to the introduction of docopt.

@tooxie tooxie commented on the diff Mar 31, 2013
requirements.pip
@@ -1,8 +1,9 @@
Flask==0.9
Flask-Restful==0.1.2
Flask-SQLAlchemy==0.16
-eyed3==0.7.1
@tooxie
Owner

The oggsupport branch was just merged. This may result in some conflicts when rebasing. Sorry for that 😅

@dbrgn
Collaborator

Where can I find that branch? What metadata backend does it use?

@tooxie
Owner

Was merged into master. It doesn't touch the metadata backend at all, but it adds a resource and modifies some other files, so it may lead to some conflicts when updating your working copy.

@tooxie
Owner

If you give me access to your fork I can rebase it myself. No problem. Besides, I'm about to merge #42 which will leave this PR unmergeable again, and I don't want to bother you every 2 days to rebase. What do you say?

@dbrgn
Collaborator

Oh, sorry. I wanted to do finish this stuff last week, but didn't find the time to do so.

I'll try to rebase and finish the branch today so that it should be merge-ready. If it doesn't work I'll add you to the collaborators.

@tooxie
Owner

Is ok, I just want to get rid of eyeD3 already 😄 If you don't have time I can do it, no problem.

@dbrgn
Collaborator

OK, now everything concerning the indexer should be fine. I also added a --reindex clause. I didn't really look at the model/API part yet, you should probably double-check that part. What about cloning the branch to your repository, finishing it and then merging it? Easier than handling collaborators on a forked repo. (Or merging directly into a dev branch as an alternative.)

One thing I noticed - shouldn't a release year field belong to the track model instead of the album model? Because different tracks on an album could have different year tags. You could then deal with "finding out the album release year" in the controller instead of doing it in the database, where it might be wrong.

@tooxie
Owner

Yes, I can do that. I'll clone it and close manually this PR when I merge it 👍

@dbrgn
Collaborator

Great :) Thanks! Now that it supports ogg files too, I can start using it for my real music collection and report or fix any issues I'll encounter.

@tooxie
Owner

This was finally merged :feelsgood:

Still needs some doc updates, but I wanted to get it merged already. Thanks!! Great PR 👍

@tooxie tooxie closed this Apr 18, 2013
@felixhummel
Collaborator

Good job everybody! 👍

@tooxie
Owner

Please sync your masters, test, and report bugs, if any :)

BTW, @dbrgn, @felixhummel, thanks for your contributions to the project. I've been a little distracted lately (holidays, work, life in general...) so to not delay this much future PRs I added you as contributors, which means now you have push access to master. Please use responsibly :bowtie:

We should define some guidelines on how/when to merge, I will issue a PR so we can discuss it and document it properly. Thanks!!

@dbrgn
Collaborator

@tooxie Thanks! I'm also pretty busy, but i might do a few more contributions in the future.

In my opinion the nr 1 thing we should do next is testing... Otherwise things will get pretty hairy and break in the future...

@tooxie
Owner

@dbrgn Completely agree, it's quite delicate already. Let's focus on that.

@felixhummel
Collaborator

Agreed. Summer is coming, but there are always those rainy days on the weekend.

@dbrgn
Collaborator

For inspiration, we set up testing using pytest, tox, travis and coveralls for Jedi a while ago: https://github.com/davidhalter/jedi/tree/dev

@tooxie
Owner

Implemented ALLOWED_FILE_EXTENSIONS config: 5959b58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment