Skip to content
This repository has been archived by the owner on Nov 6, 2021. It is now read-only.

ychalier/ina

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

INA Ripper

This module extracts information from the Inathèque database and provides tools to further locate those media on YouTube and download them, along with setting the proper ID3 tags. Its purpose is made for podcasts.

Getting Started

Prerequisites

Implementation uses Python 3 (v3.5.2. to be precise). It requires the following executables:

Required python modules are in requirements.txt.

Installing

Clone the repository, and make sure that ffmpeg and youtube-dl are in your path.

Then start the main script with:

python ina.py [action] [options]

Demo

Here is a step-by step demo for downloading a series called Les Maîtres du mystère.

  1. Scrap the database. Use the action scrap, set the query with -q and, as we already now we are only looking for one collection, apply the adequate filter with -c to lighten database operations. The -p option makes sure that maximum 3 pages are requested, i.e. only 3*500 results maximum can be lookep up.

    python ina.py scrap -q "Les Maîtres du mystère" -c les-maitres-du-mystere -p 3
    
  2. Clean the database. Remove duplicates, with action clean.

    python ina.py clean
    
  3. Enrich the database. Fecth the author and the director of each entry, along with YouTube video ids candidates and add them to the database. This is done by the enrich action.

    python ina.py enrich -c les-maitres-du-mystere
    
  4. Manually select the correct video ids. With action select_media. Warning triggering levels can be set with options -t (title error threshold, on a [0, 1] interval, measured as the Jaccard index) and -u (relative duration error threshold, on a [0, 1] interval). The maximum number of candidates showed to you can be changed with -m. Note that the first result is almomst always the best you can get browsing on YouTube, however you can try to find it yourself and give it to the script if asked.

    python ina.py select_media -c les-maitres-du-mystere -u .05 -t .2 -m 3
    
  5. Download. With action download. All corresponding collections will be downloaded using youtube-dl into MP3 files, properly named, with ID3 tags containing information gathered so far.

    python ina.py download -c les-maitres-du-mystere
    
  6. Cleanup the files. There will be missing files, missing artist names, wrongly spelled album artist. To make up for that, use the additional script unify that takes a default album artist, an album cover (only .jpg) and a folder as argumment, to clean all the audio files in that folder. Cleaning also involve shifting track ids so that no gap remains.

    python unify.py "Pierre Billard" ~/images/cover.jpg .
    

Contributing

Contributions are welcomed. Push your branch and create a pull request detailling your changes.

License

This project is licensed under the MIT License - see the LICENSE.txt file for details.

Releases

No releases published

Packages

No packages published

Languages