Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ie/arte.tv] Extract accessible subtitles #8231

Merged
merged 3 commits into from
Jan 18, 2024

Conversation

Nicals
Copy link
Contributor

@Nicals Nicals commented Sep 29, 2023

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

The arte.tv website can propose multiple subtitles for the same language. Some normal subtitles and some accessible subtitles for people suffering from hearing loss.

For example, on this video, we have two sets of french subtitles: subforced and accessible ones.
When listing the subtitles, we got the following list:

$ python -m yt_dlp 'https://www.arte.tv/fr/videos/104913-001-A/sous-controle-1-6/' --list-subs
[info] Available subtitles for 104913-001-A:
Language Formats
fr       vtt, vtt

This PR identifies those accessible subtitles and adds and -acc suffix to the language code.

$ python -m yt_dlp 'https://www.arte.tv/fr/videos/104913-001-A/sous-controle-1-6/' --list-subs
[info] Available subtitles for 104913-001-A:
Language Formats
fr       vtt
fr-acc   vtt

Implementation details:

I'm not familiar with this code base.
I tried to add a test in the ArteTVIE._TESTS, but it seems that the test runner doesn't allow to access subtitles info (ignored here).
So I've added a new test case for this extractor. Since its the first test done this way, I'm not sure this is something we want for yt_dlp.

If you have any other idea to implement this test, I would be happy to implement it.

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Copilot Summary

🤖 Generated by Copilot at 47d2800

Summary

🎥🧪🔄

Add support for accessible subtitles in Arte videos and a test module for it. Modify yt_dlp/extractor/arte.py to identify and convert the accessible subtitle formats and return them with modified language codes. Add test/extractor/test_arte.py to test the new feature with pytest.

Sing, O Muse, of the skillful coder who devised
A way to extract the subtitles for the blind and deaf
From the videos of Arte, the splendid channel of the arts
And tested their work with pytest, the framework of the wise

Walkthrough

  • Add a new feature to extract accessible subtitles from Arte videos (link, link, link)
  • Modify the ArteTVIE extractor class in yt_dlp/extractor/arte.py to check for subtitle formats with the suffix -MAL.m3u8 and append a -acc suffix to the language code (link, link)
  • Call the _contvert_accessible_subs_locale method in the _real_extract method to convert the subtitles before returning them (link)
  • Add a test module test/extractor/test_arte.py to test the new feature (link)
    • Define a test function test_extract_accessible_subtitles that uses the pytest framework and the parametrize decorator to test two examples of accessible subtitles (link)
    • Create an instance of the ArteTVIE extractor class and call its _contvert_accessible_subs_locale method on the original_subs parameter (link)
    • Assert that the returned dictionary of subtitles has only one key, which is the expected_locale parameter, and that the value of that key is the same as the original subtitles for the French language (link)

@bashonly bashonly added the site-enhancement Feature request for some website label Sep 29, 2023
@bashonly bashonly self-requested a review September 29, 2023 19:43
@seproDev
Copy link
Member

seproDev commented Nov 1, 2023

You should be able to use this as a unit test:

{
    'url': 'https://www.arte.tv/fr/videos/104913-001-A/sous-controle-1-6/',
    'info_dict': {
        'id': '104913-001-A',
        'ext': 'mp4',
        'description': 'md5:ea65e21c4b9881b3ef1c333a914779da',
        'thumbnail': 'https://api-cdn.arte.tv/img/v2/image/BL5WhDp2pnXcYhQJz9A8be/940x530',
        'upload_date': '20230927',
        'timestamp': 1695783600,
        'duration': 1907,
        'title': 'Sous contrôle (1/6)',
        'subtitles': {
            'fr': 'mincount:1',
            'fr-acc': 'mincount:1',
        },
    },
}

@Nicals Nicals force-pushed the handle_arte_tv_accessible_subtitles branch from 47d2800 to b47f4a8 Compare November 2, 2023 17:43
@seproDev seproDev added the pending-fixes PR has had changes requested label Nov 2, 2023
Copy link
Member

@seproDev seproDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, you can directly test for available subtitle languages in the _TESTS list.
Also, please don't force push. All commits will be squashed upon merge.

yt_dlp/extractor/arte.py Outdated Show resolved Hide resolved
yt_dlp/extractor/arte.py Outdated Show resolved Hide resolved
@seproDev seproDev added the stale-pr PR that has been pending fixes for a long time label Jan 9, 2024
@seproDev seproDev removed pending-fixes PR has had changes requested stale-pr PR that has been pending fixes for a long time labels Jan 14, 2024
@seproDev seproDev added the pending-review PR needs a review label Jan 14, 2024
@bashonly bashonly removed the pending-review PR needs a review label Jan 18, 2024
@seproDev seproDev merged commit 393b487 into yt-dlp:master Jan 18, 2024
6 checks passed
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Authored by: Nicals, seproDev

Co-authored-by: sepro <4618135+seproDev@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-enhancement Feature request for some website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants