Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[polskieradio] Add thumbnails. #10028

Merged
merged 1 commit into from
Jul 8, 2016
Merged

[polskieradio] Add thumbnails. #10028

merged 1 commit into from
Jul 8, 2016

Conversation

jakubadamw
Copy link
Contributor

What is the purpose of your pull request?

  • Bug fix
  • New extractor
  • New feature

Description of your pull request and other information

This is a simple change to extract thumbnails from Polskie Radio auditions.

@yan12125
Copy link
Collaborator

yan12125 commented Jul 7, 2016

Usually the thumbnail field in _TESTS uses regular expressions instead of full URLs. See https://github.com/rg3/youtube-dl#adding-support-for-a-new-site for an example.

@jakubadamw
Copy link
Contributor Author

jakubadamw commented Jul 7, 2016

@yan12125 Well, it's important that we check it finds the right image. Especially on Polskie Radio the page often contains, aside from the main image, multiple thumbnails for related materials.

@yan12125
Copy link
Collaborator

yan12125 commented Jul 7, 2016

To avoid any possible false-positive, the extraction method itself should be precise. In this case, it's unlikely that _og_search_thumbnail finds a wrong image as in normal cases there's only one og:image tag in a web page.

@jakubadamw
Copy link
Contributor Author

Agree, yet the test should not be concerned with the implementation details.

@yan12125
Copy link
Collaborator

yan12125 commented Jul 7, 2016

You want to make sure that the extractor always gives the correct thumbnail, but using a full URL does not increase the detection rate of website changes. Thumbnail URLs are constantly changing. If there's a failed test, we can't know, without further investigation, whether it's due to wrong files or just moved/renamed files. Though I don't have a concrete statistical result, moved files are more common than wrong files, and in this case, a moved/renamed file is almost the only possible reason. As a result, if there's a failed test in polskieradio in the future, we will just update the thumbnail URL. It wastes time of developers, so I recommend you to use a regular expression.

If you really want a strict check, the correct way is to check the MD5 checksum of thumbnails, which is much less unlikely to change.

@jakubadamw
Copy link
Contributor Author

That's a fair point on the MD5, thank you. I made the suggested change. :)

@yan12125
Copy link
Collaborator

yan12125 commented Jul 7, 2016

Well md5: checks for the field value but not the corresponding file contents. You'll have to extend test/test_download.py

@jakubadamw
Copy link
Contributor Author

@yan12125 That seems like a good idea, sadly I'm rather time constrained now. I've gone with the regular expression approach for now.

Is this good to go now?

@yan12125 yan12125 merged commit e2d616d into ytdl-org:master Jul 8, 2016
@yan12125
Copy link
Collaborator

yan12125 commented Jul 8, 2016

Thanks. Merged with minor fixes.

@jakubadamw
Copy link
Contributor Author

Ah, of course. Ought to have escaped the dots. Thanks!

@jakubadamw jakubadamw deleted the polskie-radio branch August 25, 2016 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants