Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[radiocloud.jp] Add new extractor #12725

Closed
wants to merge 1 commit into from
Closed

Conversation

EliteTK
Copy link

@EliteTK EliteTK commented Apr 12, 2017

I have

One of the following applies

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

The purpose of this pull request

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description

This is a new extractor for https://radiocloud.jp/.

This is a audio-only website, if that is not within the scope of this project, my apologies, please close the pull request.

Additionally I can also confirm this is not a website which is based around copyright infringement, from the little Japanese that I know it appears to be the official website for some Japanese radio station.

The website restricts access to older recordings to users without an account, however this restriction is not enforced by anything more than a semi-transparent div preventing clicks, therefore authentication was not implemented in the extractor as it is unnecessary.

The website doesn't have any obvious way of getting the URL for a specific recording, to get one you need to view the RSS feed for a radio program and then use one of the URLs provided there.

The extractor currently makes no attempt to create a playlist of all recordings found on the archive page of a program as there could be quite a few recordings and I personally don't see it as a useful feature, but it could be done.

The extractor also makes no attempt to guess which recording you wish to download if the URL provided does not refer to a specific content_id.

A note about some regex:

        file_url = self._search_regex(r'var\s*source\s*=\s*"(.+?)"', webpage, 'url')

I think this regex is a bit iffy, but I couldn't think of a better way.

Thank you for your time.

Edit: removed note about .strip() - I was mistaken.

break

if not element:
raise ExtractorError('Could not find details of id {}'.format(video_id))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{} won't on python 2.6.

return None

element = None
for e in get_elements_by_class("contents_box", webpage):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single quotes.

note='Downloading player',
errnote='Unable to download player')

file_url = self._search_regex(r'var\s*source\s*=\s*"(.+?)"', webpage, 'url')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be var\s+.

'ext': 'm4a',
'title': '「オープニング」 ',
'description': '「“フィギュアスケートは人生”引退会見で浅田真央さんは何を語ったのか?」スポーツライター・青嶋ひろのさんが解説!',
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test fails.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, it seems like for people without an account the page loads content a week or two into the past. For people who are logged in with an account the page loads content a month into the past.

I'll try to work out if I can find a way to load user restricted content without authentication. If that fails, I'll implement authentication (but I'll make it into its own commit instead of amending the current commit like for the other fixes).

I'm not sure if there is any access to content older than a month. I'll fetch the extracted URL for a recording about to expire and see if I can still access it after it expires.

If I can't find a way to download content which the website does not give direct access to even for logged in users, what should I do for tests which have an "expiration date" so to speak?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dstftw any ideas about this?

@EliteTK
Copy link
Author

EliteTK commented Mar 10, 2018

@dstftw Sorry for the long wait. I've been a bit busy and was for a while happy with my own solution.

I've made the requested changes.

Upon further inspection there is an "end_date" or expiry date on each upload. I did a search across all uploads on the website (with a separate script) and found an upload which expires in 2038. When 2038 is approaching, feel free to ask for a new test :P .

So far I've made do with my own scripts but it would be really neat to get this in youtube_dl so I can use this feature directly with mpv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defunct PR source branch is not accessible pending-fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants