Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frontend master extractor #16328

Closed
wants to merge 19 commits into from

Conversation

Projects
None yet
3 participants
@Kerruba
Copy link
Contributor

commented Apr 29, 2018

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Adds the ability to download courses from the FrontendMasters site, only if with a regular account. Should solve issue #3661
Credentials sent by email

@dstftw
Copy link
Collaborator

left a comment

  1. Does not work on python 3.
  2. Read coding conventions and fix code.
  3. You must provide account credentials for review.
@@ -1421,3 +1425,5 @@
from .zaq1 import Zaq1IE
from .zdf import ZDFIE, ZDFChannelIE
from .zingmp3 import ZingMp3IE


This comment has been minimized.

Copy link
@dstftw

dstftw Apr 29, 2018

Collaborator

Remove.


logout_link = self._search_regex('(Logout .*)',
response, 'logout-link')
if not logout_link:

This comment has been minimized.

Copy link
@dstftw

dstftw Apr 29, 2018

Collaborator

This branch is never reached.

headers={'Content-Type': 'application/x-www-form-urlencoded'}
)

logout_link = self._search_regex('(Logout .*)',

This comment has been minimized.

Copy link
@dstftw

dstftw Apr 29, 2018

Collaborator

Too relaxed.

"Chrome/66.0.3359.117 Safari/537.36"
}

if self._downloader.params.get('listformats', False):

This comment has been minimized.

Copy link
@dstftw

dstftw Apr 29, 2018

Collaborator

Remove.

This comment has been minimized.

Copy link
@Kerruba

Kerruba Apr 29, 2018

Author Contributor

Why? I took inspiration from the Pluralsight Extractor for the Qualities bit, and for what concern the cookies etc. it's to avoid "400 - Bad request" response from the server

This comment has been minimized.

Copy link
@dstftw

dstftw Apr 30, 2018

Collaborator
  1. This does not work properly.
  2. This is ad hoc micro optimization exclusively for pluralsight for reducing number of requests and overall sleeping time while downloading. Remove.
lesson_slug = lesson_data.get('slug')
lesson_thumbnail_url = lesson_data.get('thumbnail')
lesson_section = course_sections_pairing.get(lesson_index)[0]
lesson_section_number = course_sections_pairing.get(lesson_index)[1]

This comment has been minimized.

Copy link
@dstftw

dstftw Apr 29, 2018

Collaborator

Read coding convention on optional meta data.

@Kerruba

This comment has been minimized.

Copy link
Contributor Author

commented Apr 30, 2018

@dstftw As requested I have made the changes. Please review for any more possible issues (if any).
I've provided the credentials via email as requested

@dstftw
Copy link
Collaborator

left a comment

  1. There must be two separate extractors.
  2. All debug must be removed.
response, 'error message', default=None)

if error:
raise ExtractorError('Unable to login: check username and password',

This comment has been minimized.

Copy link
@dstftw

dstftw Apr 30, 2018

Collaborator

This does not necessarily mean that. There is a clear captured error message that should be output.

login_form = self._hidden_inputs(login_page)

login_form.update({
"username": username,

This comment has been minimized.

Copy link
@dstftw

dstftw Apr 30, 2018

Collaborator

Single quotes.

self._VALID_URL_RE = re.compile(self._VALID_URL)
m = self._VALID_URL_RE.match(url)
assert m
return compat_str(m.group('courseid'))

This comment has been minimized.

Copy link
@dstftw

dstftw Apr 30, 2018

Collaborator

Use re.match.

IE_NAME = 'frontend-masters'
_VALID_URL = r'https?://(?:www\.)?frontendmasters\.com/courses/(?P<courseid>[a-z\-]+)/?(?P<id>[a-z\-]+)?/?'

_NETRC_MACHINE = 'frontend-masters'

This comment has been minimized.

Copy link
@dstftw

dstftw Apr 30, 2018

Collaborator

No dashes.


class FrontEndMasterIE(FrontEndMasterBaseIE):
IE_NAME = 'frontend-masters'
_VALID_URL = r'https?://(?:www\.)?frontendmasters\.com/courses/(?P<courseid>[a-z\-]+)/?(?P<id>[a-z\-]+)?/?'

This comment has been minimized.

Copy link
@dstftw

dstftw Apr 30, 2018

Collaborator

/? is senseless at the end.

return self._download_entire_course(url, course_id)



This comment has been minimized.

Copy link
@dstftw

dstftw Apr 30, 2018

Collaborator

Check code with flake8.

'chapter_number': lesson_section_number,
'thumbnail': lesson_thumbnail_url,
'formats': formats
}

This comment has been minimized.

Copy link
@dstftw

dstftw Apr 30, 2018

Collaborator

There are also vtt subtitles available.

@Kerruba

This comment has been minimized.

Copy link
Contributor Author

commented May 1, 2018

@dstftw I've updated the code again if you want to have a look

}

@staticmethod
def _convert_subtitles(captions):

This comment has been minimized.

Copy link
@dstftw

dstftw May 25, 2018

Collaborator

No, provide subtitles as URL.

This comment has been minimized.

Copy link
@Kerruba

Kerruba May 26, 2018

Author Contributor

Do you mean I need to provide the subtitles URL and the base class is going to handle that?

This comment has been minimized.

Copy link
@dstftw

dstftw May 26, 2018

Collaborator

I mean you must return URL of subtitles if you know it. Conversion is done by --convert-subs upon request.

class FrontEndMasterCourseIE(FrontEndMasterBaseIE):
IE_NAME = 'frontend-masters:course'
_VALID_URL = r'https?://(?:www\.)?frontendmasters\.com/courses/' \
r'(?P<courseid>[a-z\-]+)/?$'

This comment has been minimized.

Copy link
@dstftw

dstftw May 25, 2018

Collaborator

No, override suitable instead. Also do not split strings.

This comment has been minimized.

Copy link
@Kerruba

Kerruba May 27, 2018

Author Contributor

Why do I need to override suitable if the _VALID_URL is already used in the suitable method from the parent class?

This comment has been minimized.

Copy link
@dstftw

dstftw May 27, 2018

Collaborator

Because this regex won't match URLs like https://www.frontendmasters.com/courses/blah?foo=bar.

'cookie': cookies_str
}

if self._downloader.params.get('listformats', False):

This comment has been minimized.

Copy link
@dstftw

dstftw May 25, 2018

Collaborator

I've already pointed out: this must be removed.

This comment has been minimized.

Copy link
@Kerruba

Kerruba May 27, 2018

Author Contributor

Do you mean the part with the cookies or the guessing of the formats?

This comment has been minimized.

Copy link
@dstftw

dstftw May 27, 2018

Collaborator

Are you trolling or what?

This comment has been minimized.

Copy link
@Kerruba

Kerruba May 27, 2018

Author Contributor

I found this pattern in other extractors, which are already part of the package - see the pluralsight one line 235.
I'm sorry if I don't have clear what you want me to do here, no need to be sarcastic...

This comment has been minimized.

Copy link
@dstftw

dstftw May 27, 2018

Collaborator

Bother to read previous comments.

This comment has been minimized.

Copy link
@Kerruba

Kerruba May 27, 2018

Author Contributor

👍

raise ExtractorError('Unable to login: %s' % unescapeHTML(error),
expected=True)

def _match_course_id(self, url):

This comment has been minimized.

Copy link
@dstftw

dstftw May 25, 2018

Collaborator

Unused.

]

cookies = self._get_cookies(self._COOKIES_BASE)
cookies_str = ';'.join(['%s=%s' % (cookie.key, cookie.value)

This comment has been minimized.

Copy link
@dstftw

dstftw May 25, 2018

Collaborator

What are you doing? These cookies are already in session, no need to bother with Cookie header at all.

@dstftw dstftw added the do-not-merge label May 25, 2018

@pavelbinar

This comment has been minimized.

Copy link

commented Jun 28, 2018

Hi @dstftw, is it necessary to do-not-merge this PR?
I can see that it is not the smoothest integration :)
You have a lot of good points here, and it is clear that @Kerruba wants this PR to happen and do your changes. If he asks, he probably does not understand.

I would like to ask you for patience here.

Thank you for consideration!

Disclaimer: I know none of you. I just see PR which would be great to have (after review).

@dstftw dstftw closed this in 49128b5 Jul 7, 2018

dstftw added a commit that referenced this pull request Jul 7, 2018

dstftw added a commit that referenced this pull request Jul 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.