Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YouTube chat replay support #25874

Open
Xalaxis opened this issue Jul 2, 2020 · 8 comments · May be fixed by #26240
Open

YouTube chat replay support #25874

Xalaxis opened this issue Jul 2, 2020 · 8 comments · May be fixed by #26240
Labels

Comments

@Xalaxis
Copy link

@Xalaxis Xalaxis commented Jul 2, 2020

Checklist

  • I'm reporting a site feature request
  • I've verified that I'm running youtube-dl version 2020.06.16.1
  • I've searched the bugtracker for similar site feature requests including closed ones

Description

YouTube now has "chat replay" for recorded livestreams in the same style as Twitch, which youtube-dl already supports extraction of as a "subtitle". It would be beneficial for youtube-dl to also support extraction as a subtitle for YouTube, as like on Twitch, chat on YouTube can form a very important part of the livestream in question. There is no existing support for this in youtube-dl, or similar option that I can see.

There is a Python library at https://github.com/taizan-hokuto/pytchat which may be useful for the implementation of this.
Amongst other formats, it supports output as JSON, which could simply be passed back as the output for a new "subtitle" - the same style as the Twitch chat replay.

Use case example: The archiving of a YouTube channel, including all metadata. At the moment the chat replay would not be saved, meaning there is no context for content in the video which may refer to it.

@Xalaxis
Copy link
Author

@Xalaxis Xalaxis commented Jul 2, 2020

@dstftw I read https://github.com/ytdl-org/youtube-dl#is-the-description-of-the-issue-itself-sufficient before writing the issue, and I believe the description meets those requirements. Please can you let me know what you would like me to amend?

EDIT: I have now made a few amendments, which might be what you are looking for.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Jul 2, 2020

Provide concrete examples with concrete URLs. There are no telepathists here.

@Xalaxis
Copy link
Author

@Xalaxis Xalaxis commented Jul 2, 2020

Example: The YouTube video https://www.youtube.com/watch?v=h4M5iFLKWqU has a chat replay associated with it.

Using youtube-dl --write-sub https://www.youtube.com/watch?v=h4M5iFLKWqU I would like the subtitles to be written to <name of output>.chatreplay.json, with a structure that includes all of the attributes available. A list of those that pytchat has extracted and therefore should be possible for youtube-dl to use is available here, including:

  • Message type (is it a superChat? Is it a new sponsor announcement?)
  • The elapsed time of the stream at the time of the message
  • The message text itself
@dstftw dstftw reopened this Jul 3, 2020
@Xalaxis
Copy link
Author

@Xalaxis Xalaxis commented Jul 3, 2020

Data appears to be provided in JSON format from the https://www.youtube.com/live_chat_replay/get_live_chat_replay endpoint.

@JomSpoons
Copy link

@JomSpoons JomSpoons commented Jul 26, 2020

I would also really like the ability to download chat replays. Whenever I do YouTube streams I tend not to put any sort of chat on-screen because it takes up too much room, so I'd like to be able to download the replays in some form. Whether it be a simple text file or some sort of subtitle track like Xalaxis mentioned, I just want a way to preserve the chat replays to my streams.

@siikamiika
Copy link

@siikamiika siikamiika commented Jul 29, 2020

Also interested in this, made a simple POC script in python that iterates all regular messages in a video and prints them to stdout. You can test it with

./script.py "<video_id>" > output # many rows of JSON objects
#!/usr/bin/env python3

import requests
import re
import json
import sys

session = requests.session()

def requests_get(url):
    return session.get(
        url,
        headers={
            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0',
            'Accept-Encoding': 'gzip, deflate',
        },
    )

def debug(message, details=None):
    print(message, details, file=sys.stderr)

def parse_yt_initial_data(data):
    raw_json = re.search(b'window\["ytInitialData"\]\s*=\s*(.*);', data).group(1)
    return json.loads(raw_json)

def get_continuation_id_initial(video_id):
    response = requests_get('https://www.youtube.com/watch?v={}'.format(video_id))
    data = parse_yt_initial_data(response.content)
    return data['contents']['twoColumnWatchNextResults']['conversationBar']['liveChatRenderer']['continuations'][0]['reloadContinuationData']['continuation']

def get_continuation_data_initial(continuation_id):
    response = requests_get('https://www.youtube.com/live_chat_replay?continuation={}'.format(continuation_id))
    return parse_yt_initial_data(response.content)

def get_continuation_data_next(continuation_id, offset):
    response = requests_get(
        'https://www.youtube.com/live_chat_replay/get_live_chat_replay'
        + '?continuation={}'.format(continuation_id)
        + '&playerOffsetMs={}'.format(offset)
        + '&hidden=false'
        + '&pbj=1'
    )
    return response.json()['response']

def iter_actions(video_id):
    continuation_id = get_continuation_id_initial(video_id)
    first = True
    offset = None
    while continuation_id is not None:
        data = get_continuation_data_initial(continuation_id) if first else get_continuation_data_next(continuation_id, int(offset) - 5000)
        first = False
        continuation_id = None

        live_chat_continuation = data['continuationContents']['liveChatContinuation']
        offset = None
        if 'actions' not in live_chat_continuation:
            # TODO either out of comments or no comments right now
            debug('Actions not found, exiting', live_chat_continuation)
            continue
        for action in live_chat_continuation['actions']:
            if 'replayChatItemAction' in action:
                replay_chat_item_action = action['replayChatItemAction']
                offset = replay_chat_item_action['videoOffsetTimeMsec']
                for sub_action in replay_chat_item_action['actions']:
                    if 'addChatItemAction' in sub_action:
                        add_chat = sub_action['addChatItemAction']['item']
                        if 'liveChatTextMessageRenderer' in add_chat:
                            # {
                            #     'message': {'runs': [
                            #         {'text': '???'},
                            #         {'emoji': {'emojiId': '???', 'shortcuts': [':???:'], 'searchTerms': ['???'], 'image': {'thumbnails': [{'url': 'https://???.ggpht.com/???', 'width': 24, 'height': 24}, {'url': 'https://???.ggpht.com/???', 'width': 48, 'height': 48}], 'accessibility': {'accessibilityData': {'label': ':???:'}}}, 'isCustomEmoji': True}},
                            #         {'text': '???'}
                            #     ]},
                            #     'authorName': {'simpleText': '????'},
                            #     'authorPhoto': {'thumbnails': [{'url': 'https://???.ggpht.com/???/photo.jpg', 'width': 32, 'height': 32}, {'url': 'https://???.ggpht.com/???/photo.jpg', 'width': 64, 'height': 64}]},
                            #     'contextMenuEndpoint': {???},
                            #     'id': '???',
                            #     'timestampUsec': '1595943102558354',
                            #     'authorBadges': [{'liveChatAuthorBadgeRenderer': {'customThumbnail': {'thumbnails': [{'url': 'https://???.ggpht.com/???'}, {'url': 'https://???.ggpht.com/???'}]}, 'tooltip': '???', 'accessibility': {'accessibilityData': {'label': '???'}}}}],
                            #     'authorExternalChannelId': '???',
                            #     'contextMenuAccessibility': {???},
                            #     'timestampText': {'simpleText': '28.42'}
                            # }
                            yield {'liveChatTextMessageRenderer': add_chat['liveChatTextMessageRenderer']}
                        elif 'liveChatPaidMessageRenderer' in add_chat:
                            # {
                            #     'id': '???',
                            #     'timestampUsec': '1595941482934178',
                            #     'authorName': {'simpleText': '???'},
                            #     'authorPhoto': {'thumbnails': [{'url': 'https://???.ggpht.com/???/photo.jpg', 'width': 32, 'height': 32}, {'url': 'https://???.ggpht.com/???/photo.jpg', 'width': 64, 'height': 64}]},
                            #     'purchaseAmountText': {'simpleText': '200\xa0¥'},
                            #     'message': {'runs': [
                            #         {'text': '???'},
                            #         {'emoji': {'emojiId': '???', 'shortcuts': [':???:'], 'searchTerms': ['???'], 'image': {'thumbnails': [{'url': 'https://???.ggpht.com/???', 'width': 24, 'height': 24}, {'url': 'https://???.ggpht.com/???', 'width': 48, 'height': 48}], 'accessibility': {'accessibilityData': {'label': ':???:'}}}, 'isCustomEmoji': True}},
                            #         {'text': '???'}
                            #     ]},
                            #     'headerBackgroundColor': 4278237396,
                            #     'headerTextColor': 4278190080,
                            #     'bodyBackgroundColor': 4278248959,
                            #     'bodyTextColor': 4278190080,
                            #     'authorExternalChannelId': '???',
                            #     'authorNameTextColor': 3003121664,
                            #     'contextMenuEndpoint': {???},
                            #     'timestampColor': 2147483648,
                            #     'contextMenuAccessibility': {???},
                            #     'timestampText': {'simpleText': '1.58'}
                            # }
                            yield {'liveChatPaidMessageRenderer': add_chat['liveChatPaidMessageRenderer']}
                        elif 'liveChatMembershipItemRenderer' in add_chat:
                            # {
                            #     'id': '???',
                            #     'timestampUsec': '1595941068503043',
                            #     'timestampText': {'simpleText': '-4:50'},
                            #     'authorExternalChannelId': '???',
                            #     'headerSubtext': {'runs': [{'text': '???'}]},
                            #     'authorName': {'simpleText': '????'},
                            #     'authorPhoto': {'thumbnails': [{'url': 'https://???.ggpht.com/???/photo.jpg', 'width': 32, 'height': 32}, {'url': 'https://???.ggpht.com/???/photo.jpg', 'width': 64, 'height': 64}]},
                            #     'authorBadges': [{'liveChatAuthorBadgeRenderer': {'customThumbnail': {'thumbnails': [{'url': 'https://???.ggpht.com/???'}, {'url': 'https://???.ggpht.com/???'}]}, 'tooltip': '???', 'accessibility': {'accessibilityData': {'label': '???'}}}}],
                            #     'contextMenuEndpoint': {???},
                            #     'contextMenuAccessibility': {???}
                            # }
                            yield {'liveChatMembershipItemRenderer': add_chat['liveChatMembershipItemRenderer']}
                        # irrelevant
                        elif 'liveChatViewerEngagementMessageRenderer' in add_chat:
                            pass
                        elif 'liveChatPlaceholderItemRenderer' in add_chat:
                            pass
                        else:
                            debug('Unrecognized action item', add_chat)
                    # tickers out of scope for now
                    elif 'addLiveChatTickerItemAction' in sub_action:
                        pass
                    else:
                        debug('Unrecognized sub_action', sub_action)
            else:
                debug('Unrecognized action', action)

        continuation_id = live_chat_continuation['continuations'][0]['liveChatReplayContinuationData']['continuation']

for action in iter_actions(sys.argv[1]):
    print(json.dumps(action, ensure_ascii=False))

edit: updated code to handle superchat and membership messages

@JomSpoons
Copy link

@JomSpoons JomSpoons commented Aug 2, 2020

Also interested in this, made a simple POC script in python that iterates all regular messages in a video and prints them to stdout. You can test it with

Thank you so much for this, it works and it's a huge help to me. I really hope we can have something similar to this implemented into youtube-dl soon

@siikamiika siikamiika linked a pull request that will close this issue Aug 5, 2020
5 of 9 tasks complete
@siikamiika
Copy link

@siikamiika siikamiika commented Aug 10, 2020

There's a PR open if anyone wants to test, and I made a converter that generates niconico-style rolling chat in the ASS/SSA subtitle format to be used offline: https://github.com/siikamiika/scripts/tree/master/danmaku

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants
You can’t perform that action at this time.