Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
YouTube chat replay support #25874
YouTube chat replay support #25874
Comments
|
@dstftw I read https://github.com/ytdl-org/youtube-dl#is-the-description-of-the-issue-itself-sufficient before writing the issue, and I believe the description meets those requirements. Please can you let me know what you would like me to amend? EDIT: I have now made a few amendments, which might be what you are looking for. |
|
Provide concrete examples with concrete URLs. There are no telepathists here. |
|
Example: The YouTube video https://www.youtube.com/watch?v=h4M5iFLKWqU has a chat replay associated with it. Using
|
|
Data appears to be provided in JSON format from the https://www.youtube.com/live_chat_replay/get_live_chat_replay endpoint. |
|
I would also really like the ability to download chat replays. Whenever I do YouTube streams I tend not to put any sort of chat on-screen because it takes up too much room, so I'd like to be able to download the replays in some form. Whether it be a simple text file or some sort of subtitle track like Xalaxis mentioned, I just want a way to preserve the chat replays to my streams. |
|
Also interested in this, made a simple POC script in python that iterates all regular messages in a video and prints them to stdout. You can test it with ./script.py "<video_id>" > output # many rows of JSON objects#!/usr/bin/env python3
import requests
import re
import json
import sys
session = requests.session()
def requests_get(url):
return session.get(
url,
headers={
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0',
'Accept-Encoding': 'gzip, deflate',
},
)
def debug(message, details=None):
print(message, details, file=sys.stderr)
def parse_yt_initial_data(data):
raw_json = re.search(b'window\["ytInitialData"\]\s*=\s*(.*);', data).group(1)
return json.loads(raw_json)
def get_continuation_id_initial(video_id):
response = requests_get('https://www.youtube.com/watch?v={}'.format(video_id))
data = parse_yt_initial_data(response.content)
return data['contents']['twoColumnWatchNextResults']['conversationBar']['liveChatRenderer']['continuations'][0]['reloadContinuationData']['continuation']
def get_continuation_data_initial(continuation_id):
response = requests_get('https://www.youtube.com/live_chat_replay?continuation={}'.format(continuation_id))
return parse_yt_initial_data(response.content)
def get_continuation_data_next(continuation_id, offset):
response = requests_get(
'https://www.youtube.com/live_chat_replay/get_live_chat_replay'
+ '?continuation={}'.format(continuation_id)
+ '&playerOffsetMs={}'.format(offset)
+ '&hidden=false'
+ '&pbj=1'
)
return response.json()['response']
def iter_actions(video_id):
continuation_id = get_continuation_id_initial(video_id)
first = True
offset = None
while continuation_id is not None:
data = get_continuation_data_initial(continuation_id) if first else get_continuation_data_next(continuation_id, int(offset) - 5000)
first = False
continuation_id = None
live_chat_continuation = data['continuationContents']['liveChatContinuation']
offset = None
if 'actions' not in live_chat_continuation:
# TODO either out of comments or no comments right now
debug('Actions not found, exiting', live_chat_continuation)
continue
for action in live_chat_continuation['actions']:
if 'replayChatItemAction' in action:
replay_chat_item_action = action['replayChatItemAction']
offset = replay_chat_item_action['videoOffsetTimeMsec']
for sub_action in replay_chat_item_action['actions']:
if 'addChatItemAction' in sub_action:
add_chat = sub_action['addChatItemAction']['item']
if 'liveChatTextMessageRenderer' in add_chat:
# {
# 'message': {'runs': [
# {'text': '???'},
# {'emoji': {'emojiId': '???', 'shortcuts': [':???:'], 'searchTerms': ['???'], 'image': {'thumbnails': [{'url': 'https://???.ggpht.com/???', 'width': 24, 'height': 24}, {'url': 'https://???.ggpht.com/???', 'width': 48, 'height': 48}], 'accessibility': {'accessibilityData': {'label': ':???:'}}}, 'isCustomEmoji': True}},
# {'text': '???'}
# ]},
# 'authorName': {'simpleText': '????'},
# 'authorPhoto': {'thumbnails': [{'url': 'https://???.ggpht.com/???/photo.jpg', 'width': 32, 'height': 32}, {'url': 'https://???.ggpht.com/???/photo.jpg', 'width': 64, 'height': 64}]},
# 'contextMenuEndpoint': {???},
# 'id': '???',
# 'timestampUsec': '1595943102558354',
# 'authorBadges': [{'liveChatAuthorBadgeRenderer': {'customThumbnail': {'thumbnails': [{'url': 'https://???.ggpht.com/???'}, {'url': 'https://???.ggpht.com/???'}]}, 'tooltip': '???', 'accessibility': {'accessibilityData': {'label': '???'}}}}],
# 'authorExternalChannelId': '???',
# 'contextMenuAccessibility': {???},
# 'timestampText': {'simpleText': '28.42'}
# }
yield {'liveChatTextMessageRenderer': add_chat['liveChatTextMessageRenderer']}
elif 'liveChatPaidMessageRenderer' in add_chat:
# {
# 'id': '???',
# 'timestampUsec': '1595941482934178',
# 'authorName': {'simpleText': '???'},
# 'authorPhoto': {'thumbnails': [{'url': 'https://???.ggpht.com/???/photo.jpg', 'width': 32, 'height': 32}, {'url': 'https://???.ggpht.com/???/photo.jpg', 'width': 64, 'height': 64}]},
# 'purchaseAmountText': {'simpleText': '200\xa0¥'},
# 'message': {'runs': [
# {'text': '???'},
# {'emoji': {'emojiId': '???', 'shortcuts': [':???:'], 'searchTerms': ['???'], 'image': {'thumbnails': [{'url': 'https://???.ggpht.com/???', 'width': 24, 'height': 24}, {'url': 'https://???.ggpht.com/???', 'width': 48, 'height': 48}], 'accessibility': {'accessibilityData': {'label': ':???:'}}}, 'isCustomEmoji': True}},
# {'text': '???'}
# ]},
# 'headerBackgroundColor': 4278237396,
# 'headerTextColor': 4278190080,
# 'bodyBackgroundColor': 4278248959,
# 'bodyTextColor': 4278190080,
# 'authorExternalChannelId': '???',
# 'authorNameTextColor': 3003121664,
# 'contextMenuEndpoint': {???},
# 'timestampColor': 2147483648,
# 'contextMenuAccessibility': {???},
# 'timestampText': {'simpleText': '1.58'}
# }
yield {'liveChatPaidMessageRenderer': add_chat['liveChatPaidMessageRenderer']}
elif 'liveChatMembershipItemRenderer' in add_chat:
# {
# 'id': '???',
# 'timestampUsec': '1595941068503043',
# 'timestampText': {'simpleText': '-4:50'},
# 'authorExternalChannelId': '???',
# 'headerSubtext': {'runs': [{'text': '???'}]},
# 'authorName': {'simpleText': '????'},
# 'authorPhoto': {'thumbnails': [{'url': 'https://???.ggpht.com/???/photo.jpg', 'width': 32, 'height': 32}, {'url': 'https://???.ggpht.com/???/photo.jpg', 'width': 64, 'height': 64}]},
# 'authorBadges': [{'liveChatAuthorBadgeRenderer': {'customThumbnail': {'thumbnails': [{'url': 'https://???.ggpht.com/???'}, {'url': 'https://???.ggpht.com/???'}]}, 'tooltip': '???', 'accessibility': {'accessibilityData': {'label': '???'}}}}],
# 'contextMenuEndpoint': {???},
# 'contextMenuAccessibility': {???}
# }
yield {'liveChatMembershipItemRenderer': add_chat['liveChatMembershipItemRenderer']}
# irrelevant
elif 'liveChatViewerEngagementMessageRenderer' in add_chat:
pass
elif 'liveChatPlaceholderItemRenderer' in add_chat:
pass
else:
debug('Unrecognized action item', add_chat)
# tickers out of scope for now
elif 'addLiveChatTickerItemAction' in sub_action:
pass
else:
debug('Unrecognized sub_action', sub_action)
else:
debug('Unrecognized action', action)
continuation_id = live_chat_continuation['continuations'][0]['liveChatReplayContinuationData']['continuation']
for action in iter_actions(sys.argv[1]):
print(json.dumps(action, ensure_ascii=False))edit: updated code to handle superchat and membership messages |
Thank you so much for this, it works and it's a huge help to me. I really hope we can have something similar to this implemented into youtube-dl soon |
|
There's a PR open if anyone wants to test, and I made a converter that generates niconico-style rolling chat in the ASS/SSA subtitle format to be used offline: https://github.com/siikamiika/scripts/tree/master/danmaku |
Checklist
Description
YouTube now has "chat replay" for recorded livestreams in the same style as Twitch, which youtube-dl already supports extraction of as a "subtitle". It would be beneficial for youtube-dl to also support extraction as a subtitle for YouTube, as like on Twitch, chat on YouTube can form a very important part of the livestream in question. There is no existing support for this in youtube-dl, or similar option that I can see.
There is a Python library at https://github.com/taizan-hokuto/pytchat which may be useful for the implementation of this.
Amongst other formats, it supports output as JSON, which could simply be passed back as the output for a new "subtitle" - the same style as the Twitch chat replay.
Use case example: The archiving of a YouTube channel, including all metadata. At the moment the chat replay would not be saved, meaning there is no context for content in the video which may refer to it.