Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] Add "most replayed" aka heatmap data #3888

Closed
5 of 6 tasks
niklas-englert opened this issue May 27, 2022 · 23 comments
Closed
5 of 6 tasks

[YouTube] Add "most replayed" aka heatmap data #3888

niklas-englert opened this issue May 27, 2022 · 23 comments
Labels
PR-needed Features that maintainers will not work on; but PRs are welcome site-enhancement Feature request for some website

Comments

@niklas-englert
Copy link

niklas-englert commented May 27, 2022

Checklist

  • I'm reporting a site feature request
  • I've verified that I'm running yt-dlp version 2022.05.18 (update instructions) or later (specify commit)
  • I've checked that all provided URLs are playable in a browser with the same IP and same login details
  • I've searched the bugtracker for similar issues including closed ones. DO NOT post duplicates
  • I've read the guidelines for opening an issue
  • I've read about sharing account credentials and I'm willing to share it if required

Region

Germany

Example URLs

https://www.youtube.com/watch?v=Z8Z51no1TD0

Description

Since Mai 18 YouTube started to roll out a feature adding a "most replayed" graph (internal name seems to be "heatmap") to the progress bar after they were experimenting with it for at least two years. (see this tweet)
FTEUf - WQAIHXTd

The data of this new feature seems to be missing right now. I'm currently helping myself laboriously with a self-written web extension. I hope what I found out so far is somehow helpful:
YouTube's implementation on the web page is relatively straight forward and easy to extract (using an extension inject). A SVG tag on the page (svg.ytp-heat-map-svg 1000x100) contains a path defined with cubic Bézier curves (a C followed by three x,y pairs).
Every third x,y parameter after a C, where x ends with 5.0, is a usable data point:
x is the time stamp in percent. Just compute (x-5)/1000 for a value from 0 to 1.
y is the heat value for this time period. Just compute (100-y)/100 for a value from 0 to 1.

Example:

Here's the SVG tag for this video:

<svg class="ytp-heat-map-svg" height="100%" preserveAspectRatio="none" version="1.1" viewBox="0 0 1000 100" width="100%" style="height: 40px;">
  <defs>
    <clipPath id="hm_1_0">
      <path class="ytp-heat-map-path" d="M 0.0,100.0 C 1.0,80.0 2.0,5.6 5.0,0.0 C 8.0,-5.6 11.0,57.4 15.0,71.8 C 19.0,86.1 21.0,71.3 25.0,71.8 C 29.0,72.2 31.0,73.0 35.0,74.0 C 39.0,75.1 41.0,76.3 45.0,77.0 C 49.0,77.7 51.0,77.0 55.0,77.5 C 59.0,78.0 61.0,79.0 65.0,79.7 C 69.0,80.4 71.0,80.9 75.0,81.0 C 79.0,81.1 81.0,80.3 85.0,80.2 C 89.0,80.1 91.0,80.3 95.0,80.7 C 99.0,81.0 101.0,81.5 105.0,81.8 C 109.0,82.2 111.0,82.1 115.0,82.4 C 119.0,82.6 121.0,82.7 125.0,82.9 C 129.0,83.1 131.0,83.5 135.0,83.4 C 139.0,83.4 141.0,82.7 145.0,82.4 C 149.0,82.2 151.0,82.3 155.0,82.3 C 159.0,82.2 161.0,82.7 165.0,82.1 C 169.0,81.4 171.0,79.3 175.0,78.9 C 179.0,78.6 181.0,80.2 185.0,80.4 C 189.0,80.5 191.0,79.7 195.0,79.7 C 199.0,79.6 201.0,80.4 205.0,80.1 C 209.0,79.8 211.0,78.5 215.0,78.2 C 219.0,77.9 221.0,78.5 225.0,78.8 C 229.0,79.0 231.0,79.6 235.0,79.6 C 239.0,79.6 241.0,78.6 245.0,78.7 C 249.0,78.8 251.0,79.8 255.0,80.0 C 259.0,80.2 261.0,79.8 265.0,79.8 C 269.0,79.8 271.0,79.9 275.0,80.1 C 279.0,80.2 281.0,80.7 285.0,80.6 C 289.0,80.5 291.0,79.7 295.0,79.5 C 299.0,79.3 301.0,79.9 305.0,79.8 C 309.0,79.7 311.0,78.9 315.0,78.8 C 319.0,78.8 321.0,79.2 325.0,79.5 C 329.0,79.8 331.0,80.2 335.0,80.2 C 339.0,80.3 341.0,80.0 345.0,79.8 C 349.0,79.6 351.0,79.4 355.0,79.3 C 359.0,79.2 361.0,79.3 365.0,79.2 C 369.0,79.2 371.0,79.1 375.0,79.0 C 379.0,79.0 381.0,79.1 385.0,79.1 C 389.0,79.1 391.0,79.3 395.0,79.0 C 399.0,78.7 401.0,78.0 405.0,77.7 C 409.0,77.5 411.0,77.7 415.0,77.8 C 419.0,77.9 421.0,78.3 425.0,78.3 C 429.0,78.3 431.0,78.1 435.0,77.9 C 439.0,77.6 441.0,77.3 445.0,77.1 C 449.0,76.8 451.0,77.0 455.0,76.7 C 459.0,76.4 461.0,75.7 465.0,75.5 C 469.0,75.4 471.0,76.3 475.0,76.1 C 479.0,75.9 481.0,74.9 485.0,74.4 C 489.0,73.8 491.0,73.6 495.0,73.3 C 499.0,73.0 501.0,72.8 505.0,72.6 C 509.0,72.5 511.0,72.8 515.0,72.5 C 519.0,72.1 521.0,71.2 525.0,70.9 C 529.0,70.6 531.0,70.7 535.0,70.8 C 539.0,70.9 541.0,71.3 545.0,71.2 C 549.0,71.1 551.0,70.2 555.0,70.3 C 559.0,70.3 561.0,71.3 565.0,71.6 C 569.0,71.8 571.0,71.5 575.0,71.5 C 579.0,71.6 581.0,71.7 585.0,71.8 C 589.0,71.9 591.0,72.0 595.0,71.9 C 599.0,71.8 601.0,71.2 605.0,71.1 C 609.0,70.9 611.0,71.0 615.0,71.1 C 619.0,71.1 621.0,71.3 625.0,71.4 C 629.0,71.5 631.0,71.7 635.0,71.7 C 639.0,71.7 641.0,71.6 645.0,71.4 C 649.0,71.3 651.0,71.1 655.0,71.1 C 659.0,71.1 661.0,71.2 665.0,71.4 C 669.0,71.6 671.0,72.1 675.0,72.0 C 679.0,71.9 681.0,71.1 685.0,70.9 C 689.0,70.7 691.0,71.2 695.0,71.0 C 699.0,70.9 701.0,70.4 705.0,70.2 C 709.0,70.0 711.0,70.1 715.0,70.0 C 719.0,70.0 721.0,70.1 725.0,70.0 C 729.0,69.9 731.0,69.6 735.0,69.7 C 739.0,69.7 741.0,70.2 745.0,70.5 C 749.0,70.8 751.0,71.0 755.0,71.0 C 759.0,71.0 761.0,70.5 765.0,70.6 C 769.0,70.7 771.0,71.4 775.0,71.6 C 779.0,71.7 781.0,71.3 785.0,71.4 C 789.0,71.4 791.0,71.6 795.0,71.9 C 799.0,72.1 801.0,72.7 805.0,72.8 C 809.0,73.0 811.0,72.7 815.0,72.8 C 819.0,72.8 821.0,72.7 825.0,73.1 C 829.0,73.5 831.0,74.0 835.0,74.8 C 839.0,75.7 841.0,76.5 845.0,77.1 C 849.0,77.8 851.0,77.7 855.0,78.1 C 859.0,78.5 861.0,78.7 865.0,79.0 C 869.0,79.4 871.0,79.6 875.0,79.8 C 879.0,80.0 881.0,80.0 885.0,80.2 C 889.0,80.3 891.0,80.2 895.0,80.5 C 899.0,80.9 901.0,81.1 905.0,81.8 C 909.0,82.5 911.0,83.1 915.0,84.1 C 919.0,85.1 921.0,85.9 925.0,87.1 C 929.0,88.2 931.0,89.4 935.0,90.0 C 939.0,90.6 941.0,90.0 945.0,90.0 C 949.0,90.0 951.0,90.0 955.0,90.0 C 959.0,90.0 961.0,90.0 965.0,90.0 C 969.0,90.0 971.0,90.0 975.0,90.0 C 979.0,90.0 981.0,90.0 985.0,90.0 C 989.0,90.0 992.0,90.0 995.0,90.0 C 998.0,90.0 999.0,88.0 1000.0,90.0 C 1001.0,92.0 1000.0,98.0 1000.0,100.0" fill="white" fill-opacity="0.6"></path>
    </clipPath>
  </defs>
  <rect class="ytp-heat-map-graph" clip-path="url(#hm_1_0)" fill="white" fill-opacity="0.2" height="100%" width="100%" x="0" y="0"></rect>
  <rect class="ytp-heat-map-hover" clip-path="url(#hm_1_0)" height="100%" x="0" y="0"></rect>
  <rect class="ytp-heat-map-play" clip-path="url(#hm_1_0)" height="100%" x="0" y="0"></rect>
</svg>

...and boiled down data:

"heatmap": [[0,1],[0.01,0.282],[0.02,0.282],[0.03,0.26],[0.04,0.23],[0.05,0.225],[0.06,0.203],[0.07,0.19],[0.08,0.198],[0.09,0.193],[0.1,0.182],[0.11,0.176],[0.12,0.171],[0.13,0.166],[0.14,0.176],[0.15,0.177],[0.16,0.179],[0.17,0.211],[0.18,0.196],[0.19,0.203],[0.2,0.199],[0.21,0.218],[0.22,0.212],[0.23,0.204],[0.24,0.213],[0.25,0.2],[0.26,0.202],[0.27,0.199],[0.28,0.194],[0.29,0.205],[0.3,0.202],[0.31,0.212],[0.32,0.205],[0.33,0.198],[0.34,0.202],[0.35,0.207],[0.36,0.208],[0.37,0.21],[0.38,0.209],[0.39,0.21],[0.4,0.223],[0.41,0.222],[0.42,0.217],[0.43,0.221],[0.44,0.229],[0.45,0.233],[0.46,0.245],[0.47,0.239],[0.48,0.256],[0.49,0.267],[0.5,0.274],[0.51,0.275],[0.52,0.291],[0.53,0.292],[0.54,0.288],[0.55,0.297],[0.56,0.284],[0.57,0.285],[0.58,0.282],[0.59,0.281],[0.6,0.289],[0.61,0.289],[0.62,0.286],[0.63,0.283],[0.64,0.286],[0.65,0.289],[0.66,0.286],[0.67,0.28],[0.68,0.291],[0.69,0.29],[0.7,0.298],[0.71,0.3],[0.72,0.3],[0.73,0.303],[0.74,0.295],[0.75,0.29],[0.76,0.294],[0.77,0.284],[0.78,0.286],[0.79,0.281],[0.8,0.272],[0.81,0.272],[0.82,0.269],[0.83,0.252],[0.84,0.229],[0.85,0.219],[0.86,0.21],[0.87,0.202],[0.88,0.198],[0.89,0.195],[0.9,0.182],[0.91,0.159],[0.92,0.129],[0.93,0.1],[0.94,0.1],[0.95,0.1],[0.96,0.1],[0.97,0.1],[0.98,0.1],[0.99,0.1]]
// or just
"heatmap": [1,0.282,0.282,0.26,0.23,0.225,0.203,0.19,0.198,0.193,0.182,0.176,0.171,0.166,0.176,0.177,0.179,0.211,0.196,0.203,0.199,0.218,0.212,0.204,0.213,0.2,0.202,0.199,0.194,0.205,0.202,0.212,0.205,0.198,0.202,0.207,0.208,0.21,0.209,0.21,0.223,0.222,0.217,0.221,0.229,0.233,0.245,0.239,0.256,0.267,0.274,0.275,0.291,0.292,0.288,0.297,0.284,0.285,0.282,0.281,0.289,0.289,0.286,0.283,0.286,0.289,0.286,0.28,0.291,0.29,0.298,0.3,0.3,0.303,0.295,0.29,0.294,0.284,0.286,0.281,0.272,0.272,0.269,0.252,0.229,0.219,0.21,0.202,0.198,0.195,0.182,0.159,0.129,0.1,0.1,0.1,0.1,0.1,0.1,0.1]
@niklas-englert niklas-englert added site-enhancement Feature request for some website triage Untriaged issue labels May 27, 2022
@niklas-englert niklas-englert changed the title Please extract YouTube's heatmap data [YouTube] Please extract YouTube's heatmap data May 27, 2022
@niklas-englert niklas-englert changed the title [YouTube] Please extract YouTube's heatmap data [YouTube] Add "most replayed" aka heatmap data May 27, 2022
@coletdjnz
Copy link
Member

For us the data is available in the initial data under playerOverlays
image

You also have the decorations for given timestamps (e.g. most replayed)

Not sure how we would extract this. Might need a new field?

@Lesmiscore
Copy link
Contributor

I think some other porn website (I don't remember what exactly is) have feature like heatmap, so it may be worth to define a new field

@niklas-englert
Copy link
Author

niklas-englert commented May 28, 2022

@Lesmiscore The implementation on the web app is a one-to-one copy. It's just uncanny.

The way it uses an <svg> element that has rectangles using a <path> over a <clipPath> as a mask is just wayyy too similar.

  1. There's no need for <clipPath> to be in a <defs>. <clipPath> can totally be on it's own.
  2. Hell, they could just use the <path> on it's own. Why the rectangles? On YouTube the entire use of a <clipPath> is unnecessary, they could just use <path> directly and change it's styling dynamically... but not if you're PH and have a more static approach on things, for them it makes sense to use clipped rectangles.
  3. Talking about the entire approach of using SVG... They could have used plain CSS clip-path: path('M 0 200 ... z'), sooo much easier. But CSS clip-path that wasn't supported when PH first implemented it back then.
  4. OMG, although YouTube exclusively uses kebab-case for IDs and classes in their DOM, they use IDs in snake_case (id="hm_1_0") for this one.

I bet money that YouTube's front-end Dev went like *clicks on search bar* P *autocompletes* *clicks on a video* F12 "ok, so that's what we're going with..." Ctrl+C

@Lesmiscore
Copy link
Contributor

YT renders that SVG from the initial data as explained in #3888 (comment)

@coletdjnz coletdjnz removed the triage Untriaged issue label May 28, 2022
@pukkandan
Copy link
Member

We could extract the raw data and put it in a new field, but what would users do with it?

@niklas-englert
Copy link
Author

niklas-englert commented May 28, 2022

but what would users do with it?

@pukkandan It's soft data, but there's still a lot you can do with it:

  1. On one hand it massively helps finding the relevant or iconic parts in a video. This could be anything: Bass drop, plot twist, announcement of a date, a joke, a certain clip.
  2. On the other hand it helps isolating the irrelevant parts of a video, like an ad or a filler scene.
  3. In combination with the view count it helps making an estimate of the watch time.
  4. In combination with chapters you can check if the given chapters are lining up with the user watch time and project a certain relevance to them.
  5. You can also check whether chapters were specified correctly or if e.g. the ad overlaps into the next section, or does not appear in the chapters at all.
  6. You can analyze the user interaction. Some examples with music videos:

@pukkandan
Copy link
Member

pukkandan commented May 28, 2022

I understand how this data is useful when shown overlayed on the video. But obv, yt-dlp can't do that. We can only extract the data and put it in a field in the .info.json. To be clear, I am not against doing this. My point is just that the data does not appear to be useful in this form (#3888 (comment)). Before implementing this, we should come up with some format (1) from which the useful information can be extracted easily and (2) that can be generalized to other sites as needed.

We shouldn't repeat the same mistake on this that yt-dlc made with live chat downloads. Implementing the live chat as a subtitle has made it quite difficult to expand this to other sites. Furthermore, by making it download the raw content, it made it difficult for any third parties to extract useful info out of the live chat.

When implementing this, I just want to avoid similar mistakes. Once the feature is implemented and released, it is assumed that third parties may depends on it. Any changes after that point will be bound by backward compatibility requirements

@pukkandan
Copy link
Member

On a related note, has anyone requested this feature to any video player (mpv/mpc/vlc)? I would like to see how they want to handle this

@niklas-englert
Copy link
Author

niklas-englert commented May 28, 2022

@pukkandan, I agree with you, the layout should be well thought out. But I found no standards, drafts or suggestions on how to format this. Not for any file format or player.

There also seems to be no obvious standard throughout proprietary software. P***Hub calls this feature "hotspots". It seems they just yeet the data to the front-end by embedding it into HTML using a JavaScript variable called flashvars. Just a plain list of watch time counters for every 5 seconds of the video (incremented by the pings that are send every 5 seconds by their player):

"hotspots":["2126473","1509162","1415385","1263371","1130396","1063220","1032573","1012590","1049435","1042380","1047685","1097360","1098579","1046719","1120822","1071058","1024836","1004157","1010109","1007019","1009978","1014976","1088071","1141994","1142713","1115550","1004629","909680","921539","865001","830299","827111","831371","803551","788714","755455","728531","699063","665349","642931","619397","716136","663634","661081","651885","598485","556105","549820","580571","677124","697647","763739","828103","779439","780161","821711","782140","745356","702643","676545","675660","718809","755582","766731","695390","665713","637099","633781","641765","630462","618966","610812","590461","581882","556716","549597","536563","578955","554676","593583","607611","721988","627043","619569","621223","609902","594560","588217","569446","638016","552857","549362","517154","508932","492234","495604","469092","467487","468587","516710","556376","659476","727488","774049","702127","653983","588627","548224","505261","471961","445253","397921"]

Naming it heatmap in the .info.json would make the most sense, as that's the only scientific term for this. But standardizing the rest seems to be hard:
How to handle normalized (0-100%) as well as absolute values for the markers? Include both? If yes, should the normalized value be calculated if missing or just not provided?
If we include "start" and "end"(/"duration"), what if a site doesn't provide it? Should that be calculated or just not be provided as well?

One of the possible formats could look like this:

"heatmap": [{
   "start": 0,          // Start of the marker in seconds. Always provided.
                        // Might be calculated with: video duration ÷ number of all markers · index of this marker
   "end": 4.321,        // End of the marker in seconds. Always provided.
                        // Might be calculated with: video duration ÷ number of all markers · (index of this marker + 1)
   "normalized": 0.81,  // Normalized heat value from 0 to 1. Always provided.
                        // Might be calculated with: absolute value of this marker ÷ biggest absolute value of all markers
   "absolute": 2126473, // Absolute heat value. Might be not provided at all.
  // ... add more attributes for markers in the future here
}, ...],
"heatmapMeta": { // ... add more data about the heat map in the future here

@upintheairsheep

This comment was marked as spam.

@Benjamin-Loison
Copy link

Benjamin-Loison commented Sep 8, 2022

I don't want to make an inappropriate remark but I have been contacted about this thread so I thought I may help some people in immediate need: my open-source YouTube operational API is able to retrieve the most replayed data from a YouTube video from its id by fetching https://A_USUAL_INSTANCE/videos?part=mostReplayed&id=VIDEO_ID (note that the official instance yt.lemnoslife.com is currently detected by Google as unusual).

@WillianAgostini
Copy link

To extract data youtube-heatmap

@ifeelagood

This comment was marked as off-topic.

@guifeliper
Copy link

guifeliper commented Dec 5, 2022

Are we going anywhere with this data?
I was interested in getting this data and am available for coding and help to decide what the data should look like.
in my point of view, we should define points for each second, similar to the answer here, but each second with the normalized data.

@pukkandan, I agree with you, the layout should be well thought out. But I found no standards, drafts or suggestions on how to format this. Not for any file format or player.

There also seems to be no obvious standard throughout proprietary software. P***Hub calls this feature "hotspots". It seems they just yeet the data to the front-end by embedding it into HTML using a JavaScript variable called flashvars. Just a plain list of watch time counters for every 5 seconds of the video (incremented by the pings that are send every 5 seconds by their player):

"hotspots":["2126473","1509162","1415385","1263371","1130396","1063220","1032573","1012590","1049435","1042380","1047685","1097360","1098579","1046719","1120822","1071058","1024836","1004157","1010109","1007019","1009978","1014976","1088071","1141994","1142713","1115550","1004629","909680","921539","865001","830299","827111","831371","803551","788714","755455","728531","699063","665349","642931","619397","716136","663634","661081","651885","598485","556105","549820","580571","677124","697647","763739","828103","779439","780161","821711","782140","745356","702643","676545","675660","718809","755582","766731","695390","665713","637099","633781","641765","630462","618966","610812","590461","581882","556716","549597","536563","578955","554676","593583","607611","721988","627043","619569","621223","609902","594560","588217","569446","638016","552857","549362","517154","508932","492234","495604","469092","467487","468587","516710","556376","659476","727488","774049","702127","653983","588627","548224","505261","471961","445253","397921"]

Naming it heatmap in the .info.json would make the most sense, as that's the only scientific term for this. But standardizing the rest seems to be hard: How to handle normalized (0-100%) as well as absolute values for the markers? Include both? If yes, should the normalized value be calculated if missing or just not provided? If we include "start" and "end"(/"duration"), what if a site doesn't provide it? Should that be calculated or just not be provided as well?

One of the possible formats could look like this:

"heatmap": [{
   "start": 0,          // Start of the marker in seconds. Always provided.
                        // Might be calculated with: video duration ÷ number of all markers · index of this marker
   "end": 4.321,        // End of the marker in seconds. Always provided.
                        // Might be calculated with: video duration ÷ number of all markers · (index of this marker + 1)
   "normalized": 0.81,  // Normalized heat value from 0 to 1. Always provided.
                        // Might be calculated with: absolute value of this marker ÷ biggest absolute value of all markers
   "absolute": 2126473, // Absolute heat value. Might be not provided at all.
  // ... add more attributes for markers in the future here
}, ...],
"heatmapMeta": { // ... add more data about the heat map in the future here

I didn't understand the calculation, can someone help a dumb guy, if I understand I can implement it.

EDIT: ok I got the instructions.

@pukkandan
Copy link
Member

I think niklas-englert's proposal is mostly good.

  • end/normalized can be generated by core code automatically
  • Do we need normalized? It can easily be calculated by the program consuming the data (imo not needed)
    • Alternately, do we need absolute? What does it actually mean? Number of times the point has been watched? Some magic number Youtube came up with that means nothing to us un-normalized?
  • Should we have a duration similar to chapters? (imo not needed)
  • What's heatmapMeta supposed to be?

Note that once we add a field, it can never be removed/changed due to compatibility requirements. But any field that we skip can always be added back in future if needed

@niklas-englert
Copy link
Author

niklas-englert commented Dec 5, 2022

My suggestion was designed to provide a standard able to represent heatmap data in a uniform way as well as to include all additional information that other sites provide as well as information that may come in the future.

  • Do we need normalized? It can easily be calculated by the program consuming the data (imo not needed)
    • Alternately, do we need absolute? What does it actually mean? Number of times the point has been watched? Some magic number Youtube came up with that means nothing to us un-normalized?

For YT (and for now) we can only provide normalized attributes as YT only sends 0%-100% (=normalized) heatmap values. absolute should only be added if YT (or another yt-dlp supported video service) binds an actual view counter (=absolute) to the heatmap points. (That would be conceivable for the future: CornHub and the heatmap from YouTube Analytics provide such absolute values.)

  • Should we have a duration similar to chapters? (imo not needed)

Well, you could drop start and end attributes... but the start and duration in milliseconds are provided in YTs own data... so if YT itself thinks it's necessary, I personally would include them as well.
At least it helps to distinguish whether the measurements were taken at a certain point in time (e.g. at exactly 12:40 only 12% of the viewers were left) or whether a time span was averaged (e.g. only 12% were watching between 12:00 and 13:20).

  • What's heatmapMeta supposed to be?

Can be completely omitted at the moment. I just wanted to demonstrate what could be done if more data than just the points of the heatmap are provided.

PS: I'm still looking forward for any implementation of yt-dlp. I have been receiving non-stop private messages asking for a solution ever since I opened the original issue on youtube-dl (50+ and still counting). Since I took my public email off GitHub, it's gotten a little better. But this is a compromise that I would like to reverse as soon as possible.

@guifeliper
Copy link

guifeliper commented Dec 6, 2022

About the duration and normalized, IMO the duration is necessary. I believe we don't need the start and end, because probably the heatmap will take most of the video, even if sometimes we have some picks (mountains) in the heatmap.
The normalization is a good idea, IMHO, because it will transform the heatmap in a coordinate of time, and value.

This is what I propose:

"heatmap": [{
   "time": 0,          // the second related to the heatmap 
   "normalized": 0.81,  // Normalized heat value from 0 to 1.
}, ...],

Edit: I am having a look in the code at the moment, and the theory about the

Every third x,y parameter after a C, where x ends with 5.0, is a usable data point:

is not usable when we have chapters. What is the rationale for the 5.0?. Can you explain? Maybe I will find out about the chapters.
Here it is the example for this video: https://www.youtube.com/watch?v=_lEzN8C5c7k

[
  [
    'M 0.0,100.0',
    '25.0,92.1 50.0,66.3 125.0,60.3',
    '200.0,54.4 275.0,66.4 375.0,70.4',
    '475.0,74.4 525.0,79.4 625.0,80.2',
    '725.0,81.1 800.0,75.7 875.0,74.6',
    '950.0,73.4 975.0,69.5 1000.0,74.6',
    '1025.0,79.6 1000.0,94.9 1000.0,100.0'
  ],
  [
    'M 0.0,100.0',
    '0.0,94.9 -10.0,79.6 0.0,74.6',
    '10.0,69.5 20.0,74.7 50.0,74.6',
    '80.0,74.4 110.0,75.0 150.0,73.9',
    '190.0,72.8 210.0,69.6 250.0,68.9',
    '290.0,68.2 310.0,70.0 350.0,70.4',
    '390.0,70.8 410.0,67.9 450.0,70.9',
    '490.0,73.9 510.0,81.4 550.0,85.2',
    '590.0,88.9 610.0,89.7 650.0,89.6',
    '690.0,89.5 710.0,88.3 750.0,84.7',
    '790.0,81.2 810.0,76.1 850.0,72.0',
    '890.0,67.9 920.0,65.8 950.0,64.2',
    '980.0,62.7 990.0,57.1 1000.0,64.2',
    '1010.0,71.4 1000.0,92.8 1000.0,100.0'
  ],
  [
    'M 0.0,100.0',
    '0.0,92.8 -14.3,71.4 0.0,64.2',
    '14.3,57.1 28.6,62.1 71.4,64.2',
    '114.3,66.4 157.1,71.9 214.3,74.9',
    '271.4,77.8 300.0,80.2 357.1,79.0',
    '414.3,77.9 442.9,70.1 500.0,69.1',
    '557.1,68.0 585.7,72.6 642.9,73.7',
    '700.0,74.9 728.6,81.4 785.7,74.7',
    '842.9,68.1 885.7,47.2 928.6,40.3',
    '971.4,33.5 985.7,28.4 1000.0,40.3',
    '1014.3,52.3 1000.0,88.1 1000.0,100.0'
  ],
  ...
]

@guifeliper
Copy link

Following the approach from @Benjamin-Loison and @WillianAgostini, I made the following gist in python.
https://gist.github.com/guifeliper/c8c79d312ea258c1ad776d2cbd919620

This gist does not use the methodology explained here, as I was not able to completely translate the logic once we have videos with chapters. What the Gist is doing is getting the HTML from a link search for a specific script that contains the ytInitialData, inside this data we can retrieve the original data from youtube.

You can also see the same code in Javascript, this is my original version. I have made a version from python that runs, but with a few errors on puppeteer, but as Python is not my "native" language and I do it only for a hobby I will let it for the experts to fix it.

https://github.com/guifeliper/yt-heatmap#readme
output example from JS:

[
  {
    timeRangeStartSeconds: 0,
    markerDurationSeconds: 11.25,
    heatMarkerIntensityScoreNormalized: 1
  },
  {
    timeRangeStartSeconds: 11.25,
    markerDurationSeconds: 11.25,
    heatMarkerIntensityScoreNormalized: 0.07961592756046977
  },
  ....
]

@aleksejrs
Copy link

aleksejrs commented Feb 13, 2023

There is an abandoned project that creates a timebar of thumbnails and other data, with a script that rendered it in MPV: https://github.com/nordlicht/nordlicht

Related to that project, YouTube also has a way to show several seekbar thumbnails at the same time, and there was in the past a userscript to show them all at the same time as a gallery. It would be nice to be able to get something like nordlicht/nordlicht#67 (comment) from them (click the image if it is slow to load).

@pukkandan
Copy link
Member

@aleksejrs How's that related to this issue? yt-dlp can already download storyboards (thumbnails in timebar)1. This post is about heatmap (see image in OP), not storyboards

Footnotes

  1. And further processing of the images is out of scope for us

@aleksejrs
Copy link

aleksejrs commented Feb 13, 2023

@pukkandan The program is related by having a script for MPV (I don't know if the script still works though).

How do I learn about the existence of storyboards (they seem to be mentioned in the format list, so the word appears a lot in issues, with no details) and how to use them?

@pukkandan
Copy link
Member

How do I learn about the existence of storyboards and how to use them?

They are listed in -F and can be selected with -f for downloading. If you have further questions, pls open a new issue

@Benjamin-Loison
Copy link

Benjamin-Loison commented Sep 14, 2023

I don't know if it's a YouTube side change or if I just found out a video having this behavior but sometimes (after multiple webpage refreshes) MX5GkDRIdno ends up with two Most replayed segments (see attachments) which leads to having null printed in the following Python code snippet.

from yt_dlp import YoutubeDL
import json

with YoutubeDL() as ydl: 
  info_dict = ydl.extract_info('https://www.youtube.com/watch?v=MX5GkDRIdno', download=False)
  print(json.dumps(info_dict.get('heatmap'), indent = 4))

Instead of considering:

ytInitialData['playerOverlays']['playerOverlayRenderer']['decoratedPlayerBarRenderer']['decoratedPlayerBarRenderer']['playerBar']['multiMarkersPlayerBarRenderer']['markersMap'][-1]['value']['heatmap']['heatmapRenderer']`

we have to consider:

ytInitialData['frameworkUpdates']['entityBatchUpdate']['mutations'][0]['payload']['macroMarkersListEntity']['markersList']

Note that the data structure doesn't just go from an element to an array, as for instance intensityScoreNormalized is renamed to heatMarkerIntensityScoreNormalized.

I am also currently managing this issue in my YouTube operational API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR-needed Features that maintainers will not work on; but PRs are welcome site-enhancement Feature request for some website
Projects
None yet
Development

No branches or pull requests

10 participants