Skip to content

Commit 803a609

Browse files
committed
Updated get_video_info() to work with new YouTube HTML formatting
1 parent 542cc36 commit 803a609

File tree

2 files changed

+10
-15
lines changed

2 files changed

+10
-15
lines changed

web-scraping/youtube-extractor/README.md

+6-11
Original file line numberDiff line numberDiff line change
@@ -8,22 +8,17 @@ To run this:
88
**Output:**
99
```
1010
Title: Me at the zoo
11-
Views: 106602383
12-
Published at: 23/04/2005
11+
Views: 172639597
12+
Published at: 2005-04-23
1313
Video Duration: 0:18
1414
Video tags: me at the zoo, jawed karim, first youtube video
15-
Likes: 3825489
16-
Dislikes: 111818
15+
Likes: 8188077
16+
Dislikes: 191986
1717

18-
Description: The first video on YouTube. Maybe it's time to go back to the zoo?
19-
20-
NEW VIDEO LIVE! https://www.youtube.com/watch?v=dQw4w...
21-
22-
23-
== Ok, new video as soon as 10M subscriberz! ==
18+
Description: The first video on YouTube. While you wait for Part 2, listen to this great song: https://www.youtube.com/watch?v=zj82_v2R6ts
2419

2520

2621
Channel Name: jawed
2722
Channel URL: https://www.youtube.com/channel/UC4QobU6STFB0P71PMvOGN5A
28-
Channel Subscribers: 1.03M
23+
Channel Subscribers: 1.98M subscribers
2924
```

web-scraping/youtube-extractor/extract_video_info.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,13 @@ def get_video_info(url):
1616
# initialize the result
1717
result = {}
1818
# video title
19-
result["title"] = soup.find("h1").text.strip()
19+
result["title"] = soup.find("meta", itemprop="name")['content']
2020
# video views (converted to integer)
21-
result["views"] = int(''.join([ c for c in soup.find("span", attrs={"class": "view-count"}).text if c.isdigit() ]))
21+
result["views"] = soup.find("meta", itemprop="interactionCount")['content']
2222
# video description
23-
result["description"] = soup.find("yt-formatted-string", {"class": "content"}).text
23+
result["description"] = soup.find("meta", itemprop="description")['content']
2424
# date published
25-
result["date_published"] = soup.find("div", {"id": "date"}).text[1:]
25+
result["date_published"] = soup.find("meta", itemprop="datePublished")['content']
2626
# get the duration of the video
2727
result["duration"] = soup.find("span", {"class": "ytp-time-duration"}).text
2828
# get the video tags

0 commit comments

Comments
 (0)