Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How To Add "date", "title", "description", & "tags" to a TEXT file? #26831

Open
syberknight opened this issue Oct 7, 2020 · 11 comments
Open

How To Add "date", "title", "description", & "tags" to a TEXT file? #26831

syberknight opened this issue Oct 7, 2020 · 11 comments
Labels

Comments

@syberknight
Copy link

@syberknight syberknight commented Oct 7, 2020

Checklist

  • I'm asking a question
  • I've looked through the README and FAQ for similar questions
  • I've searched the bugtracker for similar questions including closed ones

Question

WRITE QUESTION HERE...

i'm using youtube-dl in the Terminal on a Mac (Catalina) to download all my videos from YouTube.

i would really like to have the...
Date:
Title:
Description:
Tags:
...in a .txt file for each download, plain'n'simple :-)

i have figured out & installed homebrew to get ffmpeg & atomicparsley, and have read elsewhere that those 'could' be used to do such a thing, but cannot find nor figure out how to do it.

i know there's the "--write-info-json" option, but that's just WAY tooooo much.
i also am aware of the "--write-description" but that's obviously just the description.

alternatively, if there's a way to make the json file to ONLY include those 4 items, that would be acceptable too.

any help would be greatly appreciated!
thanks!

@syberknight syberknight added the question label Oct 7, 2020
@syberknight syberknight changed the title How To Add "upload_date", "title", "description", & "tags" to a TEXT file? How To Add "date", "title", "description", & "tags" to a TEXT file? Oct 7, 2020
@syberknight
Copy link
Author

@syberknight syberknight commented Oct 9, 2020

is this possible? any other similar solutions? thx.

@Fetchinator7
Copy link

@Fetchinator7 Fetchinator7 commented Oct 11, 2020

Well, there are the --get-title and --get-description options, but if you want the date and tags I believe you'll need to use the json dump. Don't worry, the json isn't too crazy.
To find out what keys are available I used --dump-json and searched the result for the keywords you want.
What you're looking to do is a fairly simple operation so I elected to write a shell script.
Note: since you're in MacOs you'll need to run brew install jq (as apposed to choco install jq -y on Windows 10).
Disclaimer: I've never used jq before so there's probably a more efficient way of doing this.

Copy this code into a file and run in from the command line like this: path/to/script/file.sh "https://youtu.be/QH2-TGUlwu4"
(You specifically asked about downloading a text file and not the actual video, but if you want to do both you can just paste your command in here on a new line.)

#!/bin/bash
# Get the video link from the command line input.
LINK_STR="$1"
TITLE_TEMPLATE="%(title)s.%(ext)s"
# Path to the folder you want to download to.
# This could be substituted with "$2" then include the output folder as the second argument when executing this file.
OUTPUT_PATH="/example/output/path"
cd $OUTPUT_PATH

# Get the title of the video that will be downloaded to set the output filename.
# NOTE: --restrict-filenames removes any invalid filename characters from the title.
OUTPUT_TITLE="$(youtube-dl -o $TITLE_TEMPLATE --get-title --restrict-filenames $LINK_STR)"

# Get the json info and put select info in a text file.
# "|" pipes the outputs of each command to the next one.
# Get a string of all the json info for the input video:
# "$(youtube-dl --dump-json $LINK_STR)"
# Make our own output object with custom keys. (See https://jqplay.org for a breakdown.)
# NOTE: tags are an array so it needs to be run through | join(", ") or else the tag writing fails.
# NOTE: the date comes through as yyyymmdd so format it by selecting substrings and inserting dashes between them.
# "jq -r '{ Date: .upload_date | (.[0:4] + "-" + .[4:6] + "-" + .[6:8]), Title: .fulltitle, Description: .description, Tags: .tags | join(", ") }'"
# Format our custom object be setting a leading "key" and "value" so we can wrap the value in quotes.
# " | jq -r 'to_entries | .[] | .key + ": \"" + .value + "\""'
# Write the output to a new file.
# > "$OUTPUT_TITLE.txt"
echo "$(youtube-dl --dump-json $LINK_STR)" | jq -r '{ Date: .upload_date | (.[0:4] + "-" + .[4:6] + "-" + .[6:8]), Title: .fulltitle, Description: .description, Tags: .tags | join(", ") }' | jq -r 'to_entries | .[] | .key + ": \"" + .value + "\""' > "$OUTPUT_TITLE.txt"

Which produces a file like this:

Date: "2011-04-05"
Title: "Nyan Cat [original]"
Description: "For PJ.

Check out Nyan Cat at http://nyan.cat/
Official Nyan Cat Facebook: http://www.facebook.com/NyanCatWorld
Nyan Cat on Twitter: https://twitter.com/nyannyancat

Nyan Cat Store: http://nyancat.cat/store.html

GIF by PRguitarman http://www.prguitarman.com/index.php?id=348
Song by Daniwell-P/Momone Momo UTAU http://momolabo.lolipop.jp/nyancatsong/Nyan/
***used with permission; I own neither***"
Tags: "defuse, banana, everything is gonna happy, pop tart, computer-graphics, flying through space, footage, mw2, sniper, montage, Modern, apple, porn, mp5k, acog, info, h3cz, nyan, leaked, hot, cat, hatsune miku, gif, modern, gameplay, sexy, 桃音モモ, Sniper, lolcomics, optic, amazing, dance music, M16, snd, ninja, animated, Warfare, meow, momone momo, rainbows, love, 50cal, annoying, UTAU, warfare"

Woah, didn't expect to see porn, sexy, and warfare on there 😆 .

@syberknight
Copy link
Author

@syberknight syberknight commented Oct 12, 2020

Wow, Thank you @Fetchinator7

feels a little (honestly, a lot) over my head but i will give it a try.

i think you're right, given the lack of comments here & over on stack... it seems the json dump (--write-info-json) is my only option. i did that & it contains SO much other stuff that i don't know what it all is. if your script doesn't work, perhaps i can find some mac app or script that can bulk/batch extract just those parts.

what is "jq"?

i won't have time to try this for a couple days, but i'll be sure to post back when i do.

thanks for your generosity.

@syberknight
Copy link
Author

@syberknight syberknight commented Oct 16, 2020

hi @Fetchinator7

i really do appreciate the time you took to respond to me. unfortunately, it seems to be a bit over my head :-(

i'm trying to "batch" download all the videos from my playlists & am needing to make the process as easy as possible.

i've managed to finally piece together this line that works great to do the actual downloading...
youtube-dl --ignore-errors --write-thumbnail --write-info-json --restrict-filenames -f '(bestvideo[width>=1920][ext=mp4])+bestaudio[ext=m4a]/best[ext=mp4]/best' --add-metadata --embed-thumbnail -o '/Volumes/LaCie/DUMPSTER/%(upload_date)s %(title)s.%(ext)s' https://www.youtube.com/playlist?list=XXXXXXXXXXXXXXX

but the JSON file that this produces is huge with tons of UNneeded stuff.

i've dug thru it and to my surprise discovered that the Youtube API doesn't seem to include the "Tags" in this. i haven't been able to find any way to get the tags with these videos. oh well, not the end of the world.

so, i guess since there doesn't seem to be a way to pre-process what i want into a text file, THEN...

is it possible to "batch" all the JSON files that this downloads and delete everything in it EXCEPT the Date, Title, & Description?

@Fetchinator7
Copy link

@Fetchinator7 Fetchinator7 commented Oct 17, 2020

I added your download command so now this will generate the text files and download the videos.

I'm really sorry, I forgot I made that file executable. You need to either run
bash path/to/script/file.sh "https://www.youtube.com/playlist?list=XXXXXXXXXXXXXXX" or run
chmod +x path/to/script/file.sh before you can just run
path/to/script/file.sh "https://www.youtube.com/playlist?list=XXXXXXXXXXXXXXX"

Take out the --write-info-json since that produces an actual json file which you said you don't want. --dump-json obtains the json info without writing it which is how that line works.
I personally prefer to cd into the output directory (open Terminal, cd then drag and drop the folder on the Terminal window and press enter) but in this case the script takes care of it so don't include /Volumes/LaCie/DUMPSTER/ in the -o.

Remember, "tags" is all lowercase as seen in Tags: .tags.

Keep in mind that best[ext=mp4] will get the best file that already has a .mp4 extension but if you want the best quality there's probably a .webm that's a little better but you'd need to re-encode the output which can be done automatically so I'll include that in the code:
youtube-dl -f bestvideo+bestaudio --recode-video mp4
I recommend putting this in your .sh file:

#!/bin/bash
# Path to the folder you want to download to.
# This could be substituted with "$2" then include the output folder as the second argument when executing this file.
OUTPUT_PATH="/Volumes/LaCie/DUMPSTER"
cd "$OUTPUT_PATH" || exit

# Temp file with the video ids on new lines.
FILENAME="playlist_video_ids.txt"
# Playlist url from command line.
PLAYLIST_URL="$1"

# Get all the video ids in the playlist and put them on new lines in a text file.
echo "$(youtube-dl --get-id "$PLAYLIST_URL")" > "$FILENAME"

# Read the text file of video ids.
ALL_LINES="$(cat $FILENAME)"

# Download each video in the playlist individually.
for VIDEO_ID in $ALL_LINES ; 
do
    # Get the link to the video by adding the video id .
    LINK_STR="https://youtu.be/$VIDEO_ID"
    TITLE_TEMPLATE="%(upload_date)s %(title)s.%(ext)s"
    TITLE_TEMPLATE_NO_EXT="%(upload_date)s %(title)s"

    # Get the restricted filename of the video that will be downloaded to set the output filename.
    # NOTE: --restrict-filenames removes any invalid filename characters from the title.
    OUTPUT_TITLE="$(youtube-dl -o "$TITLE_TEMPLATE_NO_EXT" --restrict-filenames --get-filename "$LINK_STR")"
    # Wait for 10 seconds to avoid youtube blocking our requests.
    sleep 10

    # Run your command to download the videos in the playlist.
    YOUTUBE_DL_OUTPUT="$(youtube-dl -i -o "$TITLE_TEMPLATE" -f "(bestvideo[width>=1920])+bestaudio" --restrict-filenames --recode-video mp4 --add-metadata --embed-thumbnail --all-subs --embed-subs "$LINK_STR")"
    echo "$YOUTUBE_DL_OUTPUT"

    # Get the json info and put select info in a text file.
    # "|" pipes the outputs of each command to the next one.
    # Get a string of all the json info for the input video:
    # "$(youtube-dl --dump-json $LINK_STR)"
    # Make our own output object with custom keys. (See https://jqplay.org for a breakdown.)
    # NOTE: tags are an array so it needs to be run through | join(", ") or else the tag writing fails.
    # NOTE: the date comes through as yyyymmdd so format it by selecting substrings and inserting dashes between them.
    # "jq -r '{ Date: .upload_date | (.[0:4] + "-" + .[4:6] + "-" + .[6:8]), Title: .fulltitle, Description: .description, Tags: .tags | join(", ") }'"
    # Format our custom object be setting a leading "key" and "value" so we can wrap the value in quotes.
    # " | jq -r 'to_entries | .[] | .key + ": \"" + .value + "\""'
    # Write the output to a new file.
    # > "$OUTPUT_TITLE.txt"
    echo "$(youtube-dl --dump-json "$LINK_STR")" | jq -r '{ Date: .upload_date | (.[0:4] + "-" + .[4:6] + "-" + .[6:8]), Title: .fulltitle, Description: .description, Tags: .tags | join(", ") }' | jq -r 'to_entries | .[] | .key + ": \"" + .value + "\""' > "$OUTPUT_TITLE.txt"
done

# Delete the temporary file.
rm "$FILENAME"

Then run it from the command line:
bash path/to/script/file.sh "https://www.youtube.com/playlist?list=XXXXXXXXXXXXXXX"

NOTE: I don't think you want to keep the thumbnail but incase you do add this back in: --write-thumbnail and I included the video subtitles but if you don't want those just delete this: --all-subs --embed-subs

@syberknight
Copy link
Author

@syberknight syberknight commented Oct 18, 2020

@Fetchinator7
okay, i got "JQ" installed, put your code in a .sh file & followed your instructions. it worked... mostly!

out of 205 videos in the first playlist...

  • 2 gave an "error: did not get any data blocks" in the terminal readout & failed to finish the process for those two. left a ".mp4.part" for the video & ".webp" for the thumbnail (which usually turns into a mp4 & jpg). i have verified that the video is on youtube & can download it manually.

  • 13 gave an "error: YouTube said: Unable to extract video data" in the terminal readout:

    • of which 7 was missing the text file altogether;
    • of which 6 had the text file just fine.
  • 7 gave a "jq: error (at :1): Cannot iterate over null (null)" in the terminal readout,

    • of which 2 created blank text files;
    • of which 5 seemed just fine (couldn't find any problems with these).
  • then after looking at each text file, i found 12 of them were blank (that includes the 2 mentioned above);

  • and then after comparing the numbers, i found there was 1 missing video & missing thumbnail, but that missing one did have a corresponding text file;

  • and lastly, there were 7 missing text files which did have corresponding mp4's & jpg's.

...so any idea how i can make the process more foolproof?

re: about the bestvideo+bestaudio bit... i specifically landed on what i had for that because after much trial'n'error'n'reading, discovered those look at the bitrate over pixel size. i was having a LOT of 720p versions download instead of the original 1080p version. so that way i had it written seemed to be the only way i found to obtain the original 1080p versions (then go down if that size didn't exist). would my removing your -f "(bestvideo[width>=1920])+bestaudio" --restrict-filenames --recode-video mp4 and re-adding my -f '(bestvideo[width>=1920][ext=mp4])+bestaudio[ext=m4a]/best[ext=mp4]/best' in its place cause some of the problems mentioned above?

question: most of the titles on youtube, and within the descriptions, there are quotes. should i somehow strip those? wondering if that would cause any problems IF i wanted to import all these text files into a spreadsheet or if someday in the future there becomes a way to import these into a database of some sort. i noticed in the json's those inner quotes were escaped with a backslash. i get the purpose of having the whole value encased in quotes, so i probably should 'not' remove those, unless there's a better character to use besides quotes (perhaps curly brackets or backtics?) to keep the inner ones as-is for readability. your thoughts on that?

and... SUPER HAPPY that i was wrong about the Tags not being available. i must've just missed it in the json amidst the plethora of other junk :-)

again, THANK YOU SO MUCH for your help on this. i recognize it's above'n'beyond. is there some way i can compensate you for your time & expertise? ✌🏼

@syberknight
Copy link
Author

@syberknight syberknight commented Oct 20, 2020

followup...

i got adventurous with regards to my question about replacing the surrounding quotes.
i figured out how to take this... jq -r 'to_entries | .[] | .key + ": \"" + .value + "\""'
and remove the /" on either side of the "value" part.

also of note, it takes a good 30-60'ish seconds to get started & more than the 10 seconds between each record. is that normal?

i'm still pretty concerned about all the errors & missing stuff i described above. am seeking to make that more foolproof before i get started on my actual project; but i may need to just get going on it & go thru each batch like i did the above to find any problems. would obviously like to avoid that hassle ongoing if possible.

any other help with all this would be greatly appreciated :-)

@Fetchinator7
Copy link

@Fetchinator7 Fetchinator7 commented Oct 20, 2020

@syberknight If you want to use this in a spreadsheet you should probably redo it for that format. I imagine there’s some json to spreadsheet converter that would be easier.
I had problems with inconsistent downloads too so try increasing the 10 second delay and that should hopefully fix it.
I’ll look into this more in a few hours.

@syberknight
Copy link
Author

@syberknight syberknight commented Oct 20, 2020

@Fetchinator7
understood. i don't necessarily need it in a spreadsheet - was just thinking of that possibility in the future, if i need a single file listing all - but yeah, i can cross that bridge then, IF needbe. the important stuff is getting it all downloaded & organized.
so THANK YOU again for your help on this.

i will try it again by increasing the 10 seconds part & see what happens & report back.

@Fetchinator7
Copy link

@Fetchinator7 Fetchinator7 commented Oct 21, 2020

@syberknight I feel like this is going beyond the scope of this GitHub issue so do you want to join my new discord server or message Fetchinator7#9036 on discord? Or some other platform?

@syberknight
Copy link
Author

@syberknight syberknight commented Oct 21, 2020

@Fetchinator7
FYI, i changed the 10 seconds to 30 seconds.
this time, out of 205 records i got...
27 ERROR: YouTube said: Unable to extract video data
and 2 ERROR: Did not get any data blocks
and 18 of the text files were blank.

aside from those issues, it's working perfect ;-)

i'm so sorry, and am so appreciative of your help. i'm not sure what a discord server is but i want to do what's appropriate. so i'm game... signing up now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.