Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Properly escape & character in ttml (xml) subtitles #21894
Comments
|
Is there any information I missed? Or is this request outside the scope of the project? Thanks! |
Checklist
Description
WRITE DESCRIPTION HERE
I recently downloaded some subtitles from Discovery and they come with a ttml extension, internally being just XML files.
After manually inspecting one that failed to convert to srt using different tools, the culprit was an
&symbol not escaped properly&<p region="pop317" begin="00:07:40.133" end="00:07:45.800"><span tts:backgroundColor="black" tts:color="white">BACK AT MASELLI & SONS.</span></p>I have no way to verify if the file was malformed on the server or this was an error when youtube-dl wrote the output file but maybe checking for properly escaped XML files is something that can be added to youtube-dl when an XML subtitle is downloaded.
Complete xml file is here https://pastebin.com/pVw1pb7p