New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

OpenAI Sora 技术解析 #4

Closed

zhendi opened this issue Feb 22, 2024 · 1 comment

Owner

zhendi commented Feb 22, 2024

No description provided.

github-actions bot commented Feb 22, 2024

Sora模型的两大原则是大规模和大规模生成式。
Sora模型先将视频压缩到潜在空间，然后将其分块成词元，最后将其转换为文本数据。
Sora模型使用了一种名为“扩散变压器”的模型结构，该结构既可以利用变压器的优点，又可以利用扩散模型的优点。
Sora模型能够生成高分辨率视频，并支持不同分辨率和纵横比。
Sora模型能够提高视频取景和构图的质量。
Sora模型需要大量算力资源进行训练，训练时间和算力资源越多，生成的视频质量越好。
Sora模型具有许多仿真能力，包括三维场景一致性、长距离相干和物体恒存、能够与世界交互以及能够仿真数字世界。
Sora模型的应用场景包括画质增强、在空间或时间上扩展视频、通过文字描述将图片转换为视频、拼接融合多个视频等。
Sora模型的实现细节没有在技术报告中提供。
想更深入地理解Sora模型的模型原理，可以参考技术报告所引用的32篇论文，特别是[21]-[25]。
如果想进一步理解扩散变压器模型，可以参考William Peebles和Saining Xie发表的论文“可扩展的具有变压器的扩散模型”。
如果想思考人类继古典时代、中世纪、文艺复兴和现代世界之后在人工智能时代如何认识现实的本质以及人在现实中的作用，建议阅读基辛格的《人工智能时代与人类未来》。

zhendi closed this as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment