-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
blog gallery-site-with-ai-generated-captions
- Loading branch information
Showing
2 changed files
with
64 additions
and
0 deletions.
There are no files selected for viewing
32 changes: 32 additions & 0 deletions
32
content/posts/gallery-site-with-ai-generated-captions-en.org
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
#+TITLE: Gallery site with AI-generated image captions | ||
#+DATE: <2024-05-15 Wed 16:20> | ||
#+TAGS[]: 技术 Cloudflare img.tianheg.org English | ||
|
||
As someone who occasionally takes photos, I've accumulated quite a few over the years with my phone. I've wanted to have an online photo album for a long time but was never quite sure which tools to use for building one. Recently, I came across this repository [[https://github.com/petrovicz/astro-photoswipe][petrovicz/astro-photoswipe]] in a newsletter, which I quite liked, and decided to give it a try along with [[https://astro.build/][Astro]], which was new to me. As for [[https://photoswipe.com/][PhotoSwipe]], it's an old friend; I used it for a while but then shifted my focus elsewhere. | ||
|
||
* Image to Text | ||
|
||
After deploying the photo album website, I noticed there were no captions for the images. Initially, I thought about writing them myself, but with so many photos, the workload seemed daunting. Then I wondered if there was an AI model that could convert images to text. Since the website is hosted on Cloudflare Pages, I naturally looked for models at Workers AI and found two (=@cf/llava-hf/llava-1.5-7b-hf= and =@cf/unum/uform-gen2-qwen-500m=). | ||
|
||
I considered the possibility of generating captions in real-time, where upon clicking an image, its description would be instantly created. However, I couldn't find a way to do this. After testing in bulk with a Node.js script, I found that generating captions for around 230 images took about 2 minutes, which included issues with images being too large to process and network connection problems. | ||
|
||
With the help of AI, I managed to write this script, encountering several issues along the way: | ||
|
||
1. My local network couldn't connect to Cloudflare's API, frequently timing out. Solution: Move the runtime environment to GitHub Action and add a 15-second timeout. | ||
2. Cloudflare Workers AI has a limit of 720 requests per minute for Image to Text models. Solution: Limit the maximum number of concurrent requests. | ||
3. How to upload all the image captions to Worker KV after obtaining them. Solution: Use =Promise.all()= (which can combine multiple iterations into one and output them). | ||
|
||
Some reflections: | ||
|
||
This script may not seem like much now, but before writing it, I was quite troubled, pondering how to solve the above problems. With the help of AI, I finally achieved my goal. When writing this script, I referred to [[https://github.com/dgurns/magic-ai-box][dgurns/magic-ai-box]] and used the REST API directly instead of Cloudflare's [[https://github.com/cloudflare/cloudflare-typescript][official SDK]]. | ||
|
||
* Remove exif info | ||
|
||
Two days later, I noticed an issue: the photos were all taken with a phone, and if I didn't remove the phone model and other information embedded in the images, it would be a significant security risk. So, I wrote another script to remove all the exif information from the images. The general process was: | ||
|
||
1. Traverse all the images in the target folder and check if there is any exif information left. | ||
2. If there is none, no further action is needed; if there is, remove the exif information. | ||
|
||
----- | ||
|
||
The code repository is at [[https://github.com/tianheg/img][GitHub]]. The website address is also there. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
#+TITLE: 为图片网站添加 AI 生成的说明 | ||
#+DATE: <2024-05-15 Wed 16:20> | ||
#+TAGS[]: 技术 Cloudflare img.tianheg.org | ||
|
||
作为一个偶尔拍照的人,拥有手机这么多年,手头也是积攒了很多照片的。想有一个在线相册很久了,一直不太确定使用哪些工具构建。最近在一封 newsletter 中遇到了这个代码库 [[https://github.com/petrovicz/astro-photoswipe][petrovicz/astro-photoswipe]],我挺喜欢的,顺便尝试一下以前没接触过的 [[https://astro.build/][Astro]]。至于 [[https://photoswipe.com/][PhotoSwipe]],它可是老朋友了,曾经用过一段时间,后来精力放在其他地方就没再使用。 | ||
|
||
* Image to text | ||
|
||
部署好整相册网站后,我注意到没有图片说明。一开始想的是可以手写,但这么多图片工作量很大,后来想着想着就想到:有没有可以图片转文字的 AI 模型呢?这个网站部署在 Cloudflare Pages 上,很自然地就去 Workers AI 那里找模型,有两个模型( =@cf/llava-hf/llava-1.5-7b-hf= 和 =@cf/unum/uform-gen2-qwen-500m= )。 | ||
|
||
本来想着能不能实时生成呢?就是我点击一张图片,瞬间生成这张图片的说明,没找到办法。我通过 Node.js 脚本测试批量后发现生成 230 张左右的图片需要 2 分钟,其中包括了因为图片太大而无法生成、网络连接问题的情况。 | ||
|
||
在借助 AI 的帮助下完成了这个脚本的编写,遇到的一些问题: | ||
|
||
1. 本地网络无法连接 Cloudflare 的 API,动不动就超时,解决办法:将运行环境放到 GitHub Action 中,还添加了 15 秒的超时; | ||
2. Cloudflare Workers AI 对于 Image to Text 类模型有每分钟 720 次请求的限制,解决办法:限制最大并发请求频次; | ||
3. 怎样在得到图片说明后,把所有图片的说明文本上传到 Worker KV 中,解决办法:使用 =Promise.all()= (它可以将多个迭代期约合并为一个并输出)。 | ||
|
||
一些感受: | ||
|
||
这个脚本现在看起来不觉得有什么了不起,但在写出来以前,我也是很头疼,思考怎么解决上面这几个问题。在 AI 的帮助下,总算是完成了我的目的。写这个脚本时,参考了 [[https://github.com/dgurns/magic-ai-box][dgurns/magic-ai-box]] 才直接用的 REST API,而不是 Cloudflare 的[[https://github.com/cloudflare/cloudflare-typescript][官方 SDK]]。 | ||
|
||
* Remove exif info | ||
|
||
两天以后,我注意到一个问题:拍摄的图片都来自手机,那么如果不把图片中附带的手机型号等信息删除,是很大的安全隐患。于是,我又写了个脚本来移除所有图片的 exif 信息。大致流程: | ||
|
||
1. 遍历目标文件夹下的所有图片,判断还有没有 exif 信息; | ||
2. 如果没有就不用处理,如果还有就移除 exif 信息。 | ||
|
||
----- | ||
|
||
代码仓库在[[https://github.com/tianheg/img][GitHub]]。网址也在那里。 |