From 2381df53f2dfe4f1549f0c6de511c6c830ffeed1 Mon Sep 17 00:00:00 2001 From: tianheg Date: Thu, 16 May 2024 09:47:05 +0800 Subject: [PATCH] blog gallery-site-with-ai-generated-captions --- ...ery-site-with-ai-generated-captions-en.org | 32 +++++++++++++++++++ ...allery-site-with-ai-generated-captions.org | 32 +++++++++++++++++++ 2 files changed, 64 insertions(+) create mode 100644 content/posts/gallery-site-with-ai-generated-captions-en.org create mode 100644 content/posts/gallery-site-with-ai-generated-captions.org diff --git a/content/posts/gallery-site-with-ai-generated-captions-en.org b/content/posts/gallery-site-with-ai-generated-captions-en.org new file mode 100644 index 0000000000..d32219e502 --- /dev/null +++ b/content/posts/gallery-site-with-ai-generated-captions-en.org @@ -0,0 +1,32 @@ +#+TITLE: Gallery site with AI-generated image captions +#+DATE: <2024-05-15 Wed 16:20> +#+TAGS[]: 技术 Cloudflare img.tianheg.org English + +As someone who occasionally takes photos, I've accumulated quite a few over the years with my phone. I've wanted to have an online photo album for a long time but was never quite sure which tools to use for building one. Recently, I came across this repository [[https://github.com/petrovicz/astro-photoswipe][petrovicz/astro-photoswipe]] in a newsletter, which I quite liked, and decided to give it a try along with [[https://astro.build/][Astro]], which was new to me. As for [[https://photoswipe.com/][PhotoSwipe]], it's an old friend; I used it for a while but then shifted my focus elsewhere. + +* Image to Text + +After deploying the photo album website, I noticed there were no captions for the images. Initially, I thought about writing them myself, but with so many photos, the workload seemed daunting. Then I wondered if there was an AI model that could convert images to text. Since the website is hosted on Cloudflare Pages, I naturally looked for models at Workers AI and found two (=@cf/llava-hf/llava-1.5-7b-hf= and =@cf/unum/uform-gen2-qwen-500m=). + +I considered the possibility of generating captions in real-time, where upon clicking an image, its description would be instantly created. However, I couldn't find a way to do this. After testing in bulk with a Node.js script, I found that generating captions for around 230 images took about 2 minutes, which included issues with images being too large to process and network connection problems. + +With the help of AI, I managed to write this script, encountering several issues along the way: + +1. My local network couldn't connect to Cloudflare's API, frequently timing out. Solution: Move the runtime environment to GitHub Action and add a 15-second timeout. +2. Cloudflare Workers AI has a limit of 720 requests per minute for Image to Text models. Solution: Limit the maximum number of concurrent requests. +3. How to upload all the image captions to Worker KV after obtaining them. Solution: Use =Promise.all()= (which can combine multiple iterations into one and output them). + +Some reflections: + +This script may not seem like much now, but before writing it, I was quite troubled, pondering how to solve the above problems. With the help of AI, I finally achieved my goal. When writing this script, I referred to [[https://github.com/dgurns/magic-ai-box][dgurns/magic-ai-box]] and used the REST API directly instead of Cloudflare's [[https://github.com/cloudflare/cloudflare-typescript][official SDK]]. + +* Remove exif info + +Two days later, I noticed an issue: the photos were all taken with a phone, and if I didn't remove the phone model and other information embedded in the images, it would be a significant security risk. So, I wrote another script to remove all the exif information from the images. The general process was: + +1. Traverse all the images in the target folder and check if there is any exif information left. +2. If there is none, no further action is needed; if there is, remove the exif information. + +----- + +The code repository is at [[https://github.com/tianheg/img][GitHub]]. The website address is also there. diff --git a/content/posts/gallery-site-with-ai-generated-captions.org b/content/posts/gallery-site-with-ai-generated-captions.org new file mode 100644 index 0000000000..6b1fc338c6 --- /dev/null +++ b/content/posts/gallery-site-with-ai-generated-captions.org @@ -0,0 +1,32 @@ +#+TITLE: 为图片网站添加 AI 生成的说明 +#+DATE: <2024-05-15 Wed 16:20> +#+TAGS[]: 技术 Cloudflare img.tianheg.org + +作为一个偶尔拍照的人,拥有手机这么多年,手头也是积攒了很多照片的。想有一个在线相册很久了,一直不太确定使用哪些工具构建。最近在一封 newsletter 中遇到了这个代码库 [[https://github.com/petrovicz/astro-photoswipe][petrovicz/astro-photoswipe]],我挺喜欢的,顺便尝试一下以前没接触过的 [[https://astro.build/][Astro]]。至于 [[https://photoswipe.com/][PhotoSwipe]],它可是老朋友了,曾经用过一段时间,后来精力放在其他地方就没再使用。 + +* Image to text + +部署好整相册网站后,我注意到没有图片说明。一开始想的是可以手写,但这么多图片工作量很大,后来想着想着就想到:有没有可以图片转文字的 AI 模型呢?这个网站部署在 Cloudflare Pages 上,很自然地就去 Workers AI 那里找模型,有两个模型( =@cf/llava-hf/llava-1.5-7b-hf= 和 =@cf/unum/uform-gen2-qwen-500m= )。 + +本来想着能不能实时生成呢?就是我点击一张图片,瞬间生成这张图片的说明,没找到办法。我通过 Node.js 脚本测试批量后发现生成 230 张左右的图片需要 2 分钟,其中包括了因为图片太大而无法生成、网络连接问题的情况。 + +在借助 AI 的帮助下完成了这个脚本的编写,遇到的一些问题: + +1. 本地网络无法连接 Cloudflare 的 API,动不动就超时,解决办法:将运行环境放到 GitHub Action 中,还添加了 15 秒的超时; +2. Cloudflare Workers AI 对于 Image to Text 类模型有每分钟 720 次请求的限制,解决办法:限制最大并发请求频次; +3. 怎样在得到图片说明后,把所有图片的说明文本上传到 Worker KV 中,解决办法:使用 =Promise.all()= (它可以将多个迭代期约合并为一个并输出)。 + +一些感受: + +这个脚本现在看起来不觉得有什么了不起,但在写出来以前,我也是很头疼,思考怎么解决上面这几个问题。在 AI 的帮助下,总算是完成了我的目的。写这个脚本时,参考了 [[https://github.com/dgurns/magic-ai-box][dgurns/magic-ai-box]] 才直接用的 REST API,而不是 Cloudflare 的[[https://github.com/cloudflare/cloudflare-typescript][官方 SDK]]。 + +* Remove exif info + +两天以后,我注意到一个问题:拍摄的图片都来自手机,那么如果不把图片中附带的手机型号等信息删除,是很大的安全隐患。于是,我又写了个脚本来移除所有图片的 exif 信息。大致流程: + +1. 遍历目标文件夹下的所有图片,判断还有没有 exif 信息; +2. 如果没有就不用处理,如果还有就移除 exif 信息。 + +----- + +代码仓库在[[https://github.com/tianheg/img][GitHub]]。网址也在那里。