Open
Description
Before Asking 在提问之前
-
I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
Search before asking 先搜索,再提问
Question
我使用图像去重算子,希望同时考虑文本
- image_deduplicator: # deduplicator to deduplicate samples at document-level using exact matching of images between documents.
method: phash # hash method for image. One of [phash, dhash, whash, ahash]
consider_text: true
有两个问题:1. 图像的phash可以设置相似度阈值吗? 2. 设置consider_text为true时,也希望设置相似度阈值来测试图文模态的去重效果,请问如何实现?
Additional 额外信息
No response