Move to a more forgiving distance at safety checker #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Moving to Jaccard distance in order to make the safety checker more forgiving with SFW prompts/images
Compared to cosine distance, Jaccard distance is considered more forgiving because it only considers the presence or absence of features. This is still not perfect for what we want as the safety model itself is build upon a lot of shaky structure mainly a vector compare to a 17 element vector which basically is -
which in itself is not a very extensive list and does not include terms like
killing
orblood
and is basically a CLIPVisionModel underneath.Here [Experimental might work might not] the Jaccard distance is calculated with some stupid estimations which just worked for the simple set of data I had.
and I think that cosine distance would be more influenced by the differences in vector lengths and term frequencies. so this patch has a small similarity measure change not for long term though!
For the wanderes
This is what Jaccard Distance logic looks like -
J(A, B) = 1 - (|A ∩ B|) / (|A ∪ B|)