Add a spam check #11217

Gargron · 2019-06-30T17:35:41Z

When we receive a remote status that mentions local users who are not following that account (and it's not a response to something involving the sender), we want to check if a similar message has already been sent before with different recipients. For this we store hashes of the 10 most recent messages by the sender, where the hash is based on the normalized body of the status.

To normalize the body, we remove all mention links, then remove all HTML tags. We normalize the unicode and convert everything to lowercase. Finally, we also strip out all whitespace and line breaks.

The hash is generated using the locality-sensitive Nilsimsa algorithm. Slightly different inputs result in very similar outputs. These can then be compared using Nilsima Compare Value which can be a number between -128 and 128, where 128 indicates identical strings. This paper indicates that a threshold of 54 is mostly free of false positives.

Of course, the shorter the messages are, the more inaccurate this algorithm becomes. For this reason, for messages shorter than 10 characters, we fall back on the MD5 algorithm.

The limitation of this approach is that the first person to receive a spam message will see it.

A positive spam check auto-silences the offender and generates an automated report to have a human in the loop. Unsilencing the account raises its trust level to 1 which prevents further spam checks on it.

ClearlyClaire · 2019-06-30T19:47:57Z

Interesting. I have a hard time figuring out how well it would perform, though. Therefore I'm very uneasy with automated silencing based on a single “near-duplicated” post.
~~The flagging part could report the actual status if that one is public/unlisted.~~ I'm not sure the flagging part should flag private toots or DMs…

EDIT: Also, this is for toots coming from remote instances. We might want something similar for local users?

app/lib/spam_check.rb

ClearlyClaire · 2019-06-30T20:22:26Z

Hm, this seems very likely to silence+report people who reply to non-followers with short messages.
This also sounds like it would be a problem for customer service-like accounts who would provide similar answers to similar questions.
And I'm very worried about accidentally leaking private/direct toots.

ClearlyClaire · 2019-06-30T20:35:34Z

I would also add a comment to the report, making it very clear it is an automated report and what were the criteria. Otherwise this is going to be really confusing.

ClearlyClaire · 2019-06-30T20:44:29Z

Also, automatically silencing the offending account can be unexpected, especially if you skip creating a report (e.g. when there is already an unresolved report).

Gargron · 2019-06-30T20:49:55Z

I would also add a comment to the report, making it very clear it is an automated report and what were the criteria.

But is this going to be a localized string? Based on DEFAULT_LOCALE?

ClearlyClaire · 2019-06-30T20:51:52Z

I would also add a comment to the report, making it very clear it is an automated report and what were the criteria.

But is this going to be a localized string? Based on DEFAULT_LOCALE?

Yes, probably

Gargron · 2019-06-30T20:55:17Z

And I'm very worried about accidentally leaking private/direct toots.

Please do be mindful of the wording. By "leaking" you mean exposing them to the mods/admins of the instance that receives them within the admin UI. If we don't auto-report such toots while auto-silencing, it will lead to false positives never being corrected. If we exempt private/direct toots from the spam check, the spammers will switch to them, making this whole exercise pointless.

ClearlyClaire · 2019-06-30T20:57:44Z

Yes, leaking to the admins. Since it's automated, we really risk disclosing it to the admins. We can still open an automated report without including the status if it is private (and make note of that in the report comments)

trinity-1686a · 2019-06-30T21:49:35Z

I don't do ruby so I don't understand most of the code, but you seem to use the Levenshtein distance. Quoting a paper about Nilsimsa :

To determine if two messages present the same textual content, their Nilsimsa digests are compared, checking the number of bits in the same position that have the same value.

The Levenshtein fail to acknowledge the same position condition. For instance a hash of "01010101" and one of "10101010" have nothing in common according to the required metric, but are at a Levenshtein distance of only 2 (one add one delete)

Gargron · 2019-06-30T22:09:56Z

Thank you for pointing it out! The Ruby example I was following seems to be really wrong on that point then. So then the appropriate comparison is to simply iterate over one digest and check each character for match in the other digest at the same position, and count up when there is a match?

trinity-1686a · 2019-06-30T22:16:51Z

The proper way would be to do that at the bit level, doing it at the character level means the actual threshold is somewhere in between 10-40 (assuming hex string), or 10-80 (assuming any character, 8b encoding), instead of the fixed 10 you chose. In a lower level language, I would xor raw hashes and count 1s, but that might not be very idiomatic in ruby

kaniini · 2019-07-01T00:31:15Z

fwiw in my own testing, I found Nilsimsa strategy to yield too many false positives.

Gargron · 2019-07-01T02:55:59Z

EDIT: Also, this is for toots coming from remote instances. We might want something similar for local users?

The biggest spam vector in the fediverse is "other instances", not the local server. Local moderation is already quite sufficient with human moderators and approval-mode registrations being available.

fwiw in my own testing, I found Nilsimsa strategy to yield too many false positives.

I have found TLSH to be unusable for our purposes due to the 256 characters minimum requirement, other algorithms do not have Ruby bindings or require new system-wide dependencies to be installed.

I believe to mitigate false positives by limiting the circumstances under which the spam check is carried out, i.e. only when local users are mentioned and they are not following the author and the author is not simply responding to something that mentions them.

Furthermore, human moderators are in the loop thanks to an automatic report, so false positives can be noticed and addressed. I am introducing account trust levels so that unsilencing the account once will ensure the same spam check will not hit it again.

ClearlyClaire · 2019-07-01T08:42:07Z

The biggest spam vector in the fediverse is "other instances", not the local server. Local moderation is already quite sufficient with human moderators and approval-mode registrations being available.

In nearly all cases of the recent spam waves, the offending accounts were hosted on Mastodon instances. Having the spam check run for local user would catch them with the same criteria but would catch them faster.

ClearlyClaire · 2019-07-01T08:47:44Z

app/lib/spam_check.rb

@@ -88,13 +88,17 @@ def auto_silence_account!
  end

  def auto_report_status!
-    ReportService.new.call(Account.representative, @account, status_ids: [@status.id]) unless @account.targeted_reports.unresolved.exists?
+    ReportService.new.call(Account.representative, @account, status_ids: @status.distributable? ? [@status.id] : nil, comment: I18n.t('spam_check.spam_detected_and_silenced')) unless @account.targeted_reports.unresolved.exists?


This will still silence without sending a report if a report is already opened for that account.

Let's say someone else has reported that account for spamming already. Not silencing just because of that doesn't make sense to me. If you ask me, the presence of any report would be enough to highlight the fact that the account is silenced, since that is displayed on the reports screen.

Mstrodl · 2019-07-01T12:03:45Z

@Gargron What happens if spam instances just use a new account to become a different "sender" each time? Won't that completely avoid your checks? Or by 10 most recent messages by the sender do you mean one instance when you say "sender"?

ClearlyClaire · 2019-07-01T13:02:41Z

@Mstrodl this is based on accounts and not instances. If a spammer keeps switching accounts, this will be ineffective. From what I understand, though, this is mainly aimed at stopping spam like the ongoing wave where accounts are manually(?) created across legitimate instances, so this approach can be effective.

ClearlyClaire · 2019-07-01T13:04:16Z

A couple remarks:

Whenever the spam checker kicks in, it should probably add something in the admin log
the expiration duration of 3 months seem pretty long

Gargron · 2019-07-01T13:53:00Z

Whenever the spam checker kicks in, it should probably add something in the admin log

The admin log is based on actor-action-target and in this case we don't have a good way of representing the system itself as actor (this comes back to having a "system" account for more general purposes)

the expiration duration of 3 months seem pretty long

What do you think would be more reasonable? I'm trying to avoid the case where accounts send spam with a super low frequency to avoid being detected.

nolanlawson · 2019-07-01T15:13:21Z

If I understand correctly, can't this spam filter be defeated by changing a single word in the spam message? Since that would change the checksum?

It seems like we could quickly wind up with the same kind of email spam TH4T L00KS L1K3 TH1S. Or randomly switches some words for synonyms.

Most of the time the spammers seem to include a link: go here to learn about our "movement," go here to buy our stuff, etc. So for that reason maybe

we remove all mention links

^ this should be reconsidered? Checking to see if a message keeps containing the same link to the same website may be a simpler and more effective filter. (Although since the spammer could just create multiple URL-shortened links, you'd have to follow the redirect chain to see the final URL. And filter out query params, etc.)

Gargron · 2019-07-01T15:25:47Z

If I understand correctly, can't this spam filter be defeated by changing a single word in the spam message? Since that would change the checksum?

Not quite, that's what Nilsimsa is for. The checksum would be quite similar.

we remove all mention links

^ this should be reconsidered?

Mention links are @nolan @Gargron etc. Not just any links. Mostly the spammer has already shown that he's willing to send messages without any links and they're just as annoying so focusing on links is not the right path.

bclindner · 2019-07-01T17:07:47Z

I think is a good start, and with the Nilsimsa hashing this will probably catch out copy-paste spam waves. However, I think it goes without saying that this will only do its job for a short period. I'm certain spammers will quickly pick up on this and move to harder-to-detect methods like image spam.

I think this PR could be improved by making it reactive to text in content warnings as well. That's probably the easiest escape hatch for current spam patterns.

Mstrodl · 2019-07-01T17:26:13Z

Keep in mind also that many spammers will also just swap around a few words / mess with capitalization automatically based upon a random chance which might just give it enough difference according to the algorithm to let them through. (E.g. hey vs hello, ur vs your, and others) This is already something that happens with DM bots on messaging services like Skype

bclindner · 2019-07-01T17:37:33Z

As an addendum, I am concerned about the inclusion of anti-spam measures as part of Mastodon's core development scope. Spam prevention measures are a rat race, and the public development of anti-spam to be directly integrated into the next version makes it very easy to see the next move of anti-spam measures and adjust accordingly, effectively making simpler measures like these a waste of development time. Additionally, I think more complex measures would be unreasonable to feasibly maintain as part of the core application stack. Even now, I'm certain spammers have already found a way around this and have switched up their tactics.

While I am woefully unequipped to speak on Mastodon architecturally, I think the best solution here might be a way to implement reactive 3rd-party anti-spam filtering "plug-ins" within Mastodon, if possible. This would spread out and open up development of anti-spam systems that integrate directly with an instance without having to straight-up fork the repo and host a custom version, and this would also allow a level of integration the REST API's stateless design can't allow.

Mstrodl · 2019-07-01T18:06:26Z

@bclindner I like the idea of being able to point mastodon at "integrations" sort of like Github's PR integrations (CircleCI and the like). This way they can be written in any language (not just Ruby) and could potentially mean a shared database could be used across instances if someone wanted

Gargron · 2019-07-01T18:16:03Z

I like the idea of being able to point mastodon at "integrations" sort of like Github's PR integrations (CircleCI and the like). This way they can be written in any language (not just Ruby) and could potentially mean a shared database could be used across instances if someone wanted

If it's not part of Mastodon's own code, it will not benefit admins with little/no technical background
Sharing moderation data with 3rd parties is a big privacy no-no

Spam prevention measures are a rat race, and the public development of anti-spam to be directly integrated into the next version makes it very easy to see the next move of anti-spam measures and adjust accordingly, effectively making simpler measures like these a waste of development time.

It is a rat race but that doesn't mean there's no point in raising the barrier of entry in the default installation.

…Levenshtein distance

…the sender

automated report, add trust level to accounts and make unsilencing raise the trust level to prevent repeated spam checks on that account

…reate a report

…moved from the spam check

Gargron · 2019-07-12T02:41:30Z

This PR has now been tested in production, and the spam check adds all matched statuses to the automatic report for diagnosis (as long as those are not private). The spam check does not run for local accounts because there doesn't seem to be an agreement over whether that is desired, but that can be changed in future PRs.

ClearlyClaire

LGTM. I'm still not 100% convinced about the reliability of such a filter (both in terms of false positives and false negatives), so I'd be more comfortable if there was an option to disable it. But if it ran without issues on m.s., it can be enabled by default.

Also, this seems pretty useless for small or single-user instances, checking for spam at the source instance seems to make more sense to me (the recent spam waves were done by creating a bunch of accounts on various “trustworthy” instances that most probably would run the spam detection code, and that would catch those accounts much faster and silence them for everyone). But as you said, that can be another PR.

* Add a spam check * Use Nilsimsa to generate locality-sensitive hashes and compare using Levenshtein distance * Add more tests * Add exemption when the message is a reply to something that mentions the sender * Use Nilsimsa Compare Value instead of Levenshtein distance * Use MD5 for messages shorter than 10 characters * Add message to automated report, do not add non-public statuses to automated report, add trust level to accounts and make unsilencing raise the trust level to prevent repeated spam checks on that account * Expire spam check data after 3 months * Add support for local statuses, reduce expiration to 1 week, always create a report * Add content warnings to the spam check and exempt empty statuses * Change Nilsimsa threshold to 95 and make sure removed statuses are removed from the spam check * Add all matched statuses into automatic report

Gargron force-pushed the feature-spam-check branch 4 times, most recently from c4b5340 to 320eb3a Compare June 30, 2019 19:36

Gargron marked this pull request as ready for review June 30, 2019 19:36

ClearlyClaire reviewed Jun 30, 2019

View reviewed changes

app/lib/spam_check.rb Outdated Show resolved Hide resolved

Gargron force-pushed the feature-spam-check branch from 6d70ce7 to 9dd3503 Compare July 1, 2019 02:38

ClearlyClaire reviewed Jul 1, 2019

View reviewed changes

Gargron added 11 commits July 12, 2019 03:04

Add a spam check

0346859

Use Nilsimsa to generate locality-sensitive hashes and compare using …

5baa6cc

…Levenshtein distance

Add more tests

cfffc4b

Add exemption when the message is a reply to something that mentions …

0469c14

…the sender

Use Nilsimsa Compare Value instead of Levenshtein distance

e24e7db

Use MD5 for messages shorter than 10 characters

0acd92a

Add message to automated report, do not add non-public statuses to

0efdf5c

automated report, add trust level to accounts and make unsilencing raise the trust level to prevent repeated spam checks on that account

Expire spam check data after 3 months

650e6e2

Add support for local statuses, reduce expiration to 1 week, always c…

a492b8b

…reate a report

Add content warnings to the spam check and exempt empty statuses

3a305aa

Change Nilsimsa threshold to 95 and make sure removed statuses are re…

f735bfe

…moved from the spam check

Gargron force-pushed the feature-spam-check branch from 096440c to 8e46e2d Compare July 12, 2019 02:04

Add all matched statuses into automatic report

182bb68

Gargron force-pushed the feature-spam-check branch from 8e46e2d to 182bb68 Compare July 12, 2019 02:33

Gargron requested a review from ClearlyClaire July 12, 2019 02:39

ClearlyClaire approved these changes Jul 13, 2019

View reviewed changes

Gargron merged commit 6ff67be into master Jul 13, 2019

Gargron deleted the feature-spam-check branch July 21, 2019 01:48

cormojs mentioned this pull request Jan 19, 2020

本家v3.0.1に追従 bluenapoleon/mastodon#37

Draft

6 tasks

lnanase mentioned this pull request Apr 23, 2020

SPAM対策 imas/mastodon#239

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a spam check #11217

Add a spam check #11217

Gargron commented Jun 30, 2019 •

edited

Loading

ClearlyClaire commented Jun 30, 2019 •

edited

Loading

ClearlyClaire commented Jun 30, 2019

ClearlyClaire commented Jun 30, 2019

ClearlyClaire commented Jun 30, 2019 •

edited

Loading

Gargron commented Jun 30, 2019

ClearlyClaire commented Jun 30, 2019

Gargron commented Jun 30, 2019

ClearlyClaire commented Jun 30, 2019

trinity-1686a commented Jun 30, 2019 •

edited

Loading

Gargron commented Jun 30, 2019

trinity-1686a commented Jun 30, 2019

kaniini commented Jul 1, 2019

Gargron commented Jul 1, 2019

ClearlyClaire commented Jul 1, 2019

ClearlyClaire Jul 1, 2019

Gargron Jul 1, 2019

Mstrodl commented Jul 1, 2019

ClearlyClaire commented Jul 1, 2019

ClearlyClaire commented Jul 1, 2019

Gargron commented Jul 1, 2019

nolanlawson commented Jul 1, 2019

Gargron commented Jul 1, 2019

bclindner commented Jul 1, 2019

Mstrodl commented Jul 1, 2019 •

edited

Loading

bclindner commented Jul 1, 2019 •

edited

Loading

Mstrodl commented Jul 1, 2019

Gargron commented Jul 1, 2019

Gargron commented Jul 12, 2019

ClearlyClaire left a comment

Add a spam check #11217

Add a spam check #11217

Conversation

Gargron commented Jun 30, 2019 • edited Loading

ClearlyClaire commented Jun 30, 2019 • edited Loading

ClearlyClaire commented Jun 30, 2019

ClearlyClaire commented Jun 30, 2019

ClearlyClaire commented Jun 30, 2019 • edited Loading

Gargron commented Jun 30, 2019

ClearlyClaire commented Jun 30, 2019

Gargron commented Jun 30, 2019

ClearlyClaire commented Jun 30, 2019

trinity-1686a commented Jun 30, 2019 • edited Loading

Gargron commented Jun 30, 2019

trinity-1686a commented Jun 30, 2019

kaniini commented Jul 1, 2019

Gargron commented Jul 1, 2019

ClearlyClaire commented Jul 1, 2019

ClearlyClaire Jul 1, 2019

Choose a reason for hiding this comment

Gargron Jul 1, 2019

Choose a reason for hiding this comment

Mstrodl commented Jul 1, 2019

ClearlyClaire commented Jul 1, 2019

ClearlyClaire commented Jul 1, 2019

Gargron commented Jul 1, 2019

nolanlawson commented Jul 1, 2019

Gargron commented Jul 1, 2019

bclindner commented Jul 1, 2019

Mstrodl commented Jul 1, 2019 • edited Loading

bclindner commented Jul 1, 2019 • edited Loading

Mstrodl commented Jul 1, 2019

Gargron commented Jul 1, 2019

Gargron commented Jul 12, 2019

ClearlyClaire left a comment

Choose a reason for hiding this comment

Gargron commented Jun 30, 2019 •

edited

Loading

ClearlyClaire commented Jun 30, 2019 •

edited

Loading

ClearlyClaire commented Jun 30, 2019 •

edited

Loading

trinity-1686a commented Jun 30, 2019 •

edited

Loading

Mstrodl commented Jul 1, 2019 •

edited

Loading

bclindner commented Jul 1, 2019 •

edited

Loading