A Python script to generate statistics for new account creations to determine if the Users/IPs are within stopforumspam.com's spam lists.
Developed to:
- Analyze the potential efficacy of stopforumspam.com's spam lists related to suspicious account creations upon various Wikimedia projects.
- Serve as a basic proof-of-concept for security-monitoring tooling for certain suspicious events as the occur on various Wikimedia projects.
n.b. As it currently exists, this should NOT be considered production code.
python 3.7.3
argparse
csv
datetime
dotenv
hashlib
json
lxml
os
re
requests
sys
time
urllib.parse
zlib
git clone "https://gerrit.wikimedia.org/r/wikimedia/security/spamaccountstats"
- Configure
.envto your liking - example values provided. chmod +x SpamAccountStats.py && ./SpamAccountStats.py {args...}- SpamAccountStats.py has a few arguments:
-h= displays help/arguments and exits.{project}= in the form of{lang code}.{project type}, e.g.en.wikipedia.-d,--date= a date range in a few different supported formats:-d {int}h= e.g.1h, range of current utc to 1 hour ago.-d {int}d= e.g.30d, range of current utc to 30 days ago.-d YYYY-MM-DD= range of current utc toYYYY-MM-DDdays ago.-d YYYY-MM-DD-yyyy-mm-dd= date range (utc) fromYYYY-MM-DDtoyyyy-mm-dd.-d YYYY-MM-DDTHH:MM:SSZ= range of current utc toYYYY-MM-DDTHH:MM:SSZdays ago.-d YYYY-MM-DDTHH:MM:SSZ-yyyy-mm-ddThh:mm:ssZ= date range (utc) fromYYYY-MM-DDTHH:MM:SSZtoyyyy-mm-ddThh:mm:ssZ(what mediawiki API tends to use).
-r,--raw= raw CSV report output, no informational header.--sfsapi= also check the StopForumSpam API via url defined as theSFS_API_URLenvironment variable.
- Actually use logstash structured data to search for IPs instead of gross
regexps of json string representations within
search_user_within_logstash() - Refactor to more proper python app.
- Support beta.wmflabs.org sites (might not be possible via logstash...)
- Scott Bassett [sbassett@wikimedia.org]
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.