Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
spam: adds spam prediction when a record is published
- Loading branch information
Showing
10 changed files
with
357 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# -*- coding: utf-8 -*- | ||
# | ||
# This file is part of Zenodo. | ||
# Copyright (C) 2020 CERN. | ||
# | ||
# Zenodo is free software; you can redistribute it | ||
# and/or modify it under the terms of the GNU General Public License as | ||
# published by the Free Software Foundation; either version 2 of the | ||
# License, or (at your option) any later version. | ||
# | ||
# Zenodo is distributed in the hope that it will be | ||
# useful, but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | ||
# General Public License for more details. | ||
# | ||
# You should have received a copy of the GNU General Public License | ||
# along with Zenodo; if not, write to the | ||
# Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
# MA 02111-1307, USA. | ||
# | ||
# In applying this license, CERN does not | ||
# waive the privileges and immunities granted to it by virtue of its status | ||
# as an Intergovernmental Organization or submit itself to any jurisdiction. | ||
|
||
from flask import abort, current_app, flash | ||
from flask_login import logout_user | ||
from invenio_accounts.models import User | ||
from invenio_accounts.sessions import delete_user_sessions | ||
from invenio_db import db | ||
|
||
from zenodo.modules.spam.utils import send_spam_admin_email, \ | ||
send_spam_user_email | ||
|
||
|
||
def default_spam_handling(deposit): | ||
"""Default actions to counter spam detected record.""" | ||
user = User.query.get(deposit['_deposit']['owners'][0]) | ||
user.active = False | ||
delete_user_sessions(user) | ||
logout_user() | ||
db.session.add(user) | ||
db.session.commit() | ||
send_spam_user_email(user.email) | ||
if current_app.config['ZENODO_SPAM_EMAIL_ADMINS']: | ||
send_spam_admin_email(deposit, user) | ||
flash( | ||
('Our spam protection system has classified your upload as a ' | ||
'potential spam attempt. As a preventive measure and due to ' | ||
'significant increase in spam, we have therefore deactivated your ' | ||
'user account and logged you out of Zenodo. Your upload has not been ' | ||
'published. If you think this is a mistake, please contact our ' | ||
'support.'), | ||
category='warning' | ||
) | ||
abort( | ||
400, | ||
('Our spam protection system has classified your upload as a ' | ||
'potential spam attempt. As a preventive measure and due to ' | ||
'significant increase in spam, we have therefore deactivated your ' | ||
'user account and logged you out of Zenodo. Your upload has not been ' | ||
'published. If you think this is a mistake, please contact our ' | ||
'support.'), | ||
) | ||
|
||
|
||
# Function handling metadata detected as spam when publishing | ||
ZENODO_SPAM_HANDLING_ACTIONS = default_spam_handling | ||
|
||
# Spam model for record predictions | ||
ZENODO_SPAM_MODEL_LOCATION = None | ||
|
||
# Float number defining the probability over which a record is considered spam | ||
ZENODO_SPAM_THRESHOLD = 0.5 | ||
|
||
# Should send email to Admins for automatically blocked users | ||
ZENODO_SPAM_EMAIL_ADMINS = True |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# -*- coding: utf-8 -*- | ||
# | ||
# This file is part of Zenodo. | ||
# Copyright (C) 2020 CERN. | ||
# | ||
# Zenodo is free software; you can redistribute it | ||
# and/or modify it under the terms of the GNU General Public License as | ||
# published by the Free Software Foundation; either version 2 of the | ||
# License, or (at your option) any later version. | ||
# | ||
# Zenodo is distributed in the hope that it will be | ||
# useful, but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | ||
# General Public License for more details. | ||
# | ||
# You should have received a copy of the GNU General Public License | ||
# along with Zenodo; if not, write to the | ||
# Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
# MA 02111-1307, USA. | ||
# | ||
# In applying this license, CERN does not | ||
# waive the privileges and immunities granted to it by virtue of its status | ||
# as an Intergovernmental Organization or submit itself to any jurisdiction. | ||
|
||
"""Support and contact module for Zenodo.""" | ||
|
||
from __future__ import absolute_import, print_function | ||
|
||
from threading import Lock | ||
|
||
import joblib | ||
from celery.signals import celeryd_init | ||
from flask import current_app | ||
from werkzeug.utils import cached_property | ||
|
||
from . import config, current_spam | ||
|
||
lock = Lock() | ||
|
||
|
||
class ZenodoSpam(object): | ||
"""Zenodo support form.""" | ||
|
||
@cached_property | ||
def model(self): | ||
"""Spam detection model.""" | ||
with lock: | ||
self._is_cache_loading = True | ||
if not current_app.config.get('ZENODO_SPAM_MODEL_LOCATION'): | ||
return None | ||
return joblib.load( | ||
current_app.config['ZENODO_SPAM_MODEL_LOCATION']) | ||
|
||
@property | ||
def is_cache_loading(self): | ||
"""Flag in case another thread is loading already the model.""" | ||
return getattr(self, '_is_cache_loading', False) | ||
|
||
def __init__(self, app=None): | ||
"""Extension initialization.""" | ||
if app: | ||
self.init_app(app) | ||
|
||
def init_app(self, app): | ||
"""Flask application initialization.""" | ||
self.app = app | ||
self.init_config(app) | ||
app.extensions['zenodo-spam'] = self | ||
|
||
@staticmethod | ||
def init_config(app): | ||
"""Initialize configuration.""" | ||
for k in dir(config): | ||
if k.startswith('ZENODO_SPAM_'): | ||
app.config.setdefault(k, getattr(config, k)) | ||
|
||
|
||
@celeryd_init.connect | ||
def warm_up_cache(instance, **kwargs): | ||
"""Preload the spam model in the celery application.""" | ||
with instance.app.flask_app.app_context(): | ||
current_spam.model |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# -*- coding: utf-8 -*- | ||
# | ||
# This file is part of Zenodo. | ||
# Copyright (C) 2020 CERN. | ||
# | ||
# Zenodo is free software; you can redistribute it | ||
# and/or modify it under the terms of the GNU General Public License as | ||
# published by the Free Software Foundation; either version 2 of the | ||
# License, or (at your option) any later version. | ||
# | ||
# Zenodo is distributed in the hope that it will be | ||
# useful, but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | ||
# General Public License for more details. | ||
# | ||
# You should have received a copy of the GNU General Public License | ||
# along with Zenodo; if not, write to the | ||
# Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
# MA 02111-1307, USA. | ||
# | ||
# In applying this license, CERN does not | ||
# waive the privileges and immunities granted to it by virtue of its status | ||
# as an Intergovernmental Organization or submit itself to any jurisdiction. | ||
|
||
"""Forms for spam deletion module.""" | ||
|
||
from __future__ import absolute_import, print_function | ||
|
||
from celery import shared_task | ||
from flask import current_app | ||
from invenio_records.models import RecordMetadata | ||
|
||
from zenodo.modules.spam import current_spam | ||
|
||
|
||
@shared_task(ignore_result=False) | ||
def check_metadata_for_spam(depid_value, dep_id): | ||
"""Checks metadata of the provided deposit for spam content.""" | ||
if not current_app.config.get('ZENODO_SPAM_MODEL_LOCATION'): | ||
return 0 | ||
deposit = RecordMetadata.query.get(dep_id) | ||
spam_proba = current_spam.model.predict_proba( | ||
[deposit.json['title'] + ' ' + deposit.json['description']])[0][1] | ||
return spam_proba |
25 changes: 25 additions & 0 deletions
25
zenodo/modules/spam/templates/zenodo_spam/email/spam_admin_email.tpl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
{# | ||
# This file is part of Zenodo. | ||
# Copyright (C) 2020 CERN. | ||
# | ||
# Zenodo is free software; you can redistribute it | ||
# and/or modify it under the terms of the GNU General Public License as | ||
# published by the Free Software Foundation; either version 2 of the | ||
# License, or (at your option) any later version. | ||
# | ||
# Zenodo is distributed in the hope that it will be | ||
# useful, but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | ||
# General Public License for more details. | ||
# | ||
# You should have received a copy of the GNU General Public License | ||
# along with Zenodo; if not, write to the | ||
# Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
# MA 02111-1307, USA. | ||
# | ||
# In applying this license, CERN does not | ||
# waive the privileges and immunities granted to it by virtue of its status | ||
# as an Intergovernmental Organization or submit itself to any jurisdiction. | ||
-#} | ||
|
||
The deposit https://zenodo.org/deposit/{{ deposit['recid'] }} from the User https://zenodo.org/spam/{{ user.id }}/delete/ has been marked as spam. |
27 changes: 27 additions & 0 deletions
27
zenodo/modules/spam/templates/zenodo_spam/email/spam_user_email.tpl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
{# | ||
# This file is part of Zenodo. | ||
# Copyright (C) 2020 CERN. | ||
# | ||
# Zenodo is free software; you can redistribute it | ||
# and/or modify it under the terms of the GNU General Public License as | ||
# published by the Free Software Foundation; either version 2 of the | ||
# License, or (at your option) any later version. | ||
# | ||
# Zenodo is distributed in the hope that it will be | ||
# useful, but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | ||
# General Public License for more details. | ||
# | ||
# You should have received a copy of the GNU General Public License | ||
# along with Zenodo; if not, write to the | ||
# Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
# MA 02111-1307, USA. | ||
# | ||
# In applying this license, CERN does not | ||
# waive the privileges and immunities granted to it by virtue of its status | ||
# as an Intergovernmental Organization or submit itself to any jurisdiction. | ||
-#} | ||
|
||
Our spam protection system has classified your upload as a potential spam attempt. | ||
As preventive measure, we have therefore deactivated your user account. | ||
If you think this is wrong, please contact us on our support line. |
Oops, something went wrong.