Evaluation the quality of Fake News Detection

The spreading of disinformation throughout the web has become a critical problem for a democratic society. The dissemination of fake news has become a profitable business and a common practice among politicians and content producers. A recent study entitled 'Regulating disinformation with artificial intelligence', examines the trade-offs involved in using automated technology to limit the spread of disinformation online. Although AI and Natural Language Generation have evolved so much in the last decade, there are still few shortcomings that must be better understood for a stronger solution. The students will dive deeper into Natural Language Processing; therefore a strong knowledge of Python and AI is necessary.

Goal

Understand more explore the vulnerabilities and limitations of automatic fact-checking detection for supporting development of regulations and better technology.

Contact

https://www.qu.tu-berlin.de/menue/team/senior_researchers/vinicius_woloszyn/

General Description

https://github.com/untruenews/ss2021/blob/main/slides/course.pdf

GROUP B start at 14HS, and GROUP A at 15HS,

Roadmap

1# 27/04/2021 - Introduction and definition of groups
2# 04/05/2021 - Flair
3# 11/05/2021 - Flair
4# 18/05/2021 - Flair
5# 25/05/2021 - Experiments with textattack: (GROUP A) ATTACKS, (GROUP B) DATA Augmentation

GROUP A & B: present the evolution of the work

6# 01/06/2021 - Experiments with textattack: (GROUP A) ATTACKS, (GROUP B) DATA Augmentation

GROUP A: Perform attacks to the flair models using https://github.com/QData/TextAttack . Compare the resilience of "RoBERTa","BERTweet", "FlairEmbeddings" to Synonym Substitution, Character Substitution, Word Insertion or Removal, General Paraphrase attacks.

GROUP B: train a german detector for fake news: 1) Extract data in different languages from the raw dataset (e.g., portuguese, italian). 2) Experimetn with google tranlate https://pypi.org/project/googletrans/ to translate the data to german. 3) Use the translated data to train a pre-trained language model in german. 4) train a multilanguage model (e.g., tf-xlm-roberta-base) 4) compare and Present the results.

7# 08/06/2021 - Presentation:

GROUP A will present: https://www.europarl.europa.eu/RegData/etudes/STUD/2019/624279/EPRS_STU(2019)624279_EN.pdf

GROUP B will present: https://digital-strategy.ec.europa.eu/en/library/proposal-regulation-laying-down-harmonised-rules-artificial-intelligence-artificial-intelligence

8# 15/06/2021 - experiments / writing the paper
9# 22/06/2021 - experiments / writing the paper
10# 29/06/2021 - experiments / writing the paper
11# 06/07/2021 - experiments / writing the paper
12# 13/07/2021 - experiments / writing the paper
13# 17/07/2021 - submiting the paper

Main References

Dive into Deep Learning, https://d2l.ai/d2l-en.pdf
Speech and Language Processing, https://web.stanford.edu/~jurafsky/slp3/ed3book_dec302020.pdf

Other References

https://github.com/QData/TextAttack
https://github.com/flairNLP/flair
https://huggingface.co/
Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
Woloszyn, Vinicius, et al. "Untrue. News: A New Search Engine For Fake Stories." arXiv preprint arXiv:2002.06585 (2020).
Zhou, Zhixuan, et al. "Fake news detection via NLP is vulnerable to adversarial attacks." arXiv preprint arXiv:1901.09657 (2019).
Sinha, Abhishek, et al. "Negative Data Augmentation." arXiv preprint arXiv:2102.05113 (2021).
Morris, John, et al. "TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP." Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2020
https://github.com/afshinea/stanford-cs-229-machine-learning/
http://cs229.stanford.edu/syllabus-spring2021.html

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
code		code
data		data
slides		slides
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

slides

slides

README.md

README.md

Repository files navigation

Evaluation the quality of Fake News Detection

Goal

Contact

General Description

Roadmap

Main References

Other References

About

Releases

Packages

Languages

untruenews/ss2021

Folders and files

Latest commit

History

Repository files navigation

Evaluation the quality of Fake News Detection

Goal

Contact

General Description

Roadmap

Main References

Other References

About

Topics

Resources

Stars

Watchers

Forks

Languages