Skip to content

This is the repository for the course "Evaluating the Quality of Fake News Detection"

Notifications You must be signed in to change notification settings

untruenews/ss2021

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Evaluation the quality of Fake News Detection

The spreading of disinformation throughout the web has become a critical problem for a democratic society. The dissemination of fake news has become a profitable business and a common practice among politicians and content producers. A recent study entitled 'Regulating disinformation with artificial intelligence', examines the trade-offs involved in using automated technology to limit the spread of disinformation online. Although AI and Natural Language Generation have evolved so much in the last decade, there are still few shortcomings that must be better understood for a stronger solution. The students will dive deeper into Natural Language Processing; therefore a strong knowledge of Python and AI is necessary.

Goal

Understand more explore the vulnerabilities and limitations of automatic fact-checking detection for supporting development of regulations and better technology.

Contact

https://www.qu.tu-berlin.de/menue/team/senior_researchers/vinicius_woloszyn/

General Description

https://github.com/untruenews/ss2021/blob/main/slides/course.pdf

GROUP B start at 14HS, and GROUP A at 15HS,

Roadmap

  • 1# 27/04/2021 - Introduction and definition of groups
  • 2# 04/05/2021 - Flair
  • 3# 11/05/2021 - Flair
  • 4# 18/05/2021 - Flair
  • 5# 25/05/2021 - Experiments with textattack: (GROUP A) ATTACKS, (GROUP B) DATA Augmentation

GROUP A & B: present the evolution of the work

  • 6# 01/06/2021 - Experiments with textattack: (GROUP A) ATTACKS, (GROUP B) DATA Augmentation

GROUP A: Perform attacks to the flair models using https://github.com/QData/TextAttack . Compare the resilience of "RoBERTa","BERTweet", "FlairEmbeddings" to Synonym Substitution, Character Substitution, Word Insertion or Removal, General Paraphrase attacks.

GROUP B: train a german detector for fake news: 1) Extract data in different languages from the raw dataset (e.g., portuguese, italian). 2) Experimetn with google tranlate https://pypi.org/project/googletrans/ to translate the data to german. 3) Use the translated data to train a pre-trained language model in german. 4) train a multilanguage model (e.g., tf-xlm-roberta-base) 4) compare and Present the results.

  • 7# 08/06/2021 - Presentation:

GROUP A will present: https://www.europarl.europa.eu/RegData/etudes/STUD/2019/624279/EPRS_STU(2019)624279_EN.pdf

GROUP B will present: https://digital-strategy.ec.europa.eu/en/library/proposal-regulation-laying-down-harmonised-rules-artificial-intelligence-artificial-intelligence

  • 8# 15/06/2021 - experiments / writing the paper
  • 9# 22/06/2021 - experiments / writing the paper
  • 10# 29/06/2021 - experiments / writing the paper
  • 11# 06/07/2021 - experiments / writing the paper
  • 12# 13/07/2021 - experiments / writing the paper
  • 13# 17/07/2021 - submiting the paper

Main References

  1. Dive into Deep Learning, https://d2l.ai/d2l-en.pdf
  2. Speech and Language Processing, https://web.stanford.edu/~jurafsky/slp3/ed3book_dec302020.pdf

Other References

  1. https://github.com/QData/TextAttack
  2. https://github.com/flairNLP/flair
  3. https://huggingface.co/
  4. Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
  5. Woloszyn, Vinicius, et al. "Untrue. News: A New Search Engine For Fake Stories." arXiv preprint arXiv:2002.06585 (2020).
  6. Zhou, Zhixuan, et al. "Fake news detection via NLP is vulnerable to adversarial attacks." arXiv preprint arXiv:1901.09657 (2019).
  7. Sinha, Abhishek, et al. "Negative Data Augmentation." arXiv preprint arXiv:2102.05113 (2021).
  8. Morris, John, et al. "TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP." Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2020
  9. https://github.com/afshinea/stanford-cs-229-machine-learning/
  10. http://cs229.stanford.edu/syllabus-spring2021.html

About

This is the repository for the course "Evaluating the Quality of Fake News Detection"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published