"GermanFakeNC" - German Fake News Corpus including manually fact-checked false statements

Published: 23 August 2019| Version 3 | DOI: 10.17632/p4c49m3pvr.3
Contributor:
Inna Vogel

Description

"GermanFakeNC" is a German Fake News Corpus including 490 texts which were retrieved from German alternative online media sources. Every fake statement in the text was verifi ed claim-by-claim by authoritative sources (e.g. from local police authorities, scientific studies, the police press office, etc.). The time interval for most of the news is established from December 2015 to March 2018. Description of the .json file: Date: publication date of the article URL: URL of the website A maximum of three false statements are provided: False_Statement_[1-3]_Location: Location of the verified false statement - Title, Teaser or Text False_Statement_[1-3]_Index: The index numbers refer to the token (!) position / number. We tokenized the text with "spaCy" (the free open-source library for Python). Example: Title of Text: "The quick brown fox jumped over the lazy dog. The fox broke both his legs while jumping." False_Statement: The fox broke both his legs while jumping. "False_Statement_1_Location": "Title", "False_Statement_1_Index": "11-19" Ratio_of_Fake_Statements: Percentage of fake found in the article 1 = Text is based on true information. Up to 25% of the information in the text is false 2 = Up to 50% of the information in the text is false. The other statements in the article are factually accurate 3 = Up to 75% of the content non-factual and incorrect 4 = Pure fabrication with up to 100% false information in text 9 = Unclear, unverifiable Overall_Rating of the disinformation in text: range [0.1:1.0]. 0.1 no disinformation in text 0.2 0.3 0.4 0.5 neutral / ambivalent 0.6 0.7 0.8 0.9 1.0 strong disinformative text ====================================================================== The original sources retain the copyright of the data. You are allowed to use this dataset for research purposes only. For more question about the dataset, please contact: Inna Vogel, inna.vogel@sit.fraunhofer.de v1.01/09/2019

Files

Steps to reproduce

Date: publication date of the article URL: URL of the website A maximum of three false statements are provided: False_Statement_[1-3]_Location: Location of the verified false statement - Title, Teaser or Text False_Statement_[1-3]_Index: The index numbers refer to the token (!) position / number. We tokenized the text with "spaCy" (the free open-source library for Python). Example: Title of Text: "The quick brown fox jumped over the lazy dog. The fox broke both his legs while jumping." False_Statement: The fox broke both his legs while jumping. "False_Statement_1_Location": "Title", "False_Statement_1_Index": "11-19" Ratio_of_Fake_Statements: Percentage of fake found in the article 1 = Text is based on true information. Up to 25% of the information in the text is false 2 = Up to 50% of the information in the text is false. The other statements in the article are factually accurate 3 = Up to 75% of the content non-factual and incorrect 4 = Pure fabrication with up to 100% false information in text 9 = Unclear, unverifiable Overall_Rating of the disinformation in text: range [0.1:1.0]. 0.1 no disinformation in text 0.2 0.3 0.4 0.5 neutral / ambivalent 0.6 0.7 0.8 0.9 1.0 strong disinformative text

Categories

German Language, Natural Language Processing, Language

Licence