DataCite Commons: Prop-HiT

Prop-HiT Dataset Version 1.0 Version 1.0: November 18, 2023 About Prop-HiT is a Propaganda Dataset for Hindi Text. The Prop-HiT dataset includes 790 articles from 32 Hindi news websites. The dataset is manually annotated using the LightTag annotation tool considering 18 propaganda techniques as follows: 1. Appeal to authority 2. Appeal to fear/prejudice 3. Bandwagon 4. Black-and-white fallacy 5. Causal oversimplification 6. Doubt 7. Exaggeration/minimization 8. Flag-waving 9. Loaded Language 10. Name Calling or Labelling 11. Obfuscation, intentional vagueness, confusion 12. Red herring 13. Reductio ad Hitlerum 14. Repetition 15. Slogans 16. Straw man 17. Thought-terminating cliche 18. Whataboutism Data format The dataset consists of one plain text and one tab-separated file per article. The text file contains the contents of the article. The tsv file contains one propaganda technique per line with the following information: article_id, technique, begin_offset, and end_offset The naming convention for the files is as follows: - article[unique_id].txt for the plain-text file - article[unique_id].labels.tsv for the annotations files There are two subfolders as train with 550 articles and test with 240 articles. Credit Please cite the dataset as: [Prop-HiT] Deptii Chaudhari, Dr. Ambika Pawar. 2023. Prop-HiT: Propaganda Dataset for Hindi Text. https://doi.org/10.5281/zenodo.10155424 Authors Deptii Chaudhari; Dr. Ambika Pawar

Version Version 1.0 of Content published 2023 in Zenodo

DatasetHindi

https://doi.org/10.5281/zenodo.10155423

Prop-HiT

Cite as

Download Reports

Prop-HiT

Cite as

Download Reports

Prop-HiT

Cite as

Download Reports

Share

Prop-HiT

Cite as

Download Reports

Share