Computer Speech & Language

4 results

Data for: Predicting emotion reactions to news articles in social networks
Omar J Gambino, Hiram Calvo
Mendeley Data | Published 24 May 2019
Corpus of Spanish news articles with annotated emotional reaction distribution from tweet responses. 288 news articles, published from 01-01-20015 to 01-01-2017, were collected from three Mexican newspapers (El Universal, La Jornada and Excelsior). The annotation task was developed by four different annotators during a three month period and they tagged the emotions expressed in tweet responses to each news article. Counting of the emotions expressed in tweet responses was used to determine the distribution of these emotions in the news articles.
Export:APA BibTeX DataCite RIS
Data for: Learning English-Chinese Bilingual Word Representations from Sentence-Aligned Parallel Corpus
Hsin-Hsi Chen, Hen-Hsn Huang, An-Zi Yen
Mendeley Data | Published 12 February 2019
This file includes three datasets for our tasks bilingual dictionary induction, cross-lingual analogy reasoning, and cross-lingual word semantic relatedness. We release them for the NLP community to explore the related issues.
Export:APA BibTeX DataCite RIS
Data for: Exploiting social and local contexts propagation for inducing Chinese microblog-specific sentiment lexicons
suge wang, deyu li, Chuanjun Zhao
Mendeley Data | Published 30 November 2018
This data set includes UCI data set (microblogPCU), Weibo data set (my_weibo_data), three general sentiment lexicons. The results of our framework include UCI and Weibo sentiment nouns, UCI sentiment features and Weibo sentiment features.
Export:APA BibTeX DataCite RIS
Data for: Language Models, Surprisal and Fantasy in Slavic Intercomprehension
Klara Jagrova, Andrea Fischer, Tania Avgustinova, Irina Stenger
Mendeley Data | Published 29 August 2018
The file webresults_cloze_publication.xlsx contains two types of data: a) transcripts of think-aloud protocols and b) respones collected in a web-based intercomprehension experiment for the same stimuli respectively. Part a) Three Polish stimuli sentences were presented to pairs of Czech native speakers in an experimental setting where both participants saw the stimulus sentence on their computer screens. Placed in different rooms, they were asked to communicate over skype and work together in order to come up with a good Czech translation of the sentence. Hence, the experiment output are audio recordings of the two participants trying to decode the stimuli and the written translations they have entered during the experiment. The transcripts are in sheet 1, 3, and 5 of the .xlsx file. Part b) Czech readers (n=23) were asked to translate certain words or phrases within Polish sentences (those that turned out problematic in part a) into Czech in a web-based translation experiment in cloze task design over the website http://intercomprehension.coli.uni-saarland.de/en/. The responses of part b) and corresponding sociodemographic data are in sheet 2, 4, and 6 of the .xlsx file. The responses were checked manually for correctness. Responses with typos were counted as correct, for the main interest was to find out if respondents had understood the stimuli. The column "Total Time Spent (ms)" is the time respondents have spent on entering their response into the gaps in the cloze test until pressing enter. The file surprisal_scores_CS_LM.txt contains surprisal scores obtained from a statistical trigram language model with Kneser-Ney smoothing trained on a Czech corpus (Czech part of InterCorp merged with the Czech part of the Russian National Corpus, size: 175,190 words).
Export:APA BibTeX DataCite RIS