Raw data (Thomson Reuters news articles and comments 2012)

Published: 26 January 2022| Version 1 | DOI: 10.17632/d2rppff696.1
Contributor:
Valentina Spiridonova

Description

Our research focuses on Internet news recipients’ comments as a continuum that has a communicative potential of its own. We consider these comments to be dynamic interactive integrals of news discourse. We apply a complex three-stage method to study the commenting continuum. At the first stage we utilize corpus-based technologies to analyze discourse structures and obtain a raw scheme of the commenting discourse. At the second stage we present qualita-tive analysis performed with a functional integral analysis which assists in deeper understanding of the discourse structure and its pragmatic functioning. At the third stage we deal with the comparative analysis. We aim at identifying the dynamic changes of commenting discourse continua which they undergo through time and under the influence of technological development. This dataset includes raw corpus material. It consists of 116 Thomson Reuters world news stories about Iran neuclear program and 1018 comments to them. The minimum comment number to an news article is equal to one, the maximum - is equal to 144. We worked with each news story and its set of comments separately using AntConc software to figure out the list of keywords first and to see the keyword mapping (trace their place in the comments). Then we analysed which part of the article has the strongest impetus. After that we analysed each comment continuum (a set of comments on a news article) on a deeper pragmatic and cognitive level and determined the elements that serve as cognitve-pragmatic foci of the comments. The analysis of comments allows us to follow the main lines of communicative interaction scheme and retrace the transformation of data presentation in a constant process of its re-contextualization. The materials also include the program which was developed by Stas Shilov (2012) on my request with the technical task to cout the number of tokens in each comment and to have the maximum, minimum and average values automatically counted both per sentence and per comment. A token in this research is equal to a group of letters (normally, a word/words) and symbols separated by a left and a right space from neighbor tokens. Mostly tokens here are equal to words, though with the tendency to include punctuation signs. The program can be translated into English on request, though its interface is intuative. The third file contains a graph (Figure 1) wich shows the distribution of comment length within the corpus given, though with little respect to the number of comments to each news article taken separately.

Files

Steps to reproduce

We have been studying various news commenting platforms, including Thomson Reuters news agency website, Facebook and Instagram since 2012. The main part of our theoretical conclusions is based on the corpus built up on news articles and read-ers’ comments from Thompson Reuters news agency website. We chose the news articles, not the opinion articles, to have the least effect from the article – starting factual point of mass-commenting discourse – on the comments. The analysis combines three stages: first, automated corpus analysis and formal parameters (such as comment length) calculations – quantitative analysis; second, functional integrative qualitative analysis; and third, comparative analysis of dia-chronic changes evolving in SNS (Facebook and Instagram in particular). We believe that the contribution of functional integrative analysis, combined with elements of corpus technologies, applied to news-related commenting discourse enables to provide better understanding of news opinionating communication as a complex dynamic system [Author 2, Author 1]. At the first stage of our work we applied computer programs designed to work with corpora from an unusual angle. We used AntConc concordance and text analysis toolkit and Sketch engine not to analyze the language structures but to reveal commenting discourse schemes. We managed to get a rough discourse scheme for each news article. These schemes consist of discourse lines based predominantly on such relevance parameter as keyness and syntactic parameter as repetition. The Integrative component of the second stage implies systemic functional text analysis, pragmatic meaning and cognitive interpretation. The research is carried out in three respects: first, a continuum of comments as a dynamic information block is identified; second, framing analysis is applied to show how pragmatic elements give impetus to context promotion; third, the shifts in intentional discursive contexts are explained.

Institutions

Sankt-Peterburgskij Gornyj Universitet, Sankt-Peterburgskij gosudarstvennyj universitet

Categories

Data Analysis

Licence