Ratings of the emotional valence and arousal of collocations and their constituent words: How can they be useful in L2 vocabulary research?
Description of this data
Supplementary Materials: Excel sheet showing focal vocabulary items and data
These materials consist of conventional multiword expressions (MWEs) and various associated measures. All these MWEs are composed of two words. The most common structures are NN, AdjN, and V_.
There are essentially two sets of MWEs, although the MWEs within them are mostly the same. The first set figured in the study relating to emotional valence; the second set figured in the study relating to arousal. The whole MWE ratings of valence and arousal were crowdsourced using Amazon Mechanical Turk (AMT) (https://www.mturk.com/), The corresponding ratings of the constituent words stem from a list compiled by Warriner, Kuperman, and Brysbaert (2013). The list itself is available at: http://crr.ugent.be/archives/1003. See the main article for further details.
Associated with each set of MWEs is a smaller set of MWEs having both AMT ratings and ratings from Warriner et al. These matched, ‘overlapping’ ratings were used to assess the reliability of the new AMT ratings. In the main lists of MWEs these overlappers are given in red italics. Additionally, there are lists of regression residuals. The closer a residual is to zero, the more accurately MWE valence/arousal was predicted by the valence/arousal ratings of the mean of the constituent word ratings.
Important abbreviations used in the spreadsheet are: AMT = Amazon Mechanical Turk; Cword = Constituent word; Geo.mean = Geometrical mean; Harm.mean = Harmonic mean; Most valenced = The Cword rating that is the furthest from 5 (i.e., neutral) either toward 1 or toward 9; SD = The standard deviation of the individual AMT ratings obtained for a given MWE; WKB = Warriner et al. (2013); NA = not available.
NA was used in place of values (e.g., Cword ratings) that could not be found in the list of WKB. This abbreviation was chosen because the R functions used in the studies can handle datasets that include NA in place of a missing value. For instance, the appropriate calls in base R for calculating Spearman’s and Pearson’s correlations between the variables x and y, when NAs are present, are: cor(x, y, method = "s", use = "pairwise.complete.obs") and cor(x, y, method = "p", use = "pairwise.complete.obs"). The R functions of Wilcox (2012) that were used handle missing values even more automatically. For example, when Wilcox’s R functions are installed (https://dornsife.usc.edu/labs/rwilcox/software/), the call corb(x,y, corfun = spear, nboot = 20000) gives a bootstrap 95% confidence interval for Spearman’s correlation. The call for the medianbased linear regression method that is mentioned would be: tsreg(x, y), where x and y are the independent and the dependent variables, respectively.
Reference
Warriner, A., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 1191–11207. https://doi.org/10.3758/s134280120314x
Experiment data files
Description of this data
Supplementary Materials: Excel sheet showing focal vocabulary items and data
These materials consist of conventional multiword expressions (MWEs) and various associated measures. All these MWEs are composed of two words. The most common structures are NN, AdjN, and V_.
There are essentially two sets of MWEs, although the MWEs within them are mostly the same. The first set figured in the study relating to emotional valence; the second set figured in the study relating to arousal. The whole MWE ratings of valence and arousal were crowdsourced using Amazon Mechanical Turk (AMT) (https://www.mturk.com/), The corresponding ratings of the constituent words stem from a list compiled by Warriner, Kuperman, and Brysbaert (2013). The list itself is available at: http://crr.ugent.be/archives/1003. See the main article for further details.
Associated with each set of MWEs is a smaller set of MWEs having both AMT ratings and ratings from Warriner et al. These matched, ‘overlapping’ ratings were used to assess the reliability of the new AMT ratings. In the main lists of MWEs these overlappers are given in red italics. Additionally, there are lists of regression residuals. The closer a residual is to zero, the more accurately MWE valence/arousal was predicted by the valence/arousal ratings of the mean of the constituent word ratings.
Important abbreviations used in the spreadsheet are: AMT = Amazon Mechanical Turk; Cword = Constituent word; Geo.mean = Geometrical mean; Harm.mean = Harmonic mean; Most valenced = The Cword rating that is the furthest from 5 (i.e., neutral) either toward 1 or toward 9; SD = The standard deviation of the individual AMT ratings obtained for a given MWE; WKB = Warriner et al. (2013); NA = not available.
NA was used in place of values (e.g., Cword ratings) that could not be found in the list of WKB. This abbreviation was chosen because the R functions used in the studies can handle datasets that include NA in place of a missing value. For instance, the appropriate calls in base R for calculating Spearman’s and Pearson’s correlations between the variables x and y, when NAs are present, are: cor(x, y, method = "s", use = "pairwise.complete.obs") and cor(x, y, method = "p", use = "pairwise.complete.obs"). The R functions of Wilcox (2012) that were used handle missing values even more automatically. For example, when Wilcox’s R functions are installed (https://dornsife.usc.edu/labs/rwilcox/software/), the call corb(x,y, corfun = spear, nboot = 20000) gives a bootstrap 95% confidence interval for Spearman’s correlation. The call for the medianbased linear regression method that is mentioned would be: tsreg(x, y), where x and y are the independent and the dependent variables, respectively.
Reference
Warriner, A., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45, 1191–11207. https://doi.org/10.3758/s134280120314x
Experiment data files
This data is associated with the following publication:
Latest version

Version 1
20190918
Published: 20190918
DOI: 10.17632/vtkvn93kts.1
Cite this dataset
Lindstromberg, Seth (2019), “Ratings of the emotional valence and arousal of collocations and their constituent words: How can they be useful in L2 vocabulary research?”, Mendeley Data, v1 http://dx.doi.org/10.17632/vtkvn93kts.1
Statistics
Categories
Licence
The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.
What does this mean?
This dataset is licensed under a Creative Commons Attribution 4.0 International licence. What does this mean? You can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.