HanDeSeT: Hansard Debates with Sentiment Tags
Description of this data
A corpus of Hansard UK Parliament Debates for use in the evaluation of sentiment analysis systems.
The corpus consists of 1251 motion-speech units taken from 129 separate debates from the UK House of Commons 1997-2017.
Each unit comprises a parliamentary speech of up to five utterances and an associated debate motion. Debates comprise between one and 30 speeches, and speeches range in length from 31 to 1049 words, with a mean of 167.8 words. The debates cover a two decade period from 1997 to 2017 and a wide range of topics from domestic and foreign affairs to procedural matters concerning the running of the House.
Each motion has two sentiment polarity labels:
- A manually applied sentiment polarity label ; and
- A label derived from the relationship of the MP who proses the motion to the Government.
Each speech has two sentiment polarity labels:
- A speaker-vote label extracted from the division associated with the corresponding debate; and:
- A manually assigned label.
In addition, the following metadata is included with each unit: debate id, speaker party affiliation, motion party affiliation, speaker name, and speaker rebellion rate.
Manually applied motion labels are approximately evenly balanced; the other labels are slightly skewed towards the positive class.
Hansard transcript data is used under the Open Parliament Licence V3.0.
Data regarding speaker rebellion rates is taken from the Public Whip, and used under the Open Data Commons Open Database License (ODbL).
Experiment data files
Instructions for annotators used in the creation of the HanDeSeT corpus.
CSV file of 1252 rows with the following comma separated values:id, title, motion, manual motion, govt/opp motion ,motion party affiliation, utt1, utt2, utt3, utt4, utt5, manual speech, vote speech, party affiliation, name, rebellion %
Steps to reproduce
The corpus is published as a CSV file with the following comma separated values:
id, title, motion, manual motion, govt/opp motion ,motion party affiliation, utt1, utt2, utt3, utt4, utt5, manual speech, vote speech, party affiliation, name, rebellion %
id: a unique ID number given to each debate.
title: the title of the debate
motion: the debate motion as proposed by a Member of Parliament
manual motion: gold standard manually annotated sentiment polarity label applied to the motion. '1' = positive, '0' = negative.
govt/opp motion: a sentiment polarity label applied to the motion derived from the relationship of the MP who proposes it to the current Government: '1' if they are affiliated with the governing party or coalition, '0' otherwise.
motion party affiliation: the political party to which the MP who proposes the motion belongs.
utt1 - utt5: utterances 1 to 5 of the speaker's speech.
manual speech: gold standard manually annotated sentiment polarity label applied to the speech. '1' = positive, '0' = negative.
vote speech: a sentiment polarity label applied to the speech derived from speaker's division vote. 'Aye' = '1' = positive, 'No' = '0' = negative.
party affiliation: the political party to which the speaker.
name: name of the speaker.
rebellion %: rate at which the speaker rebels against the majority of members of their own party as a percentage of their total votes during that parliament.
Cite this dataset
Abercrombie, Gavin; Batista-Navarro, Riza (2018), “HanDeSeT: Hansard Debates with Sentiment Tags”, Mendeley Data, v2 http://dx.doi.org/10.17632/xsvp45cbt4.2
Compare to version
The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.