HanDeSeT: Hansard Debates with Sentiment Tags

Published: 22 Feb 2018 | Version 2 | DOI: 10.17632/xsvp45cbt4.2

Description of this data

A corpus of Hansard UK Parliament Debates for use in the evaluation of sentiment analysis systems.
The corpus consists of 1251 motion-speech units taken from 129 separate debates from the UK House of Commons 1997-2017.

Each unit comprises a parliamentary speech of up to five utterances and an associated debate motion. Debates comprise between one and 30 speeches, and speeches range in length from 31 to 1049 words, with a mean of 167.8 words. The debates cover a two decade period from 1997 to 2017 and a wide range of topics from domestic and foreign affairs to procedural matters concerning the running of the House.

Each motion has two sentiment polarity labels:

  1. A manually applied sentiment polarity label ; and
  2. A label derived from the relationship of the MP who proses the motion to the Government.

Each speech has two sentiment polarity labels:

  1. A speaker-vote label extracted from the division associated with the corresponding debate; and:
  2. A manually assigned label.

In addition, the following metadata is included with each unit: debate id, speaker party affiliation, motion party affiliation, speaker name, and speaker rebellion rate.

Manually applied motion labels are approximately evenly balanced; the other labels are slightly skewed towards the positive class.

Hansard transcript data is used under the Open Parliament Licence V3.0.
Data regarding speaker rebellion rates is taken from the Public Whip, and used under the Open Data Commons Open Database License (ODbL).

Experiment data files

Steps to reproduce

The corpus is published as a CSV file with the following comma separated values:

id, title, motion, manual motion, govt/opp motion ,motion party affiliation, utt1, utt2, utt3, utt4, utt5, manual speech, vote speech, party affiliation, name, rebellion %

id: a unique ID number given to each debate.
title: the title of the debate
motion: the debate motion as proposed by a Member of Parliament
manual motion: gold standard manually annotated sentiment polarity label applied to the motion. '1' = positive, '0' = negative.
govt/opp motion: a sentiment polarity label applied to the motion derived from the relationship of the MP who proposes it to the current Government: '1' if they are affiliated with the governing party or coalition, '0' otherwise.
motion party affiliation: the political party to which the MP who proposes the motion belongs.
utt1 - utt5: utterances 1 to 5 of the speaker's speech.
manual speech: gold standard manually annotated sentiment polarity label applied to the speech. '1' = positive, '0' = negative.
vote speech: a sentiment polarity label applied to the speech derived from speaker's division vote. 'Aye' = '1' = positive, 'No' = '0' = negative.
party affiliation: the political party to which the speaker.
name: name of the speaker.
rebellion %: rate at which the speaker rebels against the majority of members of their own party as a percentage of their total votes during that parliament.

Latest version

  • Version 2


    Published: 2018-02-22

    DOI: 10.17632/xsvp45cbt4.2

    Cite this dataset

    Abercrombie, Gavin; Batista-Navarro, Riza (2018), “HanDeSeT: Hansard Debates with Sentiment Tags”, Mendeley Data, v2 http://dx.doi.org/10.17632/xsvp45cbt4.2


Views: 1041
Downloads: 101

Previous versions

Compare to version


The University of Manchester


Artificial Intelligence, Political Science, Computational Linguistics, Data Science, Natural Language Processing


CC BY 4.0 Learn more

The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.

What does this mean?
You can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.