pt-br2libras-gloss

Published: 23 April 2025| Version 1 | DOI: 10.17632/ryj88ckjww.1
Contributors:
,
,
,
,
,
,
,

Description

This dataset is a UTF-8 encoded Comma-Separated Values (CSV) format containing a parallel corpus of 127,349 aligned sentence pairs in Brazilian Portuguese and LIBRAS gloss. The file includes three columns: pt-br: original sentences in Brazilian Portuguese libras-gloss: corresponding translations in LIBRAS gloss notation is_government_source: a boolean field indicating whether the source sentence was extracted from an official Brazilian Federal Government website (True) or from a non-governmental source (False) A total of 55,047 sentence pairs in the dataset originate from government sources. The dataset is intended to support research in bilingual corpora, machine translation, and sign language processing, particularly for applications involving Brazilian Portuguese and Brazilian Sign Language (LIBRAS).

Files

Institutions

  • Universidade Federal da Paraiba

Categories

Natural Language Processing, Machine Translation, Sign Language, Portuguese Language

Funders

  • Secretaria Nacional dos Direitos das Pessoas com Deficiência (SNDPD), Ministério dos Direitos Humanos e da Cidadania, Brasil
  • Secretaria de Governo Digital (SGD), Ministério da Gestão e da Inovação em Serviços Públicos, Brasil

Licence