pt-br2libras-gloss

Name: pt-br2libras-gloss
Creator: Manuella Lima
Published: 2025-04-23T10:55:32.377Z
Keywords: Natural Language Processing, Machine Translation, Sign Language, Portuguese Language

Lima, Manuella; França, Daniel; Silva, Diego; Albuquerque, Dilainne; Lacerda, Daniel; Costa, Rostand; Souza Filho, Guido; Araújo, Tiago

doi:10.17632/ryj88ckjww.1

pt-br2libras-gloss

Published: 23 April 2025| Version 1 | DOI: 10.17632/ryj88ckjww.1

Contributors:

,

Description

This dataset is a UTF-8 encoded Comma-Separated Values (CSV) format containing a parallel corpus of 127,349 aligned sentence pairs in Brazilian Portuguese and LIBRAS gloss. The file includes three columns: pt-br: original sentences in Brazilian Portuguese libras-gloss: corresponding translations in LIBRAS gloss notation is_government_source: a boolean field indicating whether the source sentence was extracted from an official Brazilian Federal Government website (True) or from a non-governmental source (False) A total of 55,047 sentence pairs in the dataset originate from government sources. The dataset is intended to support research in bilingual corpora, machine translation, and sign language processing, particularly for applications involving Brazilian Portuguese and Brazilian Sign Language (LIBRAS).

Files

Institutions

Universidade Federal da Paraiba

Funders

Secretaria Nacional dos Direitos das Pessoas com Deficiência (SNDPD), Ministério dos Direitos Humanos e da Cidadania, Brasil
Secretaria de Governo Digital (SGD), Ministério da Gestão e da Inovação em Serviços Públicos, Brasil

pt-br2libras-gloss

Description

Files

Institutions

Categories

Funders

Licence