Embeddings and topic vectors for MOOC lectures dataset

Name: Embeddings and topic vectors for MOOC lectures dataset
Creator: Zenun Kastrati
Published: 2019-12-06T13:27:43.170Z
Keywords: Natural Language Processing, Machine Learning, e-Learning

Kastrati, Zenun; Kurti, Arianit; Imran, Ali Shariq

doi:10.17632/xknjp8pxbj.1

Embeddings and topic vectors for MOOC lectures dataset

Published: 6 December 2019| Version 1 | DOI: 10.17632/xknjp8pxbj.1

Contributors:

Zenun Kastrati, Arianit Kurti, Ali Shariq Imran

Description

This dataset is comprised of word embeddings and document topic distribution vectors generated from transcripts of 12032 video lectures from 200 courses that were collected from Coursera learning platform. Two well-known natural language processing techniques, namely Word2Vec and Latent Dirichlet Allocation (LDA) implemented in the Gensim package in Python are used to generate word embeddings and topic vectors, respectively.

Files

Institutions

Linneuniversitet

Embeddings and topic vectors for MOOC lectures dataset

Description

Files

Institutions

Categories

Licence