Indonesian Stand-Up Comedy Data Kompas TV

Published: 2 April 2024| Version 1 | DOI: 10.17632/zjdncn6tkv.1
Supriyono Supriyono,


The "Indonesian Stand-Up Comedy Data" that was taken from Kompas TV's YouTube channel offers a thorough archive of user interactions and remarks related to stand-up comedy videos that have been broadcast on the site. Researchers and fans who want to learn more about the state of Indonesian stand-up comedy and trends in audience interaction will find this dataset to be a useful tool.


Steps to reproduce

Identifying Data Source: The first step was to determine that the main source of data for Indonesian stand-up comedy video was Kompas TV's YouTube channel. Kompas TV is a well-known media platform in Indonesia that features a wide variety of entertainment, including stand-up comedies. Web Crawling: To methodically access and retrieve data from Kompas TV's YouTube channel, web crawling techniques were utilized. For this, Python-based web scraping packages like Scrapy and BeautifulSoup were used. The extraction of pertinent metadata, such as video URLs, titles, likes, views, posting dates, hashtags, descriptions, comments, and replies, was made easier by these libraries. Data Extraction: The procedure for web scraping entailed browsing the channel's video library, obtaining metadata from specific video pages, and putting the information in structured file formats like CSV or JSON. To guarantee the accuracy and comprehensiveness of the extracted data, extra care was taken. Encoding Commenter Identities: The dataset's commenter identities were encoded in order to protect commenters' privacy and data integrity. This entailed giving commentators special IDs or pseudonyms in order to maintain commenter anonymity while also enabling comment interaction analysis. Quality Assurance: Strict quality assurance procedures were followed during the data collection process to confirm the precision and coherence of the extracted data. Cross-referencing data points, spotting and fixing mistakes or discrepancies, and guaranteeing adherence to moral standards were all part of this. Documentation: The dataset was accompanied by comprehensive documentation that outlined the methods, instruments, and procedures utilized in the data collection process. For academics who want to replicate the study or use the dataset for their own analysis, this material acts as a guide. Open Access: To ensure transparency and allow for the replication of research findings, the dataset was made publicly available to the scientific community via platforms like Mendeley Data. In order to promote wider accessibility, further efforts were undertaken to offer translations or explanations when needed.


Universitas Islam Negeri Maulana Malik Ibrahim, Universitas Negeri Malang


Natural Language Processing, Applied Computer Science