Stand-Up Comedy Data

Published: 18 April 2024| Version 1 | DOI: 10.17632/rvfxvxy94b.1
, Fachrul Kurniawan


Researchers and comedy enthusiasts seeking more profound insights into Indonesian stand-up comedy can leverage a rich dataset from Kompas TV's YouTube channel. This dataset, brimming with user interactions and remarks on stand-up comedy videos, is a comprehensive archive reflecting audience engagement and perceptions of various comedic performances. By meticulously analyzing this trove of user-generated content, researchers can unravel prevailing trends, identify standout comedians, and discern cultural nuances influencing audience preferences. Such an endeavor facilitates a profound understanding of the Indonesian comedic landscape, shedding light on evolving tastes, emerging comedic styles, and the reception of specific comedy specials. Ultimately, this dataset emerges as a valuable tool for academic inquiry and a guiding compass for comedians and content creators, informing their craft and enabling them to better connect with their audience.


Steps to reproduce

Identifying the primary data source for Indonesian stand-up comedy videos involved recognizing Kompas TV's YouTube channel as a prominent platform for such content. Kompas TV, renowned for its diverse entertainment offerings, including stand-up comedy, was the focal point for data collection. Leveraging web crawling techniques facilitated systematic access to and retrieval of data from the channel. Python-based web scraping tools like Scrapy and BeautifulSoup were instrumental in extracting essential metadata such as video URLs, titles, likes, views, posting dates, hashtags, descriptions, comments, and replies. The web scraping process involved: Navigating through the channel's video library Extracting metadata from individual video pages Structuring the information into formats like CSV or JSON Careful consideration was given to ensuring the accuracy and comprehensiveness of the extracted data. Commenter identities were encoded to safeguard privacy and maintain data integrity, assigning unique IDs or pseudonyms while enabling comment interaction analysis. Stringent quality assurance protocols were implemented throughout the data collection phase to validate the precision and coherence of the extracted data. Cross-referencing data points, rectifying errors or discrepancies, and upholding ethical standards were integral aspects of this process. Comprehensive documentation accompanied the dataset, elucidating the methodologies, tools, and procedures employed in data collection. This documentation is a valuable resource for academics aiming to replicate the study or utilize the dataset for their analyses. To promote transparency and facilitate the replication of research findings, the dataset was made openly accessible to the scientific community via platforms like Mendeley Data. Additional efforts were undertaken to enhance accessibility, including providing translations or explanations where necessary. This commitment to open access ensures the dissemination of knowledge and fosters collaborative research endeavors in Indonesian stand-up comedy.


Universitas Islam Negeri Maulana Malik Ibrahim, Universitas Negeri Malang


Natural Language Processing, Applied Computer Science