Dta for: Dual Effects of TL in We-media Videos(308+2670)
Description
In order to obtain accurate data on the time length of videos, it is necessary for investigators to measure customer satisfaction and participation by providing videos of different types and different time lengths as stimuli, so it seems that a rigorous and complex experiment would be inevitable. But fortunately, Bilibili has in fact provided a large number of videos of various types and time lengths, which are attached with relevant and reliable data (with a large enough number of views), hence can be considered a natural experiment of this research.
Files
Steps to reproduce
Part of the data used for multiple linear regression analysis comes from the web pages of videos, such as the time length of videos, the number of plays, likes, bullet-screen-comments, and the proportion of viewers with paid membership. Another part of the data comes from the personal homepage of video creators, such as their levels on Bilibili, the number of followers of theirs, and the number of all plays and likes they ever had. The final part of the data comes from the refinement of video content, such as types of experiences. For obvious reasons, these data had to be collected by hand using a stratified random sampling method in following three steps. Firstly, stratified based on four ranges of the time length and 21 keywords, it was planned to collect four pieces of data per layer and 336 data in total. Ranges of the time length are "less than 10 minutes", "10-30 minutes", "30-60 minutes", and "more than 60 minutes". The keywords are selected with simplification and optimization based on the video classification labels on the Bilibili homepage. Secondly, sample videos were extract one by one from the search results of the keyword and time range of each layer, located by a set of three-dimensional random array (indicating the page number, row number, and piece number of sample videos), and relevant data were collected after videos were fully watched. Finally, 308 (92%) valid data were kept after duplicate and incomplete data had been removed. The data used for structural equation modelling was obtained by data crawling from Bilibili using a software named "Octopus". The data crawling is carried out in three steps using a simple random sampling method. Firstly, the websites of 140 videos were randomly crawled based on each keyword. Secondly, according to the websites, data of corresponding videos were collected on time lengths, the number of plays, likes, coins, favourites, sharing, bullet-screen-comments, and comments, plus the information of titles, introductions, and tags. Finally, 2670 (91%) valid data were kept from a total of 2940 data collected, after duplicate and incomplete data had been removed by comparing video titles, introductions and tags.