HEVC-SVS: Low-level HEVC features and CNN features for TVSum, SumMe, OVP and VSUMM datasets

Published: 30 May 2023| Version 8 | DOI: 10.17632/88rpmmnmkm.8
Contributors:
,

Description

Proposed HEVC feature sets along with CNN features from GoogleNet, AlexNet, Inception-ResNet-V2, and VGG16 for TVSum, SumMe, OVP and VSUMM datasets. The new modified datasets names are "HEVC-SVS-TVSum", "HEVC-SVS-SumMe", "HEVC-SVS-OVP" and "HEVC-SVS-VSUMM", respectively. The datasets contain the original ground truth data they came with, and these stayed unmodified. Upon using any of these datasets, please do cite our publications where we proposed the HEVC feature set for the first time: If you are using (HEVC-SVS-OVP) and/or (HEVC-SVS-VSUMM) datasets: https://ieeexplore.ieee.org/document/9815254/ @article{issa_cnn_2022, title = {{CNN} and {HEVC} {Video} {Coding} {Features} for {Static} {Video} {Summarization}}, volume = {10}, issn = {2169-3536}, url = {https://ieeexplore.ieee.org/document/9815254/}, doi = {10.1109/ACCESS.2022.3188638}, urldate = {2022-09-29}, journal = {IEEE Access}, author = {Issa, Obada and Shanableh, Tamer}, year = {2022}, pages = {72080--72091}, } If you are using (HEVC-SVS-TVSum) and/or (HEVC-SVS-SumMe) datasets: https://www.mdpi.com/2076-3417/13/10/6065 @article{issa_static_2023, title = {Static {Video} {Summarization} {Using} {Video} {Coding} {Features} with {Frame}-{Level} {Temporal} {Subsampling} and {Deep} {Learning}}, volume = {13}, issn = {2076-3417}, url = {https://www.mdpi.com/2076-3417/13/10/6065}, doi = {10.3390/app13106065}, number = {10}, journal = {Applied Sciences}, author = {Issa, Obada and Shanableh, Tamer}, month = may, year = {2023}, pages = {6065}, } Make sure to also cite the original authors for each of the datasets: TVSum (https://people.csail.mit.edu/yalesong/tvsum/) SumMe (https://gyglim.github.io/me/vsum/index.html) OVP and VSUMM (https://www.sites.google.com/site/vsummsite/download) Acknowledgement: The work in this research project is supported by the American University of Sharjah under research grant number FRG22-E-E44. This research work represents the opinions of the author(s) and does not mean to represent the position or opinions of the American University of Sharjah.

Files

Steps to reproduce

The features extracted are the following: Feature ID Feature variable Averaged per frame 1 Number of CU parts 2 MVD bits per CU 3 CU bits excluding MVD bits 4 Percentage of intra CU parts 5 Percentage of skipped CU parts 6 Number of CUs with depth 0 (i.e 64×64) 7 Number of parts with depth 1 (i.e 32×32) 8 Number of CUs with depth 2 (i.e 16×16) 9 Number of parts with depth 3 (i.e 8x8) Not Averaged per frame 10-18 Standard deviation of feature IDs 1-9 per frame 19 Max CU depth per frame 20 For CUs with depth > 0, log2 (sum of MVD) 21 For CUs with depth = 0, log2 (sum of MVD) Averaged per frame 22 Row-wise SAD of the CU prediction error 23 Column-wise SAD of the CU prediction error 24 Ratio of gradients (i.e feature 22 divided by feature 23) per CU 25 Total distortion per CU as computed by the HEVC encoder Not Averaged per frame 26-29 Standard deviation of feature IDs 22-25 per frame 30 Per frame: Summation of variance of the x and y components of all MVs 31-47 Histogram of x-component of all MVs per frame (using 16 pins) 48-64 Histogram of y-component of all MVs per frame (using 16 pins)

Institutions

American University of Sharjah

Categories

Video Processing, Feature Extraction, Convolutional Neural Network, Video Summarization

Licence