HEVC-SVS: Low-level HEVC features and CNN features for OVP and SUMME datasets

Published: 3 May 2022| Version 2 | DOI: 10.17632/88rpmmnmkm.2


Proposed HEVC feature sets along with CNN features from GoogleNet, AlexNet, Inception-ResNet-V2, and VGG16 for OVP and SUMME datasets. The new modified datasets are names "HEVC-SVS-OVP" and "HEVC-SVS-SUMME". The datasets contain the original ground truth data they came with, and these stayed unmodified. Feature ID Feature variable Averaged per frame 1 Number of CU parts 2 MVD bits per CU 3 CU bits excluding MVD bits 4 Percentage of intra CU parts 5 Percentage of skipped CU parts 6 Number of CUs with depth 0 (i.e 64x64) 7 Number of parts with depth 1 (i.e 32x32) 8 Number of CUs with depth 2 (i.e 16x16) 9 Number of parts with depth 3 (i.e 8x8) 10-18 Standard deviation of feature IDs 1-9 per frame 19 Max CU depth per frame 20 For CUs with depth > 0, log_2⁡〖(|sum of MVD|)〗? 21 For CUs with depth = 0, log_2⁡〖(|sum of MVD|)〗? Averaged per frame 22 Row-wise SAD of the CU prediction error 23 Column-wise SAD of the CU prediction error 24 Ratio of gradients (i.e feature 22 divided by feature 23) per CU 25 Total distortion per CU as computed by the HEVC encoder 26-29 Standard deviation of feature IDs 22-25 per frame 30 Per frame: Summation of variance of the x and y components of all MVs 31-47 Histogram of x-component of all MVs per frame (using 16 pins) 48-64 Histogram of y-component of all MVs per frame (using 16 pins) Upon using any of these datasets, please do cite our publication where we proposed the HEVC 64 feature set for the first time: [[Place holder for our publication reference]]. Along with the original authors for each of the datasets: OVP: @article{Avila, title = "VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method", journal = "Pattern Recognition Letters", volume = "32", number = "1", pages = "56 - 68", year = "2011", note = "<ce:title>Image Processing, Computer Vision and Pattern Recognition in Latin America</ce:title>", issn = "0167-8655", doi = "10.1016/j.patrec.2010.08.004", author = "Sandra Eliza Fontes de Avila and Ana Paula Brand„o Lopes and Antonio da Luz Jr. and Arnaldo de Albuquerque Ara˙jo", } SUMME: @inproceedings{GygliECCV14, author ={Gygli, Michael and Grabner, Helmut and Riemenschneider, Hayko and Van Gool, Luc}, title = {Creating Summaries from User Videos}, booktitle = {ECCV}, year = {2014} }


Steps to reproduce

The videos inside the original OVP and SUMME datasets were fed into a custom HEVC encoder. The following table has the HEVC features that were extracted for every 15th frame, starting with frame 1. The datasets also include CNN features from the following networks: GoogleNet, AlexNet, Inception-ResNet-V2, and VGG16.


American University of Sharjah


Video Processing, Video Summarization