TraderZOE

TraderZOE is revolutionizing decentralized applications by offering unique features that will benefit users and the entire ecosystem The company is doing a generally excellent job of promoting and…

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

Unsupervised Temporal Segmentation of Video in the Wild

Temporal segmentation of video is a very vital preprocessing task for various video analysis tasks such as video summarization, video surveillance, video captioning/description, sports video analysis, etc. The enormous multimedia content is recorded in a variety of scene contexts so we can not rely on predefined object saliency or scenes. This enforces the temporal segmentation framework should be purely unsupervised. Apart from that, some videos like lifelogging, surveillance, and movie videos span from hours to days, require scalable solutions.

Why do deep learning frameworks fail?
The deep learning-based solutions use a sequential model (RNN/LSTM) to harness temporal information. Most of the techniques are supervised or semisupervised and require a large amount of training data [1]. On the other end, these methods do not scale for hours-long egocentric video segments, as the gradients during backpropagation vanish beyond a few hundred-time steps[2].

Concept Drift Detection for streaming data: We formulate the problem of temporal segmentation as concept drift detection in multivariate time series data. In a concept drift detection task, one maintains two adjacent temporal windows of fixed size and estimates statistical summary (e.g. average) of the two windows separately. If the summary is significantly different for the two windows, the algorithm declares concept drift. The key challenges to use the formulation for temporal segmentation are:

(1) Choosing window length for the statistical summary, as different activity/event lengths may require different temporal windows, and

(2) Choosing threshold to declare boundary, as real boundaries may have smooth visual changes, whereas sharp head motion may cause significant visual changes in non-boundary regions.

We emphasize that the proposed formulation can incorporate various other cues suggested for temporal segmentation of videos viz optical flow, and other objects present in the scene, etc. Our primary contribution is in suggesting a way to deal with smooth changes in the features at the real boundaries compared to sharper changes at the boundaries. A boundary is declared if the difference of the statistical summaries of the two sub-windows is larger than this threshold. The threshold is based on Hoeffding’s inequality and is valid for all probability distributions.

Concept Drift Detection in videos: For the concept drift detection in videos, one maintains a sliding window, 𝑤, of dynamic length, n, over the video sequence. Consider a hypothesis that there is a segment boundary at index t within the window, i.e., there is a particular segment,𝑤1, of length n1, from [0,𝑡) and another segment,𝑤2, of length n2, from[𝑡,𝑛). We assume that the data in two segments is from two unknown distributions with the observed mean values of 𝜇1 and 𝜇2, respectively. If for a particular partition, the score (∥𝜇1−𝜇2∥2) exceeds a threshold ecut, we would like to declare a detected boundary at 𝑡, and the segment w1 will be dropped from 𝑤. Otherwise, a new sample is added to the current window w, and the process is repeated for this new window of size 𝑛+1. For each window 𝑤, the boundary hypothesis is tested for all indices t ∈ 𝑤. We compute the threshold 𝜖cut in a principled manner using multiple hypothesis testing (kindly refer to the paper for detailed derivation).

References:
[1] Sathyanarayanan N Aakur and Sudeep Sarkar. 2019. A Perceptual PredictionFramework for Self Supervised Event Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[2] Shuai Li, Wanqing Li, Chris Cook, Ce Zhu, and Yanbo Gao. 2018. Independently recurrent neural network (indrnn): Building a longer and deeper rnn. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

TraderZOE

Unsupervised Temporal Segmentation of Video in the Wild

Add a comment

Related posts:

Treating My Insomnia

A First Time Father At 80

Objectifier