Video Inpainting Benchmark

Abstract
When evaluating video inpainting methods, people often use different video sources. Commonly-used datasets are video segmentation datasets like DAVIS 2017 and YouTube-VOS because they provide foreground masks. However, they are not specifically designed for video inpainting: they don’t give labels to indicate whether the video is hard for video inpainting, and if so, why.
For this reason, we propose attributes that are likely to affect the video inpainting model performance — camera motion, foreground motion, background scene motion, and foreground displacement — and label them on DAVIS 2017. This can help us evaluate those videos’ difficulties for the video inpainting task. However, labeling a video objectively is a challenge for humans since everyone has their own criterion; therefore, we propose a set of automatic classifiers. We hope the new-annotated dataset will be a guide for researchers to improve their algorithms.
Classifier Performance
In the new dataset, we divide the videos’ motion into binary labels. It will be either high motion video or low motion video and evaluate the performance on the DAVIS dataset.
In low camera motion video, most of the background textures are observed in both frames and that they are completely different on the right.
-
Foreground motion
Foreground Motion Performance -
Background scene motion
Background Scene Motion Performance -
Foreground displacement
