This talk targets the automatic recognition of human actions in videos. Human action recognition is defined as a requirement to determine what human actions occur in videos. This problem is particularly hard due to enormous variations in visual and motion appearance of people and actions, camera viewpoint changes, moving background, occlusions, noise, and enormous amount of video data.
Firstly, I will present two local spatio-temporal descriptors for action recognition in videos. The first descriptor is based on a covariance matrix representation, and it models linear relations between low-level features. The second descriptor is based on a Brownian covariance and it models all kinds of possible relations between low-level features.
Then, I will talk about two higher-level feature representations to go beyond the limitations of the local feature encoding techniques.
The first representation is based on the idea of relative dense trajectories. I will present an object-centric local feature representation of motion trajectories, which allows to use the spatial information by a local feature encoding technique.
The second representation captures statistics of pairwise co-occurring visual words within multi-scale feature-centric neighborhoods. The proposed contextual features based representation encodes information about local density of features, local pairwise relations among the features, and spatio-temporal order among features.
Finally, I will show that the proposed techniques obtain better or similar performance in comparison to the state-of-the-art on various, real, and challenging human action recognition datasets (Weizmann, KTH, URADL, MSR Daily Activity 3D, UCF50, HMDB51, and CHU Nice Hospital).
Dr. Piotr Tadeusz Biliński is a Post-Doctoral Research Fellow at STARS team at INRIA Institute, Sophia Antipolis Research Center, France. He obtained his Bachelor's Degree in 2008 and Master's Degree in 2009 from Poznan University of Technology in Poland. He has been working on Human Action Recognition in Videos since 2010, under the supervision of Francois Bremond. In 2013, he was a Research Intern at the Microsoft Research in Redmond, United States, where he was working in Audio and Signal Processing domain, on Computer Vision and Machine Learning techniques for Head-Related Transfer Function Personalization using Anthropometric Features. In 2014, he received his Ph.D. Degree from the University of Nice in France. His Ph.D. Thesis was reviewed by Ram Nevatia, Frederic Jurie, and Ivan Laptev.