A Dataset for Egocentric Recognition of Handled Objects

This is a dataset for the recognition of handled objects using a wearable camera, collected by Matthai Philipose and Xiaofeng Ren at Intel Labs Seattle. It includes ten video sequences from two human subjects manipulating 42 everyday object instances.

The purpose of this dataset is to study object recognition in everyday life settings from an egocentric view i.e. using a wearable camera. In particular we are interested in recognizing the object that is held in the user's hands and/or being manipulated. This object is of special relevance as it carries rich information about the user's activities. A smart camera that understands what the user is doing can offer many types of assistance and make our life a lot easier than today.

Our egocentric object recognition dataset has many interesting and unique characteristics. It is in video form where users actively and continuously manipulate the objects, with rapid object pose changes and frequent (and often severe) hand occlusions. It is captured from a wearable camera with constant camera movement and poor image quality including motion blur, low-resolution and sensor noise. It covers a variety of realistic everyday environments with varying illumination and background.

To quickly check out this dataset, please see this preview video.

Here are the links to the actual dataset (6G bytes total, in two parts: part1, part2).

The current release includes about 100K frames stored at 512x384 in JPEG format, along with several annotations: object labels (one per frame, only for the object being manipulated), exemplar photos for the objects (clean background, about 10 per object), and object-background segmentations (selected frames from the actual videos, about 10 per object).

For more details of this dataset and its analysis, please refer to our paper

     Egocentric Recognition of Handled Objects: Benchmark and Analysis

presented at the First Workshop of Egocentric Vision in conjunction with CVPR 2009.

In our more recent CVPR 2010 paper Figure-Ground Segmentation Improves Handled Object Recognition in Egocentric Video, we have done extensive experimentation on this dataset and showed a lot of progress both in object recognition itself and in the use of figure-ground segmentation for recognition. Check out the videos here.

Let me know if you have any questions or comments about this dataset.