Amazon’s Latest Artificial Intelligence (AI) Research Proposes New Human-in-the-Loop Framework for Generating Semantic Segmentation Annotations for Full Video

Compared to unsupervised learning, supervised learning produces more accurate results in computer vision (CV). Supervised learning makes use of annotated data sets to develop algorithms for classification or prediction. However, the data annotation process is labor intensive, time consuming and a lot of human effort. This becomes considerably more expensive when using semantic segmentation, as it involves annotating every pixel in an image. However, accurate per-pixel semantic annotation is required for training and evaluating semantic segmentation algorithms when dealing with video datasets. However, when it comes to video versus images, the cost of annotation becomes even more prohibitive which is why annotations are often limited to a small fraction of the video content.

To address this issue, a team of Amazon researchers developed Human-in-the-loop Video Semantic Segmentation Auto-Annotation (HVSA). This state-of-the-art framework is capable of providing semantic segmentation annotations for a complete video quickly and effectively. HVSA continuously switches between active sample selection and setup under testing until the quality of the annotation is assured. While test setup propagates manual annotations of selected samples to the entire video, active sample selection sets up the most crucial samples for manual annotation. The researchers’ work will also be presented at the prestigious Winter Conference on Applications of Computer Vision (WACV).

The team uses a pre-trained network to perform semantic segmentation on the videos. Their method involves tailoring the pre-trained model to a specific input video so that it can be configured to assist in annotating the video with extremely high accuracy. This method was inspired by how human annotators handle video annotation tasks. Adjacent frames are examined to identify appropriate object categories. In addition, existing annotations of the same video are also taken into account. This is how they use their approach setup under testing. The researchers added a new loss function that considers these two data sources to modify the pre-trained network to the incoming video. While the second component of the loss has the task of penalizing predictions inconsistent with existing data, the first part penalizes unreliable semantic prediction between successive frames.

Introducing Hailo-8™: An AI Processor Using Computer Vision for Multi-Camera Multi-Person Re-Identification (Sponsored)

HVSA uses active learning to tune the model using samples actively chosen by the algorithm and labeled by the annotators in each iteration. Uncertainty sampling is the primary essence of active learning. In simple terms, a sample should be chosen for manual annotation if a network predicts its tag with sufficient confidence. However, uncertainty sampling is inadequate by itself. The researchers also looked at diversity sampling to ensure that the samples were distinct. These sample types were generated using clustering-based sampling. The overall strategy can be summarized as initially making an active selection of the annotation samples that provide the most information during each iteration. Once these chosen samples receive manual annotations, the team’s method uses semantic knowledge and time constraints to refine the video-specific semantic segmentation model. The entire video can be annotated using this template.

It was discovered through experimental evaluations on two datasets that Amazon’s HVSA achieves impressive accuracy (over 95%) and nearly flawless semantic segmentation annotations. The fact that it achieves these goals with the least amount of annotation time and expense strikes it as a differentiator. HVSA takes only a few tens of minutes for each iteration. Researchers are further examining optimization using multi-task parallelization.


Check out the Paper and Blog article. All credit for this research goes to the researchers of this project. Also, don’t forget to subscribe our Reddit page and discord channelwhere we share the latest news on AI research, cool AI projects and more.


Khushboo Gupta is a Consulting Intern at MarktechPost. He is currently pursuing his B.Tech in Indian Institute of Technology (IIT), Goa. She is passionate about Machine Learning, Natural Language Processing and Web Development. She likes to learn more about the technical field by participating in different challenges.


Add a Comment

Your email address will not be published. Required fields are marked *