Projects

Text Conditioned Image Retrieval (TCIR)

with Media and Data Science Research Labs, Adobe

I worked on the challenging problem of Text Conditioned Image Retrieval which involves taking as input a reference image and a modifying caption, with the goal of retrieving images from a gallery which satisfy the constraints imposed by the two inputs. We proposed a novel two stage approach for TCIR  which systematically highlights salient regions in the image as mentioned in the caption and then hierarchically modifies them while preserving aspects of the original image. 

Spatio-Temporal Video Grounding (STVG)

with Prof. Ram Nevatia, University of Southern California

In STVG, the goal is localize the spatio-temporal tube of a referred object in an untrimmed video. This involves determining the start and end time of the referred activity in the video, and predicting the bounding boxes of the referred object in between this predicted  temporal segment. We proposed a novel single pass architecture for STVG, eliminating the need for sliding windows and tube proposals as used by many prior works whilst beating the State of the Art model on this problem statement.  

Multimodal Contrastive Learning for Control 

with Prof. Florian Shkurti, University of Toronto

In this project I worked on learning useful representations for the task of Imitation Learning for lane following by using a self-supervised contrastive learning based approach. We formulated a multimodal deep learning model to learn the efficient fusion of RGB and LiDAR views using a self-supervised approach. Next, we used these fused embeddings to train a linear control model, allowing an autonomous agent to follow a path on the CARLA Simulator. 

Simulation of Fairness in Multi-Agent Credit Setup

with IBM Research 

I worked on creating a simulation environment in a multi-agent (bank) and a multi-group (where one group is under-represented) setup where two or more banks compete to offer loans to a population, while trying to maximise their profit. Along with the profit, earned, I also analysed the biasness of the agents toward a particular group of the population and quantitatively estimated it using fairness metrics such as Equality of Opportunity.

Detection and Tracking of Foot Movement in Indian Classical Dances

with Prof. Partha Pratim Das, IIT Kharagpur

As part of my Bachelor's Thesis, I worked on using classic computer vision approaches such as Otsu Thresholding, Skin Based segmentation, and contour fitting to accurately detect and track feet movement in untrimmed Indian Classical Dance videos. Our pipeline was able to accurately track the movement of feet across long videos without the need for manual thresholding and without the use of any labelled data or deep learning methods.