Hardik Meisheri

Hardik Meisheri

ML Scientist

Amazon

Biography

I am currently working on developing foundational models for Amazon Advertising. Previously, I focused on building multilingual models to moderate ads within Amazon Sponsored Products. My experience spans a wide range of Reinforcement Learning (RL) applications, including optimizing supply chains, cost-to-serve models, and dialogue generation. I have also contributed to improving the robustness and sample efficiency of RL algorithms. Beyond RL, I have worked at the intersection of Natural Language Processing (NLP) and Deep Learning, with a particular focus on sentiment analysis.

I am also interested in collaborations in the mentioned areas below. If you have an interesting project, and would like to collaborate or talk, or just want to talk about research in general, please reach out to me.

Interests

  • Representation learning
  • Sample Efficient Reinforcement Learning
  • Generalization in RL
  • Alignment in LLMs

Education

  • M.tech in Machine Intelligence, 2016

    Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)

  • B.E in Electronics and Communication Engg, 2013

    Gujarat Technological University

Experience

 
 
 
 
 

Applied Scientist II

Amazon

Dec 2022 – Present Bangalore, India
  • Spearheaded the foundational multi-modal model for Amazon Ads, integrating image and text, and achieving a 23% sample efficiency gain with GPRO in preliminary experiments.
  • Active Learning strategies to optimally utilize the human annotator bandwidth. This helped in reduction of annotated data needed for training by 80%. (1 Publication in internal ML conference, AMLC)
  • Developed a novel training paradigm to bootstrap current production model knowledge into new policies. This resulted in improving F1 score by 20%. Helped reaching the required production business metrics.
  • Finetuned LLM (7 & 13B) for advertisement policy understanding using Reinforcement Learning and Human in the loop with parameter efficient fine tuning. This achieved the exact match precision of 80% with 30% traffic reduction to human moderators. (ACVC Publication)
 
 
 
 
 

Researcher

Tata Consultancy Services

May 2018 – Dec 2022 Mumbai, India
  • Technical Lead - Team size 3
  • Supply Chain Optimization using Deep Reinforcement Learning (6 Publications, 3 Patents filed, 1 Granted)
  • Pommerman - Multiagent simulation game (3 Publication)
  • Sample Efficient Reinforcement Learning (1 Publication)
  • General Value functions (2 Publication)
  • Policy Optimization in large action spaces (1 Publication)
  • Reinforcement Learning in last mile delivery (1 Publication)
  • Preliminary Phase: Reinforcement Learning driven open domain dialogue systems
 
 
 
 
 

Researcher

Tata Consultancy Services

Jul 2016 – May 2018 Gurgaon, India
  • Sentiment Analysis of use generated noisy texts. (4 Publications)
  • Predicting Stock Market movement based on different categories of news articles. (1 Publication, 1 Tutorial)
  • Predicting trends and behavioural analysis in social media space such as Twitter and Reddit (2 Publications)
 
 
 
 
 

Teaching Assistant

DA-IICT

Aug 2014 – May 2016 Gandhinagar, India
Designing Lab course structures, Evaluating the students- Exams, quiz, Supervising and conducting Labs

Recent Publications

Quickly discover relevant content by filtering publications.

A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management

Most existing literature on supply chain and inventory management consider stochastic demand processes with zero or constant lead …

Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Exploration versus exploitation dilemma is a significant problem in reinforcement learning (RL), particularly in complex environments …

FoLaR: Foggy Latent Representations for Reinforcement Learning with Partial Observability

We propose a novel methodology for improving the rate and consistency of reinforcement learning in partially observable (foggy) …

Sample Efficient Training in Multi-Agent Adversarial Games with Limited Teammate Communication

We describe our solution approach for Pommerman TeamRadio, a competition environment associated with NeurIPS 2019. The defining feature …

Using Reinforcement Learning for a Large Variable-Dimensional Inventory Management Problem

This paper evaluates the applicability of reinforcement learning (RL) to multi-product inventory management in supply chains. The …

Characterizing Behavioral Trends in a Community Driven Discussion Platform

This article presents a systematic analysis of the patterns of behavior of individuals as well as groups observed in community-driven …

Accelerating Training in Pommerman with Imitation and Reinforcement Learning

The Pommerman simulation was recently developed to mimic the classic Japanese game Bomberman, and focuses on competitive gameplay in a …

Actor Based Simulation for Closed Loop Control of Supply Chain using Reinforcement Learning

Reinforcement Learning (RL) has achieved a degree of success in control applications such as online gameplay and robotics, but has …

Multi-Document Summarization using Distributed Bag-of-Words Model

As the number of documents on the web is growing exponentially, multi-document summarization is becoming more and more important since …

Air Pollutant Severity Prediction Using Bi-Directional LSTM Network

Air pollution has emerged as a universal concern across the globe affecting human health. This increasing danger motivates the study of …

Learning representations for sentiment classification using Multi-task framework

Most of the existing state of the art sentiment classification techniques involve the use of pre-trained embeddings. This paper …

Analyzing Behavioral Trends in Community Driven Discussion Platforms Like Reddit

The aim of this paper is to present methods to systematically analyze individual and group behavioral patterns observed in community …

TCS Research at SemEval-2018 Task 1 Learning Robust Representations using Multi-Attention Architecture

This paper presents system description of our submission to the SemEval-2018 task-1, Affect in tweets for the English language. We …

Multiclass Common Spatial Pattern for EEG based Brain Computer Interface with Adaptive Learning Classifier

In Brain Computer Interface (BCI), data generated from Electroencephalogram (EEG) is non-stationary with low signal to noise ratio and …

Sentiment extraction from Consumer-generated noisy short texts

Sentiment analysis or recognizing emotions from short and noisy text from social networks such as twitter has been a challenging task. …

Textmining at EmoInt-2017: A Deep Learning Approach to Sentiment Intensity Scoring of English Tweets

This paper describes our approach to the Emotion Intensity shared task. A parallel architecture of Convolutional Neural Network (CNN) …

Detecting, quantifying and accessing impact of news events on Indian stock indices

The impact of different types of events reported in News articles on stock market is a widely accepted phenomenon. Market analysts rely …

Multiclass common spatial pattern with artifacts removal methodology for EEG signals

Common Spatial Pattern (SP) algorithm has been proved to be effective in Brain Computer Interface (BCI) for extracting features from …

Skills

Python

100%

R

100%

Tensorflow

100%

Keras

100%

Git

80%

Recent Posts

Musings While Building with LLMs

When you’re working with large language models (LLMs) to bring an idea to life, it’s easy to be impressed by how quickly they can write code, generate explanations, or stitch together components.

Useful courses/blogs related to Machine Learning

List of blogs are informative for RL/ML Machine Learning Blog at CMU Berkeley Artificial Intelligence Research Blog Standford AI Blog Lilian Weng blog on RL Deep Mind Blog Lectures/courses for Machine Learning

NeurIPS-2019 Summary

I presented my work on Pommerman at Deep Reinforcement Learning Workshop in NeurIPS this year, titled “Accelerating training in Pommerman with Imitation and Reinforcement Learning”, coauthors are Omkar Shelke, Richa Verma and Harshad Khadilkar.

EMNLP-2018 Summary

For a long time, I wanted to write a summary of the conference that I attended. Not just to share the new ideas and trends prevalent, also to keep track of the thoughts that I had during the conference.

Projects

Sample Efficient RL

Making Model-free RL sample efficient

Reinforcement Learning in Supply Chain Optimization

Applicability of reinforcement learning (RL) algorithms to a class of problems rarely addressed in machine learning literature, involving the control of a dynamic system with high-dimensional control inputs (actions).

Pommerman

The Pommerman environment is based on the classic Nintendo console game, Bomberman.

Sentiment Analysis of Short text

Sentiment analysis or recognizing emotions from short and noisy text typically from social networks such as Twitter.

Behavioral Analysis of Social media posts

To analyze the patterns of individual and group behaviour observed in community-driven discussion platforms like Reddit.

Undergrad Thesis - Smart Wheel Chair

In this project, we aim to develop a fully functional prototype of motorized wheelchair which can be controlled by speech, joystick and neck movement.

News aggreation and its effect over Stock and market

The impact of different types of events reported in News articles on stock market is a widely accepted phenomenon. Market analysts rely heavily on technology to combine data from different sources and generate appropriate insights for predicting stock movements.

Brain Computer Interface

Brain Computer Interface (BCI) provides a pathway for communication from humans to computer without any muscular activity.