Hardik Meisheri

ML Scientist

Amazon

Biography

I am currently building large-scale foundational models for advertising at Microsoft AI. Prior to this, I developed foundational models for Amazon Advertising and built multilingual models for moderating ads within Amazon Sponsored Products. My background spans a wide range of Reinforcement Learning (RL) applications, including optimizing supply chains, cost-to-serve models, and dialogue generation. I have contributed to improving the robustness and sample efficiency of RL algorithms, and have worked at the intersection of Natural Language Processing (NLP) and Deep Learning with a particular focus on sentiment analysis. Additionally, I have explored social media text analysis from both statistical and behavioral perspectives as part of a side project.

My experience across diverse domains within machine learning has given me a unique perspective on the ML/DL landscape. As Alan Kay, a pioneer in computer science, once said, “A change in perspective is worth 80 IQ points.” Embracing new perspectives has often been key to solving complex problems.

Interests

Representation learning
Sample Efficient Reinforcement Learning
Generalization in RL
Alignment in LLMs

Education

M.tech in Machine Intelligence, 2016

Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)
B.E in Electronics and Communication Engg, 2013

Gujarat Technological University

Experience

Senior Applied Scientist

Microsoft AI

Apr 2025 – Present Remote

Building Foundational Model for Advertisment

Applied Scientist II

Amazon

Dec 2022 – Mar 2025 Bangalore, India

Active Learning strategies to optimally utilize the human annotator bandwidth. This helped in reduction of annotated data needed for training by 80%.
Developed a novel training paradigm to bootstrap current production model knowledge into new policies. This resulted in improving F1 score by 20%. Helped reaching the required production business metrics.
Deployed models across multiple locales. Improvement in overall business metric by 12% was achieved.
Finetuned LLM for ad policy understanding using RLHF resulting in automation improvement by 16%.
Currently Building Foundational Model for Amazon Advertisement

Researcher

Tata Consultancy Services

May 2018 – Dec 2022 Mumbai, India

Technical Lead - Team size 3
Supply Chain Optimization using Deep Reinforcement Learning (6 Publications, 3 Patents filed, 1 Granted)
Pommerman - Multiagent simulation game (3 Publication)
Sample Efficient Reinforcement Learning (1 Publication)
General Value functions (2 Publication)
Policy Optimization in large action spaces (1 Publication)
Reinforcement Learning in last mile delivery (1 Publication)
Preliminary Phase: Reinforcement Learning driven open domain dialogue systems

Researcher

Tata Consultancy Services

Jul 2016 – May 2018 Gurgaon, India

Sentiment Analysis of use generated noisy texts. (4 Publications)
Predicting Stock Market movement based on different categories of news articles. (1 Publication, 1 Tutorial)
Predicting trends and behavioural analysis in social media space such as Twitter and Reddit (2 Publications)

Teaching Assistant

DA-IICT

Aug 2014 – May 2016 Gandhinagar, India

Designing Lab course structures, Evaluating the students- Exams, quiz, Supervising and conducting Labs

Recent Publications

Quickly discover relevant content by filtering publications.

A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management

Most existing literature on supply chain and inventory management consider stochastic demand processes with zero or constant lead …

Hardik Meisheri, Somjit Nath, Mayank Baranwal, Harshad Khadilkar

Project Paper

Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Exploration versus exploitation dilemma is a significant problem in reinforcement learning (RL), particularly in complex environments …

Somjit Nath, Omkar Shelke, Durgesh Kalwar, Hardik Meisheri, Harshad Khadilkar

Project Paper Poster

FoLaR: Foggy Latent Representations for Reinforcement Learning with Partial Observability

We propose a novel methodology for improving the rate and consistency of reinforcement learning in partially observable (foggy) …

Hardik Meisheri, Harshad Khadilkar

Project Paper Video slides

Scalable multi-product inventory control with lead time constraints using reinforcement learning

Determining optimum inventory replenishment decisions is critical for retail businesses with uncertain demand. The problem becomes …

Hardik Meisheri, Nazneen N Sultana, Mayank Baranwal, Vinita Baniwal, Somjit Nath, Satyam Verma, Balaraman Ranvindran, Harshad Khadilkar

Project Paper

Sample Efficient Training in Multi-Agent Adversarial Games with Limited Teammate Communication

We describe our solution approach for Pommerman TeamRadio, a competition environment associated with NeurIPS 2019. The defining feature …

Hardik Meisheri, Harshad Khadilkar

Project Paper

Using Reinforcement Learning for a Large Variable-Dimensional Inventory Management Problem

This paper evaluates the applicability of reinforcement learning (RL) to multi-product inventory management in supply chains. The …

Hardik Meisheri, Vinita Baniwal, Nazneen N Sultana, Harshad Khadilkar, Balaraman Ranvindran

Project Paper

Characterizing Behavioral Trends in a Community Driven Discussion Platform

This article presents a systematic analysis of the patterns of behavior of individuals as well as groups observed in community-driven …

Sachin Thukral, Arnab Chatterjee, Hardik Meisheri, Tushar Kataria, Aman Agarwal, Ishan Verma, Lipika Dey

Project Book Chapter

Accelerating Training in Pommerman with Imitation and Reinforcement Learning

The Pommerman simulation was recently developed to mimic the classic Japanese game Bomberman, and focuses on competitive gameplay in a …

Hardik Meisheri, Omkar Shelke, Richa Verma, Harshad Khadilkar

Project Paper

Actor Based Simulation for Closed Loop Control of Supply Chain using Reinforcement Learning

Reinforcement Learning (RL) has achieved a degree of success in control applications such as online gameplay and robotics, but has …

Souvik Barat, Harshad Khadilkar, Hardik Meisheri, Vinay Kulkarni, Vinita Baniwal, Prashant Kumar, Monika Gajrani

Project Paper

Multi-Document Summarization using Distributed Bag-of-Words Model

As the number of documents on the web is growing exponentially, multi-document summarization is becoming more and more important since …

Kaustubh Mani, Ishan Verma, Hardik Meisheri, Lipika Dey

Paper

Air Pollutant Severity Prediction Using Bi-Directional LSTM Network

Air pollution has emerged as a universal concern across the globe affecting human health. This increasing danger motivates the study of …

Ishan Verma, Rahul Ahuja, Hardik Meisheri, Lipika Dey

Paper

Learning representations for sentiment classification using Multi-task framework

Most of the existing state of the art sentiment classification techniques involve the use of pre-trained embeddings. This paper …

Hardik Meisheri, Harshad Khadilkar

Project Paper

Analyzing Behavioral Trends in Community Driven Discussion Platforms Like Reddit

The aim of this paper is to present methods to systematically analyze individual and group behavioral patterns observed in community …

Sachin Thukral, Hardik Meisheri, Tushar Kataria, Aman Agarwal, Ishan Verma, Arnab Chatterjee, Lipika Dey

Paper

TCS Research at SemEval-2018 Task 1 Learning Robust Representations using Multi-Attention Architecture

This paper presents system description of our submission to the SemEval-2018 task-1, Affect in tweets for the English language. We …

Hardik Meisheri, Lipika Dey

Project Paper

Multiclass Common Spatial Pattern for EEG based Brain Computer Interface with Adaptive Learning Classifier

In Brain Computer Interface (BCI), data generated from Electroencephalogram (EEG) is non-stationary with low signal to noise ratio and …

Hardik Meisheri, Nagaraj Ramrao, Suman K Mitra

Project Paper

Sentiment extraction from Consumer-generated noisy short texts

Sentiment analysis or recognizing emotions from short and noisy text from social networks such as twitter has been a challenging task. …

Hardik Meisheri, Kunal Ranjan, Lipika Dey

Project Paper

Textmining at EmoInt-2017: A Deep Learning Approach to Sentiment Intensity Scoring of English Tweets

This paper describes our approach to the Emotion Intensity shared task. A parallel architecture of Convolutional Neural Network (CNN) …

Hardik Meisheri, Rupsa Saha, Priyanka Sinha, Lipika Dey

Project Paper

Detecting, quantifying and accessing impact of news events on Indian stock indices

The impact of different types of events reported in News articles on stock market is a widely accepted phenomenon. Market analysts rely …

Ishna Verma, Lipika Dey, Hardik Meisheri

Project DOI Paper

Multiclass common spatial pattern with artifacts removal methodology for EEG signals

Common Spatial Pattern (SP) algorithm has been proved to be effective in Brain Computer Interface (BCI) for extracting features from …

Hardik Meisheri, Nagaraj Ramrao, Suman K Mitra

Project DOI Paper

Skills

Python

100%

R

100%

Tensorflow

100%

Keras

100%

Git

80%

Projects

Sample Efficient RL

Making Model-free RL sample efficient

Reinforcement Learning in Supply Chain Optimization

Applicability of reinforcement learning (RL) algorithms to a class of problems rarely addressed in machine learning literature, involving the control of a dynamic system with high-dimensional control inputs (actions).

Pommerman

The Pommerman environment is based on the classic Nintendo console game, Bomberman.

Sentiment Analysis of Short text

Sentiment analysis or recognizing emotions from short and noisy text typically from social networks such as Twitter.

Behavioral Analysis of Social media posts

To analyze the patterns of individual and group behaviour observed in community-driven discussion platforms like Reddit.

Undergrad Thesis - Smart Wheel Chair

In this project, we aim to develop a fully functional prototype of motorized wheelchair which can be controlled by speech, joystick and neck movement.

News aggreation and its effect over Stock and market

The impact of different types of events reported in News articles on stock market is a widely accepted phenomenon. Market analysts rely heavily on technology to combine data from different sources and generate appropriate insights for predicting stock movements.

Brain Computer Interface

Brain Computer Interface (BCI) provides a pathway for communication from humans to computer without any muscular activity.

Hardik Meisheri

ML Scientist

Amazon

Biography

Interests

Education

Experience

Senior Applied Scientist

Applied Scientist II

Researcher

Researcher

Teaching Assistant

Recent Publications

Skills

Python

R

Tensorflow

Keras

Git

Recent Posts

Projects

Popular Topics

Contact