Hardik Meisheri

Hardik Meisheri

Researcher

Tata Consultancy Services

Biography

I am Applied scientist at Amazon Ads, working on Moderation of Advertisement. I work in building models at scale to moderate ads and improve aligment of LLMs towards amazon policies. Previously, I was a Researcher at Tata Consultancy Services (TCS) Research Division, where I was working with Dr. Harshad Khadilkar in Data and Decision Sciences Group from 2018 to 2022. Before this, I was associated with Dr. Lipika Dey for two years in the Text and Data Mining Group at TCS Research.

Before joining TCS, I was a student at Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, India pursuing Master of Technology (M.Tech) with Machine Intelligence specialization. I was supervised by Prof. Nagraj Ramrao and co-supervised by Prof. Suman Mitra. My External Supervisors were from Nanyang Technological University Prof. Suresh Sundaram and Prof. N. Sundararajan.

Recent News: Paper Titled “DCT: Dual Channel Training of Action Embeddings for Reinforcement Learning with Large Discrete Action Spaces” as been accepted in AAMAS-2024.

Note Updated list of publications can be found at google scholar.

Interests

  • Representation learning
  • Sample Efficient Reinforcement Learning
  • Generalization in RL
  • Alignment in LLMs

Education

  • M.tech in Machine Intelligence, 2016

    Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)

  • B.E in Electronics and Communication Engg, 2013

    Gujarat Technological University

Experience

 
 
 
 
 

Applied Scientist II

Amazon

Dec 2022 – Present Bangalore, India
  • Active Learning strategies to optimally utilize the human annotator bandwidth. This helped in reduction of annotated data needed for training by 80%.
  • Developed a novel training paradigm to bootstrap current production model knowledge into new policies. This resulted in improving F1 score by 20%. Helped reaching the required production business metrics.
  • Deployed models across multiple locales. Improvement in overall business metric by 12% was achieved.
  • Finetuned LLM for ad policy understanding using RLHF resulting in automation improvement by 16%
 
 
 
 
 

Researcher

Tata Consultancy Services

May 2018 – Dec 2022 Mumbai, India
  • Technical Lead - Team size 3
  • Supply Chain Optimization using Deep Reinforcement Learning (6 Publications, 3 Patents filed, 1 Granted)
  • Pommerman - Multiagent simulation game (3 Publication)
  • Sample Efficient Reinforcement Learning (1 Publication)
  • General Value functions (2 Publication)
  • Policy Optimization in large action spaces (1 Publication)
  • Reinforcement Learning in last mile delivery (1 Publication)
  • Preliminary Phase: Reinforcement Learning driven open domain dialogue systems
 
 
 
 
 

Researcher

Tata Consultancy Services

Jul 2016 – May 2018 Gurgaon, India
  • Sentiment Analysis of use generated noisy texts. (4 Publications)
  • Predicting Stock Market movement based on different categories of news articles. (1 Publication, 1 Tutorial)
  • Predicting trends and behavioural analysis in social media space such as Twitter and Reddit (2 Publications)
 
 
 
 
 

Teaching Assistant

DA-IICT

Aug 2014 – May 2016 Gandhinagar, India
Designing Lab course structures, Evaluating the students- Exams, quiz, Supervising and conducting Labs

Recent Publications

Quickly discover relevant content by filtering publications.

A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management

Most existing literature on supply chain and inventory management consider stochastic demand processes with zero or constant lead …

Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Exploration versus exploitation dilemma is a significant problem in reinforcement learning (RL), particularly in complex environments …

FoLaR: Foggy Latent Representations for Reinforcement Learning with Partial Observability

We propose a novel methodology for improving the rate and consistency of reinforcement learning in partially observable (foggy) …

Sample Efficient Training in Multi-Agent Adversarial Games with Limited Teammate Communication

We describe our solution approach for Pommerman TeamRadio, a competition environment associated with NeurIPS 2019. The defining feature …

Using Reinforcement Learning for a Large Variable-Dimensional Inventory Management Problem

This paper evaluates the applicability of reinforcement learning (RL) to multi-product inventory management in supply chains. The …

Characterizing Behavioral Trends in a Community Driven Discussion Platform

This article presents a systematic analysis of the patterns of behavior of individuals as well as groups observed in community-driven …

Accelerating Training in Pommerman with Imitation and Reinforcement Learning

The Pommerman simulation was recently developed to mimic the classic Japanese game Bomberman, and focuses on competitive gameplay in a …

Actor Based Simulation for Closed Loop Control of Supply Chain using Reinforcement Learning

Reinforcement Learning (RL) has achieved a degree of success in control applications such as online gameplay and robotics, but has …

Multi-Document Summarization using Distributed Bag-of-Words Model

As the number of documents on the web is growing exponentially, multi-document summarization is becoming more and more important since …

Air Pollutant Severity Prediction Using Bi-Directional LSTM Network

Air pollution has emerged as a universal concern across the globe affecting human health. This increasing danger motivates the study of …

Learning representations for sentiment classification using Multi-task framework

Most of the existing state of the art sentiment classification techniques involve the use of pre-trained embeddings. This paper …

Analyzing Behavioral Trends in Community Driven Discussion Platforms Like Reddit

The aim of this paper is to present methods to systematically analyze individual and group behavioral patterns observed in community …

TCS Research at SemEval-2018 Task 1 Learning Robust Representations using Multi-Attention Architecture

This paper presents system description of our submission to the SemEval-2018 task-1, Affect in tweets for the English language. We …

Multiclass Common Spatial Pattern for EEG based Brain Computer Interface with Adaptive Learning Classifier

In Brain Computer Interface (BCI), data generated from Electroencephalogram (EEG) is non-stationary with low signal to noise ratio and …

Sentiment extraction from Consumer-generated noisy short texts

Sentiment analysis or recognizing emotions from short and noisy text from social networks such as twitter has been a challenging task. …

Textmining at EmoInt-2017: A Deep Learning Approach to Sentiment Intensity Scoring of English Tweets

This paper describes our approach to the Emotion Intensity shared task. A parallel architecture of Convolutional Neural Network (CNN) …

Detecting, quantifying and accessing impact of news events on Indian stock indices

The impact of different types of events reported in News articles on stock market is a widely accepted phenomenon. Market analysts rely …

Multiclass common spatial pattern with artifacts removal methodology for EEG signals

Common Spatial Pattern (SP) algorithm has been proved to be effective in Brain Computer Interface (BCI) for extracting features from …

Skills

Python

100%

R

100%

Tensorflow

100%

Keras

100%

Git

80%

Recent Posts

Useful courses/blogs related to Machine Learning

List of blogs are informative for RL/ML Machine Learning Blog at CMU Berkeley Artificial Intelligence Research Blog Standford AI Blog Lilian Weng blog on RL Deep Mind Blog Lectures/courses for Machine Learning

NeurIPS-2019 Summary

I presented my work on Pommerman at Deep Reinforcement Learning Workshop in NeurIPS this year, titled “Accelerating training in Pommerman with Imitation and Reinforcement Learning”, coauthors are Omkar Shelke, Richa Verma and Harshad Khadilkar.

EMNLP-2018 Summary

For a long time, I wanted to write a summary of the conference that I attended. Not just to share the new ideas and trends prevalent, also to keep track of the thoughts that I had during the conference.

Projects

Sample Efficient RL

Making Model-free RL sample efficient

Reinforcement Learning in Supply Chain Optimization

Applicability of reinforcement learning (RL) algorithms to a class of problems rarely addressed in machine learning literature, involving the control of a dynamic system with high-dimensional control inputs (actions).

Pommerman

The Pommerman environment is based on the classic Nintendo console game, Bomberman.

Sentiment Analysis of Short text

Sentiment analysis or recognizing emotions from short and noisy text typically from social networks such as Twitter.

Behavioral Analysis of Social media posts

To analyze the patterns of individual and group behaviour observed in community-driven discussion platforms like Reddit.

Undergrad Thesis - Smart Wheel Chair

In this project, we aim to develop a fully functional prototype of motorized wheelchair which can be controlled by speech, joystick and neck movement.

News aggreation and its effect over Stock and market

The impact of different types of events reported in News articles on stock market is a widely accepted phenomenon. Market analysts rely heavily on technology to combine data from different sources and generate appropriate insights for predicting stock movements.

Brain Computer Interface

Brain Computer Interface (BCI) provides a pathway for communication from humans to computer without any muscular activity.