Most existing literature on supply chain and inventory management consider stochastic demand processes with zero or constant lead times. While it is true that in certain niche scenarios, uncertainty in lead times can be ignored, most real-world …
Exploration versus exploitation dilemma is a significant problem in reinforcement learning (RL), particularly in complex environments with large state space and sparse rewards. When optimizing for a particular goal, running simple smaller tasks can …
We propose a novel methodology for improving the rate and consistency of reinforcement learning in partially observable (foggy) environments, under the broader umbrella of robust latent representations. The present work addresses partially observable …
Determining optimum inventory replenishment decisions is critical for retail businesses with uncertain demand. The problem becomes particularly challenging when multiple products with different lead times and cross-product constraints are considered. …
We describe our solution approach for Pommerman TeamRadio, a competition environment associated with NeurIPS 2019. The defining feature of our algorithm is achieving sample efficiency within a restrictive computational budget while beating the …
This paper evaluates the applicability of reinforcement learning (RL) to multi-product inventory management in supply chains. The novelty of this problem with respect to supply chain literature is (i) we consider concurrent inventory management of a …
The Pommerman simulation was recently developed to mimic the classic Japanese game Bomberman, and focuses on competitive gameplay in a multi-agent setting. We focus on the 22 team version of Pommerman, developed for a competition at NeurIPS 2018. Our …
Reinforcement Learning (RL) has achieved a degree of success in control applications such as online gameplay and robotics, but has rarely been used to manage operations of business-critical systems such as supply chains. A key aspect of using RL in …
Applicability of reinforcement learning (RL) algorithms to a class of problems rarely addressed in machine learning literature, involving the control of a dynamic system with high-dimensional control inputs (actions).