Practical Reinforcement Learning: Experiences in Lot Scheduling Application

Hannu Rummukainen, Jukka K. Nurminen

Research output: Contribution to journalArticle in a proceedings journalScientificpeer-review

Abstract

With recent advances in deep reinforcement learning, it is time to take another look at reinforcement learning as an approach for discrete production control. We applied proximal policy optimization (PPO), a recently developed algorithm for deep reinforcement learning, to the stochastic economic lot scheduling problem. The problem involves scheduling manufacturing decisions on a single machine under stochastic demand, and despite its simplicity remains computationally challenging. We implemented two parameterized models for the control policy and value approximation, a linear model and a neural network, and used a modified PPO algorithm to seek the optimal parameter values. Benchmarking against the best known control policy for the test case, in which Paternina-Arboleda and Das (2005) combined a base-stock policy and an older reinforcement learning algorithm, we improved the average cost rate by 2 %. Our approach is more general, as we do not require a priori policy parameters such as base-stock levels, and the entire policy is learned.
Original languageEnglish
Pages (from-to)1415-1420
Number of pages6
JournalIFAC-PapersOnLine
Volume52
Issue number13
Early online date25 Dec 2019
DOIs
Publication statusPublished - 6 Jan 2020
MoE publication typeA4 Article in a conference publication
Event9th IFAC Conference on Manufacturing Modelling, Management and Control - Berlin, Germany
Duration: 28 Aug 201930 Aug 2019

Fingerprint

Reinforcement learning
Scheduling
Production control
Benchmarking
Learning algorithms
Neural networks
Economics
Costs

Keywords

  • Reinforcement learning
  • Stochastic economic lot scheduling
  • Learning control
  • Stochastic control
  • Monte Carlo simulation
  • Neural networks
  • Machine learning

Cite this

@article{abe3cc7cf6bf4ee29a29db7df3f18cf8,
title = "Practical Reinforcement Learning: Experiences in Lot Scheduling Application",
abstract = "With recent advances in deep reinforcement learning, it is time to take another look at reinforcement learning as an approach for discrete production control. We applied proximal policy optimization (PPO), a recently developed algorithm for deep reinforcement learning, to the stochastic economic lot scheduling problem. The problem involves scheduling manufacturing decisions on a single machine under stochastic demand, and despite its simplicity remains computationally challenging. We implemented two parameterized models for the control policy and value approximation, a linear model and a neural network, and used a modified PPO algorithm to seek the optimal parameter values. Benchmarking against the best known control policy for the test case, in which Paternina-Arboleda and Das (2005) combined a base-stock policy and an older reinforcement learning algorithm, we improved the average cost rate by 2 {\%}. Our approach is more general, as we do not require a priori policy parameters such as base-stock levels, and the entire policy is learned.",
keywords = "Reinforcement learning, Stochastic economic lot scheduling, Learning control, Stochastic control, Monte Carlo simulation, Neural networks, Machine learning",
author = "Hannu Rummukainen and Nurminen, {Jukka K.}",
year = "2020",
month = "1",
day = "6",
doi = "10.1016/j.ifacol.2019.11.397",
language = "English",
volume = "52",
pages = "1415--1420",
journal = "IFAC-PapersOnLine",
issn = "2405-8971",
publisher = "IFAC Secretariat",
number = "13",

}

Practical Reinforcement Learning : Experiences in Lot Scheduling Application. / Rummukainen, Hannu; Nurminen, Jukka K.

In: IFAC-PapersOnLine, Vol. 52, No. 13, 06.01.2020, p. 1415-1420.

Research output: Contribution to journalArticle in a proceedings journalScientificpeer-review

TY - JOUR

T1 - Practical Reinforcement Learning

T2 - Experiences in Lot Scheduling Application

AU - Rummukainen, Hannu

AU - Nurminen, Jukka K.

PY - 2020/1/6

Y1 - 2020/1/6

N2 - With recent advances in deep reinforcement learning, it is time to take another look at reinforcement learning as an approach for discrete production control. We applied proximal policy optimization (PPO), a recently developed algorithm for deep reinforcement learning, to the stochastic economic lot scheduling problem. The problem involves scheduling manufacturing decisions on a single machine under stochastic demand, and despite its simplicity remains computationally challenging. We implemented two parameterized models for the control policy and value approximation, a linear model and a neural network, and used a modified PPO algorithm to seek the optimal parameter values. Benchmarking against the best known control policy for the test case, in which Paternina-Arboleda and Das (2005) combined a base-stock policy and an older reinforcement learning algorithm, we improved the average cost rate by 2 %. Our approach is more general, as we do not require a priori policy parameters such as base-stock levels, and the entire policy is learned.

AB - With recent advances in deep reinforcement learning, it is time to take another look at reinforcement learning as an approach for discrete production control. We applied proximal policy optimization (PPO), a recently developed algorithm for deep reinforcement learning, to the stochastic economic lot scheduling problem. The problem involves scheduling manufacturing decisions on a single machine under stochastic demand, and despite its simplicity remains computationally challenging. We implemented two parameterized models for the control policy and value approximation, a linear model and a neural network, and used a modified PPO algorithm to seek the optimal parameter values. Benchmarking against the best known control policy for the test case, in which Paternina-Arboleda and Das (2005) combined a base-stock policy and an older reinforcement learning algorithm, we improved the average cost rate by 2 %. Our approach is more general, as we do not require a priori policy parameters such as base-stock levels, and the entire policy is learned.

KW - Reinforcement learning

KW - Stochastic economic lot scheduling

KW - Learning control

KW - Stochastic control

KW - Monte Carlo simulation

KW - Neural networks

KW - Machine learning

U2 - 10.1016/j.ifacol.2019.11.397

DO - 10.1016/j.ifacol.2019.11.397

M3 - Article in a proceedings journal

VL - 52

SP - 1415

EP - 1420

JO - IFAC-PapersOnLine

JF - IFAC-PapersOnLine

SN - 2405-8971

IS - 13

ER -