Practical Reinforcement Learning: Experiences in Lot Scheduling Application

Hannu Rummukainen, Jukka K. Nurminen

    Research output: Contribution to journalArticle in a proceedings journalScientificpeer-review

    27 Citations (Scopus)

    Abstract

    With recent advances in deep reinforcement learning, it is time to take another look at reinforcement learning as an approach for discrete production control. We applied proximal policy optimization (PPO), a recently developed algorithm for deep reinforcement learning, to the stochastic economic lot scheduling problem. The problem involves scheduling manufacturing decisions on a single machine under stochastic demand, and despite its simplicity remains computationally challenging. We implemented two parameterized models for the control policy and value approximation, a linear model and a neural network, and used a modified PPO algorithm to seek the optimal parameter values. Benchmarking against the best known control policy for the test case, in which Paternina-Arboleda and Das (2005) combined a base-stock policy and an older reinforcement learning algorithm, we improved the average cost rate by 2 %. Our approach is more general, as we do not require a priori policy parameters such as base-stock levels, and the entire policy is learned.
    Original languageEnglish
    Pages (from-to)1415-1420
    Number of pages6
    JournalIFAC-PapersOnLine
    Volume52
    Issue number13
    Early online date25 Dec 2019
    DOIs
    Publication statusPublished - 2019
    MoE publication typeA4 Article in a conference publication
    Event9th IFAC Conference on Manufacturing Modelling, Management and Control - Berlin, Germany
    Duration: 28 Aug 201930 Aug 2019

    Keywords

    • Reinforcement learning
    • Stochastic economic lot scheduling
    • Learning control
    • Stochastic control
    • Monte Carlo simulation
    • Neural networks
    • Machine learning

    Fingerprint

    Dive into the research topics of 'Practical Reinforcement Learning: Experiences in Lot Scheduling Application'. Together they form a unique fingerprint.

    Cite this