TY - JOUR
T1 - Offline Reinforcement Learning for Adaptive Control in Manufacturing Processes: A Press Hardening Case Study
AU - Nievas, Nuria
AU - Espinosa-Leal, Leonardo
AU - Pagès-Bernaus, Adela
AU - Abio, Albert
AU - Echeverria, Lluís
AU - Bonada, Francesc
PY - 2025
Y1 - 2025
N2 - This paper explores the application of offline reinforcement learning in batch manufacturing, with a specific focus on press hardening processes. Offline reinforcement learning presents a viable alternative to traditional control and reinforcement learning methods, which often rely on impractical real-world interactions or complex simulations and iterative adjustments to bridge the gap between simulated and real-world environments. We demonstrate how offline reinforcement learning can improve control policies by leveraging existing data, thereby streamlining the training pipeline and reducing reliance on high-fidelity simulators. Our study evaluates the impact of varying data exploration rates by creating five datasets with exploration rates ranging from epsilon = 0 to epsilon = 0.8. Using the Conservative Q-Learning algorithm, we train and assess policies against both a dynamic baseline and a static industry-standard policy. The results indicate that while offline reinforcement learning effectively refines behavior policies and enhances supervised learning methods, its effectiveness is heavily dependent on the quality and exploratory nature of the initial behavior policy.
AB - This paper explores the application of offline reinforcement learning in batch manufacturing, with a specific focus on press hardening processes. Offline reinforcement learning presents a viable alternative to traditional control and reinforcement learning methods, which often rely on impractical real-world interactions or complex simulations and iterative adjustments to bridge the gap between simulated and real-world environments. We demonstrate how offline reinforcement learning can improve control policies by leveraging existing data, thereby streamlining the training pipeline and reducing reliance on high-fidelity simulators. Our study evaluates the impact of varying data exploration rates by creating five datasets with exploration rates ranging from epsilon = 0 to epsilon = 0.8. Using the Conservative Q-Learning algorithm, we train and assess policies against both a dynamic baseline and a static industry-standard policy. The results indicate that while offline reinforcement learning effectively refines behavior policies and enhances supervised learning methods, its effectiveness is heavily dependent on the quality and exploratory nature of the initial behavior policy.
KW - artificial intelligence
KW - machine learning for engineering applications
KW - manufacturing automation
UR - http://www.scopus.com/inward/record.url?scp=105001239274&partnerID=8YFLogxK
U2 - 10.1115/1.4066999
DO - 10.1115/1.4066999
M3 - Article
SN - 1530-9827
VL - 25
JO - Journal of Computing and Information Science in Engineering
JF - Journal of Computing and Information Science in Engineering
IS - 1
M1 - 011004
ER -