Synthetic data for developing surrogate models–A case study of multistep forecasting in mineral processing

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

18 Downloads (Pure)

Abstract

Machine learning (ML) has emerged as a powerful tool for predictive modelling in industrial processes. However, its effectiveness relies heavily on data quality and availability. Real-World industrial datasets often suffer from limitations such as scarcity, noise, missing values, and regulatory constraints, making it challenging to train reliable models. Synthetic data offers a promising alternative by providing a scalable and controlled approach to data generation while preserving the statistical characteristics of real-world systems. It can be generated through several methods, such as domain-specific simulations, statistical methods, and data augmentation techniques, enabling the development of robust ML models even in data-limited environments. This study investigates the application of synthetic data for training ML-based surrogate models in mineral processing, with a specific focus on multistep forecasting of the gold concentrate grade in a flotation process. Based on statistical properties of historical real process data, synthetic datasets are generated using a commercial mineral processing simulator and Python-Based techniques, replicating the operating conditions of a real plant. Surrogate models are trained exclusively on synthetic data and then evaluated against future real-world process measurements to assess their predictive accuracy. The results demonstrate that synthetic data can effectively supplement traditional datasets, particularly in industrial scenarios where historical data is limited, such as early commissioning phases or after major process modifications. Our findings underscore the potential of synthetic data to enhance ML applications in industrial settings by enabling more flexible, cost-effective, and data-efficient model development.
Original languageEnglish
Title of host publication2025 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA)
PublisherIEEE Institute of Electrical and Electronic Engineers
Number of pages10
ISBN (Electronic)979-8-3315-3562-9
ISBN (Print)979-8-3315-3563-6
DOIs
Publication statusPublished - 2025
MoE publication typeA4 Article in a conference publication
EventInternational Conference on Artificial Intelligence, Computer, Data Sciences and Applications, ACDSA 2025 - Antalya, Turkey
Duration: 7 Aug 20259 Aug 2025
https://acdsa.org/2025/

Conference

ConferenceInternational Conference on Artificial Intelligence, Computer, Data Sciences and Applications, ACDSA 2025
Country/TerritoryTurkey
CityAntalya
Period7/08/259/08/25
Internet address

Funding

The authors sincerely acknowledge the support of Business Finland and VTT for funding this research.

Keywords

  • gold flotation
  • mineral processing
  • multistep time series prediction
  • surrogate model
  • synthetic data generation

Fingerprint

Dive into the research topics of 'Synthetic data for developing surrogate models–A case study of multistep forecasting in mineral processing'. Together they form a unique fingerprint.

Cite this