Abstract
Machine learning (ML) has emerged as a powerful tool for predictive modelling in industrial processes. However, its effectiveness relies heavily on data quality and availability. Real-World industrial datasets often suffer from limitations such as scarcity, noise, missing values, and regulatory constraints, making it challenging to train reliable models. Synthetic data offers a promising alternative by providing a scalable and controlled approach to data generation while preserving the statistical characteristics of real-world systems. It can be generated through several methods, such as domain-specific simulations, statistical methods, and data augmentation techniques, enabling the development of robust ML models even in data-limited environments. This study investigates the application of synthetic data for training ML-based surrogate models in mineral processing, with a specific focus on multistep forecasting of the gold concentrate grade in a flotation process. Based on statistical properties of historical real process data, synthetic datasets are generated using a commercial mineral processing simulator and Python-Based techniques, replicating the operating conditions of a real plant. Surrogate models are trained exclusively on synthetic data and then evaluated against future real-world process measurements to assess their predictive accuracy. The results demonstrate that synthetic data can effectively supplement traditional datasets, particularly in industrial scenarios where historical data is limited, such as early commissioning phases or after major process modifications. Our findings underscore the potential of synthetic data to enhance ML applications in industrial settings by enabling more flexible, cost-effective, and data-efficient model development.
| Original language | English |
|---|---|
| Title of host publication | 2025 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA) |
| Publisher | IEEE Institute of Electrical and Electronic Engineers |
| Number of pages | 10 |
| ISBN (Electronic) | 979-8-3315-3562-9 |
| ISBN (Print) | 979-8-3315-3563-6 |
| DOIs | |
| Publication status | Published - 2025 |
| MoE publication type | A4 Article in a conference publication |
| Event | International Conference on Artificial Intelligence, Computer, Data Sciences and Applications, ACDSA 2025 - Antalya, Turkey Duration: 7 Aug 2025 → 9 Aug 2025 https://acdsa.org/2025/ |
Conference
| Conference | International Conference on Artificial Intelligence, Computer, Data Sciences and Applications, ACDSA 2025 |
|---|---|
| Country/Territory | Turkey |
| City | Antalya |
| Period | 7/08/25 → 9/08/25 |
| Internet address |
Funding
The authors sincerely acknowledge the support of Business Finland and VTT for funding this research.
Keywords
- gold flotation
- mineral processing
- multistep time series prediction
- surrogate model
- synthetic data generation
Fingerprint
Dive into the research topics of 'Synthetic data for developing surrogate models–A case study of multistep forecasting in mineral processing'. Together they form a unique fingerprint.Projects
- 1 Finished
-
AIMODE: Development of Artificial Intelligence and Machine Learning for Online Perception and Operating Mode Optimization in Process Industry
Linnosmaa, J. (Manager), Seppi, M. (Participant), Zeb, A. (Participant), Saarela, O. (Participant), Verma, N. (Participant), Freimane, L. (Participant), Aho, J. (Participant) & Tahkola, M. (Participant)
1/09/22 → 31/08/25
Project: Business Finland project
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver