Offline LLM for assisted assembly process and safety observation

Research output: Contribution to conferenceConference PosterScientific

Abstract

The presented work is part of a Business Finland funded project to study future industrial workplaces where humans, machines and AI can collaborate. At the centre is exploring how technological advancements reshape roles, tasks and environments of industrial workers while maintaining the meaningfulness of the work.
Utilization of large language models (LLM) and visual language models (VLM) have become popular tools in everyday life. As usage is progressing more within various industries, offline models are increasingly considered as well as connection capabilities for multi-user/device environments. For the LLM/VLM to function as a meaningful assistant that can provide value to the operator, the user-interface (UI) needs to be straightforward. The field of LLMs is advancing rapidly, and their industrial adoption has to progress at a comparable pace to realize their potential.
Scope for the project thrives from the possibility of providing specific instructions to a novice operator about previously unknown tasks in a way that operator can effortlessly interact with an entity about the progress. The entity also provides safety notifications regarding the usage of safety equipment. The target is an LLM/VLM supported human-robot work allocation on a concept level. Hence, the project provides a data pipeline from operator to robot via a LLM as a conversational agent. The approach provides scaling options for multiple robots and devices as well as operators.

The project utilizes offline Qwen2.5-32B (LLM) as conversational agent, and Qwen2.5-VL-3B (VLM) as image interpreter for safety. Both models are run on a local PC with 24 Gb of vRAM, and they have their own use-case specific contexts provided for scope. VLM is used for analyzing image feedback from a camera. Camera is triggered by a YOLO11 detection model as the number of people changed around the defined worksurface. VLM detects required safety equipment (safety vest and optional helmet) from the image and informs workers accordingly about the usage of safety equipment.

LLM on the other hand, is presented as a virtual assistant on a VTT power wall screen. The assembled components and interaction interface with settings are also visualized on the screen. Interaction between operator and the assistant is based on speech with OpenAI faster_whisper for speech-to-text and Kokoro for text-to-speech. Additionally, LLM communicates with a cobot via socket to provide necessary components for the assembly. The cobot informs the LLM of its progress, which the LLM then articulates to the operator, thus providing interim information about the overall progress.
Original languageEnglish
Publication statusPublished - 13 Nov 2025
MoE publication typeNot Eligible
EventFCAI AI Day 2025 - Dipoli, Aalto University, Espoo, Finland
Duration: 13 Nov 202513 Nov 2025
https://fcai.fi/ai-day-2025

Conference

ConferenceFCAI AI Day 2025
Country/TerritoryFinland
CityEspoo
Period13/11/2513/11/25
Internet address

Fingerprint

Dive into the research topics of 'Offline LLM for assisted assembly process and safety observation'. Together they form a unique fingerprint.

Cite this