Abstract
Deploying multimodal AI on severely resource-constrained hardware remains challenging due to tight latency, memory, and privacy requirements. We present LLMYOLOEdge, a fully on-device framework that integrates YOLO object detection (YOLOv8/11/12 variants) with quantized instruction-tuned LLMs—Qwen2.5:0.5b-instruct for reliable image-reference extraction and Granite3-MoE:1b-instruct for concise textual summarization—served locally via Ollama and orchestrated by a lightweight Flask API on Raspberry Pi 4B as a resource constrained egde-Internet of Things (IoT) setup. The system employs grammar-constrained, multi-shot prompting to guarantee structured JSON outputs and instruments every stage with fine-grained metrics for rigorous statistical analysis. On a Raspberry Pi 4B, LLMYOLOEdge sustains real-time operation while preserving data locality. Across extensive trials, we observe significant performance differences among YOLO backbones – yolo11n.pt yields the shortest inference latency (1013.241 ms), whereas yolo12s.pt minimizes extractor prompt-evaluation time ( 19.242 ×109 ns). The multi-shot extractor achieves perfect URL/path extraction accuracy (100%), outperforming a baseline Granite3-MoE approach (88.89%). One-way ANOVA with Tukey’s HSD and pairwise t-tests confirm these effects at p < 0.001 , establishing both efficiency and accuracy gains under strict resource budgets. Our contributions are threefold such as 1) an integrated, privacy-preserving multimodal pipeline that runs entirely on commodity edge hardware; 2) a principled prompting strategy that removes brittle parsing failure modes; and 3) a reproducible evaluation suite reporting per-stage latencies, throughput, and correctness.
| Original language | English |
|---|---|
| Pages (from-to) | 167250-167279 |
| Number of pages | 30 |
| Journal | IEEE Access |
| Volume | 13 |
| DOIs | |
| Publication status | Published - 2025 |
| MoE publication type | A1 Journal article-refereed |
Keywords
- Edge AI
- image analysis
- Internet of Things
- large language models
- multi-shot prompting
- Ollama
- Raspberry Pi
- YOLO
Fingerprint
Dive into the research topics of 'LLMYOLOEdge: An Edge-IoT Aware Novel Framework for Integration of YOLO With Localized Quantized Large Language Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver