Control policy training for a Simulation-to-Real transfer: Simulation-to-real case study

Research output: ThesisMaster's thesis

Abstract

Robots have been deployed in various fields of the industry, with the expectation of managing more tasks for humans. Simulation-to-real is a relatively new discipline within robotics that offers an alternative approach to the traditional programming methods by training a model of the robot in a simulator and afterwards transferring the knowledge to control the physical counterpart. The knowledge is located in a deep reinforcement learning policy that is carefully selected and tuned for the intended task. The thesis studied tools and steps that are required to implement a physical system that is trained using the sim-to-real transfer learning. The chosen use case is a Universal Robots UR10e manipulator that is to locate and reach a stationary target in the physical world. Because the scope is to provide a proof-of-concept pipeline for the simulation-to-real process, the only part in the use case requiring adaption is the changing location of the target. However, target reaching is a fundamental task in robotics for which more complex tasks are based upon. The simulation environment was constructed from a CAD-model of the physical robot cell that was later updated within the chosen simulator CoppeliaSim. For the convenience of having the kinematic chain premade, the manipulator was changed in the simulator to an older version UR10. Also, the gripper was changed to an older version. The control in the simulation environment followed the Markov Decision Process having a manipulator as an agent interacting with the environment. As the agent performed actions in available states, it tried to maximize the total cumulative reward and learned accordingly. The goal was to reach a simulated target that position was randomized along a specified line segment. In practice, the algorithms learned trajectory paths in joint space under given environment constraints while the agent controlled the manipulator with velocity based forward kinematics. The overall process was scripted as Python modules with an interface to the simulator. The considered deep reinforcement learning algorithms were Deep Deterministic Policy Gradient and Soft Actor-Critic. The algorithms were validated in the simulation and Deep Deterministic Policy Gradient was chosen for the simulation-to-real transfer owing to its better performance. The transfer was based on a zero-shot method where the policy controlled the physical manipulator from the simulation. Control included the joint positions of the simulated manipulator that were forwarded to the physical counterpart via Robot Operating System network. Therefore, the knowledge transfer only considers kinematics. The network conjoined the simulator, the manipulator and the machine vision system, which was responsible of tracking the target, an ArUco marker. The marker position replaced the random position of the simulated target. The simulation-to-real transfer process demonstrates a working step-by-step pipeline, which at the time of writing this thesis was not publicly provided. The resulted policy learned a redundant kinematics trajectory with geometrical limitations. The manipulator reaches the target within the given precision threshold with a collision free path. At the same time the reality gap between simulation and reality is explained and managed. Although the task related results are not generalizable, the concept of simulation-to-real transfer is applicable to more complex tasks.
Original languageEnglish
QualificationMaster Degree
Awarding Institution
  • Tampere University
Supervisors/Advisors
  • Heikkilä, Tapio, Supervisor
  • Kämäräinen, Joni, Supervisor, External person
Award date11 May 2022
Publisher
Publication statusPublished - 11 May 2022
MoE publication typeG2 Master's thesis, polytechnic Master's thesis

Fingerprint

Dive into the research topics of 'Control policy training for a Simulation-to-Real transfer: Simulation-to-real case study'. Together they form a unique fingerprint.

Cite this