A learning automata approach to multi-agent policy gradient learning

Maarten Peeters, Ville Könönen, Katja Verbeeck, Ann Nowé

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review


The policy gradient method is a popular technique for implementing reinforcement learning in an agent system. One of the reasons is that a policy gradient learner has a simple design and strong theoretical properties in single-agent domains. Previously, Williams showed that the REINFORCE algorithm is a special case of policy gradient learning. He also showed that a learning automaton could be seen as a special case of the REINFORCE algorithm. Learning automata theory guarantees that a group of automata will converge to a stable equilibrium in team games. In this paper we will show a theoretical connection between learning automata and policy gradient methods to transfer this theoretical result to multi-agent policy gradient learning. An appropriate exploration technique is crucial for the convergence of a multi-agent system. Since learning automata are guaranteed to converge, they posses such an exploration. We identify the identical mapping of a learning automaton onto the Boltzmann exploration strategy with an suitable temperature setting. The novel idea is that the temperature of the Boltzmann function is not dependent on time but on the action probabilities of the agents.
Original languageEnglish
Title of host publicationKnowledge-Based Intelligent Information and Engineering Systems
Subtitle of host publication12th International Conference, KES 2008
EditorsIgnac Lovrek, Robert J. Howlett, Lakhmi C. Jain
Place of PublicationHeidelberg
ISBN (Print)978-3-540-85564-4
Publication statusPublished - 2008
MoE publication typeA4 Article in a conference publication
Event12th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, KES-2008 - Zagreb, Croatia
Duration: 3 Sept 20085 Sept 2008
Conference number: 12

Publication series

SeriesLecture Notes in Computer Science


Conference12th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, KES-2008
Abbreviated titleKES-2008


  • Learning automata, reinforcement learning, policy gradient
  • multi-agent systems


Dive into the research topics of 'A learning automata approach to multi-agent policy gradient learning'. Together they form a unique fingerprint.

Cite this