Abstract
The policy gradient method is a popular technique for implementing
reinforcement learning in an agent system. One of the reasons is that a policy
gradient learner has a simple design and strong theoretical properties in
single-agent domains. Previously, Williams showed that the REINFORCE algorithm
is a special case of policy gradient learning. He also showed that a learning
automaton could be seen as a special case of the REINFORCE algorithm.
Learning automata theory guarantees that a group of automata will converge to
a stable equilibrium in team games. In this paper we will show a theoretical
connection between learning automata and policy gradient methods to transfer
this theoretical result to multi-agent policy gradient learning. An
appropriate exploration technique is crucial for the convergence of a
multi-agent system. Since learning automata are guaranteed to converge, they
posses such an exploration. We identify the identical mapping of a learning
automaton onto the Boltzmann exploration strategy with an suitable temperature
setting. The novel idea is that the temperature of the Boltzmann function is
not dependent on time but on the action probabilities of the agents.
Original language | English |
---|---|
Title of host publication | Knowledge-Based Intelligent Information and Engineering Systems |
Subtitle of host publication | 12th International Conference, KES 2008 |
Editors | Ignac Lovrek, Robert J. Howlett, Lakhmi C. Jain |
Place of Publication | Heidelberg |
Publisher | Springer |
Pages | 379-390 |
Volume | II |
ISBN (Print) | 978-3-540-85564-4 |
DOIs | |
Publication status | Published - 2008 |
MoE publication type | A4 Article in a conference publication |
Event | 12th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, KES-2008 - Zagreb, Croatia Duration: 3 Sept 2008 → 5 Sept 2008 Conference number: 12 |
Publication series
Series | Lecture Notes in Computer Science |
---|---|
Volume | 5178 |
ISSN | 0302-9743 |
Conference
Conference | 12th International Conference on Knowledge-Based Intelligent Information and Engineering Systems, KES-2008 |
---|---|
Abbreviated title | KES-2008 |
Country/Territory | Croatia |
City | Zagreb |
Period | 3/09/08 → 5/09/08 |
Keywords
- Learning automata, reinforcement learning, policy gradient
- multi-agent systems