References & Citations
Computer Science > Computer Science and Game Theory
Title: Near-Optimal Last-iterate Convergence of Policy Optimization in Zero-sum Polymatrix Markov games
(Submitted on 15 Aug 2023 (v1), last revised 16 Aug 2023 (this version, v2))
Abstract: Computing approximate Nash equilibria in multi-player general-sum Markov games is a computationally intractable task. However, multi-player Markov games with certain cooperative or competitive structures might circumvent this intractability. In this paper, we focus on multi-player zero-sum polymatrix Markov games, where players interact in a pairwise fashion while remain overall competitive. To the best of our knowledge, we propose the first policy optimization algorithm called Entropy-Regularized Optimistic-Multiplicative-Weights-Update (ER-OMWU) for finding approximate Nash equilibria in finite-horizon zero-sum polymatrix Markov games with full information feedback. We provide last-iterate convergence guarantees for finding an $\epsilon$-approximate Nash equilibrium within $\tilde{O}(1/\epsilon)$ iterations, which is near-optimal compared to the optimal $O(1/\epsilon)$ iteration complexity in two-player zero-sum Markov games, which is a degenerate case of zero-sum polymatrix games with only two players involved. Our algorithm combines the regularized and optimistic learning dynamics with separated smooth value update within a single loop, where players update strategies in a symmetric and almost uncoupled manner. It provides a natural dynamics for finding equilibria and is more probable to be adapted to a sample-efficient and fully decentralized implementation where only partial information feedback is available in the future.
Submission history
From: Zailin Ma [view email][v1] Tue, 15 Aug 2023 16:40:10 GMT (34kb)
[v2] Wed, 16 Aug 2023 17:17:48 GMT (0kb,I)
Link back to: arXiv, form interface, contact.