We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: One Policy is Enough: Parallel Exploration with a Single Policy is Minimax Optimal for Reward-Free Reinforcement Learning

Abstract: While parallelism has been extensively used in Reinforcement Learning (RL), the quantitative effects of parallel exploration are not well understood theoretically. We study the benefits of simple parallel exploration for reward-free RL for linear Markov decision processes (MDPs) and two-player zero-sum Markov games (MGs). In contrast to the existing literature focused on approaches that encourage agents to explore over a diverse set of policies, we show that using a single policy to guide exploration across all agents is sufficient to obtain an almost-linear speedup in all cases compared to their fully sequential counterpart. Further, we show that this simple procedure is minimax optimal up to logarithmic factors in the reward-free setting for both linear MDPs and two-player zero-sum MGs. From a practical perspective, our paper shows that a single policy is sufficient and provably optimal for incorporating parallelism during the exploration phase.
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:2205.15891 [cs.LG]
  (or arXiv:2205.15891v1 [cs.LG] for this version)

Submission history

From: Boxiang Lyu [view email]
[v1] Tue, 31 May 2022 15:41:55 GMT (70kb)
[v2] Thu, 13 Oct 2022 01:15:25 GMT (71kb)
[v3] Wed, 1 Mar 2023 21:39:50 GMT (53kb)

Link back to: arXiv, form interface, contact.