One Policy is Enough: Parallel Exploration with a Single Policy is Near-Optimal for Reward-Free Reinforcement Learning

Cisneros-Velarde, Pedro; Lyu, Boxiang; Koyejo, Sanmi; Kolar, Mladen

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2205

Computer Science > Machine Learning

Title: One Policy is Enough: Parallel Exploration with a Single Policy is Near-Optimal for Reward-Free Reinforcement Learning

Authors: Pedro Cisneros-Velarde, Boxiang Lyu, Sanmi Koyejo, Mladen Kolar

(Submitted on 31 May 2022 (v1), last revised 1 Mar 2023 (this version, v3))

Abstract: Although parallelism has been extensively used in reinforcement learning (RL), the quantitative effects of parallel exploration are not well understood theoretically. We study the benefits of simple parallel exploration for reward-free RL in linear Markov decision processes (MDPs) and two-player zero-sum Markov games (MGs). In contrast to the existing literature, which focuses on approaches that encourage agents to explore a diverse set of policies, we show that using a single policy to guide exploration across all agents is sufficient to obtain an almost-linear speedup in all cases compared to their fully sequential counterpart. Furthermore, we demonstrate that this simple procedure is near-minimax optimal in the reward-free setting for linear MDPs. From a practical perspective, our paper shows that a single policy is sufficient and provably near-optimal for incorporating parallelism during the exploration phase.

Comments:	50 pages
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2205.15891 [cs.LG]
	(or arXiv:2205.15891v3 [cs.LG] for this version)

Submission history

From: Boxiang Lyu [view email]
[v1] Tue, 31 May 2022 15:41:55 GMT (70kb)
[v2] Thu, 13 Oct 2022 01:15:25 GMT (71kb)
[v3] Wed, 1 Mar 2023 21:39:50 GMT (53kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2205.15891

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: One Policy is Enough: Parallel Exploration with a Single Policy is Near-Optimal for Reward-Free Reinforcement Learning

Submission history