Offline Reinforcement Learning with Behavioral Supervisor Tuning

Srinivasan, Padmanaba; Knottenbelt, William

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2404

Computer Science > Machine Learning

Title: Offline Reinforcement Learning with Behavioral Supervisor Tuning

Authors: Padmanaba Srinivasan, William Knottenbelt

(Submitted on 25 Apr 2024)

Abstract: Offline reinforcement learning (RL) algorithms are applied to learn performant, well-generalizing policies when provided with a static dataset of interactions. Many recent approaches to offline RL have seen substantial success, but with one key caveat: they demand substantial per-dataset hyperparameter tuning to achieve reported performance, which requires policy rollouts in the environment to evaluate; this can rapidly become cumbersome. Furthermore, substantial tuning requirements can hamper the adoption of these algorithms in practical domains. In this paper, we present TD3 with Behavioral Supervisor Tuning (TD3-BST), an algorithm that trains an uncertainty model and uses it to guide the policy to select actions within the dataset support. TD3-BST can learn more effective policies from offline datasets compared to previous methods and achieves the best performance across challenging benchmarks without requiring per-dataset tuning.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.16399 [cs.LG]
	(or arXiv:2404.16399v1 [cs.LG] for this version)

Submission history

From: Padmanaba Srinivasan [view email]
[v1] Thu, 25 Apr 2024 08:22:47 GMT (1423kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.16399

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Offline Reinforcement Learning with Behavioral Supervisor Tuning

Submission history