Return-Aligned Decision Transformer

Tanaka, Tsunehiko; Abe, Kenshi; Ariu, Kaito; Morimura, Tetsuro; Simo-Serra, Edgar

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2402

Change to browse by:

Computer Science > Machine Learning

Title: Return-Aligned Decision Transformer

Authors: Tsunehiko Tanaka, Kenshi Abe, Kaito Ariu, Tetsuro Morimura, Edgar Simo-Serra

(Submitted on 6 Feb 2024 (v1), last revised 23 Apr 2024 (this version, v2))

Abstract: Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return. However, as applications broaden, it becomes increasingly crucial to train agents that not only maximize the returns, but align the actual return with a specified target return, giving control over the agent's performance. Decision Transformer (DT) optimizes a policy that generates actions conditioned on the target return through supervised learning and is equipped with a mechanism to control the agent using the target return. Despite being designed to align the actual return with the target return, we have empirically identified a discrepancy between the actual return and the target return in DT. In this paper, we propose Return-Aligned Decision Transformer (RADT), designed to effectively align the actual return with the target return. Our model decouples returns from the conventional input sequence, which typically consists of returns, states, and actions, to enhance the relationships between returns and states, as well as returns and actions. Extensive experiments show that RADT reduces the discrepancies between the actual return and the target return of DT-based methods.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2402.03923 [cs.LG]
	(or arXiv:2402.03923v2 [cs.LG] for this version)

Submission history

From: Tsunehiko Tanaka [view email]
[v1] Tue, 6 Feb 2024 11:46:47 GMT (212kb,D)
[v2] Tue, 23 Apr 2024 06:10:59 GMT (232kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2402.03923

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Return-Aligned Decision Transformer

Submission history