Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning

Qi, Carl; Abbeel, Pieter; Grover, Aditya

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2204

Computer Science > Machine Learning

Title: Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning

Authors: Carl Qi, Pieter Abbeel, Aditya Grover

(Submitted on 7 Apr 2022 (v1), last revised 18 Apr 2022 (this version, v2))

Abstract: The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal. A popular class of approach infers the (unknown) reward function via inverse reinforcement learning (IRL) followed by maximizing this reward function via reinforcement learning (RL). The policies learned via these approaches are however very brittle in practice and deteriorate quickly even with small test-time perturbations due to compounding errors. We propose Imitation with Planning at Test-time (IMPLANT), a new meta-algorithm for imitation learning that utilizes decision-time planning to correct for compounding errors of any base imitation policy. In contrast to existing approaches, we retain both the imitation policy and the rewards model at decision-time, thereby benefiting from the learning signal of the two components. Empirically, we demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments and excels at zero-shot generalization when subject to challenging perturbations in test-time dynamics.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2204.03597 [cs.LG]
	(or arXiv:2204.03597v2 [cs.LG] for this version)

Submission history

From: Carl Qi [view email]
[v1] Thu, 7 Apr 2022 17:16:52 GMT (3312kb,D)
[v2] Mon, 18 Apr 2022 14:43:11 GMT (3314kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2204.03597

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning

Submission history