We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints

Abstract: Model-free reinforcement learning methods lack an inherent mechanism to impose behavioural constraints on the trained policies. While certain extensions exist, they remain limited to specific types of constraints, such as value constraints with additional reward signals or visitation density constraints. In this work we try to unify these existing techniques and bridge the gap with classical optimization and control theory, using a generic primal-dual framework for value-based and actor-critic reinforcement learning methods. The obtained dual formulations turn out to be especially useful for imposing additional constraints on the learned policy, as an intrinsic relationship between such dual constraints (or regularization terms) and reward modifications in the primal is reveiled. Furthermore, using this framework, we are able to introduce some novel types of constraints, allowing to impose bounds on the policy's action density or on costs associated with transitions between consecutive states and actions. From the adjusted primal-dual optimization problems, a practical algorithm is derived that supports various combinations of policy constraints that are automatically handled throughout training using trainable reward modifications. The resulting $\texttt{DualCRL}$ method is examined in more detail and evaluated under different (combinations of) constraints on two interpretable environments. The results highlight the efficacy of the method, which ultimately provides the designer of such systems with a versatile toolbox of possible policy constraints.
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
ACM classes: I.2.8
Cite as: arXiv:2404.16468 [cs.LG]
  (or arXiv:2404.16468v1 [cs.LG] for this version)

Submission history

From: Bram De Cooman [view email]
[v1] Thu, 25 Apr 2024 09:50:57 GMT (522kb,D)

Link back to: arXiv, form interface, contact.