technologyneutral
Mastering the Limits: A New Way to Generalize Offline RL
<unknown>Sunday, December 29, 2024
DOGE works by training a state-conditioned distance function. This function can be easily added to standard actor-critic methods as a policy constraint. The method is simple but powerful, showing better generalization than current top methods on D4RL benchmarks.
Theory backs up DOGE's superiority over methods that only focus on data distribution or support constraints. It's a fresh take on how to handle out-of-distribution (OOD) areas in RL.
Actions
flag content