SIRL Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.1145/3568162.3576989
· OA: W4313529721
When robots learn reward functions using high capacity models that take raw\nstate directly as input, they need to both learn a representation for what\nmatters in the task -- the task ``features" -- as well as how to combine these\nfeatures into a single objective. If they try to do both at once from input\ndesigned to teach the full reward function, it is easy to end up with a\nrepresentation that contains spurious correlations in the data, which fails to\ngeneralize to new settings. Instead, our ultimate goal is to enable robots to\nidentify and isolate the causal features that people actually care about and\nuse when they represent states and behavior. Our idea is that we can tune into\nthis representation by asking users what behaviors they consider similar:\nbehaviors will be similar if the features that matter are similar, even if\nlow-level behavior is different; conversely, behaviors will be different if\neven one of the features that matter differs. This, in turn, is what enables\nthe robot to disambiguate between what needs to go into the representation\nversus what is spurious, as well as what aspects of behavior can be compressed\ntogether versus not. The notion of learning representations based on similarity\nhas a nice parallel in contrastive learning, a self-supervised representation\nlearning technique that maps visually similar data points to similar\nembeddings, where similarity is defined by a designer through data augmentation\nheuristics. By contrast, in order to learn the representations that people use,\nso we can learn their preferences and objectives, we use their definition of\nsimilarity. In simulation as well as in a user study, we show that learning\nthrough such similarity queries leads to representations that, while far from\nperfect, are indeed more generalizable than self-supervised and task-input\nalternatives.\n