arXiv (Cornell University)
MixMax: Distributional Robustness in Function Space via Optimal Data Mixtures
June 2024 • Anvith Thudi, Chris J. Maddison
Machine learning models are often required to perform well across several pre-defined settings, such as a set of user groups. Worst-case performance is a common metric to capture this requirement, and is the objective of group distributionally robust optimization (group DRO). Unfortunately, these methods struggle when the loss is non-convex in the parameters, or the model class is non-parametric. Here, we make a classical move to address this: we reparameterize group DRO from parameter space to function space, whi…