Thorsten Joachims
YOU?
Author Swipe
View article: Prompt Curriculum Learning for Efficient LLM Post-Training
Prompt Curriculum Learning for Efficient LLM Post-Training Open
We introduce Prompt Curriculum Learning (PCL), a lightweight reinforcement learning (RL) algorithm that selects intermediate-difficulty prompts using a learned value model to post-train language models. Since post-training LLMs via RL rema…
View article: CONSEQUENCES 2025 - The 4th Workshop on Causality, Counterfactuals and Sequential Decision-Making for Recommender Systems
CONSEQUENCES 2025 - The 4th Workshop on Causality, Counterfactuals and Sequential Decision-Making for Recommender Systems Open
Contains fulltext : 323049.pdf (Publisher’s version ) (Closed access)
View article: Prompt Optimization with Logged Bandit Data
Prompt Optimization with Logged Bandit Data Open
We study how to use naturally available user feedback, such as clicks, to optimize large language model (LLM) pipelines for generating personalized sentences using prompts. Naive approaches, which estimate the policy gradient in the prompt…
View article: Poor Alignment and Steerability of Large Language Models: Evidence from College Admission Essays
Poor Alignment and Steerability of Large Language Models: Evidence from College Admission Essays Open
People are increasingly using technologies equipped with large language models (LLM) to write texts for formal communication, which raises two important questions at the intersection of technology and society: Who do LLMs write like (model…
View article: End-to-end Training for Recommendation with Language-based User Profiles
End-to-end Training for Recommendation with Language-based User Profiles Open
There is a growing interest in natural language-based user profiles for recommender systems, which aims to enhance transparency and scrutability compared with embedding-based methods. Existing studies primarily generate these profiles usin…
View article: Large language models, social demography, and hegemony: comparing authorship in human and synthetic text
Large language models, social demography, and hegemony: comparing authorship in human and synthetic text Open
Large language models have become popular over a short period of time because they can generate text that resembles human writing across various domains and tasks. The popularity and breadth of use also put this technology in the position …
View article: Algorithms for College Admissions Decision Support: Impacts of Policy Change and Inherent Variability
Algorithms for College Admissions Decision Support: Impacts of Policy Change and Inherent Variability Open
Each year, selective American colleges sort through tens of thousands of applications to identify a first-year class that displays both academic merit and diversity. In the 2023-2024 admissions cycle, these colleges faced unprecedented cha…
View article: Algorithms for College Admissions Decision Support: Impacts of Policy Change and Inherent Variability
Algorithms for College Admissions Decision Support: Impacts of Policy Change and Inherent Variability Open
Each year, selective American colleges sort through tens of thousands of applications to identify a first-year class that displays both academic merit and diversity. In the 2023-2024 admissions cycle, these colleges faced unprecedented cha…
View article: REBEL: Reinforcement Learning via Regressing Relative Rewards
REBEL: Reinforcement Learning via Regressing Relative Rewards Open
While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortun…
View article: Large Language Models, Social Demography, and Hegemony: Comparing Authorship in Human and Synthetic Text
Large Language Models, Social Demography, and Hegemony: Comparing Authorship in Human and Synthetic Text Open
** Final version published open-access in the Journal of Big Data: https://link.springer.com/article/10.1186/s40537-024-00986-7?utm_source=rct_congratemailt&utm_medium=email&utm_campaign=oa_20240927&utm_content=10.1186/s40537-0…
View article: Ranking with Long-Term Constraints
Ranking with Long-Term Constraints Open
The feedback that users provide through their choices (e.g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms. However, myopically training systems based on choi…
View article: Language-Based User Profiles for Recommendation
Language-Based User Profiles for Recommendation Open
Most conventional recommendation methods (e.g., matrix factorization) represent user profiles as high-dimensional vectors. Unfortunately, these vectors lack interpretability and steerability, and often perform poorly in cold-start settings…
View article: Reviewer2: Optimizing Review Generation Through Prompt Generation
Reviewer2: Optimizing Review Generation Through Prompt Generation Open
Recent developments in LLMs offer new opportunities for assisting authors in improving their work. In this paper, we envision a use case where authors can receive LLM-generated reviews that uncover weak points in the current draft. While i…
View article: POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition
POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition Open
We study off-policy learning (OPL) of contextual bandit policies in large discrete action spaces where existing methods -- most of which rely crucially on reward-regression models or importance-weighted policy gradients -- fail due to exce…
View article: Unbiased Offline Evaluation for Learning to Rank with Business Rules
Unbiased Offline Evaluation for Learning to Rank with Business Rules Open
For industrial learning-to-rank (LTR) systems, it is common that the output of a ranking model is modified, either as a results of post-processing logic that enforces business requirements, or as a result of unforeseen design flaws or bugs…
View article: Ranking with Slot Constraints
Ranking with Slot Constraints Open
We introduce the problem of ranking with slot constraints, which can be used to model a wide range of application problems -- from college admission with limited slots for different majors, to composing a stratified cohort of eligible part…
View article: GPT as a Baseline for Recommendation Explanation Texts
GPT as a Baseline for Recommendation Explanation Texts Open
In this work, we establish a baseline potential for how modern model-generated text explanations of movie recommendations may help users, and explore what different components of these text explanations that users like or dislike, especial…
View article: Fairness in Ranking under Disparate Uncertainty
Fairness in Ranking under Disparate Uncertainty Open
Ranking is a ubiquitous method for focusing the attention of human evaluators on a manageable subset of options. Its use as part of human decision-making processes ranges from surfacing potentially relevant products on an e-commerce site t…
View article: Ranking with Long-Term Constraints
Ranking with Long-Term Constraints Open
The feedback that users provide through their choices (e.g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms. However, myopically training systems based on choi…
View article: Augmenting Holistic Review in University Admission using Natural Language Processing for Essays and Recommendation Letters
Augmenting Holistic Review in University Admission using Natural Language Processing for Essays and Recommendation Letters Open
University admission at many highly selective institutions uses a holistic review process, where all aspects of the application, including protected attributes (e.g., race, gender), grades, essays, and recommendation letters are considered…
View article: Augmenting Holistic Review in University Admission using Natural Language Processing for Essays and Recommendation Letters
Augmenting Holistic Review in University Admission using Natural Language Processing for Essays and Recommendation Letters Open
University admission at many highly selective institutions uses a holistic review process, where all aspects of the application, including protected attributes (e.g., race, gender), grades, essays, and recommendation letters are considered…
View article: Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling
Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling Open
We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional importance-weighting approaches suffer from excessive variance. To circumvent this variance issue, we propose a new esti…
View article: Evaluating a Learned Admission-Prediction Model as a Replacement for Standardized Tests in College Admissions
Evaluating a Learned Admission-Prediction Model as a Replacement for Standardized Tests in College Admissions Open
A growing number of college applications has presented an annual challenge for college admissions in the United States. Admission offices have historically relied on standardized test scores to organize large applicant pools into viable su…
View article: Off-policy evaluation for learning-to-rank via interpolating the item-position model and the position-based model
Off-policy evaluation for learning-to-rank via interpolating the item-position model and the position-based model Open
A critical need for industrial recommender systems is the ability to evaluate recommendation policies offline, before deploying them to production. Unfortunately, widely used off-policy evaluation methods either make strong assumptions abo…
View article: Fair Ranking as Fair Division: Impact-Based Individual Fairness in Ranking
Fair Ranking as Fair Division: Impact-Based Individual Fairness in Ranking Open
Rankings have become the primary interface in two-sided online markets. Many\nhave noted that the rankings not only affect the satisfaction of the users\n(e.g., customers, listeners, employers, travelers), but that the position in\nthe ran…
View article: Boosted Off-Policy Learning
Boosted Off-Policy Learning Open
We propose the first boosting algorithm for off-policy learning from logged bandit feedback. Unlike existing boosting methods for supervised learning, our algorithm directly optimizes an estimate of the policy's expected reward. We analyze…
View article: Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion
Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion Open
Conventional methods for query autocompletion aim to predict which completed query a user will select from a list. A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on th…
View article: Uncertainty Quantification for Fairness in Two-Stage Recommender Systems
Uncertainty Quantification for Fairness in Two-Stage Recommender Systems Open
Many large-scale recommender systems consist of two stages. The first stage efficiently screens the complete pool of items for a small subset of promising candidates, from which the second-stage model curates the final recommendations. In …
View article: Off-Policy Evaluation for Large Action Spaces via Embeddings
Off-Policy Evaluation for Large Action Spaces via Embeddings Open
Off-policy evaluation (OPE) in contextual bandits has seen rapid adoption in real-world systems, since it enables offline evaluation of new policies using only historic log data. Unfortunately, when the number of actions is large, existing…