Nikos Karampatziakis
YOU?
Author Swipe
View article: Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning
Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning Open
Reinforcement learning with human feedback (RLHF), as a widely adopted approach in current large language model pipelines, is \textit{bottlenecked by the size of human preference data}. While traditional methods rely on offline preference …
View article: Active, anytime-valid risk controlling prediction sets
Active, anytime-valid risk controlling prediction sets Open
Rigorously establishing the safety of black-box machine learning models concerning critical risk measures is important for providing guarantees about model behavior. Recently, Bates et. al. (JACM '24) introduced the notion of a risk contro…
View article: Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Open
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5…
View article: Anytime-valid off-policy Inference for Contextual Bandits
Anytime-valid off-policy Inference for Contextual Bandits Open
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning algorithms that adaptively learn policies over time to map observed contexts X t to a…
View article: LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models Open
Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on…
View article: Meet in the Middle: A New Pre-training Paradigm
Meet in the Middle: A New Pre-training Paradigm Open
Most language models (LMs) are trained and applied in an autoregressive left-to-right fashion, assuming that the next token only depends on the preceding ones. However, this assumption ignores the potential benefits of using the full seque…
View article: Anytime-valid off-policy inference for contextual bandits
Anytime-valid off-policy inference for contextual bandits Open
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in healthcare and the tech industry. They involve online learning algorithms that adaptively learn policies over time to map observed contexts $X_t$ to…
View article: Contextual Bandit Applications in a Customer Support Bot
Contextual Bandit Applications in a Customer Support Bot Open
Virtual support agents have grown in popularity as a way for businesses to\nprovide better and more accessible customer service. Some challenges in this\ndomain include ambiguous user queries as well as changing support topics and\nuser be…
View article: Off-policy Confidence Sequences
Off-policy Confidence Sequences Open
We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting. These confidence sequences are based on recent ideas from martingale analysis and are non-asymptotic, non-parametric, an…
View article: Empirical Likelihood for Contextual Bandits
Empirical Likelihood for Contextual Bandits Open
We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting. To this end we apply empirical likelihood techniques to formulate our estimator and confidence inter…
View article: Lessons from Real-World Reinforcement Learning in a Customer Support Bot.
Lessons from Real-World Reinforcement Learning in a Customer Support Bot. Open
In this work, we describe practical lessons we have learned from successfully using contextual bandits (CBs) to improve key business metrics of the Microsoft Virtual Agent for customer support. While our current use cases focus on single s…
View article: Lessons from Contextual Bandit Learning in a Customer Support Bot
Lessons from Contextual Bandit Learning in a Customer Support Bot Open
In this work, we describe practical lessons we have learned from successfully using contextual bandits (CBs) to improve key business metrics of the Microsoft Virtual Agent for customer support. While our current use cases focus on single s…
View article: Extreme classification under limited space and time budget
Extreme classification under limited space and time budget Open
Schedae Informaticae » 2016 » Volume 25 » Extreme classification under limited space and time budget A A A
View article: Gradient Coding
Gradient Coding Open
We propose a novel coding theoretic framework for mitigating stragglers in distributed learning. We show how carefully replicating data blocks and coding across gradients can provide tolerance to failures and stragglers for Synchronous Gra…
View article: Gradient Coding
Gradient Coding Open
We propose a novel coding theoretic framework for mitigating stragglers in distributed learning. We show how carefully replicating data blocks and coding across gradients can provide tolerance to failures and stragglers for Synchronous Gra…
View article: Log-time and Log-space Extreme Classification
Log-time and Log-space Extreme Classification Open
We present LTLS, a technique for multiclass and multilabel prediction that can perform training and inference in logarithmic time and space. LTLS embeds large classification problems into simple structured prediction problems and relies on…
View article: Logarithmic Time One-Against-Some
Logarithmic Time One-Against-Some Open
We create a new online reduction of multiclass classification to binary classification for which training and prediction time scale logarithmically with the number of classes. Compared to previous approaches, we obtain substantially better…
View article: Active Information Acquisition
Active Information Acquisition Open
We propose a general framework for sequential and dynamic acquisition of useful information in order to solve a particular task. While our goal could in principle be tackled by general reinforcement learning, our particular setting is cons…