Kshitiz Malik
YOU?
Author Swipe
View article: Effective Long-Context Scaling of Foundation Models
Effective Long-Context Scaling of Foundation Models Open
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our model series are built through continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts …
View article: Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning
Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning Open
An oft-cited challenge of federated learning is the presence of heterogeneity. \emph{Data heterogeneity} refers to the fact that data from different clients may follow very different distributions. \emph{System heterogeneity} refers to the…
View article: Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning
Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning Open
An oft-cited challenge of federated learning is the presence of heterogeneity. \emph{Data heterogeneity} refers to the fact that data from different clients may follow very different distributions. \emph{System heterogeneity} refers to cli…
View article: Federated Learning with Partial Model Personalization
Federated Learning with Partial Model Personalization Open
We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices. Both algorithms have been proposed in the l…
View article: FedSynth: Gradient Compression via Synthetic Data in Federated Learning
FedSynth: Gradient Compression via Synthetic Data in Federated Learning Open
Model compression is important in federated learning (FL) with large models to reduce communication cost. Prior works have been focusing on sparsification based compression that could desparately affect the global model accuracy. In this w…
View article: Papaya: Practical, Private, and Scalable Federated Learning
Papaya: Practical, Private, and Scalable Federated Learning Open
Cross-device Federated Learning (FL) is a distributed learning paradigm with several challenges that differentiate it from traditional distributed learning, variability in the system characteristics on each device, and millions of clients …
View article: Federated Learning with Buffered Asynchronous Aggregation
Federated Learning with Buffered Asynchronous Aggregation Open
Scalability and privacy are two critical concerns for cross-device federated learning (FL) systems. In this work, we identify that synchronous FL - synchronized aggregation of client updates in FL - cannot scale efficiently beyond a few hu…
View article: Active Federated Learning
Active Federated Learning Open
Federated Learning allows for population level models to be trained without centralizing client data by transmitting the global model to clients, calculating gradients locally, then averaging the gradients. Downloading models and uploading…
View article: Federated User Representation Learning
Federated User Representation Learning Open
Collaborative personalization, such as through learned user representations (embeddings), can improve the prediction accuracy of neural-network-based models significantly. We propose Federated User Representation Learning (FURL), a simple,…