Junze Yin
YOU?
Author Swipe
View article: Support Basis: Fast Attention Beyond Bounded Entries
Support Basis: Fast Attention Beyond Bounded Entries Open
The quadratic complexity of softmax attention remains a central bottleneck in scaling large language models (LLMs). [Alman and Song, NeurIPS 2023] proposed a sub-quadratic attention approximation algorithm, but it works only under the rest…
View article: CoVE: Compressed Vocabulary Expansion Makes Better LLM-based Recommender Systems
CoVE: Compressed Vocabulary Expansion Makes Better LLM-based Recommender Systems Open
Recommender systems play a pivotal role in providing relevant content to users. With the rapid development of large language models (LLMs), researchers have begun utilizing LLMs to build more powerful recommender systems. However, existing…
View article: Breaking the Frozen Subspace: Importance Sampling for Low-Rank Optimization in LLM Pretraining
Breaking the Frozen Subspace: Importance Sampling for Low-Rank Optimization in LLM Pretraining Open
Low-rank optimization has emerged as a promising approach to enabling memory-efficient training of large language models (LLMs). Existing low-rank optimization methods typically project gradients onto a low-rank subspace, reducing the memo…
View article: Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method
Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method Open
Leverage scores have become essential in statistics and machine learning, aiding regression analysis, randomized matrix computations, and various other tasks. This paper delves into the inverse problem, aiming to recover the intrinsic mode…
View article: Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers
Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers Open
The self-attention mechanism is the key to the success of transformers in recent Large Language Models (LLMs). However, the quadratic computational cost $O(n^2)$ in the input sequence length $n$ is a notorious obstacle for further improvem…
View article: How to Inverting the Leverage Score Distribution?
How to Inverting the Leverage Score Distribution? Open
Leverage score is a fundamental problem in machine learning and theoretical computer science. It has extensive applications in regression analysis, randomized algorithms, and neural network inversion. Despite leverage scores are widely use…
View article: Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression
Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression Open
There have been significant advancements made by large language models (LLMs) in various aspects of our daily lives. LLMs serve as a transformative force in natural language processing, finding applications in text generation, translation,…
View article: The Expressibility of Polynomial based Attention Scheme
The Expressibility of Polynomial based Attention Scheme Open
Large language models (LLMs) have significantly improved various aspects of our daily lives. These models have impacted numerous domains, from healthcare to education, enhancing productivity, decision-making processes, and accessibility. A…
View article: A Unified Scheme of ResNet and Softmax
A Unified Scheme of ResNet and Softmax Open
Large language models (LLMs) have brought significant changes to human society. Softmax regression and residual neural networks (ResNet) are two important techniques in deep learning: they not only serve as significant theoretical componen…
View article: A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time Open
Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee …
View article: Solving Attention Kernel Regression Problem via Pre-conditioner
Solving Attention Kernel Regression Problem via Pre-conditioner Open
The attention mechanism is the key to large language models, and the attention matrix serves as an algorithmic and computational bottleneck for such a scheme. In this paper, we define two problems, motivated by designing fast algorithms fo…
View article: Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation
Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation Open
Weighted low rank approximation is a fundamental problem in numerical linear algebra, and it has many applications in machine learning. Given a matrix $M \in \mathbb{R}^{n \times n}$, a non-negative weight matrix $W \in \mathbb{R}_{\geq 0}…
View article: Query Complexity of Active Learning for Function Family With Nearly Orthogonal Basis
Query Complexity of Active Learning for Function Family With Nearly Orthogonal Basis Open
Many machine learning algorithms require large numbers of labeled data to deliver state-of-the-art results. In applications such as medical diagnosis and fraud detection, though there is an abundance of unlabeled data, it is costly to labe…
View article: Faster Robust Tensor Power Method for Arbitrary Order
Faster Robust Tensor Power Method for Arbitrary Order Open
Tensor decomposition is a fundamental method used in various areas to deal with high-dimensional data. \emph{Tensor power method} (TPM) is one of the widely-used techniques in the decomposition of tensors. This paper presents a novel tenso…
View article: Federated Empirical Risk Minimization via Second-Order Method
Federated Empirical Risk Minimization via Second-Order Method Open
Many convex optimization problems with important applications in machine learning are formulated as empirical risk minimization (ERM). There are several examples: linear and logistic regression, LASSO, kernel regression, quantile regressio…
View article: An Iterative Algorithm for Rescaled Hyperbolic Functions Regression
An Iterative Algorithm for Rescaled Hyperbolic Functions Regression Open
Large language models (LLMs) have numerous real-life applications across various domains, such as natural language translation, sentiment analysis, language modeling, chatbots and conversational agents, creative writing, text classificatio…
View article: Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time
Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time Open
Given a matrix $M\in \mathbb{R}^{m\times n}$, the low rank matrix completion problem asks us to find a rank-$k$ approximation of $M$ as $UV^\top$ for $U\in \mathbb{R}^{m\times k}$ and $V\in \mathbb{R}^{n\times k}$ by only observing a few e…
View article: A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee
A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee Open
Given a matrix $A\in \mathbb{R}^{n\times d}$ and a vector $b\in \mathbb{R}^n$, we consider the regression problem with $\ell_\infty$ guarantees: finding a vector $x'\in \mathbb{R}^d$ such that $ \|x'-x^*\|_\infty \leq \fracε{\sqrt{d}}\cdot…