Explanipedia

Support Basis: Fast Attention Beyond Bounded Entries Open

Maryam Aliakbarpour, Vladimir Braverman, Junze Yin, Haochen Zhang · 2025

The quadratic complexity of softmax attention remains a central bottleneck in scaling large language models (LLMs). [Alman and Song, NeurIPS 2023] proposed a sub-quadratic attention approximation algorithm, but it works only under the rest…

CoVE: Compressed Vocabulary Expansion Makes Better LLM-based Recommender Systems Open

Haochen Zhang, Tianyi Zhang, Junze Yin, Gal Oren, Anshumali Shrivastava , et al. · 2025

Recommender systems play a pivotal role in providing relevant content to users. With the rapid development of large language models (LLMs), researchers have begun utilizing LLMs to build more powerful recommender systems. However, existing…

Breaking the Frozen Subspace: Importance Sampling for Low-Rank Optimization in LLM Pretraining Open

Haochen Zhang, Junze Yin, Guanchu Wang, Zirui Liu, Tianyi Zhang , et al. · 2025

Low-rank optimization has emerged as a promising approach to enabling memory-efficient training of large language models (LLMs). Existing low-rank optimization methods typically project gradients onto a low-rank subspace, reducing the memo…

Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method Open

Chenyang Li, Zhao Song, Zhongxian Xu, Junze Yin · 2024

Computer science Mathematics Physics

Leverage scores have become essential in statistics and machine learning, aiding regression analysis, randomized matrix computations, and various other tasks. This paper delves into the inverse problem, aiming to recover the intrinsic mode…

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers Open

Jiuxiang Gu, Yingyu Liang, Heshan Liu, Zhenmei Shi, Zhao Song , et al. · 2024

Computer science Mathematics Engineering

The self-attention mechanism is the key to the success of transformers in recent Large Language Models (LLMs). However, the quadratic computational cost $O(n^2)$ in the input sequence length $n$ is a notorious obstacle for further improvem…

How to Inverting the Leverage Score Distribution? Open

Zhihang Li, Zhao Song, Weixin Wang, Junze Yin, Yu Zheng · 2024

Computer science Business Mathematics

Leverage score is a fundamental problem in machine learning and theoretical computer science. It has extensive applications in regression analysis, randomized algorithms, and neural network inversion. Despite leverage scores are widely use…

Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression Open

Zhihang Li, Zhao Song, Zifan Wang, Junze Yin · 2023

Computer science Mathematics Economics

There have been significant advancements made by large language models (LLMs) in various aspects of our daily lives. LLMs serve as a transformative force in natural language processing, finding applications in text generation, translation,…

The Expressibility of Polynomial based Attention Scheme Open

Zhao Song, Guangyi Xu, Junze Yin · 2023

Computer science Mathematics Physics

Large language models (LLMs) have significantly improved various aspects of our daily lives. These models have impacted numerous domains, from healthcare to education, enhancing productivity, decision-making processes, and accessibility. A…

A Unified Scheme of ResNet and Softmax Open

Zhao Song, Weixin Wang, Junze Yin · 2023

Computer science Mathematics Physics

Large language models (LLMs) have brought significant changes to human society. Softmax regression and residual neural networks (ResNet) are two important techniques in deep learning: they not only serve as significant theoretical componen…

A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time Open

Yeqi Gao, Zhao Song, Weixin Wang, Junze Yin · 2023

Physics Mathematics Materials science

Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee …

Solving Attention Kernel Regression Problem via Pre-conditioner Open

Song Zhao, Junze Yin, Lichen Zhang · 2023

Mathematics Computer science Materials science

The attention mechanism is the key to large language models, and the attention matrix serves as an algorithmic and computational bottleneck for such a scheme. In this paper, we define two problems, motivated by designing fast algorithms fo…

Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation Open

Zhao Song, Mingquan Ye, Junze Yin, Lichen Zhang · 2023

Mathematics Physics Materials science

Weighted low rank approximation is a fundamental problem in numerical linear algebra, and it has many applications in machine learning. Given a matrix $M \in \mathbb{R}^{n \times n}$, a non-negative weight matrix $W \in \mathbb{R}_{\geq 0}…

Query Complexity of Active Learning for Function Family With Nearly Orthogonal Basis Open

Xiang Chen, Zhao Song, Baocheng Sun, Junze Yin, Danyang Zhuo · 2023

Computer science Mathematics Physics

Many machine learning algorithms require large numbers of labeled data to deliver state-of-the-art results. In applications such as medical diagnosis and fraud detection, though there is an abundance of unlabeled data, it is costly to labe…

Faster Robust Tensor Power Method for Arbitrary Order Open

Yichuan Deng, Zhao Song, Junze Yin · 2023

Mathematics Computer science Physics

Tensor decomposition is a fundamental method used in various areas to deal with high-dimensional data. \emph{Tensor power method} (TPM) is one of the widely-used techniques in the decomposition of tensors. This paper presents a novel tenso…

Federated Empirical Risk Minimization via Second-Order Method Open

Song Bian, Zhao Song, Junze Yin · 2023

Computer science Mathematics

Many convex optimization problems with important applications in machine learning are formulated as empirical risk minimization (ERM). There are several examples: linear and logistic regression, LASSO, kernel regression, quantile regressio…

An Iterative Algorithm for Rescaled Hyperbolic Functions Regression Open

Yeqi Gao, Zhao Song, Junze Yin · 2023

Mathematics Computer science Geography

Large language models (LLMs) have numerous real-life applications across various domains, such as natural language translation, sentiment analysis, language modeling, chatbots and conversational agents, creative writing, text classificatio…

Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time Open

Yuzhou Gu, Zhao Song, Junze Yin, Lichen Zhang · 2023

Mathematics Computer science Physics

Given a matrix $M\in \mathbb{R}^{m\times n}$, the low rank matrix completion problem asks us to find a rank-$k$ approximation of $M$ as $UV^\top$ for $U\in \mathbb{R}^{m\times k}$ and $V\in \mathbb{R}^{n\times k}$ by only observing a few e…

A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee Open

Zhao Song, Mingquan Ye, Junze Yin, Lichen Zhang · 2023

Mathematics Physics Materials science

Given a matrix $A\in \mathbb{R}^{n\times d}$ and a vector $b\in \mathbb{R}^n$, we consider the regression problem with $\ell_\infty$ guarantees: finding a vector $x'\in \mathbb{R}^d$ such that $ \|x'-x^*\|_\infty \leq \fracε{\sqrt{d}}\cdot…

Junze Yin YOU? Author Swipe