Eugene Kwek YOU? Author Swipe

Last 10y

Open Invitation to Help Curate This Field & Enhance Impact .ORG

COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens Open

Eugene Kwek, Wenpeng Yin · 2025

Making large language models (LLMs) more efficient in memory, latency, and serving cost is crucial for edge deployment, interactive applications, and sustainable inference at scale. Pruning is a promising technique, but existing pruning me…

Creating related items for first view…