Xuansheng Wu
YOU?
Author Swipe
View article: Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential
Soundness-Aware Level: A Microscopic Signature that Predicts LLM Reasoning Potential Open
Reinforcement learning with verifiable rewards (RLVR) can elicit strong reasoning in large language models (LLMs), while their performance after RLVR varies dramatically across different base models. This raises a fundamental question: wha…
View article: Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement
Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement Open
Steering has emerged as a promising approach in controlling large language models (LLMs) without modifying model parameters. However, most existing steering methods rely on large-scale datasets to learn clear behavioral information, which …
View article: Self-Regularization with Sparse Autoencoders for Controllable LLM-based Classification
Self-Regularization with Sparse Autoencoders for Controllable LLM-based Classification Open
View article: Concept-Centric Token Interpretation for Vector-Quantized Generative Models
Concept-Centric Token Interpretation for Vector-Quantized Generative Models Open
Vector-Quantized Generative Models (VQGMs) have emerged as powerful tools for image generation. However, the key component of VQGMs -- the codebook of discrete tokens -- is still not well understood, e.g., which tokens are critical to gene…
View article: Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering
Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering Open
Linear concept vectors effectively steer LLMs, but existing methods suffer from noisy features in diverse datasets that undermine steering robustness. We propose Sparse Autoencoder-Denoised Concept Vectors (SDCV), which selectively keep th…
View article: Self-Regularization with Sparse Autoencoders for Controllable LLM-based Classification
Self-Regularization with Sparse Autoencoders for Controllable LLM-based Classification Open
Modern text classification methods heavily rely on contextual embeddings from large language models (LLMs). Compared to human-engineered features, these embeddings provide automatic and effective representations for classification model tr…
View article: Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders Open
View article: LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models
LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models Open
View article: A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models
A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models Open
View article: Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering
Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering Open
View article: LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models
LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models Open
The prevalence of vision-threatening eye diseases is a significant global burden, with many cases remaining undiagnosed or diagnosed too late for effective treatment. Large vision-language models (LVLMs) have the potential to assist in und…
View article: Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring
Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring Open
Large language models (LLMs) have demonstrated strong potential in performing automatic scoring for constructed response assessments. While constructed responses graded by humans are usually based on given grading rubrics, the methods by w…
View article: Could Small Language Models Serve as Recommenders? Towards Data-centric Cold-start Recommendation
Could Small Language Models Serve as Recommenders? Towards Data-centric Cold-start Recommendation Open
View article: DIRECT: Dual Interpretable Recommendation with Multi-aspect Word Attribution
DIRECT: Dual Interpretable Recommendation with Multi-aspect Word Attribution Open
Recommending products to users with intuitive explanations helps improve the system in transparency, persuasiveness, and satisfaction. Existing interpretation techniques include post hoc methods and interpretable modeling. The former categ…
View article: Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering
Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering Open
Large Language Models (LLMs) have shown proficiency in question-answering tasks but often struggle to integrate real-time knowledge, leading to potentially outdated or inaccurate responses. This problem becomes even more challenging when d…
View article: Applying large language models and chain-of-thought for automatic scoring
Applying large language models and chain-of-thought for automatic scoring Open
This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT) in the automatic scoring of student-written responses to science assessments. We focused on overcoming the …
View article: InFoBench: Evaluating Instruction Following Ability in Large Language Models
InFoBench: Evaluating Instruction Following Ability in Large Language Models Open
This paper introduces the Decomposed Requirements Following Ratio (DRFR), a new metric for evaluating Large Language Models' (LLMs) ability to follow instructions. Addressing a gap in current methodologies, DRFR breaks down complex instruc…
View article: Applying Large Language Models and Chain-of-Thought for Automatic Scoring
Applying Large Language Models and Chain-of-Thought for Automatic Scoring Open
This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT) in the automatic scoring of student-written responses to science assessments. We focused on overcoming the …
View article: Could Small Language Models Serve as Recommenders? Towards Data-centric Cold-start Recommendations
Could Small Language Models Serve as Recommenders? Towards Data-centric Cold-start Recommendations Open
Recommendation systems help users find matched items based on their previous behaviors. Personalized recommendation becomes challenging in the absence of historical user-item interactions, a practical problem for startups known as the syst…
View article: AGI: Artificial General Intelligence for Education
AGI: Artificial General Intelligence for Education Open
Artificial general intelligence (AGI) has gained global recognition as a future technology due to the emergence of breakthrough large language models and chatbots such as GPT-4 and ChatGPT, respectively. Compared to conventional AI models,…
View article: Black-box Backdoor Defense via Zero-shot Image Purification
Black-box Backdoor Defense via Zero-shot Image Purification Open
Backdoor attacks inject poisoned samples into the training data, resulting in the misclassification of the poisoned input during a model's deployment. Defending against such attacks is challenging, especially for real-world black-box model…
View article: A Survey of Graph Prompting Methods: Techniques, Applications, and Challenges
A Survey of Graph Prompting Methods: Techniques, Applications, and Challenges Open
The recent "pre-train, prompt, predict training" paradigm has gained popularity as a way to learn generalizable models with limited labeled data. The approach involves using a pre-trained model and a prompting function that applies a templ…
View article: NoPPA: Non-Parametric Pairwise Attention Random Walk Model for Sentence Representation
NoPPA: Non-Parametric Pairwise Attention Random Walk Model for Sentence Representation Open
We propose a novel non-parametric/un-trainable language model, named Non-Parametric Pairwise Attention Random Walk Model (NoPPA), to generate sentence embedding only with pre-trained word embedding and pre-counted word frequency. To the be…
View article: Matching Exemplar as Next Sentence Prediction (MeNSP): Zero-shot Prompt Learning for Automatic Scoring in Science Education
Matching Exemplar as Next Sentence Prediction (MeNSP): Zero-shot Prompt Learning for Automatic Scoring in Science Education Open
Developing models to automatically score students' written responses to science problems is critical for science education. However, collecting and labeling sufficient student responses for training models is time and cost-consuming. Recen…