Siliang Zeng
YOU?
Author Swipe
View article: Research hotspots and current status of iliotibial band studies: A bibliometric analysis (1934–2023)
Research hotspots and current status of iliotibial band studies: A bibliometric analysis (1934–2023) Open
Background: The iliotibial band (ITB), a unique musculoskeletal structure, is gaining interest in various fields such as biomechanics, orthopedics, sports medicine, and rehabilitation medicine. The pathogenesis, prevention, and treatment o…
View article: Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens Open
Language models are often trained to maximize the likelihood of the next token given past tokens in the training dataset. However, during inference time, they are utilized differently, generating text sequentially and auto-regressively by …
View article: A real-world, cross-sectional, and longitudinal study on high-risk human papillomavirus genotype distribution in 31,942 women in Dongguan, China
A real-world, cross-sectional, and longitudinal study on high-risk human papillomavirus genotype distribution in 31,942 women in Dongguan, China Open
Background Persistent human papillomavirus (HPV) infection remains a key risk factor for cervical cancer. HPV-based primary screening is widely recommended in clinical guidelines, and further longitudinal studies are needed to optimize str…
View article: Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment
Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment Open
Aligning human preference and value is an important requirement for building contemporary foundation models and embodied AI. However, popular approaches such as reinforcement learning with human feedback (RLHF) break down the task into suc…
View article: Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment Open
Aligning human preference and value is an important requirement for contemporary foundation models. State-of-the-art techniques such as Reinforcement Learning from Human Feedback (RLHF) often consist of two stages: 1) supervised fine-tunin…
View article: A Bayesian Approach to Robust Inverse Reinforcement Learning
A Bayesian Approach to Robust Inverse Reinforcement Learning Open
We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL). The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward funct…
View article: When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning Open
Offline inverse reinforcement learning (Offline IRL) aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of experti…
View article: Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees
Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees Open
We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states. This problem has an inherent nested structure: in the inner problem, a…
View article: Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees
Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees Open
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy that best fits observed sequences of states and actions implemented by an expert. Many algorithms for IRL have an inherently nested …
View article: Prevalence of and risk factors for diabetic retinopathy in residents with different types of abnormal glucose metabolism with or without hypertension: A suburban community-based cross-sectional study
Prevalence of and risk factors for diabetic retinopathy in residents with different types of abnormal glucose metabolism with or without hypertension: A suburban community-based cross-sectional study Open
Aims The present study examined the prevalence and risk factors for diabetic retinopathy (DR) in residents with abnormal glucose metabolism in a community. Methods 6029 subjects were included and underwent standardized interviews and compr…
View article: Manual therapy's marginal utility
Manual therapy's marginal utility Open
A brief idea
View article: Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees
Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees Open
Multi-agent reinforcement learning (MARL) has attracted much research attention recently. However, unlike its single-agent counterpart, many theoretical and algorithmic aspects of MARL have not been well-understood. In this paper, we study…
View article: A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum
A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum Open
This paper proposes a new algorithm -- the \underline{S}ingle-timescale Do\underline{u}ble-momentum \underline{St}ochastic \underline{A}pprox\underline{i}matio\underline{n} (SUSTAIN) -- for tackling stochastic unconstrained bilevel optimiz…
View article: A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization.
A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization. Open
This paper proposes a new algorithm -- the Momentum-assisted Single-timescale Stochastic Approximation (MSTSA) -- for tackling unconstrained bilevel optimization problems. We focus on bilevel problems where the lower level subproblem is st…
View article: On the Divergence of Decentralized Non-Convex Optimization
On the Divergence of Decentralized Non-Convex Optimization Open
We study a generic class of decentralized algorithms in which $N$ agents jointly optimize the non-convex objective $f(u):=1/N\sum_{i=1}^{N}f_i(u)$, while only communicating with their neighbors. This class of problems has become popular in…
View article: Laboratory astrophysics with laser-driven strong magnetic fields in China
Laboratory astrophysics with laser-driven strong magnetic fields in China Open
In this paper, the recent studies of laboratory astrophysics with strong magnetic fields in China have been reviewed. On the Shenguang-II laser facility of the National Laboratory on High-Power Lasers and Physics, a laser-driven strong mag…
View article: Chirp-free isolated attosecond pulse generation from an atom irradiated by a fundamental terahertz pulse synchronizing an infrared laser pulse
Chirp-free isolated attosecond pulse generation from an atom irradiated by a fundamental terahertz pulse synchronizing an infrared laser pulse Open
We theoretically study high-order harmonic generation (HHG) and attosecond pulses from an atom irradiated synchronically by a terahertz (THz) pulse and an infrared laser pulse. For the HHG spectrum from the THz pulse alone and the combined…