Hessian matrix ≈ Hessian matrix
View article
New parton distribution functions from a global analysis of quantum chromodynamics Open
Here, we present new parton distribution functions (PDFs) up to next-to-next-to-leading order (NNLO) from the CTEQ-TEA global analysis of quantum chromodynamics. These differ from previous CT PDFs in several respects, including the use of …
View article
PDF4LHC recommendations for LHC Run II Open
We provide an updated recommendation for the usage of sets of parton distribution functions (PDFs) and the assessment of PDF and PDF+${\alpha }_{s}$ uncertainties suitable for applications at the LHC Run II. We review developments since th…
View article
Understanding Black-box Predictions via Influence Functions Open
How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data,…
View article
Interpretation of Neural Networks Is Fragile Open
In order for machine learning to be trusted in many applications, it is critical to be able to reliably explain why the machine learning algorithm makes certain predictions. For this reason, a variety of methods have been developed recentl…
View article
New CTEQ global analysis of quantum chromodynamics with high-precision data from the LHC Open
We present the new parton distribution functions (PDFs) from the CTEQ-TEA collaboration, obtained using a wide variety of high-precision Large Hadron Collider (LHC) data, in addition to the combined HERA I + II deep-inelastic scattering da…
View article
nCTEQ15: Global analysis of nuclear parton distributions with uncertainties in the CTEQ framework Open
We present the new nCTEQ15 set of nuclear parton distribution functions with uncertainties. This fit extends the CTEQ proton PDFs to include the nuclear dependence using data on nuclei all the way up to 208Pb. The uncertainties are determi…
View article
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT Open
Transformer based architectures have become de-facto models used for a range of Natural Language Processing tasks. In particular, the BERT based models achieved significant accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. However, BERT ba…
View article
Optimizing Neural Networks with Kronecker-factored Approximate Curvature Open
We propose an efficient method for approximating natural gradient descent in neural networks which we call Kronecker-Factored Approximate Curvature (K-FAC). K-FAC is based on an efficiently invertible approximation of a neural network's Fi…
View article
Global rates of convergence for nonconvex optimization on manifolds Open
We consider the minimization of a cost function f on a manifold $\mathcal{M}$ using Riemannian gradient descent and Riemannian trust regions (RTR). We focus on satisfying necessary optimality conditions within a tolerance ε. Specifically, …
View article
Understanding Black-box Predictions via Influence Functions Open
How can we explain the predictions of a black-box model? In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data,…
View article
Faster Independent Component Analysis by Preconditioning With Hessian Approximations Open
Independent Component Analysis (ICA) is a technique for unsupervised exploration of multi-channel data that is widely used in observational sciences. In its classic form, ICA relies on modeling the data as linear mixtures of non-Gaussian i…
View article
The landscape of empirical risk for nonconvex losses Open
Most high-dimensional estimation methods propose to minimize a cost function (empirical risk) that is a sum of losses associated to each data point (each example). In this paper, we focus on the case of nonconvex losses. Classical empirica…
View article
Simulated Annealing Algorithm for Deep Learning Open
Deep learning (DL) is a new area of research in machine learning, in which the objective is moving us closer to the goal of artificial intelligent. This method can learn many levels of abstraction and representation to create a common sens…
View article
RMSProp and equilibrated adaptive learning rates for non-convex optimization. Open
Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of the criti…
View article
Stable Neo-Hookean Flesh Simulation Open
Nonlinear hyperelastic energies play a key role in capturing the fleshy appearance of virtual characters. Real-world, volume-preserving biological tissues have Poisson’s ratios near 1/2, but numerical simulation within this regime is notor…
View article
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks Open
We study the properties of common loss surfaces through their Hessian matrix. In particular, in the context of deep learning, we empirically show that the spectrum of the Hessian is composed of two parts: (1) the bulk centered near zero, (…
View article
QuickFF: A program for a quick and easy derivation of force fields for metal‐organic frameworks from <i>ab initio</i> input Open
QuickFF is a software package to derive accurate force fields for isolated and complex molecular systems in a quick and easy manner. Apart from its general applicability, the program has been designed to generate force fields for metal‐org…
View article
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning Open
Incorporating second-order curvature information into machine learning optimization algorithms can be subtle, and doing so naïvely can lead to high per-iteration costs associated with forming the Hessian and performing the associated linea…
View article
Super-convergence: very fast training of neural networks using large learning rates Open
In this paper, we describe a phenomenon, which we named "super-convergence", where neural networks can be trained an order of magnitude faster than with standard training methods. The existence of super-convergence is relevant to understan…
View article
Asynchronous Stochastic Gradient Descent with Delay Compensation Open
With the fast development of deep learning, it has become common to learn big neural networks using massive training data. Asynchronous Stochastic Gradient Descent (ASGD) is widely adopted to fulfill this task for its efficiency, which is,…
View article
Global Optimality of Local Search for Low Rank Matrix Recovery Open
We show that there are no spurious local minima in the non-convex factorized parametrization of low-rank matrix recovery from incoherent linear measurements. With noisy measurements we show all local minima are very close to a global optim…
View article
Representation-Free Model Predictive Control for Dynamic Motions in Quadrupeds Open
This paper presents a novel Representation-Free Model Predictive Control (RF-MPC) framework for controlling various dynamic motions of a quadrupedal robot in three dimensional (3D) space. Our formulation directly represents the rotational …
View article
Equilibrated adaptive learning rates for non-convex optimization Open
Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of the criti…
View article
Estimating the gradient and higher-order derivatives on quantum hardware Open
For a large class of variational quantum circuits, we show how\narbitrary-order derivatives can be analytically evaluated in terms of simple\nparameter-shift rules, i.e., by running the same circuit with different shifts\nof the parameters…
View article
Global optimality of local search for low rank matrix recovery Open
We show that there are no spurious local minima in the non-convex factorized parametrization of low-rank matrix recovery from incoherent linear measurements. With noisy measurements we show all local minima are very close to a global optim…
View article
Essentially No Barriers in Neural Network Energy Landscape Open
Training neural networks involves finding minima of a high-dimensional non-convex loss function. Knowledge of the structure of this energy landscape is sparse. Relaxing from linear interpolations, we construct continuous paths between mini…
View article
Second-order MCSCF optimization revisited. I. Improved algorithms for fast and robust second-order CASSCF convergence Open
A new improved implementation of the second-order multiconfiguration self-consistent field optimization method of Werner and Knowles [J. Chem. Phys. 82, 5053 (1985)] is presented. It differs from the original method by more stable and effi…
View article
Methodology for replacing indirect measurements with direct measurements Open
In quantum computing, the indirect measurement of unitary operators such as\nthe Hadamard test plays a significant role in many algorithms. However, in\ncertain cases, the indirect measurement can be reduced to the direct\nmeasurement, whe…
View article
EStokTP: Electronic Structure to Temperature- and Pressure-Dependent Rate Constants—A Code for Automatically Predicting the Thermal Kinetics of Reactions Open
A priori rate predictions for gas phase reactions have undergone a gradual but dramatic transformation, with current predictions often rivaling the accuracy of the best available experimental data. The utility of such kinetic predictions w…
View article
Optimization in Quaternion Dynamic Systems: Gradient, Hessian, and Learning Algorithms Open
The optimization of real scalar functions of quaternion variables, such as the mean square error or array output power, underpins many practical applications. Solutions typically require the calculation of the gradient and Hessian. However…