View article: A survey on Image Data Augmentation for Deep Learning
A survey on Image Data Augmentation for Deep Learning Open
Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a fun…
View article
Reading digits in natural images with unsupervised feature learning Open
Detecting and reading text from natural images is a hard computer vision task that is central to a variety of emerging applications. Related problems like document character recognition have been widely studied by computer vision and machi…
View article
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium Open
Generative Adversarial Networks (GANs) excel at creating realistic images with complex models for which maximum likelihood is infeasible. However, the convergence of GAN training has still not been proved. We propose a two time-scale updat…
View article
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling Open
For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine tran…
View article
BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis Open
Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary q…
View article
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Open
We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerg…
View article
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Open
Human ability to understand language is general, flexible, and robust. In contrast, most NLU models above the word level are designed for a specific task and struggle with out-of-domain data. If we aspire to develop models with understandi…
View article
CP2K: An electronic structure and molecular dynamics software package - Quickstep: Efficient and accurate electronic structure calculations Open
CP2K is an open source electronic structure and molecular dynamics software package to perform atomistic simulations of solid-state, liquid, molecular, and biological systems. It is especially aimed at massively parallel and linear-scaling…
View article
Extended Reconstructed Sea Surface Temperature, Version 5 (ERSSTv5): Upgrades, Validations, and Intercomparisons Open
The monthly global 2° × 2° Extended Reconstructed Sea Surface Temperature (ERSST) has been revised and updated from version 4 to version 5. This update incorporates a new release of ICOADS release 3.0 (R3.0), a decade of near-surface data …
View article
Neural Message Passing for Quantum Chemistry Open
Supervised learning on molecules has incredible potential to be useful in chemistry, drug discovery, and materials science. Luckily, several promising and closely related neural network models invariant to molecular symmetries have already…
View article
MoleculeNet: a benchmark for molecular machine learning Open
A large scale benchmark for molecular machine learning consisting of multiple public datasets, metrics, featurizations and learning algorithms.
View article
The Arithmetic Optimization Algorithm Open
This work proposes a new meta-heuristic method called Arithmetic Optimization Algorithm (AOA) that utilizes the distribution behavior of the main arithmetic operators in mathematics including (Multiplication (M), Division (D), Subtraction …
View article
Antibiotic resistance: a rundown of a global crisis Open
The advent of multidrug resistance among pathogenic bacteria is imperiling the worth of antibiotics, which have previously transformed medical sciences. The crisis of antimicrobial resistance has been ascribed to the misuse of these agents…
View article
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction Open
Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or requir…
View article
VoxCeleb2: Deep Speaker Recognition Open
The objective of this paper is speaker recognition under noisy and\nunconstrained conditions.\n We make two key contributions. First, we introduce a very large-scale\naudio-visual speaker recognition dataset collected from open-source medi…
View article
PaLM: Scaling Language Modeling with Pathways Open
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model t…
View article
ShapeNet: An Information-Rich 3D Model Repository Open
We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects. ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy. It is a c…
View article
SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary Open
The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered "de facto" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its…
View article
<i>Gaia</i> Data Release 2 Open
Context. Gaia Data Release 2 ( Gaia DR2) contains results for 1693 million sources in the magnitude range 3 to 21 based on observations collected by the European Space Agency Gaia satellite during the first 22 months of its operational pha…
View article
Densely Connected Convolutional Networks Open
Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we emb…
View article
Natural Questions: A Benchmark for Question Answering Research Open
We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from…
View article
Return of Frustratingly Easy Domain Adaptation Open
Unlike human learning, machine learning often fails to handle changes between training (source) and test (target) input distributions. Such domain shifts, common in practical scenarios, severely damage the performance of conventional machi…
View article
A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks Open
Intrusion detection plays an important role in ensuring information security, and the key technology is to accurately identify various attacks in the network. In this paper, we explore how to model an intrusion detection system based on de…
View article
A look at the density functional theory zoo with the advanced GMTKN55 database for general main group thermochemistry, kinetics and noncovalent interactions Open
We present the updated and extended GMTKN55 benchmark database for more accurate and extensive energetic evaluation of density functionals and other electronic structure methods with detailed guidelines for method users.
View article
A Review of Global Precipitation Data Sets: Data Sources, Estimation, and Intercomparisons Open
In this paper, we present a comprehensive review of the data sources and estimation methods of 30 currently available global precipitation data sets, including gauge‐based, satellite‐related, and reanalysis data sets. We analyzed the discr…
View article
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing Open
Pretraining large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks. However, most pretraining efforts focus on general domain corpora, such as newswire and Web. A prevailing …
View article
An Ensemble Version of the E‐OBS Temperature and Precipitation Data Sets Open
We describe the construction of a new version of the Europe‐wide E‐OBS temperature (daily minimum, mean, and maximum values) and precipitation data set. This version provides an improved estimation of interpolation uncertainty through the …
View article
Therapeutic peptides: Historical perspectives, current development trends, and future directions Open
Peptide therapeutics have played a notable role in medical practice since the advent of insulin therapy in the 1920s. Over 60 peptide drugs are approved in the United States and other major markets, and peptides continue to enter clinical …
View article
SoilGrids 2.0: producing soil information for the globe with quantified spatial uncertainty Open
SoilGrids produces maps of soil properties for the entire globe at medium spatial resolution (250 m cell size) using state-of-the-art machine learning methods to generate the necessary models. It takes as inputs soil observations from abou…
View article
Integrating psychological and neurobiological considerations regarding the development and maintenance of specific Internet-use disorders: An Interaction of Person-Affect-Cognition-Execution (I-PACE) model Open
Within the last two decades, many studies have addressed the clinical phenomenon of Internet-use disorders, with a particular focus on Internet-gaming disorder. Based on previous theoretical considerations and empirical findings, we sugges…