Source lines of code ≈ Source lines of code
View article
Horovod: fast and easy distributed deep learning in TensorFlow Open
Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the tr…
View article
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP Open
While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. …
View article
On the "naturalness" of buggy code Open
Real software, the kind working programmers produce by the kLOC to solve real-world problems, tends to be "natural", like speech or natural language; it tends to be highly repetitive and predictable. Researchers have captured this naturaln…
View article
PIT: a practical mutation testing tool for Java (demo) Open
International Symposium on Software Testing and Analysis (ISSTA), Saarbrücken, Germany, 18-20 July 2016
View article
Scaling static analyses at Facebook Open
Key lessons for designing static analyses tools deployed to find bugs in hundreds of millions of lines of code.
View article
End-to-End Deep Learning of Optimization Heuristics Open
Accurate automatic optimization heuristics are necessary for dealing with thecomplexity and diversity of modern hardware and software. Machine learning is aproven technique for learning such heuristics, but its success is bound by thequali…
View article
Practical program repair via bytecode mutation Open
Automated Program Repair (APR) is one of the most recent advances in automated debugging, and can directly fix buggy programs with minimal human intervention. Although various advanced APR techniques (including search-based or semantic-bas…
View article
Why Google stores billions of lines of code in a single repository Open
Google's monolithic repository provides a common source of truth for tens of thousands of developers around the world.
View article
GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow Open
GPT-Neo is an implementation of model & data-parallel GPT-2 and GPT-3-like models, utilizing Mesh Tensorflow for distributed support. This codebase is designed for TPUs. It should also work on GPUs, though we do not recommend this hardware…
View article
RefDiff: Detecting Refactorings in Version Histories Open
Refactoring is a well-known technique that is widely adopted by software\nengineers to improve the design and enable the evolution of a system. Knowing\nwhich refactoring operations were applied in a code change is a valuable\ninformation …
View article
Predicting Defective Lines Using a Model-Agnostic Technique Open
Defect prediction models are proposed to help a team prioritize source code areas files that need Software Quality Assurance (SQA) based on the likelihood of having defects. However, developers may waste their unnecessary effort on the who…
View article
A deep tree-based model for software defect prediction Open
Defects are common in software systems and can potentially cause various problems to software users. Different methods have been developed to quickly predict the most likely locations of defects in large code bases. Most of them focus on d…
View article
Democratizing artificial intelligence: How no-code AI can leverage machine learning operations Open
Organizations are increasingly seeking to generate value and insights from their data by integrating advances in artificial intelligence (AI) such as machine learning (ML) systems into their operations. However, there are several manageria…
View article
Exploring the Potential of ChatGPT in Automated Code Refinement: An Empirical Study Open
Code review is an essential activity for ensuring the quality and maintainability of software projects. However, it is a time-consuming and often error-prone task that can significantly impact the development process. Recently, ChatGPT, a …
View article
Reassessing automatic evaluation metrics for code summarization tasks Open
In recent years, research in the domain of source code summarization has adopted data-driven techniques pioneered in machine translation (MT). Automatic evaluation metrics such as BLEU, METEOR, and ROUGE, are fundamental to the evaluation …
View article
Is GitHub copilot a substitute for human pair-programming? Open
This empirical study investigates the effectiveness of pair programming with GitHub Copilot in comparison to human pair-programming. Through an experiment with 21 participants we focus on code productivity and code quality. For experimenta…
View article
An empirical study on the effectiveness of static C code analyzers for vulnerability detection Open
Presentation slides of my talk about SAST tool effectiveness at the BINSEC Team, Université Paris-Saclay, and the Research Training Group ConVeY, Technical University of Munich (TUM) and Ludwig Maximilian University of Munich (LMU).
View article
Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction) Open
We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect…
View article
EverCrypt: A Fast, Verified, Cross-Platform Cryptographic Provider Open
International audience
View article
Scaling symbolic evaluation for automated verification of systems code with Serval Open
This paper presents Serval, a framework for developing automated verifiers for systems software. Serval provides an extensible infrastructure for creating verifiers by lifting interpreters under symbolic evaluation, and a systematic approa…
View article
How bugs are born: a model to identify how bugs are introduced in software components Open
When identifying the origin of software bugs, many studies assume that “a bug was introduced by the lines of code that were modified to fix it”. However, this assumption does not always hold and at least in some cases, these modified lines…
View article
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP Open
While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. …
View article
esATAC: an easy-to-use systematic pipeline for ATAC-seq data analysis Open
Summary ATAC-seq is rapidly emerging as one of the major experimental approaches to probe chromatin accessibility genome-wide. Here, we present ‘esATAC’, a highly integrated easy-to-use R/Bioconductor package, for systematic ATAC-seq data …
View article
Scalable Approaches for Test Suite Reduction Open
Test suite reduction approaches aim at decreasing software regression testing costs by selecting a representative subset from large-size test suites. Most existing techniques are too expensive for handling modern massive systems and moreov…
View article
Practical Mutation Testing at Scale: A view from Google Open
Mutation analysis assesses a test suite’s adequacy by measuring its ability to detect small artificial faults, systematically seeded into the tested program. Mutation analysis is considered one of the strongest test-adequacy criteria. Muta…
View article
A Framework for Creating Deployable Smart Contracts for Non-fungible Tokens on the Ethereum Blockchain Open
Non-fungible tokens are an up and coming application domain for smart contracts. Ethereum is the first blockchain-based decentralized computing platform that has standardized this type of tokens into a well-defined interface, namely ERC721…
View article
An Empirical Study of Iterative Improvement in Programming Assignments Open
As automated tools for grading programming assignments become more widely used, it is imperative that we better understand how students are utilizing them. Other researchers have provided helpful data on the role automated assessment tools…
View article
Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations Open
Publisher Copyright: © 2023 Owner/Author.
View article
The MalSource Dataset: Quantifying Complexity and Code Reuse in Malware Development Open
During the last decades, the problem of malicious and unwanted software (malware) has surged in numbers and sophistication. Malware plays a key role in most of today's cyberattacks and has consolidated as a commodity in the underground eco…
View article
Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code Open
All major web browsers now support WebAssembly, a low-level bytecode intended to serve as a compilation target for code written in languages like C and C++. A key goal of WebAssembly is performance parity with native code; previous work re…