Explanipedia

Horovod: fast and easy distributed deep learning in TensorFlow Open

Alexander Sergeev, Mike Del Balso · 2018

Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the tr…

TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP Open

John X. Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin , et al. · 2020

Computer science Chemistry Physics

While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. …

On the "naturalness" of buggy code Open

Baishakhi Ray, Vincent J. Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli , et al. · 2016

Computer science Physics

Real software, the kind working programmers produce by the kLOC to solve real-world problems, tends to be "natural", like speech or natural language; it tends to be highly repetitive and predictable. Researchers have captured this naturaln…

PIT: a practical mutation testing tool for Java (demo) Open

Henry Coles, Thomas Laurent, Christopher Henard, Mike Papadakis, Anthony Ventresque · 2016

Computer science Chemistry

International Symposium on Software Testing and Analysis (ISSTA), Saarbrücken, Germany, 18-20 July 2016

Scaling static analyses at Facebook Open

Dino Distefano, Manuel Fähndrich, Francesco Logozzo, Peter W. O’Hearn · 2019

Computer science Mathematics

Key lessons for designing static analyses tools deployed to find bugs in hundreds of millions of lines of code.

End-to-End Deep Learning of Optimization Heuristics Open

Chris Cummins, Pavlos Petoumenos, Zheng Wang, Hugh Leather · 2017

Computer science Philosophy

Accurate automatic optimization heuristics are necessary for dealing with thecomplexity and diversity of modern hardware and software. Machine learning is aproven technique for learning such heuristics, but its success is bound by thequali…

Practical program repair via bytecode mutation Open

Ali Ghanbari, Samuel Benton, Lingming Zhang · 2019

Computer science

Automated Program Repair (APR) is one of the most recent advances in automated debugging, and can directly fix buggy programs with minimal human intervention. Although various advanced APR techniques (including search-based or semantic-bas…

Why Google stores billions of lines of code in a single repository Open

Rachel Potvin, Josh Levenberg · 2016

Computer science

Google's monolithic repository provides a common source of truth for tens of thousands of developers around the world.

GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow Open

Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora , et al. · 2021

Computer science Geography Mathematics

GPT-Neo is an implementation of model & data-parallel GPT-2 and GPT-3-like models, utilizing Mesh Tensorflow for distributed support. This codebase is designed for TPUs. It should also work on GPUs, though we do not recommend this hardware…

RefDiff: Detecting Refactorings in Version Histories Open

Danilo Silva, Marco Túlio Valente · 2017

Computer science

Refactoring is a well-known technique that is widely adopted by software\nengineers to improve the design and enable the evolution of a system. Knowing\nwhich refactoring operations were applied in a code change is a valuable\ninformation …

Predicting Defective Lines Using a Model-Agnostic Technique Open

Supatsara Wattanakriengkrai, Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Hideaki Hata, Kenichi Matsumoto · 2020

Computer science Geology Mathematics

Defect prediction models are proposed to help a team prioritize source code areas files that need Software Quality Assurance (SQA) based on the likelihood of having defects. However, developers may waste their unnecessary effort on the who…

A deep tree-based model for software defect prediction Open

Hoa Khanh Dam, Trang Pham, Shien Wee Ng, Truyen Tran, John Grundy , et al. · 2018

Computer science Mathematics Physics

Defects are common in software systems and can potentially cause various problems to software users. Different methods have been developed to quickly predict the most likely locations of defects in large code bases. Most of them focus on d…

Democratizing artificial intelligence: How no-code AI can leverage machine learning operations Open

Leif Sundberg, Jonny Holmström · 2023

Computer science Business

Organizations are increasingly seeking to generate value and insights from their data by integrating advances in artificial intelligence (AI) such as machine learning (ML) systems into their operations. However, there are several manageria…

Exploring the Potential of ChatGPT in Automated Code Refinement: An Empirical Study Open

Qi Guo, Junming Cao, Xiaofei Xie, Shangqing Liu, Xiaohong Li , et al. · 2024

Computer science Engineering Philosophy

Code review is an essential activity for ensuring the quality and maintainability of software projects. However, it is a time-consuming and often error-prone task that can significantly impact the development process. Recently, ChatGPT, a …

Reassessing automatic evaluation metrics for code summarization tasks Open

Devjeet Roy, Sarah Fakhoury, Venera Arnaoudova · 2021

Computer science Economics Mathematics

In recent years, research in the domain of source code summarization has adopted data-driven techniques pioneered in machine translation (MT). Automatic evaluation metrics such as BLEU, METEOR, and ROUGE, are fundamental to the evaluation …

Is GitHub copilot a substitute for human pair-programming? Open

Saki Imai · 2022

Computer science Economics Philosophy

This empirical study investigates the effectiveness of pair programming with GitHub Copilot in comparison to human pair-programming. Through an experiment with 21 participants we focus on code productivity and code quality. For experimenta…

An empirical study on the effectiveness of static C code analyzers for vulnerability detection Open

Stephan Lipp, Sebastian Bănescu, Alexander Pretschner · 2022

Computer science Geography

Presentation slides of my talk about SAST tool effectiveness at the BINSEC Team, Université Paris-Saclay, and the Research Training Group ConVeY, Technical University of Munich (TUM) and Ludwig Maximilian University of Munich (LMU).

Is "Better Data" Better than "Better Data Miners"? (On the Benefits of Tuning SMOTE for Defect Prediction) Open

Amritanshu Agrawal, Tim Menzies · 2017

Computer science

We report and fix an important systematic error in prior studies that ranked classifiers for software analytics. Those studies did not (a) assess classifiers on multiple criteria and they did not (b) study how variations in the data affect…

EverCrypt: A Fast, Verified, Cross-Platform Cryptographic Provider Open

Jonathan Protzenko, Bryan Parno, Aymeric Fromherz, Chris Hawblitzel, Marina Polubelova , et al. · 2020

Computer science Mathematics Medicine

International audience

Scaling symbolic evaluation for automated verification of systems code with Serval Open

Luke Nelson, James Bornholt, Ronghui Gu, Andrew Baumann, Emina Torlak , et al. · 2019

Computer science

This paper presents Serval, a framework for developing automated verifiers for systems software. Serval provides an extensible infrastructure for creating verifiers by lifting interpreters under symbolic evaluation, and a systematic approa…

How bugs are born: a model to identify how bugs are introduced in software components Open

Gema Rodríguez-Pérez, Gregório Robles, Alexander Serebrenik, Andy Zaidman, Daniel M. Germán , et al. · 2020

Computer science

When identifying the origin of software bugs, many studies assume that “a bug was introduced by the lines of code that were modified to fix it”. However, this assumption does not always hold and at least in some cases, these modified lines…

TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP Open

John X. Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin , et al. · 2020

Computer science Physics Chemistry

While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. …

esATAC: an easy-to-use systematic pipeline for ATAC-seq data analysis Open

Wei Zheng, Wei Zhang, Huan Fang, Yanda Li, Xiaowo Wang · 2018

Computer science Biology Engineering

Summary ATAC-seq is rapidly emerging as one of the major experimental approaches to probe chromatin accessibility genome-wide. Here, we present ‘esATAC’, a highly integrated easy-to-use R/Bioconductor package, for systematic ATAC-seq data …

Scalable Approaches for Test Suite Reduction Open

Emilio Cruciani, Breno Miranda, Roberto Verdecchia, Antonia Bertolino · 2019

Computer science Mathematics History

Test suite reduction approaches aim at decreasing software regression testing costs by selecting a representative subset from large-size test suites. Most existing techniques are too expensive for handling modern massive systems and moreov…

Practical Mutation Testing at Scale: A view from Google Open

Goran Petrović, Marko Ivanković, Gordon Fraser, René Just · 2021

Computer science Biology

Mutation analysis assesses a test suite’s adequacy by measuring its ability to detect small artificial faults, systematically seeded into the tested program. Mutation analysis is considered one of the strongest test-adequacy criteria. Muta…

A Framework for Creating Deployable Smart Contracts for Non-fungible Tokens on the Ethereum Blockchain Open

Dan Chirtoaca, Joshua Ellul, George Azzopardi · 2020

Computer science History Biology

Non-fungible tokens are an up and coming application domain for smart contracts. Ethereum is the first blockchain-based decentralized computing platform that has standardized this type of tokens into a well-defined interface, namely ERC721…

An Empirical Study of Iterative Improvement in Programming Assignments Open

Raymond Pettit, John Homer, Roger Gee, Susan Mengel, Adam Starbuck · 2015

Computer science Engineering Philosophy

As automated tools for grading programming assignments become more widely used, it is imperative that we better understand how students are utilizing them. Other researchers have provided helpful data on the role automated assessment tools…

Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations Open

Brent N. Reeves, Sami Sarsa, James Prather, Paul Denny, Brett A. Becker , et al. · 2023

Computer science Physics Philosophy

The MalSource Dataset: Quantifying Complexity and Code Reuse in Malware Development Open

Alejandro Calleja, Juan Tapiador, Juan Caballero · 2018

Computer science Sociology Biology

During the last decades, the problem of malicious and unwanted software (malware) has surged in numbers and sophistication. Malware plays a key role in most of today's cyberattacks and has consolidated as a commodity in the underground eco…

Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code Open

Abhinav Jangda, Bobby Powers, Emery D. Berger, Arjun Guha · 2019

Computer science History

All major web browsers now support WebAssembly, a low-level bytecode intended to serve as a compilation target for code written in languages like C and C++. A key goal of WebAssembly is performance parity with native code; previous work re…

Source lines of code ≈ Source lines of code