Todd Mytkowicz
YOU?
Author Swipe
View article: Resolving Build Conflicts via Example-Based and Rule-Based Program Transformations
Resolving Build Conflicts via Example-Based and Rule-Based Program Transformations Open
Merge conflicts often arise when developers integrate changes from different software branches. The conflicts can result from overlapping edits in programs (i.e., textual conflicts) or cause build and test errors (i.e., build and test conf…
View article: CodeExp: Explanatory Code Document Generation
CodeExp: Explanatory Code Document Generation Open
Developing models that can automatically generate detailed code explanation can greatly benefit software maintenance and programming education. However, existing code-to-text generation models often produce only high-level summaries of cod…
View article: Program merge conflict resolution via neural transformers
Program merge conflict resolution via neural transformers Open
Collaborative software development is an integral part of the modern software\ndevelopment life cycle, essential to the success of large-scale software\nprojects. When multiple developers make concurrent changes around the same\nlines of c…
View article: TOGA
TOGA Open
Testing is widely recognized as an important stage of the software\ndevelopment lifecycle. Effective software testing can provide benefits such as\nbug finding, preventing regressions, and documentation. In terms of\ndocumentation, unit te…
View article: CodeExp: Explanatory Code Document Generation
CodeExp: Explanatory Code Document Generation Open
Developing models that can automatically generate detailed code explanation can greatly benefit software maintenance and programming education. However, existing code-to-text generation models often produce only high-level summaries of cod…
View article: Can Pre-trained Language Models be Used to Resolve Textual and Semantic Merge Conflicts?
Can Pre-trained Language Models be Used to Resolve Textual and Semantic Merge Conflicts? Open
Program merging is standard practice when developers integrate their individual changes to a common code base. When the merge algorithm fails, this is called a merge conflict. The conflict either manifests in textual merge conflicts where …
View article: Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL
Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL Open
Large ML models and datasets have necessitated the use of multi-GPU systems for distributed model training. To harness the power offered by multi-GPU systems, it is critical to eliminate bottlenecks in inter-GPU communication - a problem m…
View article: TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches Open
Machine learning models are increasingly being trained across multiple GPUs and servers. In this setting, data is transferred between GPUs using communication collectives such as AlltoAll and AllReduce, which can become a significant bottl…
View article: Understanding the Efficiency of Social Tagging Systems Using Information Theory
Understanding the Efficiency of Social Tagging Systems Using Information Theory Open
Given the rise in popularity of social tagging systems, it seems only natural to ask how efficient is the organically evolved tagging vocabulary in describing any underlying document objects? Does this distributed process really provide a …
View article: Neural Unit Test Suggestions.
Neural Unit Test Suggestions. Open
Testing is widely recognized as an important stage of the software development lifecycle. Effective software testing can provide benefits such as documentation, bug finding, and preventing regressions. In particular, unit tests document a …
View article: DeepMerge: Learning to Merge Programs
DeepMerge: Learning to Merge Programs Open
In collaborative software development, program merging is the mechanism to integrate changes from multiple programmers. Merge algorithms in modern version control systems report a conflict when changes interfere textually. Merge conflicts …
View article: Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads
Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads Open
Recent trend towards increasing large machine learning models require both training and inference tasks to be distributed. Considering the huge cost of training these models, it is imperative to unlock optimizations in computation and comm…
View article: Distributed Training of Embeddings using Graph Analytics
Distributed Training of Embeddings using Graph Analytics Open
Many applications today, such as NLP, network analysis, and code analysis, rely on semantically embedding objects into low-dimensional fixed-length vectors. Such embeddings naturally provide a way to perform useful downstream tasks, such a…
View article: Scaling Distributed Training with Adaptive Summation
Scaling Distributed Training with Adaptive Summation Open
Stochastic gradient descent (SGD) is an inherently sequential training algorithm--computing the gradient at batch $i$ depends on the model parameters learned from batch $i-1$. Prior approaches that break this dependence do not honor them (…
View article: Niijima
Niijima Open
Multilingual data-parallel pipelines, such as Microsoft's Scope and Apache Spark, are widely used in real-world analytical tasks. While the involvement of multiple languages (often including both managed and native languages) provides much…
View article: Distributed Word2Vec using Graph Analytics Frameworks.
Distributed Word2Vec using Graph Analytics Frameworks. Open
Word embeddings capture semantic and syntactic similarities of words, represented as vectors. Word2Vec is a popular implementation of word embeddings; it takes as input a large corpus of text and learns a model that maps unique words in th…
View article: CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs
CHET: Compiler and Runtime for Homomorphic Evaluation of Tensor Programs Open
Fully Homomorphic Encryption (FHE) refers to a set of encryption schemes that allow computations to be applied directly on encrypted data without requiring a secret key. This enables novel application scenarios where a client can safely of…
View article: High Five: Improving Gesture Recognition by Embracing Uncertainty
High Five: Improving Gesture Recognition by Embracing Uncertainty Open
Sensors on mobile devices---accelerometers, gyroscopes, pressure meters, and GPS---invite new applications in gesture recognition, gaming, and fitness tracking. However, programming them remains challenging because human gestures captured …
View article: Debugging probabilistic programs
Debugging probabilistic programs Open
Many applications compute with estimated and uncertain data. While advances in probabilistic programming help developers build such applications, debugging them remains extremely challenging. New types of errors in probabilistic programs i…
View article: Parallel Stochastic Gradient Descent with Sound Combiners
Parallel Stochastic Gradient Descent with Sound Combiners Open
Stochastic gradient descent (SGD) is a well known method for regression and classification tasks. However, it is an inherently sequential algorithm at each step, the processing of the current example depends on the parameters learned from …
View article: Jumping the ORDER BY Barrier in Large-Scale Pattern Matching
Jumping the ORDER BY Barrier in Large-Scale Pattern Matching Open
Event-series pattern matching is a major component of large-scale data analytics pipelines enabling a wide range of system diagnostics tasks. A precursor to pattern matching is an expensive ``shuffle the world'' stage wherein data are orde…
View article: Efficient parallelization using rank convergence in dynamic programming algorithms
Efficient parallelization using rank convergence in dynamic programming algorithms Open
This paper proposes an efficient parallel algorithm for an important class of dynamic programming problems that includes Viterbi, Needleman--Wunsch, Smith--Waterman, and Longest Common Subsequence. In dynamic programming, the subproblems t…
View article: Low-Rank Methods for Parallelizing Dynamic Programming Algorithms
Low-Rank Methods for Parallelizing Dynamic Programming Algorithms Open
This article proposes efficient parallel methods for an important class of dynamic programming problems that includes Viterbi, Needleman-Wunsch, Smith-Waterman, and Longest Common Subsequence. In dynamic programming, the subproblems that d…
View article: Guest Editors' Introduction: Approximate Computing
Guest Editors' Introduction: Approximate Computing Open
Ih classicial technology scaling, also known as Dennard's scaling has tremendously improved computer's performance over the past decades, which in turn has enabled countless innovative applications benefiting our daily lives today. However…
View article: Approximate and Probabilistic Computing: Design, Coding, Verification (Dagstuhl Seminar 15491)
Approximate and Probabilistic Computing: Design, Coding, Verification (Dagstuhl Seminar 15491) Open
Computing has entered the era of approximation, in which hardware and software generate and reason about estimates. Navigation applications turn maps and location estimates from hardware GPS sensors into driving directions; speech recognit…
View article: Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup
Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup Open
This paper presents Yinyang K-means, a new algorithm for K-means clustering. By cluster-ing the centers in the initial stage, and lever-aging efficiently maintained lower and upper bounds between a point and centers, it more effectively av…
View article: InterPoll: Crowd-Sourced Internet Polls
InterPoll: Crowd-Sourced Internet Polls Open
Crowd-sourcing is increasingly being used to provide answers to online polls and surveys. However, existing systems, while taking care of the mechanics of attracting crowd workers, poll building, and payment, provide little to help the sur…