Sarah Fakhoury
YOU?
Author Swipe
View article: Good Vibrations? A Qualitative Study of Co-Creation, Communication, Flow, and Trust in Vibe Coding
Good Vibrations? A Qualitative Study of Co-Creation, Communication, Flow, and Trust in Vibe Coding Open
Vibe coding, a term coined by Andrej Karpathy in February 2025, has quickly become a compelling and controversial natural language programming paradigm in AI-assisted software development. Centered on iterative co-design with an AI assista…
View article: DiffSpec: Differential Testing with LLMs using Natural Language Specifications and Code Artifacts
DiffSpec: Differential Testing with LLMs using Natural Language Specifications and Code Artifacts Open
Differential testing can be an effective way to find bugs in software systems with multiple implementations that conform to the same specification, like compilers, network protocol parsers, or language runtimes. Specifications for such sys…
View article: Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions?
Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions? Open
Informal natural language that describes code functionality, such as code comments or function documentation, may contain substantial information about a program’s intent. However, there is typically no guarantee that a program’s implement…
View article: Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming
Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming Open
Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to au…
View article: 3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers
3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers Open
Improper parsing of attacker-controlled input is a leading source of software security vulnerabilities, especially when programmers transcribe informal format descriptions in RFCs into efficient parsing logic in low-level, memory unsafe la…
View article: LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation
LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation Open
Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, given NL is informal, it does not lend easily to checking th…
View article: Exploring the Effectiveness of LLM based Test-driven Interactive Code Generation: User Study and Empirical Evaluation
Exploring the Effectiveness of LLM based Test-driven Interactive Code Generation: User Study and Empirical Evaluation Open
We introduce a novel workflow, TiCoder, designed to enhance the trust and accuracy of LLM-based code generation through interactive and guided intent formalization. TiCoder partially formalizes ambiguous intent in natural language prompts …
View article: NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions
NL2Fix: Generating Functionally Correct Code Edits from Bug Descriptions Open
Despite the notable advancement of Large Language Models for Code Generation, there is a distinct gap in benchmark datasets and evaluation of LLMs' proficiency in generating functionally correct code edits based on natural language descrip…
View article: Ranking LLM-Generated Loop Invariants for Program Verification
Ranking LLM-Generated Loop Invariants for Program Verification Open
Synthesizing inductive loop invariants is fundamental to automating program verification. In this work, we observe that Large Language Models (such as gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of programs in…
View article: Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions?
Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions? Open
Informal natural language that describes code functionality, such as code comments or function documentation, may contain substantial information about a programs intent. However, there is typically no guarantee that a programs implementat…
View article: Towards Generating Functionally Correct Code Edits from Natural Language Issue Descriptions
Towards Generating Functionally Correct Code Edits from Natural Language Issue Descriptions Open
Large language models (LLMs), such as OpenAI's Codex, have demonstrated their potential to generate code from natural language descriptions across a wide range of programming tasks. Several benchmarks have recently emerged to evaluate the …
View article: Ranking LLM-Generated Loop Invariants for Program Verification
Ranking LLM-Generated Loop Invariants for Program Verification Open
Saikat Chakraborty, Shuvendu Lahiri, Sarah Fakhoury, Akash Lal, Madanlal Musuvathi, Aseem Rastogi, Aditya Senthilnathan, Rahul Sharma, Nikhil Swamy. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023.
View article: Program merge conflict resolution via neural transformers
Program merge conflict resolution via neural transformers Open
Collaborative software development is an integral part of the modern software\ndevelopment life cycle, essential to the success of large-scale software\nprojects. When multiple developers make concurrent changes around the same\nlines of c…
View article: MergeBERT: Program Merge Conflict Resolution via Neural Transformers.
MergeBERT: Program Merge Conflict Resolution via Neural Transformers. Open
Collaborative software development is an integral part of the modern software development life cycle, essential to the success of large-scale software projects. When multiple developers make concurrent changes around the same lines of code…
View article: Reassessing automatic evaluation metrics for code summarization tasks
Reassessing automatic evaluation metrics for code summarization tasks Open
In recent years, research in the domain of source code summarization has adopted data-driven techniques pioneered in machine translation (MT). Automatic evaluation metrics such as BLEU, METEOR, and ROUGE, are fundamental to the evaluation …
View article: A Model to Detect Readability Improvements in Incremental Changes
A Model to Detect Readability Improvements in Incremental Changes Open
Identifying source code that has poor readability allows developers to focus maintenance efforts on problematic code. Therefore, the effort to develop models that can quantify the readability of a piece of source code has been an area of i…
View article: Moving towards objective measures of program comprehension
Moving towards objective measures of program comprehension Open
Traditionally, program comprehension research relies heavily on indirect measures of comprehension, where subjects report on their own comprehension levels or summarize part of an artifact so that researchers can instead deduce the level o…