Explanipedia

MARAGE: Transferable Multi-Model Adversarial Attack for Retrieval-Augmented Generation Data Extraction Open

Xiaoling Hu, Eric Minwei Liu, Weizhou Wang, Xiangyu Guo, David Lie · 2025

Retrieval-Augmented Generation (RAG) offers a solution to mitigate hallucinations in Large Language Models (LLMs) by grounding their outputs to knowledge retrieved from external sources. The use of private resources and data in constructin…

Time Will Tell: Timing Side Channels via Output Token Count in Large Language Models Open

Tianchen Zhang, Gururaj Saileshwar, David Lie · 2024

This paper demonstrates a new side-channel that enables an adversary to extract sensitive information about inference inputs in large language models (LLMs) based on the number of output tokens in the LLM response. We construct attacks usi…

ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data Open

Weizhou Wang, Eric Liu, Xiangyu Guo, David Lie · 2024

Supervised-learning-based vulnerability detectors often fall short due to limited labelled training data. In contrast, Large Language Models (LLMs) like GPT-4 are trained on vast unlabelled code corpora, yet perform only marginally better …

A Survey of Hardware Improvements to Secure Program Execution Open

Lianying Zhao, He Shuang, Shengjie Xu, Wei Huang, Rongzhen Cui , et al. · 2024

Hardware has been constantly augmented for security considerations since the advent of computers. There is also a common perception among computer users that hardware does a relatively better job on security assurance compared with softwar…

LDPKiT: Superimposing Remote Queries for Privacy-Preserving Local Model Training Open

Kexin Li, Xi Yang, Aastha Mehta, David Lie · 2024

Users of modern Machine Learning (ML) cloud services face a privacy conundrum -- on one hand, they may have concerns about sending private data to the service for inference, but on the other hand, for specialized models, there may be no al…

Maximizing Information Gain in Privacy-Aware Active Learning of Email Anomalies Open

Mu-Huan Chung, Sharon Li, Jaturong Kongmanee, Lu Wang, Yuhong Yang , et al. · 2024

Redacted emails satisfy most privacy requirements but they make it more difficult to detect anomalous emails that may be indicative of data exfiltration. In this paper we develop an enhanced method of Active Learning using an information g…

Dumviri: Detecting Trackers and Mixed Trackers with a Breakage Detector Open

Shuang He, Lianying Zhao, David Lie · 2024

Web tracking harms user privacy. As a result, the use of tracker detection and blocking tools is a common practice among Internet users. However, no such tool can be perfect, and thus there is a trade-off between avoiding breakage (caused …

Calpric: Inclusive and Fine-grain Labeling of Privacy Policies with Crowdsourcing and Active Learning Open

Wenjun Qiu, David Lie, Lisa Austin · 2024

A significant challenge to training accurate deep learning models on privacy policies is the cost and difficulty of obtaining a large and comprehensive set of training data. To address these challenges, we present Calpric , which combines …

Maximizing Information Gain in Privacy-Aware Active Learning of Email Anomalies Open

Mu-Huan Chung, Jaturong Kongmanee, Lu Wang, Yuhong Yang, Calvin Giang , et al. · 2024

MIFP: Selective Fat-Pointer Bounds Compression for Accurate Bounds Checking Open

Shengjie Xu, Eric X. Liu, Wei Huang, David Lie · 2023

Bounds compression for fat pointers can reduce the memory and performance overhead of maintaining pointer bounds and is necessary for efficient hardware implementation. However, compression can introduce inaccuracy to the bounds, making ce…

Implementing Active Learning in Cybersecurity: Detecting Anomalies in Redacted Emails Open

Mu-Huan, Chung, Lu Wang, Sharon Sharon, LI . , et al. · 2023

Research on email anomaly detection has typically relied on specially prepared datasets that may not adequately reflect the type of data that occurs in industry settings. In our research, at a major financial services company, privacy conc…

In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning Open

Jiaqi Wang, Roei Schuster, Ilia Shumailov, David Lie, Nicolas Papernot · 2022

When learning from sensitive data, care must be taken to ensure that training algorithms address privacy concerns. The canonical Private Aggregation of Teacher Ensembles, or PATE, computes output labels by aggregating the predictions of a …

On the Exploitability of Audio Machine Learning Pipelines to Surreptitious Adversarial Examples Open

Adelin Travers, Lorna Licollari, Guanghan Wang, Varun Chandrasekaran, Adam Dziedzic , et al. · 2021

Machine learning (ML) models are known to be vulnerable to adversarial examples. Applications of ML to voice biometrics authentication are no exception. Yet, the implications of audio adversarial examples on these real-world systems remain…

Data Trusts and the Governance of Smart Environments: Lessons from the Failure of Sidewalk Labs’ Urban Data Trust Open

Lisa M. Austin, David Lie · 2021

Data trusts are an increasingly popular proposal for managing complex data governance questions, although what they are remains contested. Sidewalk Labs proposed creating an “Urban Data Trust” as part of the Sidewalk Toronto “smart” redeve…

Program Committee Open

Alina Oprea, Adam J. Aviv, Davide Balzarotti, Gilles Barthe, Karthikeyan Bhargavan , et al. · 2021

In-fat pointer: hardware-assisted tagged-pointer spatial memory safety defense with subobject granularity protection Open

Shengjie Xu, Wei Huang, David Lie · 2021

Programming languages like C and C++ are not memory-safe because they provide programmers with low-level pointer manipulation primitives. The incorrect use of these primitives can result in bugs and security vulnerabilities: for example, s…

Online Harms and Lawful Access: A Submission to the Government of Canada Open

Lisa M. Austin, Andrea Slane, David Lie, Ian Goldberg · 2021

Emilia: Catching Iago in Legacy Code Open

Rongzhen Cui, Lianying Zhao, David Lie · 2021

There has been interest in mechanisms that enable the secure use of legacy code to implement trusted code in a Trusted Execution Environment (TEE), such as Intel SGX.However, because legacy code generally assumes the presence of an operati…

Deep Active Learning with Crowdsourcing Data for Privacy Policy Classification Open

Wenjun Qiu, David Lie · 2020

Privacy policies are statements that notify users of the services' data practices. However, few users are willing to read through policy texts due to the length and complexity. While automated tools based on machine learning exist for priv…

vWitness: Certifying Web Page Interactions with Computer Vision Open

Shuang He, Lianying Zhao, David Lie · 2020

Web servers service client requests, some of which might cause the web server to perform security-sensitive operations (e.g. money transfer, voting). An attacker may thus forge or maliciously manipulate such requests by compromising a web …

Using Context and Interactions to Verify User-Intended Network Requests. Open

Shuang He, Michelle Y. Wong, David Lie · 2020

Client-side malware can attack users by tampering with applications or user interfaces to generate requests that users did not intend. We propose Verified Intention (VInt), which ensures a network request, as received by a service, is user…

Ex-vivo dynamic analysis framework for Android device drivers Open

Ivan Pustogarov, Qian Wu, David Lie · 2020

The ability to execute and analyze code makes many security tasks such as exploit development, reverse engineering, and vulnerability detection much easier. However, on embedded devices such as Android smartphones, executing code in-vivo, …

Test, Trace, and Isolate: COVID-19 and the Canadian Constitution Open

Lisa M. Austin, Vincent Chiao, Beth Coleman, David Lie, Martha Shaffer , et al. · 2020

Machine Unlearning Open

Lucas Bourtoule, Varun Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia, Adelin Travers , et al. · 2019

Once users have shared their data online, it is generally difficult for them to revoke access and ask for the data to be deleted. Machine learning (ML) exacerbates this problem because any model trained with said data may have memorized it…

Machine Unlearning Open

Lucas Bourtoule, Varun Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia, Adelin Travers , et al. · 2019

Once users have shared their data online, it is generally difficult for them to revoke access and ask for the data to be deleted. Machine learning (ML) exacerbates this problem because any model trained with said data may have memorized it…

SoK: Hardware Security Support for Trustworthy Execution Open

Lianying Zhao, Shuang He, Shengjie Xu, Wei Huang, Rongzhen Cui , et al. · 2019

In recent years, there have emerged many new hardware mechanisms for improving the security of our computer systems. Hardware offers many advantages over pure software approaches: immutability of mechanisms to software attacks, better exec…

Critical Index Determination Method on Visual Assessment of Concrete Damage for Buildings Open

Henny Wiyanto, David Lie, James Kurniawan · 2019

Visual Assessment is an initial assessment of the concrete condition of a building (non-destructive test). There are multiple types of concrete damage, so it is necessary to identify the type of damage that could be assessed visually. To f…

Using Safety Properties to Generate Vulnerability Patches Open

Zhen Huang, David Lie, Gang Tan, Trent Jaeger · 2019

Security vulnerabilities are among the most critical software defects in existence. When identified, programmers aim to produce patches that prevent the vulnerability as quickly as possible, motivating the need for automatic program repair…

MultiK: A Framework for Orchestrating Multiple Specialized Kernels Open

Hsuan-Chi Kuo, Akshith Gunasekaran, Yeongjin Jang, Sibin Mohan, Rakesh B. Bobba , et al. · 2019

We present, MultiK, a Linux-based framework 1 that reduces the attack surface for operating system kernels by reducing code bloat. MultiK "orchestrates" multiple kernels that are specialized for individual applications in a transparent man…

Safe Sharing Sites Open

Lisa M. Austin, David Lie · 2019

In this paper we argue that data-sharing is an activity that sits at the crossroads of privacy concerns and the broader challenges of data governance surrounding access and use. Using the Sidewalk Toronto “smart city” proposal as a startin…

David Lie YOU? Author Swipe