David Lie
YOU?
Author Swipe
View article: MARAGE: Transferable Multi-Model Adversarial Attack for Retrieval-Augmented Generation Data Extraction
MARAGE: Transferable Multi-Model Adversarial Attack for Retrieval-Augmented Generation Data Extraction Open
Retrieval-Augmented Generation (RAG) offers a solution to mitigate hallucinations in Large Language Models (LLMs) by grounding their outputs to knowledge retrieved from external sources. The use of private resources and data in constructin…
View article: Time Will Tell: Timing Side Channels via Output Token Count in Large Language Models
Time Will Tell: Timing Side Channels via Output Token Count in Large Language Models Open
This paper demonstrates a new side-channel that enables an adversary to extract sensitive information about inference inputs in large language models (LLMs) based on the number of output tokens in the LLM response. We construct attacks usi…
View article: ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data
ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data Open
Supervised-learning-based vulnerability detectors often fall short due to limited labelled training data. In contrast, Large Language Models (LLMs) like GPT-4 are trained on vast unlabelled code corpora, yet perform only marginally better …
View article: A Survey of Hardware Improvements to Secure Program Execution
A Survey of Hardware Improvements to Secure Program Execution Open
Hardware has been constantly augmented for security considerations since the advent of computers. There is also a common perception among computer users that hardware does a relatively better job on security assurance compared with softwar…
View article: LDPKiT: Superimposing Remote Queries for Privacy-Preserving Local Model Training
LDPKiT: Superimposing Remote Queries for Privacy-Preserving Local Model Training Open
Users of modern Machine Learning (ML) cloud services face a privacy conundrum -- on one hand, they may have concerns about sending private data to the service for inference, but on the other hand, for specialized models, there may be no al…
View article: Maximizing Information Gain in Privacy-Aware Active Learning of Email Anomalies
Maximizing Information Gain in Privacy-Aware Active Learning of Email Anomalies Open
Redacted emails satisfy most privacy requirements but they make it more difficult to detect anomalous emails that may be indicative of data exfiltration. In this paper we develop an enhanced method of Active Learning using an information g…
View article: Dumviri: Detecting Trackers and Mixed Trackers with a Breakage Detector
Dumviri: Detecting Trackers and Mixed Trackers with a Breakage Detector Open
Web tracking harms user privacy. As a result, the use of tracker detection and blocking tools is a common practice among Internet users. However, no such tool can be perfect, and thus there is a trade-off between avoiding breakage (caused …
View article: Calpric: Inclusive and Fine-grain Labeling of Privacy Policies with Crowdsourcing and Active Learning
Calpric: Inclusive and Fine-grain Labeling of Privacy Policies with Crowdsourcing and Active Learning Open
A significant challenge to training accurate deep learning models on privacy policies is the cost and difficulty of obtaining a large and comprehensive set of training data. To address these challenges, we present Calpric , which combines …
View article: Maximizing Information Gain in Privacy-Aware Active Learning of Email Anomalies
Maximizing Information Gain in Privacy-Aware Active Learning of Email Anomalies Open
View article: MIFP: Selective Fat-Pointer Bounds Compression for Accurate Bounds Checking
MIFP: Selective Fat-Pointer Bounds Compression for Accurate Bounds Checking Open
Bounds compression for fat pointers can reduce the memory and performance overhead of maintaining pointer bounds and is necessary for efficient hardware implementation. However, compression can introduce inaccuracy to the bounds, making ce…
View article: Implementing Active Learning in Cybersecurity: Detecting Anomalies in Redacted Emails
Implementing Active Learning in Cybersecurity: Detecting Anomalies in Redacted Emails Open
Research on email anomaly detection has typically relied on specially prepared datasets that may not adequately reflect the type of data that occurs in industry settings. In our research, at a major financial services company, privacy conc…
View article: In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning
In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning Open
When learning from sensitive data, care must be taken to ensure that training algorithms address privacy concerns. The canonical Private Aggregation of Teacher Ensembles, or PATE, computes output labels by aggregating the predictions of a …
View article: On the Exploitability of Audio Machine Learning Pipelines to Surreptitious Adversarial Examples
On the Exploitability of Audio Machine Learning Pipelines to Surreptitious Adversarial Examples Open
Machine learning (ML) models are known to be vulnerable to adversarial examples. Applications of ML to voice biometrics authentication are no exception. Yet, the implications of audio adversarial examples on these real-world systems remain…
View article: Data Trusts and the Governance of Smart Environments: Lessons from the Failure of Sidewalk Labs’ Urban Data Trust
Data Trusts and the Governance of Smart Environments: Lessons from the Failure of Sidewalk Labs’ Urban Data Trust Open
Data trusts are an increasingly popular proposal for managing complex data governance questions, although what they are remains contested. Sidewalk Labs proposed creating an “Urban Data Trust” as part of the Sidewalk Toronto “smart” redeve…
View article: Program Committee
Program Committee Open
View article: In-fat pointer: hardware-assisted tagged-pointer spatial memory safety defense with subobject granularity protection
In-fat pointer: hardware-assisted tagged-pointer spatial memory safety defense with subobject granularity protection Open
Programming languages like C and C++ are not memory-safe because they provide programmers with low-level pointer manipulation primitives. The incorrect use of these primitives can result in bugs and security vulnerabilities: for example, s…
View article: Online Harms and Lawful Access: A Submission to the Government of Canada
Online Harms and Lawful Access: A Submission to the Government of Canada Open
View article: Emilia: Catching Iago in Legacy Code
Emilia: Catching Iago in Legacy Code Open
There has been interest in mechanisms that enable the secure use of legacy code to implement trusted code in a Trusted Execution Environment (TEE), such as Intel SGX.However, because legacy code generally assumes the presence of an operati…
View article: Deep Active Learning with Crowdsourcing Data for Privacy Policy Classification
Deep Active Learning with Crowdsourcing Data for Privacy Policy Classification Open
Privacy policies are statements that notify users of the services' data practices. However, few users are willing to read through policy texts due to the length and complexity. While automated tools based on machine learning exist for priv…
View article: vWitness: Certifying Web Page Interactions with Computer Vision
vWitness: Certifying Web Page Interactions with Computer Vision Open
Web servers service client requests, some of which might cause the web server to perform security-sensitive operations (e.g. money transfer, voting). An attacker may thus forge or maliciously manipulate such requests by compromising a web …
View article: Using Context and Interactions to Verify User-Intended Network Requests.
Using Context and Interactions to Verify User-Intended Network Requests. Open
Client-side malware can attack users by tampering with applications or user interfaces to generate requests that users did not intend. We propose Verified Intention (VInt), which ensures a network request, as received by a service, is user…
View article: Ex-vivo dynamic analysis framework for Android device drivers
Ex-vivo dynamic analysis framework for Android device drivers Open
The ability to execute and analyze code makes many security tasks such as exploit development, reverse engineering, and vulnerability detection much easier. However, on embedded devices such as Android smartphones, executing code in-vivo, …
View article: Test, Trace, and Isolate: COVID-19 and the Canadian Constitution
Test, Trace, and Isolate: COVID-19 and the Canadian Constitution Open
View article: Machine Unlearning
Machine Unlearning Open
Once users have shared their data online, it is generally difficult for them
to revoke access and ask for the data to be deleted. Machine learning (ML)
exacerbates this problem because any model trained with said data may have
memorized it…
View article: Machine Unlearning
Machine Unlearning Open
Once users have shared their data online, it is generally difficult for them to revoke access and ask for the data to be deleted. Machine learning (ML) exacerbates this problem because any model trained with said data may have memorized it…
View article: SoK: Hardware Security Support for Trustworthy Execution
SoK: Hardware Security Support for Trustworthy Execution Open
In recent years, there have emerged many new hardware mechanisms for improving the security of our computer systems. Hardware offers many advantages over pure software approaches: immutability of mechanisms to software attacks, better exec…
View article: Critical Index Determination Method on Visual Assessment of Concrete Damage for Buildings
Critical Index Determination Method on Visual Assessment of Concrete Damage for Buildings Open
Visual Assessment is an initial assessment of the concrete condition of a building (non-destructive test). There are multiple types of concrete damage, so it is necessary to identify the type of damage that could be assessed visually. To f…
View article: Using Safety Properties to Generate Vulnerability Patches
Using Safety Properties to Generate Vulnerability Patches Open
Security vulnerabilities are among the most critical software defects in existence. When identified, programmers aim to produce patches that prevent the vulnerability as quickly as possible, motivating the need for automatic program repair…
View article: MultiK: A Framework for Orchestrating Multiple Specialized Kernels
MultiK: A Framework for Orchestrating Multiple Specialized Kernels Open
We present, MultiK, a Linux-based framework 1 that reduces the attack surface for operating system kernels by reducing code bloat. MultiK "orchestrates" multiple kernels that are specialized for individual applications in a transparent man…
View article: Safe Sharing Sites
Safe Sharing Sites Open
In this paper we argue that data-sharing is an activity that sits at the crossroads of privacy concerns and the broader challenges of data governance surrounding access and use. Using the Sidewalk Toronto “smart city” proposal as a startin…