Antti Honkela
YOU?
Author Swipe
View article: A supervised Bayesian method for time (re)annotation of transcriptomics data
A supervised Bayesian method for time (re)annotation of transcriptomics data Open
Transcriptomics experiments are often conducted to capture changes in gene expression over time. However, time annotations may be missing, imprecise, or not reflect the same physiological state of the bacterial culture between different ex…
View article: Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning Open
With the emergence of powerful large-scale foundation models, the training paradigm is increasingly shifting from from-scratch training to transfer learning. This enables high utility training with small, domain-specific datasets typical i…
View article: Mitigating Disparate Impact of Differentially Private Learning through Bounded Adaptive Clipping
Mitigating Disparate Impact of Differentially Private Learning through Bounded Adaptive Clipping Open
Differential privacy (DP) has become an essential framework for privacy-preserving machine learning. Existing DP learning methods, however, often have disparate impacts on model predictions, e.g., for minority groups. Gradient clipping, wh…
View article: Gaussian DP for Reporting Differential Privacy Guarantees in Machine Learning
Gaussian DP for Reporting Differential Privacy Guarantees in Machine Learning Open
Current practices for reporting the level of differential privacy (DP) protection for machine learning (ML) algorithms such as DP-SGD provide an incomplete and potentially misleading picture of the privacy guarantees. For instance, if only…
View article: Hyperparameters in Score-Based Membership Inference Attacks
Hyperparameters in Score-Based Membership Inference Attacks Open
Membership Inference Attacks (MIAs) have emerged as a valuable framework for evaluating privacy leakage by machine learning models. Score-based MIAs are distinguished, in particular, by their ability to exploit the confidence scores that t…
View article: Differential Privacy in Continual Learning: Which Labels to Update?
Differential Privacy in Continual Learning: Which Labels to Update? Open
The goal of continual learning (CL) is to retain knowledge across tasks, but this conflicts with strict privacy required for sensitive training data that prevents storing or memorising individual samples. To address that, we combine CL and…
View article: NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA
NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA Open
The Privacy Preserving Federated Learning Document VQA (PFL-DocVQA) competition challenged the community to develop provably private and communication-efficient solutions in a federated setting for a real-life use case: invoice processing.…
View article: Noise-Aware Differentially Private Variational Inference
Noise-Aware Differentially Private Variational Inference Open
Differential privacy (DP) provides robust privacy guarantees for statistical inference, but this can lead to unreliable results and biases in downstream applications. While several noise-aware approaches have been proposed which integrate …
View article: Pan-pathogen deep sequencing of nosocomial bacterial pathogens in Italy in spring 2020: a prospective cohort study
Pan-pathogen deep sequencing of nosocomial bacterial pathogens in Italy in spring 2020: a prospective cohort study Open
Wellcome Trust, European Research Council, Academy of Finland Flagship program, Trond Mohn Foundation, and Research Council of Norway.
View article: Towards Efficient and Scalable Training of Differentially Private Deep Learning
Towards Efficient and Scalable Training of Differentially Private Deep Learning Open
Differentially private stochastic gradient descent (DP-SGD) is the standard algorithm for training machine learning models under differential privacy (DP). The most common DP-SGD privacy accountants rely on Poisson subsampling for ensuring…
View article: Collaborative learning from distributed data with differentially private synthetic data
Collaborative learning from distributed data with differentially private synthetic data Open
Background Consider a setting where multiple parties holding sensitive data aim to collaboratively learn population level statistics, but pooling the sensitive data sets is not possible due to privacy concerns and parties are unable to eng…
View article: Noise-Aware Differentially Private Regression via Meta-Learning
Noise-Aware Differentially Private Regression via Meta-Learning Open
Many high-stakes applications require machine learning models that protect user privacy and provide well-calibrated, accurate predictions. While Differential Privacy (DP) is the gold standard for protecting user privacy, standard DP mechan…
View article: Bayesian model-based method for clustering gene expression time series with multiple replicates
Bayesian model-based method for clustering gene expression time series with multiple replicates Open
In this study, we introduce a Bayesian model-based method for clustering transcriptomics time series data with multiple replicates. This technique is based on sampling Gaussian processes (GPs) within an infinite mixture model from a Dirich…
View article: Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning
Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning Open
Membership inference attacks (MIAs) are used to test practical privacy of machine learning models. MIAs complement formal guarantees from differential privacy (DP) under a more realistic adversary model. We analyse MIA vulnerability of fin…
View article: A Bias-Variance Decomposition for Ensembles over Multiple Synthetic Datasets
A Bias-Variance Decomposition for Ensembles over Multiple Synthetic Datasets Open
Recent studies have highlighted the benefits of generating multiple synthetic datasets for supervised learning, from increased accuracy to more effective model selection and uncertainty estimation. These benefits have clear empirical suppo…
View article: Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimisation
Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimisation Open
We study how the batch size affects the total gradient variance in differentially private stochastic gradient descent (DP-SGD), seeking a theoretical explanation for the usefulness of large batch sizes. As DP-SGD is the basis of modern DP …
View article: Privacy-Aware Document Visual Question Answering
Privacy-Aware Document Visual Question Answering Open
Document Visual Question Answering (DocVQA) has quickly grown into a central task of document understanding. But despite the fact that documents contain sensitive or copyrighted information, none of the current DocVQA methods offers strong…
View article: Collaborative Learning From Distributed Data With Differentially Private Synthetic Twin Data
Collaborative Learning From Distributed Data With Differentially Private Synthetic Twin Data Open
Consider a setting where multiple parties holding sensitive data aim to collaboratively learn population level statistics, but pooling the sensitive data sets is not possible. We propose a framework in which each party shares a differentia…
View article: On Consistent Bayesian Inference from Synthetic Data
On Consistent Bayesian Inference from Synthetic Data Open
Generating synthetic data, with or without differential privacy, has attracted significant attention as a potential solution to the dilemma between making data easily available, and the privacy of data subjects. Several works have shown th…
View article: On the Efficacy of Differentially Private Few-shot Image Classification
On the Efficacy of Differentially Private Few-shot Image Classification Open
There has been significant recent progress in training differentially private (DP) models which achieve accuracy that approaches the best non-private models. These DP models are typically pretrained on large public datasets and then fine-t…
View article: Digital public health leadership in the global fight for health security
Digital public health leadership in the global fight for health security Open
The COVID-19 pandemic highlighted the need to prioritise mature digital health and data governance at both national and supranational levels to guarantee future health security. The Riyadh Declaration on Digital Health was a call to action…
View article: DPVIm: Differentially Private Variational Inference Improved
DPVIm: Differentially Private Variational Inference Improved Open
Differentially private (DP) release of multidimensional statistics typically considers an aggregate sensitivity, e.g. the vector norm of a high-dimensional vector. However, different dimensions of that vector might have widely different ma…
View article: Individual Privacy Accounting with Gaussian Differential Privacy
Individual Privacy Accounting with Gaussian Differential Privacy Open
Individual privacy accounting enables bounding differential privacy (DP) loss individually for each participant involved in the analysis. This can be informative as often the individual privacy losses are considerably smaller than those in…
View article: Differentially private partitioned variational inference
Differentially private partitioned variational inference Open
Learning a privacy-preserving model from sensitive data which are distributed across multiple devices is an increasingly important problem. The problem is often formulated in the federated learning context, with the aim of learning a singl…
View article: Strong pathogen competition in neonatal gut colonisation
Strong pathogen competition in neonatal gut colonisation Open
Bacterial pathogen species and their strains that colonise the human gut are generally understood to compete against both each other and the commensal species colonising this ecosystem. However, currently we are lacking a population-wide q…
View article: Noise-Aware Statistical Inference with Differentially Private Synthetic Data
Noise-Aware Statistical Inference with Differentially Private Synthetic Data Open
While generation of synthetic data under differential privacy (DP) has received a lot of attention in the data privacy community, analysis of synthetic data has received much less. Existing work has shown that simply analysing DP synthetic…
View article: d3p - A Python Package for Differentially-Private Probabilistic Programming
d3p - A Python Package for Differentially-Private Probabilistic Programming Open
We present d3p , a software package designed to help fielding runtime efficient widely-applicable Bayesian inference under differential privacy guarantees. d3p achieves general applicability to a wide range of probabilistic modelling probl…
View article: Bacterial genomic epidemiology with mixed samples
Bacterial genomic epidemiology with mixed samples Open
Genomic epidemiology is a tool for tracing transmission of pathogens based on whole-genome sequencing. We introduce the mGEMS pipeline for genomic epidemiology with plate sweeps representing mixed samples of a target pathogen, opening the …
View article: Locally Differentially Private Bayesian Inference
Locally Differentially Private Bayesian Inference Open
In recent years, local differential privacy (LDP) has emerged as a technique of choice for privacy-preserving data collection in several scenarios when the aggregator is not trustworthy. LDP provides client-side privacy by adding noise at …