Tom Bäckström
YOU?
Author Swipe
Privacy Disclosure of Similarity Rank in Speech and Language Processing Open
Speaker, author, and other biometric identification applications often compare a sample's similarity to a database of templates to determine the identity. Given that data may be noisy and similarity measures can be inaccurate, such a compa…
Privacy in Speech Technology Open
Speech technology for communication, accessing information, and services has rapidly improved in quality. It is convenient and appealing because speech is the primary mode of communication for humans. Such technology, however, also present…
Privacy Preservation in Audio and Video Open
Audio and video sensors are useful in Active Assisted Living (AAL) systems as they provide a rich source of information in well-known formats and since such sensors are readily available. This, however, also means that audio and video sens…
Good practices for evaluation of machine learning systems Open
Many development decisions affect the results obtained from ML experiments: training data, features, model architecture, hyperparameters, test data, etc. Among these aspects, arguably the most important design decisions are those that invo…
Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization Open
Generative adversarial networks (GANs) learn a latent space whose samples can be mapped to real-world images. Such latent spaces are difficult to interpret. Some earlier supervised methods aim to create an interpretable latent space or dis…
View article: Real-Time Joint Noise Suppression and Bandwidth Extension of Noisy Reverberant Wideband Speech
Real-Time Joint Noise Suppression and Bandwidth Extension of Noisy Reverberant Wideband Speech Open
Artificially extending the bandwidth of speech in real-time applications that are band-limited to 16 kHz (known as wideband) or lower sample rates such as VoIP or communication over Bluetooth, can significantly improve its perceptual quali…
Privacy PORCUPINE: Anonymization of Speaker Attributes Using Occurrence Normalization for Space-Filling Vector Quantization Open
Speech signals contain a vast range of private information such as its text, speaker identity, emotions, and state of health. Privacy-preserving speech processing seeks to filter out any private information that is not needed for downstrea…
Evaluating privacy, security, and trust perceptions in conversational AI: A systematic review Open
Conversational AI (CAI) systems which encompass voice- and text-based assistants are on the rise and have been largely integrated into people’s everyday lives. Despite their widespread adoption, users voice concerns regarding privacy, secu…
Evaluating Privacy, Security, and Trust Perceptions in Conversational AI: A Systematic Review Open
Conversational AI (CAI) systems which encompass voice- and text-based assistants are on the rise and have been largely integrated into people's everyday lives. Despite their widespread adoption, users voice concerns regarding privacy, secu…
View article: Low-Complexity Real-Time Neural Network for Blind Bandwidth Extension of Wideband Speech
Low-Complexity Real-Time Neural Network for Blind Bandwidth Extension of Wideband Speech Open
Speech is streamed at 16 kHz or lower sample rates in many applications (e.g. VoIP, Bluetooth headsets). Extending its bandwidth can produce significant quality improvements. We introduce BBWEXNet, a lightweight neural network that perform…
Privacy and Quality Improvements in Open Offices Using Multi-Device Speech Enhancement Open
Teleconferencing has increased in popularity and often takes place around other people such as open offices. A particular problem of such environments is that multiple users can have independent conversations simultaneously, which leak int…
Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in Voice Conversion Open
Vector quantized variational autoencoders (VQ-VAE) are well-known deep generative models, which map input data to a latent space that is used for data generation. Such latent spaces are unstructured and can thus be difficult to interpret. …
Privacy in Speech Technology Open
Speech technology for communication, accessing information, and services has rapidly improved in quality. It is convenient and appealing because speech is the primary mode of communication for humans. Such technology, however, also present…
Stochastic Optimization of Vector Quantization Methods in Application to Speech and Image Processing Open
Vector quantization (VQ) methods have been used in a wide range of applications for speech, image, and video data. While classic VQ methods often use expectation maximization, in this paper, we investigate the use of stochastic optimizatio…
View article: The Internet of Sounds: Convergent Trends, Insights, and Future Directions
The Internet of Sounds: Convergent Trends, Insights, and Future Directions Open
Current sound-based practices and systems developed in both academia and industry point to convergent research trends that bring together the field of Sound and Music Computing with that of the Internet of Things. This paper proposes a vis…
Voice Quality Features for Replay Attack Detection Open
Replay attacks are attempts to get fraudulent access to an automatic speaker verification system. In this paper, we investigate the usefulness of voice quality features to detect replay attacks. The voice quality features are used together…
View article: Introduction to Speech Processing: 2nd Edition
Introduction to Speech Processing: 2nd Edition Open
This release is primarily about migrating all content to jupyter-books and git. The published version is now hosted at https://speechprocessingbook.aalto.fi. In addition to github, the release has long-term storage location at Zenodo, whic…
Speech Localization at Low Bitrates in Wireless Acoustics Sensor Networks Open
The use of speech source localization (SSL) and its applications offer great possibilities for the design of speaker local positioning systems with wireless acoustic sensor networks (WASNs). Recent works have shown that data-driven front-e…
NSVQ: Noise Substitution in Vector Quantization for Machine Learning Open
Machine learning algorithms have been shown to be highly effective in solving optimization problems in a wide range of applications. Such algorithms typically use gradient descent with backprop- agation and the chain rule. Hence, the backp…
Intuitive Privacy from Acoustic Reach: A Case for Networked Voice User-Interfaces Open
The effect that advances in voice interface technologies have on privacy has not yet received the attention it deserves. Systems in which multiple devices collaborate to provide a unified user-interface amplify those worries about privacy.…
The Use of Audio Fingerprints for Authentication of Speakers on Speech Operated Interfaces Open
In a multi-speaker and multi-device environment, we need acoustic fingerprint information for authentication between devices. Thus, in these kinds of environments, it is crucial to continuously check the authenticity of speakers and device…
Federated Learning for Privacy Preserving On-Device Speaker Recognition Open
State-of-the-art speaker recognition systems are usually trained on a single computer using speech data collected from multiple users. However, these speech samples may contain private information which users are not willing to share. To o…
End-to-End Optimized Multi-Stage Vector Quantization of Spectral Envelopes for Speech and Audio Coding Open
Spectral envelope modeling is an instrumental part of speech and audio codecs, which can be used to enable efficient entropy coding of spectral components. Overall optimization of codecs, including envelope models, has however been difficu…
Voice-quality Features for Deep Neural Network Based Speaker Verification Systems Open
Jitter and shimmer are voice-quality features which have been successfully used to detect voice pathologies and classify different speaking styles. In this paper, we investigate the usefulness of such voice-quality features in neural-netwo…
Cancellation of Local Competing Speaker with Near-Field Localization for Distributed ad-hoc Sensor Network Open
In scenarios such as remote work, open offices and call centers, multiple people may simultaneously have independent spoken interactions with their devices in the same room. The speech of competing speakers will however be picked up by all…
PyAWNeS-Codec: Speech and audio codec for ad-hoc acoustic wireless sensor networks Open
Existing hardware with microphones can potentially be used as sensor networks to capture speech and audio signals for the benefit of better signal quality than possible with a single microphone. A central pre-requisite for such ad-hoc acou…
Enhancement by postfiltering for speech and audio coding in <i>ad hoc</i> sensor networks Open
Enhancement algorithms for wireless acoustic sensor networks (WASNs) are indispensable with the increasing availability and usage of connected devices with microphones. Conventional spatial filtering approaches for enhancement in WASNs app…
Federated Learning for Privacy-Preserving Speaker Recognition Open
The state-of-the-art speaker recognition systems are usually trained on a single computer using speech data collected from multiple users. However, these speech samples may contain private information which users may not be willing to shar…
Introduction to Speech Processing: snapshot on 31.12.2020 Open
This is a snapshot of the wiki-format book on 31.12.2020.