Ho-Hsiang Wu
YOU?
Author Swipe
View article: Mind the Prompt: Prompting Strategies in Audio Generations for Improving Sound Classification
Mind the Prompt: Prompting Strategies in Audio Generations for Improving Sound Classification Open
This paper investigates the design of effective prompt strategies for generating realistic datasets using Text-To-Audio (TTA) models. We also analyze different techniques for efficiently combining these datasets to enhance their utility in…
View article: WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP
WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP Open
We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on a variety of audio tasks including classification, retrieval, and …
View article: Learning Audio Concepts from Counterfactual Natural Language
Learning Audio Concepts from Counterfactual Natural Language Open
Conventional audio classification relied on predefined classes, lacking the ability to learn from free-form text. Recent methods unlock learning joint audio-text embeddings from raw audio-text pairs describing audio in natural language. De…
View article: MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Learning via Interactive Perception
MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Learning via Interactive Perception Open
A holistic understanding of object properties across diverse sensory modalities (e.g., visual, audio, and haptic) is essential for tasks ranging from object categorization to complex manipulation. Drawing inspiration from cognitive science…
View article: Audio-Text Models Do Not Yet Leverage Natural Language
Audio-Text Models Do Not Yet Leverage Natural Language Open
Multi-modal contrastive learning techniques in the audio-text domain have quickly become a highly active area of research. Most works are evaluated with standard audio retrieval and classification benchmarks assuming that (i) these models …
View article: Wav2CLIP: Learning Robust Audio Representations From CLIP
Wav2CLIP: Learning Robust Audio Representations From CLIP Open
We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on a variety of audio tasks including classification, retrieval, and …
View article: How to Listen? Rethinking Visual Sound Localization
How to Listen? Rethinking Visual Sound Localization Open
Localizing visual sounds consists on locating the position of objects that emit sound within an image. It is a growing research area with potential applications in monitoring natural and urban environments, such as wildlife migration and u…
View article: A Study on Robustness to Perturbations for Representations of Environmental Sound
A Study on Robustness to Perturbations for Representations of Environmental Sound Open
Audio applications involving environmental sound analysis increasingly use general-purpose audio representations, also known as embeddings, for transfer learning. Recently, Holistic Evaluation of Audio Representations (HEAR) evaluated twen…
View article: Exploring modality-agnostic representations for music classification
Exploring modality-agnostic representations for music classification Open
Music information is often conveyed or recorded across multiple data modalities including but not limited to audio, images, text and scores. However, music information retrieval research has almost exclusively focused on single modality re…
View article: Multi-Task Self-Supervised Pre-Training for Music Classification
Multi-Task Self-Supervised Pre-Training for Music Classification Open
Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well. Machine listening research often suffers from limited labeled data problem, as human annotations are costly to acquire, and a…
View article: SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Open
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Version 2.3, September 2020 Created by Mark Cartwright (1,2,3), Jason Cramer (1), Ana Elisa Mendez Mendez (…
View article: SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Open
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Version 2.3, September 2020 Created by Mark Cartwright (1,2,3), Jason Cramer (1), Ana Elisa Mendez Mendez (…
View article: SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context
SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context Open
We present SONYC-UST-V2, a dataset for urban sound tagging with spatiotemporal information. This dataset is aimed for the development and evaluation of machine listening systems for real-world urban noise monitoring. While datasets of urba…
View article: SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Open
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Version 2.2, June 2020 Created by Mark Cartwright (1,2,3), Jason Cramer (1), Ana Elisa Mendez Mendez (1), Y…
View article: SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Open
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Version 2.1, March 2020 Created by Mark Cartwright (1,2,3), Jason Cramer (1), Ana Elisa Mendez Mendez (1), …
View article: SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Open
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Version 1.0, February 2020 Created by Mark Cartwright (1,2,3), Ana Elisa Mendez Mendez (1), Graham Dove (2)…
View article: CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search Open
The dataset for CodeSearchNet. See https://github.com/github/CodeSearchNet and https://arxiv.org/abs/1909.09436 for more.
View article: CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search Open
Semantic code search is the task of retrieving relevant code given a natural language query. While related to other information retrieval tasks, it requires bridging the gap between the language used in code (often abbreviated and highly t…
View article: SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Open
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Version 0.4, July 2019 Created by Mark Cartwright (1,2,3), Ana Elisa Mendez Mendez (1), Graham Dove (2), Ja…
View article: SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Open
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Version 0.3, July 2019 Created by Mark Cartwright (1,2,3), Ana Elisa Mendez Mendez (1), Graham Dove (2), Ja…
View article: SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Open
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Version 0.2, May 2019 Created by Mark Cartwright (1,2,3), Ana Elisa Mendez Mendez (1), Graham Dove (2), Jas…
View article: SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Open
SONYC Urban Sound Tagging (SONYC-UST): a multilabel dataset from an urban acoustic sensor network Version 0.1, March 2019 Created by Mark Cartwright (1,2,3), Ana Elisa Mendez Mendez (1), Graham Dove (2), J…
View article: SONYC Urban Sound Tagging (SONYC-UST): A Multilabel Dataset from an Urban Acoustic Sensor Network
SONYC Urban Sound Tagging (SONYC-UST): A Multilabel Dataset from an Urban Acoustic Sensor Network Open
SONYC Urban Sound Tagging (SONYC-UST) is a dataset for the development and evaluation of machine listening systems for real-world urban noise monitoring. It consists of 3068 audio recordings from the "Sounds of New York City" (SONYC) acous…