Miriam Cha
YOU?
Author Swipe
View article: Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing
Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing Open
Vision language models have achieved impressive results across various fields. However, adoption in remote sensing remains limited, largely due to the scarcity of paired image-text data. To bridge this gap, synthetic caption generation has…
View article: Improving Medical Visual Representations via Radiology Report Generation
Improving Medical Visual Representations via Radiology Report Generation Open
Vision-language pretraining has been shown to produce high-quality visual encoders which transfer efficiently to downstream computer vision tasks. Contrastive learning approaches have increasingly been adopted for medical vision language p…
View article: MultiEarth 2023 -- Multimodal Learning for Earth and Environment Workshop and Challenge
MultiEarth 2023 -- Multimodal Learning for Earth and Environment Workshop and Challenge Open
The Multimodal Learning for Earth and Environment Workshop (MultiEarth 2023) is the second annual CVPR workshop aimed at the monitoring and analysis of the health of Earth ecosystems by leveraging the vast amount of remote sensing data tha…
View article: RadTex: Learning Efficient Radiograph Representations from Text Reports
RadTex: Learning Efficient Radiograph Representations from Text Reports Open
Automated analysis of chest radiography using deep learning has tremendous potential to enhance the clinical diagnosis of diseases in patients. However, deep learning models typically require large amounts of annotated data to achieve high…
View article: SAR-to-EO Image Translation with Multi-Conditional Adversarial Networks
SAR-to-EO Image Translation with Multi-Conditional Adversarial Networks Open
This paper explores the use of multi-conditional adversarial networks for SAR-to-EO image translation. Previous methods condition adversarial networks only on the input SAR. We show that incorporating multiple complementary modalities such…
View article: Developing a Series of AI Challenges for the United States Department of the Air Force
Developing a Series of AI Challenges for the United States Department of the Air Force Open
Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Departme…
View article: MultiEarth 2022 -- Multimodal Learning for Earth and Environment Workshop and Challenge
MultiEarth 2022 -- Multimodal Learning for Earth and Environment Workshop and Challenge Open
The Multimodal Learning for Earth and Environment Challenge (MultiEarth 2022) will be the first competition aimed at the monitoring and analysis of deforestation in the Amazon rainforest at any time and in any weather conditions. The goal …
View article: Twitter Geolocation and Regional Classification via Sparse Coding
Twitter Geolocation and Regional Classification via Sparse Coding Open
We present a data-driven approach for Twitter geolocation and regional classification. Our method is based on sparse coding and dictionary learning, an unsupervised method popular in computer vision and pattern recognition. Through a serie…
View article: Multimodal Representation Learning via Maximization of Local Mutual\n Information
Multimodal Representation Learning via Maximization of Local Mutual\n Information Open
We propose and demonstrate a representation learning approach by maximizing\nthe mutual information between local features of images and text. The goal of\nthis approach is to learn useful image representations by taking advantage of\nthe …
View article: Adversarial Learning of Semantic Relevance in Text to Image Synthesis
Adversarial Learning of Semantic Relevance in Text to Image Synthesis Open
We describe a new approach that improves the training of generative adversarial nets (GANs) for synthesizing diverse images from a text input. Our approach is based on the conditional version of GANs and expands on previous work leveraging…
View article: Multimodal Sparse Representation Learning and Cross-Modal Synthesis
Multimodal Sparse Representation Learning and Cross-Modal Synthesis Open
Humans have a natural ability to process and relate concurrent sensations in different sensory modalities such as vision, hearing, smell, and taste. In order for artificial intelligence to be more human-like in their capabilities, it needs…
View article: Adversarial Learning of Semantic Relevance in Text to Image Synthesis
Adversarial Learning of Semantic Relevance in Text to Image Synthesis Open
We describe a new approach that improves the training of generative adversarial nets (GANs) for synthesizing diverse images from a text input. Our approach is based on the conditional version of GANs and expands on previous work leveraging…
View article: Language Modeling by Clustering with Word Embeddings for Text Readability Assessment
Language Modeling by Clustering with Word Embeddings for Text Readability Assessment Open
We present a clustering-based language model using word embeddings for text readability prediction. Presumably, an Euclidean semantic space hypothesis holds true for word embeddings whose training is done by observing word co-occurrences. …
View article: Language Modeling by Clustering with Word Embeddings for Text Readability Assessment
Language Modeling by Clustering with Word Embeddings for Text Readability Assessment Open
We present a clustering-based language model using word embeddings for text readability prediction. Presumably, an Euclidean semantic space hypothesis holds true for word embeddings whose training is done by observing word co-occurrences. …
View article: Adversarial nets with perceptual losses for text-to-image synthesis
Adversarial nets with perceptual losses for text-to-image synthesis Open
Recent approaches in generative adversarial networks (GANs) can automatically synthesize realistic images from descriptive text. Despite the overall fair quality, the generated images often expose visible flaws that lack structural definit…
View article: Deep Sparse-coded Network (DSN)
Deep Sparse-coded Network (DSN) Open
We introduce Deep Sparse-coded Network (DSN), a deep architecture based on sparse coding and dictionary learning. Key advantage of our approach is two-fold. By interlacing max pooling with sparse coding layer, we achieve nonlinear activati…
View article: Multimodal Sparse Coding for Event Detection
Multimodal Sparse Coding for Event Detection Open
Unsupervised feature learning methods have proven effective for classification tasks based on a single modality. We present multimodal sparse coding for learning feature representations shared across multiple modalities. The shared represe…
View article: Multimodal sparse representation learning and applications
Multimodal sparse representation learning and applications Open
Unsupervised methods have proven effective for discriminative tasks in a single-modality scenario. In this paper, we present a multimodal framework for learning sparse representations that can capture semantic correlation between modalitie…