Explanipedia

Imagine yourself: Tuning-Free Personalized Image Generation Open

Zhisong He, Bo Sun, Felix Juefei-Xu, Haoyu Ma, Ankit Ramchandani , et al. · 2024

Computer science

Diffusion models have demonstrated remarkable efficacy across various image-to-image tasks. In this research, we introduce Imagine yourself, a state-of-the-art model designed for personalized image generation. Unlike conventional tuning-ba…

L <span>umos</span> : Empowering Multimodal LLMs with Scene Text Recognition Open

Ashish Shenoy, Yichao Lu, Srihari Jayakumar, Debojeet Chatterjee, Mohsen Moslehpour , et al. · 2024

Computer science Physics Mathematics

We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At the core of Lumos is a Scene Text Recognition (STR) component that extracts text from first person point-of-view images,…

Layout Agnostic Scene Text Image Synthesis with Diffusion Models Open

Qilong Zhangli, Jindong Jiang, Di Liu, Licheng Yu, Xiaoliang Dai , et al. · 2024

Computer science Physics

While diffusion models have significantly advanced the quality of image generation their capability to accurately and coherently render text within these images remains a substantial challenge. Conventional diffusion-based methods for scen…

Lumos : Empowering Multimodal LLMs with Scene Text Recognition Open

Ashish Shenoy, Yichao Lu, Srihari Jayakumar, Debojeet Chatterjee, Mohsen Moslehpour , et al. · 2024

Computer science Psychology

We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At the core of Lumos is a Scene Text Recognition (STR) component that extracts text from first person point-of-view images,…

Animated Stickers: Bringing Stickers to Life with Video Diffusion Open

David Yan, Winnie Zhang, Luxin Zhang, Anmol Kalia, Dingkang Wang , et al. · 2024

Computer science Art Physics

We introduce animated stickers, a video diffusion model which generates an animation conditioned on a text prompt and static sticker image. Our model is built on top of the state-of-the-art Emu text-to-image model, with the addition of tem…

Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression Open

Animesh A. Sinha, Bo Sun, Anmol Kalia, Arantxa Casanova, Elliot Blanchard , et al. · 2023

Computer science Mathematics History

We introduce Style Tailoring, a recipe to finetune Latent Diffusion Models (LDMs) in a distinct domain with high visual quality, prompt alignment and scene diversity. We choose sticker image generation as the target domain, as the images s…

DISGO: Automatic End-to-End Evaluation for Scene Text OCR Open

Mei-Yuh Hwang, Yangyang Shi, Ankit Ramchandani, Guan Pang, Praveen Krishnan , et al. · 2023

Computer science Engineering Mathematics

This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds. We propose to uniformly use word error rates (WER) a…

DeepCOVIDNet: An Interpretable Deep Learning Model for Predictive Surveillance of COVID-19 Using Heterogeneous Features and Their Interactions Open

Ankit Ramchandani, Chao Fan, Ali Mostafavi · 2020

Computer science Philosophy Materials science

In this paper, we propose a deep learning model to forecast the range of increase in COVID-19 infected cases in future days and we present a novel method to compute equidimensional representations of multivariate time series and multivaria…

DeepCOVIDNet: An Interpretable Deep Learning Model for Predictive\n Surveillance of COVID-19 Using Heterogeneous Features and their Interactions Open

Ankit Ramchandani, Chao Fan, Ali Mostafavi · 2020

Computer science Mathematics Engineering

In this paper, we propose a deep learning model to forecast the range of\nincrease in COVID-19 infected cases in future days and we present a novel\nmethod to compute equidimensional representations of multivariate time series\nand multiva…

DeepCOVIDNet: An Interpretable Deep Learning Model for Predictive Surveillance of COVID-19 Using Heterogeneous Features and Their Interactions Open

Ankit Ramchandani, Chao Fan, Ali Mostafavi · 2020

Computer science Mathematics Medicine

In this paper, we propose a deep learning model to forecast the range of increase in COVID-19 infected cases in future days and we present a novel method to compute equidimensional representations of multivariate time series and multivaria…

Ankit Ramchandani YOU? Author Swipe