Explanipedia

Measuring and Guiding Monosemanticity Open

Robert Harle, F. Friedrich, Manuel Brack, Stephan Wäldchen, Björn Deiseroth , et al. · 2025

There is growing interest in leveraging mechanistic interpretability and controllability to better understand and influence the internal dynamics of large language models (LLMs). However, current methods face fundamental challenges in reli…

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Inconsistencies Open

Felix Friedrich, Simone Tedeschi, Patrick Schramowski, Manuel Brack, Roberto Navigli , et al. · 2024

Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we conduct a large-scale, comprehensive safety evaluation of the current LLM landscape. F…

SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs Open

Robert Harle, F. Friedrich, Manuel Brack, Björn Deiseroth, Patrick Schramowski , et al. · 2024

Psychology Computer science

Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, but their output may not be aligned with the user or even produce harmful content. This paper presents a novel approach to detect and ste…

Core Tokensets for Data-efficient Sequential Training of Transformers Open

Subarnaduti Paul, Manuel Brack, Patrick Schramowski, Kristian Kersting, Martin Mundt · 2024

Computer science Engineering

Deep networks are frequently tuned to novel tasks and continue learning from ongoing data streams. Such sequential training requires consolidation of new and past information, a challenge predominantly addressed by retaining the most impor…

Does CLIP Know My Face? Open

Dominik Hintersdorf, Lukas Struppek, Manuel Brack, F. Friedrich, Patrick Schramowski , et al. · 2024

Computer science Chemistry Physics

With the rise of deep learning in various applications, privacy concerns around the protection of training data have become a critical area of research. Whereas prior studies have focused on privacy risks in single-modal models, we introdu…

T-FREE: Subword Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings Open

Björn Deiseroth, Manuel Brack, Patrick Schramowski, Kristian Kersting, Samuel Weinbach · 2024

Computer science

Tokenizers are crucial for encoding information in Large Language Models, but their development has recently stagnated, and they contain inherent weaknesses. Major limitations include computational overhead, ineffective vocabulary use, and…

LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models Open

Lukas Helff, F. Friedrich, Manuel Brack, Kristian Kersting, Patrick Schramowski · 2024

Computer science

This paper introduces LlavaGuard, a suite of VLM-based vision safeguards that address the critical need for reliable guardrails in the era of large-scale data and models. To this end, we establish a novel open framework, describing a custo…

DeiSAM: Segment Anything with Deictic Prompting Open

Hikaru Shindo, Manuel Brack, Gopika Sudhakaran, Devendra Singh Dhami, Patrick Schramowski , et al. · 2024

Psychology Computer science Philosophy

Large-scale, pre-trained neural networks have demonstrated strong capabilities in various tasks, including zero-shot image segmentation. To identify concrete objects in complex scenes, humans instinctively rely on deictic descriptions in n…

LEDITS++: Limitless Image Editing using Text-to-Image Models Open

Manuel Brack, F. Friedrich, Katharina Kornmeier, Linoy Tsaban, Patrick Schramowski , et al. · 2023

Computer science Geography Engineering

Text-to-image diffusion models have recently received increasing interest for their astonishing ability to produce high-fidelity images from solely text inputs. Subsequent research efforts aim to exploit and apply their capabilities to rea…

Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge Open

Manuel Brack, Patrick Schramowski, Kristian Kersting · 2023

Computer science Philosophy Chemistry

Text-conditioned image generation models have recently achieved astonishing image quality and alignment results. Consequently, they are employed in a fast-growing number of applications. Since they are highly data-driven, relying on billio…

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation Open

Marco Bellagente, Manuel Brack, Hannah Teufel, F. Friedrich, Björn Deiseroth , et al. · 2023

Computer science Sociology Chemistry

The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interp…

Class Attribute Inference Attacks: Inferring Sensitive Class Information by Diffusion-Based Attribute Manipulations Open

Lukas Struppek, Dominik Hintersdorf, F. Friedrich, Manuel Brack, Patrick Schramowski , et al. · 2023

Computer science Mathematics Chemistry

Neural network-based image classifiers are powerful tools for computer vision tasks, but they inadvertently reveal sensitive attribute information about their classes, raising concerns about their privacy. To investigate this privacy leaka…

Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness Open

F. Friedrich, Patrick Schramowski, Manuel Brack, Lukas Struppek, Dominik Hintersdorf , et al. · 2023

Computer science Physics Philosophy

Generative AI models have recently achieved astonishing results in quality and are consequently employed in a fast-growing number of applications. However, since they are highly data-driven, relying on billion-sized datasets randomly scrap…

SEGA: Instructing Text-to-Image Models using Semantic Guidance Open

Manuel Brack, F. Friedrich, Dominik Hintersdorf, Lukas Struppek, Patrick Schramowski , et al. · 2023

Computer science Mathematics Engineering

Text-to-image diffusion models have recently received a lot of interest for their astonishing ability to produce high-fidelity images from text only. However, achieving one-shot generation that aligns with the user's intent is nearly impos…

Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge Open

Manuel Brack, Patrick Schramowski, Kristian Kersting · 2023

Computer science Philosophy Chemistry

Text-conditioned image generation models have recently achieved astonishing image quality and alignment results.Consequently, they are employed in a fast-growing number of applications.Since they are highly data-driven, relying on billion-…

The Stable Artist: Steering Semantics in Diffusion Latent Space Open

Manuel Brack, Patrick Schramowski, F. Friedrich, Dominik Hintersdorf, Kristian Kersting · 2022

Computer science Mathematics Political science

Large, text-conditioned generative diffusion models have recently gained a lot of attention for their impressive performance in generating high-fidelity images from text alone. However, achieving high-quality results is almost unfeasible i…

Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis Open

Lukas Struppek, Dominik Hintersdorf, F. Friedrich, Manuel Brack, Patrick Schramowski , et al. · 2022

Computer science Philosophy

Models for text-to-image synthesis, such as DALL-E~2 and Stable Diffusion, have recently drawn a lot of interest from academia and the general public. These models are capable of producing high-quality images that depict a variety of conce…

Does CLIP Know My Face? Open

Dominik Hintersdorf, Lukas Struppek, Manuel Brack, Felix Friedrich, Patrick Schramowski , et al. · 2022

Computer science Chemistry Physics

With the rise of deep learning in various applications, privacy concerns around the protection of training data have become a critical area of research. Whereas prior studies have focused on privacy risks in single-modal models, we introdu…

I2G Benchmark Open

Manuel Brack · 2022

Computer science Geography

Inappropriate Image Prompts (I2G) benchmark for text to image diffusion models.

Manuel Brack YOU? Author Swipe