Philipp Harzig
YOU?
Author Swipe
View article: Global Average Feature Augmentation for Robust Semantic Segmentation with Transformers
Global Average Feature Augmentation for Robust Semantic Segmentation with Transformers Open
Robustness to out-of-distribution data is crucial for deploying modern neural networks. Recently, Vision Transformers, such as SegFormer for semantic segmentation, have shown impressive robustness to visual corruptions like blur or noise a…
View article: Extended Self-Critical Pipeline for Transforming Videos to Text (TRECVID-VTT Task 2021) -- Team: MMCUniAugsburg
Extended Self-Critical Pipeline for Transforming Videos to Text (TRECVID-VTT Task 2021) -- Team: MMCUniAugsburg Open
The Multimedia and Computer Vision Lab of the University of Augsburg participated in the VTT task only. We use the VATEX and TRECVID-VTT datasets for training our VTT models. We base our model on the Transformer approach for both of our su…
View article: Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation
Synchronized Audio-Visual Frames with Fractional Positional Encoding for Transformers in Video-to-Text Translation Open
Video-to-Text (VTT) is the task of automatically generating descriptions for short audio-visual video clips, which can support visually impaired people to understand scenes of a YouTube video for instance. Transformer architectures have sh…
View article: A comprehensive analysis of classification methods in gastrointestinal endoscopy imaging
A comprehensive analysis of classification methods in gastrointestinal endoscopy imaging Open
Gastrointestinal (GI) endoscopy has been an active field of research motivated by the large number of highly lethal GI cancers. Early GI cancer precursors are often missed during the endoscopic surveillance. The high missed rate of such ab…
View article: Addressing Data Bias Problems for Chest X-ray Image Report Generation
Addressing Data Bias Problems for Chest X-ray Image Report Generation Open
Automatic medical report generation from chest X-ray images is one possibility for assisting doctors to reduce their workload. However, the different patterns and data distribution of normal and abnormal cases can bias machine learning mod…
View article: Addressing Data Bias Problems for Chest X-ray Image Report Generation
Addressing Data Bias Problems for Chest X-ray Image Report Generation Open
Automatic medical report generation from chest X-ray images is one possibility for assisting doctors to reduce their workload. However, the different patterns and data distribution of normal and abnormal cases can bias machine learning mod…
View article: Image Captioning with Clause-Focused Metrics in a Multi-modal Setting for Marketing
Image Captioning with Clause-Focused Metrics in a Multi-modal Setting for Marketing Open
Automatically generating descriptive captions for images is a well-researched\narea in computer vision. However, existing evaluation approaches focus on\nmeasuring the similarity between two sentences disregarding fine-grained\nsemantics o…
View article: Visual Question Answering With a Hybrid Convolution Recurrent Model
Visual Question Answering With a Hybrid Convolution Recurrent Model Open
Visual Question Answering (VQA) is a relatively new task, which tries to infer answer sentences for an input image coupled with a corresponding question. Instead of dynamically generating answers, they are usually inferred by finding the m…
View article: Multimodal Image Captioning for Marketing Analysis
Multimodal Image Captioning for Marketing Analysis Open
Automatically captioning images with natural language sentences is an\nimportant research topic. State of the art models are able to produce\nhuman-like sentences. These models typically describe the depicted scene as a\nwhole and do not t…