NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research using Knowledge Graphs in Visual Relationship Detection Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.5281/zenodo.7931113
· OA: W4393795205
<strong>NeSy4VRD</strong> NeSy4VRD is a multifaceted, multipurpose resource designed to foster neurosymbolic AI (NeSy) research, particularly NeSy research using Semantic Web technologies such as OWL ontologies, OWL-based knowledge graphs and OWL-based reasoning as symbolic components. The NeSy4VRD research resource pertains to the <em>computer vision</em> field of AI and, within that field, to the application tasks of <em>visual relationship detection (VRD) and scene graph generation</em>. Whilst the core motivation of the NeSy4VRD research resource is to foster computer vision-based NeSy research using Semantic Web technologies such as OWL ontologies and OWL-based knowledge graphs, AI researchers can readily use NeSy4VRD to either: 1) pursue computer vision-based NeSy research without involving Semantic Web technologies as symbolic components, or 2) pursue computer vision research without NeSy (i.e. pursue research that focuses purely on deep learning alone, without involving symbolic components of any kind). This is the sense in which we describe NeSy4VRD as being <em>multipurpose</em>: it can readily be used by diverse groups of computer vision-based AI researchers with diverse interests and objectives. The NeSy4VRD research resource in its entirety is distributed across two locations: Zenodo and GitHub. <strong>NeSy4VRD on Zenodo: the NeSy4VRD dataset package</strong> This entry on Zenodo hosts the <em>NeSy4VRD dataset package</em>, which includes the <em>NeSy4VRD dataset</em> and its companion <em>NeSy4VRD ontology</em>, an OWL ontology called VRD-World. The <em>NeSy4VRD dataset</em> consists of an image dataset with associated visual relationship annotations. The images of the <em>NeSy4VRD dataset</em> are the same as those that were once publicly available as part of the VRD dataset. The NeSy4VRD visual relationship annotations are a highly customised and quality-improved version of the original VRD visual relationship annotations. The <em>NeSy4VRD dataset</em> is designed for computer vision-based research that involves detecting objects in images and predicting relationships between ordered pairs of those objects. A visual relationship for an image of the <em>NeSy4VRD dataset</em> has the form <'subject', 'predicate', 'object'>, where the 'subject' and 'object' are two objects in the image, and the 'predicate' describes some relation between them. Both the 'subject' and 'object' objects are specified in terms of bounding boxes and object classes. For example, representative annotated visual relationships are <'person', 'ride', 'horse'>, <'hat', 'on', 'teddy bear'> and <'cat', 'under', 'pillow'>. Visual relationship detection is pursued as a computer vision application task in its own right, and as a building block capability for the broader application task of scene graph generation. Scene graph generation, in turn, is commonly used as a precursor to a variety of enriched, downstream visual understanding and reasoning application tasks, such as image captioning, visual question answering, image retrieval, image generation and multimedia event processing. The <em>NeSy4VRD ontology</em>, VRD-World, is a rich, well-aligned, companion OWL ontology engineered specifically for use with the <em>NeSy4VRD dataset.</em> It directly describes the domain of the <em>NeSy4VRD dataset</em>, as reflected in the NeSy4VRD visual relationship annotations. More specifically, all of the object classes that feature in the NeSy4VRD visual relationship annotations have corresponding classes within the VRD-World OWL class hierarchy, and all of the predicates that feature in the NeSy4VRD visual relationship annotations have corresponding properties within the VRD-World OWL object property hierarchy. The rich structure of the VRD-World class hierarchy and the rich characteristics and relationships of the VRD-World object properties together give the VRD-World OWL ontology rich inference semantics. These provide ample opportunity for OWL reasoning to be meaningfully exercised and exploited in NeSy research that uses OWL ontologies and OWL-based knowledge graphs as symbolic components. There is also ample potential for NeSy researchers to explore supplementing the OWL reasoning capabilities afforded by the VRD-World ontology with Datalog rules and reasoning. Use of the <em>NeSy4VRD ontology</em>, VRD-World, in conjunction with the <em>NeSy4VRD dataset </em>is, of course, purely optional, however. Computer vision AI researchers who have no interest in NeSy, or NeSy researchers who have no interest in OWL ontologies and OWL-based knowledge graphs, can ignore the <em>NeSy4VRD ontology</em> and use the <em>NeSy4VRD dataset </em>by itself. All computer vision-based AI research user groups can, if they wish, also avail themselves of the other components of the NeSy4VRD research resource available on GitHub. <strong>NeSy4VRD on GitHub: open source infrastructure supporting extensibility, and sample code</strong> The NeSy4VRD research resource incorporates additional components that are companions to the <em>NeSy4VRD dataset package</em> here on Zenodo. These companion components are available at NeSy4VRD on GitHub. These companion components consist of: comprehensive open source Python-based infrastructure supporting the extensibility of the NeSy4VRD visual relationship annotations (and, thereby, the extensibility of the <em>NeSy4VRD ontology</em>, VRD-World, as well) open source Python sample code showing how one can work with the NeSy4VRD visual relationship annotations in conjunction with the <em>NeSy4VRD ontology</em>, VRD-World, and RDF knowledge graphs. The NeSy4VRD infrastructure supporting extensibility consists of: open source Python code for conducting deep and comprehensive analyses of the <em>NeSy4VRD dataset</em> (the VRD images and their associated NeSy4VRD visual relationship annotations) an open source, custom-designed <em>NeSy4VRD protocol</em> for specifying visual relationship annotation customisation instructions declaratively, in text files an open source, custom-designed <em>NeSy4VRD workflow, </em>implemented using Python scripts and modules, for applying small or large volumes of customisations or extensions to the NeSy4VRD visual relationship annotations in a configurable, managed, automated and repeatable process. The purpose behind providing comprehensive infrastructure to support extensibility of the NeSy4VRD visual relationship annotations is to make it easy for researchers to take the <em>NeSy4VRD dataset</em> in new directions, by further enriching the annotations, or by tailoring them to introduce new or more data conditions that better suit their particular research needs and interests. The option to use the NeSy4VRD extensibility infrastructure in this way applies equally well to each of the diverse potential NeSy4VRD user groups already mentioned. The NeSy4VRD extensibility infrastructure, however, may be of particular interest to NeSy researchers interested in using the <em>NeSy4VRD ontology</em>, VRD-World, in conjunction with the <em>NeSy4VRD dataset. </em>These researchers can of course tailor the VRD-World ontology if they wish without needing to modify or extend the NeSy4VRD visual relationship annotations in any way. But their degrees of freedom for doing so will be limited by the need to maintain alignment with the NeSy4VRD visual relationship annotations and the particular set of object classes and predicates to which they refer. If NeSy researchers want full freedom to tailor the VRD-World ontology, they may well need to tailor the NeSy4VRD visual relationship annotations first, in order that alignment be maintained. To illustrate our point, and to illustrate our vision of how the NeSy4VRD extensibility infrastructure can be used, let us consider a simple example. It is common in computer vision to distinguish between <em>thing</em> objects (that have well-defined shapes) and <em>stuff</em> objects (that are amorphous). Suppose a researcher wishes to have a greater number of <em>stuff</em> object classes with which to work. Water is such a <em>stuff</em> object. Many VRD images contain water but it is not currently one of the annotated object classes and hence is never referenced in any visual relationship annotations. So adding a <em>Water</em> class to the class hierarchy of the VRD-World ontology would be pointless because it would never acquire any instances (because an object detector would never detect any). However, our hypothetical researcher could choose to do the following: use the analysis functionality of the NeSy4VRD extensibility infrastructure to find images containing water (by, say, searching for images whose visual relationships refer to object classes such as 'boat', 'surfboard', 'sand', 'umbrella', etc.); use free image analysis software (such as GIMP, at gimp.org) to get bounding boxes for instances of water in these images; use the <em>NeSy4VRD protocol</em> to specify new visual relationships for these images that refer to the new 'water' objects (e.g. <'boat', 'on', 'water'>); use the <em>NeSy4VRD workflow</em> to introduce the new object class 'water' and to apply the specified new visual relationships to the sets of annotations for the affected images; introduce class Water to the class hierarchy of the VRD-World ontology (using, say, the free Protege ontology editor); continue experimenting, now with the added benefit of the additional <em>stuff</em> object class 'water'; contribute the enriched set of NeSy4VRD visual relationship annotations, and the enriched companion VRD-World ontology, to research communities. <strong>Information pertaining to the VRD dataset</strong> Information about the original VRD dataset is available here. Public availability of the VRD images (via information accessible from that location) ceased sometime in the latter part of 2021. We thank Dr. Ranjay Krishna, one of the principals associated with the VRD