Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multidimensional Analysis Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.5121/csit.2025.150903
Various AI safety datasets have been developed to measure LLMs against evolving interpretations of harm. Our evaluation of five recently published open-source safety benchmarks reveals distinct semantic clusters using UMAP dimensionality reduction and kmeans clustering (silhouette score: 0.470). We identify six primary harm categories with varying benchmark representation. GretelAI, for example, focuses heavily on privacy concerns, while WildGuardMix emphasizes self-harm scenarios. Significant differences in prompt length distribution suggests confounds to data collection and interpretations of harm as well as offer possible context. Our analysis quantifies benchmark orthogonality among AI benchmarks, allowing for transparency in coverage gaps despite topical similarities. Our quantitative framework for analyzing semantic orthogonality across safety benchmarks enables more targeted development of datasets that comprehensively address the evolving landscape of harms in AI use, however that is defined in the future.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.5121/csit.2025.150903
- https://doi.org/10.5121/csit.2025.150903
- OA Status
- gold
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4410509893
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4410509893Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.5121/csit.2025.150903Digital Object Identifier
- Title
-
Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multidimensional AnalysisWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-05-20Full publication date if available
- Authors
-
Jonathan Bennion, Sanjay Ghosh, M. P. Singh, Nouha DziriList of authors in order
- Landing page
-
https://doi.org/10.5121/csit.2025.150903Publisher landing page
- PDF URL
-
https://doi.org/10.5121/csit.2025.150903Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.5121/csit.2025.150903Direct OA link when available
- Concepts
-
Orthogonality, Computer science, Mathematics, GeometryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4410509893 |
|---|---|
| doi | https://doi.org/10.5121/csit.2025.150903 |
| ids.doi | https://doi.org/10.5121/csit.2025.150903 |
| ids.openalex | https://openalex.org/W4410509893 |
| fwci | 0.0 |
| type | article |
| title | Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multidimensional Analysis |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | 38 |
| biblio.first_page | 27 |
| topics[0].id | https://openalex.org/T12423 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.878600001335144 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1712 |
| topics[0].subfield.display_name | Software |
| topics[0].display_name | Software Reliability and Analysis Research |
| topics[1].id | https://openalex.org/T13295 |
| topics[1].field.id | https://openalex.org/fields/22 |
| topics[1].field.display_name | Engineering |
| topics[1].score | 0.8690999746322632 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/2213 |
| topics[1].subfield.display_name | Safety, Risk, Reliability and Quality |
| topics[1].display_name | Safety Systems Engineering in Autonomy |
| topics[2].id | https://openalex.org/T11357 |
| topics[2].field.id | https://openalex.org/fields/18 |
| topics[2].field.display_name | Decision Sciences |
| topics[2].score | 0.8657000064849854 |
| topics[2].domain.id | https://openalex.org/domains/2 |
| topics[2].domain.display_name | Social Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1804 |
| topics[2].subfield.display_name | Statistics, Probability and Uncertainty |
| topics[2].display_name | Risk and Safety Analysis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C17137986 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8110727071762085 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q215067 |
| concepts[0].display_name | Orthogonality |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7091143131256104 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C33923547 |
| concepts[2].level | 0 |
| concepts[2].score | 0.11457091569900513 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[2].display_name | Mathematics |
| concepts[3].id | https://openalex.org/C2524010 |
| concepts[3].level | 1 |
| concepts[3].score | 0.0 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q8087 |
| concepts[3].display_name | Geometry |
| keywords[0].id | https://openalex.org/keywords/orthogonality |
| keywords[0].score | 0.8110727071762085 |
| keywords[0].display_name | Orthogonality |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.7091143131256104 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/mathematics |
| keywords[2].score | 0.11457091569900513 |
| keywords[2].display_name | Mathematics |
| language | en |
| locations[0].id | doi:10.5121/csit.2025.150903 |
| locations[0].is_oa | True |
| locations[0].source | |
| locations[0].license | |
| locations[0].pdf_url | https://doi.org/10.5121/csit.2025.150903 |
| locations[0].version | publishedVersion |
| locations[0].raw_type | proceedings-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Advanced Natural Language Processing 2025 |
| locations[0].landing_page_url | https://doi.org/10.5121/csit.2025.150903 |
| locations[1].id | pmh:oai:arXiv.org:2505.17636 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by-sa |
| locations[1].pdf_url | https://arxiv.org/pdf/2505.17636 |
| locations[1].version | submittedVersion |
| locations[1].raw_type | text |
| locations[1].license_id | https://openalex.org/licenses/cc-by-sa |
| locations[1].is_accepted | False |
| locations[1].is_published | False |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | http://arxiv.org/abs/2505.17636 |
| indexed_in | arxiv, crossref |
| authorships[0].author.id | https://openalex.org/A5117602716 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Jonathan Bennion |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Jonathan Bennion |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5021733871 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-8624-1494 |
| authorships[1].author.display_name | Sanjay Ghosh |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Shaona Ghosh |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5103749697 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | M. P. Singh |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Mantek Singh |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5049618494 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Nouha Dziri |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Nouha Dziri |
| authorships[3].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.5121/csit.2025.150903 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multidimensional Analysis |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T12423 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.878600001335144 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1712 |
| primary_topic.subfield.display_name | Software |
| primary_topic.display_name | Software Reliability and Analysis Research |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2095582735, https://openalex.org/W2059318893, https://openalex.org/W1965698851, https://openalex.org/W834942123, https://openalex.org/W4232542516, https://openalex.org/W1967331680, https://openalex.org/W3176637561 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | doi:10.5121/csit.2025.150903 |
| best_oa_location.is_oa | True |
| best_oa_location.source | |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://doi.org/10.5121/csit.2025.150903 |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | proceedings-article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Advanced Natural Language Processing 2025 |
| best_oa_location.landing_page_url | https://doi.org/10.5121/csit.2025.150903 |
| primary_location.id | doi:10.5121/csit.2025.150903 |
| primary_location.is_oa | True |
| primary_location.source | |
| primary_location.license | |
| primary_location.pdf_url | https://doi.org/10.5121/csit.2025.150903 |
| primary_location.version | publishedVersion |
| primary_location.raw_type | proceedings-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Advanced Natural Language Processing 2025 |
| primary_location.landing_page_url | https://doi.org/10.5121/csit.2025.150903 |
| publication_date | 2025-05-20 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.AI | 1, 88, 124 |
| abstract_inverted_index.We | 38 |
| abstract_inverted_index.as | 76, 78 |
| abstract_inverted_index.in | 63, 93, 123, 130 |
| abstract_inverted_index.is | 128 |
| abstract_inverted_index.of | 13, 17, 74, 113, 121 |
| abstract_inverted_index.on | 53 |
| abstract_inverted_index.to | 7, 69 |
| abstract_inverted_index.Our | 15, 82, 99 |
| abstract_inverted_index.and | 32, 72 |
| abstract_inverted_index.for | 49, 91, 102 |
| abstract_inverted_index.six | 40 |
| abstract_inverted_index.the | 118, 131 |
| abstract_inverted_index.LLMs | 9 |
| abstract_inverted_index.UMAP | 29 |
| abstract_inverted_index.been | 5 |
| abstract_inverted_index.data | 70 |
| abstract_inverted_index.five | 18 |
| abstract_inverted_index.gaps | 95 |
| abstract_inverted_index.harm | 42, 75 |
| abstract_inverted_index.have | 4 |
| abstract_inverted_index.more | 110 |
| abstract_inverted_index.that | 115, 127 |
| abstract_inverted_index.use, | 125 |
| abstract_inverted_index.well | 77 |
| abstract_inverted_index.with | 44 |
| abstract_inverted_index.among | 87 |
| abstract_inverted_index.harm. | 14 |
| abstract_inverted_index.harms | 122 |
| abstract_inverted_index.offer | 79 |
| abstract_inverted_index.using | 28 |
| abstract_inverted_index.while | 56 |
| abstract_inverted_index.across | 106 |
| abstract_inverted_index.kmeans | 33 |
| abstract_inverted_index.length | 65 |
| abstract_inverted_index.prompt | 64 |
| abstract_inverted_index.safety | 2, 22, 107 |
| abstract_inverted_index.score: | 36 |
| abstract_inverted_index.0.470). | 37 |
| abstract_inverted_index.Various | 0 |
| abstract_inverted_index.address | 117 |
| abstract_inverted_index.against | 10 |
| abstract_inverted_index.defined | 129 |
| abstract_inverted_index.despite | 96 |
| abstract_inverted_index.enables | 109 |
| abstract_inverted_index.focuses | 51 |
| abstract_inverted_index.future. | 132 |
| abstract_inverted_index.heavily | 52 |
| abstract_inverted_index.however | 126 |
| abstract_inverted_index.measure | 8 |
| abstract_inverted_index.primary | 41 |
| abstract_inverted_index.privacy | 54 |
| abstract_inverted_index.reveals | 24 |
| abstract_inverted_index.topical | 97 |
| abstract_inverted_index.varying | 45 |
| abstract_inverted_index.allowing | 90 |
| abstract_inverted_index.analysis | 83 |
| abstract_inverted_index.clusters | 27 |
| abstract_inverted_index.context. | 81 |
| abstract_inverted_index.coverage | 94 |
| abstract_inverted_index.datasets | 3, 114 |
| abstract_inverted_index.distinct | 25 |
| abstract_inverted_index.evolving | 11, 119 |
| abstract_inverted_index.example, | 50 |
| abstract_inverted_index.identify | 39 |
| abstract_inverted_index.possible | 80 |
| abstract_inverted_index.recently | 19 |
| abstract_inverted_index.semantic | 26, 104 |
| abstract_inverted_index.suggests | 67 |
| abstract_inverted_index.targeted | 111 |
| abstract_inverted_index.GretelAI, | 48 |
| abstract_inverted_index.analyzing | 103 |
| abstract_inverted_index.benchmark | 46, 85 |
| abstract_inverted_index.concerns, | 55 |
| abstract_inverted_index.confounds | 68 |
| abstract_inverted_index.developed | 6 |
| abstract_inverted_index.framework | 101 |
| abstract_inverted_index.landscape | 120 |
| abstract_inverted_index.published | 20 |
| abstract_inverted_index.reduction | 31 |
| abstract_inverted_index.self-harm | 59 |
| abstract_inverted_index.benchmarks | 23, 108 |
| abstract_inverted_index.categories | 43 |
| abstract_inverted_index.clustering | 34 |
| abstract_inverted_index.collection | 71 |
| abstract_inverted_index.emphasizes | 58 |
| abstract_inverted_index.evaluation | 16 |
| abstract_inverted_index.quantifies | 84 |
| abstract_inverted_index.scenarios. | 60 |
| abstract_inverted_index.(silhouette | 35 |
| abstract_inverted_index.Significant | 61 |
| abstract_inverted_index.benchmarks, | 89 |
| abstract_inverted_index.development | 112 |
| abstract_inverted_index.differences | 62 |
| abstract_inverted_index.open-source | 21 |
| abstract_inverted_index.WildGuardMix | 57 |
| abstract_inverted_index.distribution | 66 |
| abstract_inverted_index.quantitative | 100 |
| abstract_inverted_index.transparency | 92 |
| abstract_inverted_index.orthogonality | 86, 105 |
| abstract_inverted_index.similarities. | 98 |
| abstract_inverted_index.dimensionality | 30 |
| abstract_inverted_index.comprehensively | 116 |
| abstract_inverted_index.interpretations | 12, 73 |
| abstract_inverted_index.representation. | 47 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile.value | 0.18118949 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |