Exploring the Plausibility of Hate and Counter Speech Detectors with Explainable AI Article Swipe
Adrian Jaques Böck
,
Djordje Slijepčević
,
Matthias Zeppelzauer
·
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2407.20274
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2407.20274
In this paper we investigate the explainability of transformer models and their plausibility for hate speech and counter speech detection. We compare representatives of four different explainability approaches, i.e., gradient-based, perturbation-based, attention-based, and prototype-based approaches, and analyze them quantitatively with an ablation study and qualitatively in a user study. Results show that perturbation-based explainability performs best, followed by gradient-based and attention-based explainability. Prototypebased experiments did not yield useful results. Overall, we observe that explainability strongly supports the users in better understanding the model predictions.
Related Topics
Concepts
Metadata
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2407.20274
- https://arxiv.org/pdf/2407.20274
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4401202655
All OpenAlex metadata
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4401202655Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2407.20274Digital Object Identifier
- Title
-
Exploring the Plausibility of Hate and Counter Speech Detectors with Explainable AIWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-07-25Full publication date if available
- Authors
-
Adrian Jaques Böck, Djordje Slijepčević, Matthias ZeppelzauerList of authors in order
- Landing page
-
https://arxiv.org/abs/2407.20274Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2407.20274Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2407.20274Direct OA link when available
- Concepts
-
Detector, Linguistics, Psychology, Computer science, Speech recognition, Natural language processing, Philosophy, TelecommunicationsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4401202655 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2407.20274 |
| ids.doi | https://doi.org/10.48550/arxiv.2407.20274 |
| ids.openalex | https://openalex.org/W4401202655 |
| fwci | |
| type | preprint |
| title | Exploring the Plausibility of Hate and Counter Speech Detectors with Explainable AI |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11689 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9825999736785889 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Adversarial Robustness in Machine Learning |
| topics[1].id | https://openalex.org/T12262 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9800000190734863 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Hate Speech and Cyberbullying Detection |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C94915269 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6116741895675659 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1834857 |
| concepts[0].display_name | Detector |
| concepts[1].id | https://openalex.org/C41895202 |
| concepts[1].level | 1 |
| concepts[1].score | 0.401358962059021 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[1].display_name | Linguistics |
| concepts[2].id | https://openalex.org/C15744967 |
| concepts[2].level | 0 |
| concepts[2].score | 0.39310216903686523 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[2].display_name | Psychology |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.37139618396759033 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C28490314 |
| concepts[4].level | 1 |
| concepts[4].score | 0.3694204092025757 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q189436 |
| concepts[4].display_name | Speech recognition |
| concepts[5].id | https://openalex.org/C204321447 |
| concepts[5].level | 1 |
| concepts[5].score | 0.3207957446575165 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[5].display_name | Natural language processing |
| concepts[6].id | https://openalex.org/C138885662 |
| concepts[6].level | 0 |
| concepts[6].score | 0.21885812282562256 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[6].display_name | Philosophy |
| concepts[7].id | https://openalex.org/C76155785 |
| concepts[7].level | 1 |
| concepts[7].score | 0.10776135325431824 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q418 |
| concepts[7].display_name | Telecommunications |
| keywords[0].id | https://openalex.org/keywords/detector |
| keywords[0].score | 0.6116741895675659 |
| keywords[0].display_name | Detector |
| keywords[1].id | https://openalex.org/keywords/linguistics |
| keywords[1].score | 0.401358962059021 |
| keywords[1].display_name | Linguistics |
| keywords[2].id | https://openalex.org/keywords/psychology |
| keywords[2].score | 0.39310216903686523 |
| keywords[2].display_name | Psychology |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.37139618396759033 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/speech-recognition |
| keywords[4].score | 0.3694204092025757 |
| keywords[4].display_name | Speech recognition |
| keywords[5].id | https://openalex.org/keywords/natural-language-processing |
| keywords[5].score | 0.3207957446575165 |
| keywords[5].display_name | Natural language processing |
| keywords[6].id | https://openalex.org/keywords/philosophy |
| keywords[6].score | 0.21885812282562256 |
| keywords[6].display_name | Philosophy |
| keywords[7].id | https://openalex.org/keywords/telecommunications |
| keywords[7].score | 0.10776135325431824 |
| keywords[7].display_name | Telecommunications |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2407.20274 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://arxiv.org/pdf/2407.20274 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2407.20274 |
| locations[1].id | doi:10.48550/arxiv.2407.20274 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2407.20274 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5109570079 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-1972-0473 |
| authorships[0].author.display_name | Adrian Jaques Böck |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Böck, Adrian Jaques |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5055814440 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-2295-7466 |
| authorships[1].author.display_name | Djordje Slijepčević |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Slijepčević, Djordje |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5060926433 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-0413-4746 |
| authorships[2].author.display_name | Matthias Zeppelzauer |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Zeppelzauer, Matthias |
| authorships[2].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2407.20274 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-08-01T00:00:00 |
| display_name | Exploring the Plausibility of Hate and Counter Speech Detectors with Explainable AI |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11689 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9825999736785889 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Adversarial Robustness in Machine Learning |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2366906938, https://openalex.org/W2349391998, https://openalex.org/W4205655149, https://openalex.org/W2000775715, https://openalex.org/W2931662336, https://openalex.org/W2795393339, https://openalex.org/W2626393719, https://openalex.org/W4390618967 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2407.20274 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2407.20274 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2407.20274 |
| primary_location.id | pmh:oai:arXiv.org:2407.20274 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://arxiv.org/pdf/2407.20274 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2407.20274 |
| publication_date | 2024-07-25 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 46 |
| abstract_inverted_index.In | 0 |
| abstract_inverted_index.We | 20 |
| abstract_inverted_index.an | 40 |
| abstract_inverted_index.by | 57 |
| abstract_inverted_index.in | 45, 78 |
| abstract_inverted_index.of | 7, 23 |
| abstract_inverted_index.we | 3, 70 |
| abstract_inverted_index.and | 10, 16, 32, 35, 43, 59 |
| abstract_inverted_index.did | 64 |
| abstract_inverted_index.for | 13 |
| abstract_inverted_index.not | 65 |
| abstract_inverted_index.the | 5, 76, 81 |
| abstract_inverted_index.four | 24 |
| abstract_inverted_index.hate | 14 |
| abstract_inverted_index.show | 50 |
| abstract_inverted_index.that | 51, 72 |
| abstract_inverted_index.them | 37 |
| abstract_inverted_index.this | 1 |
| abstract_inverted_index.user | 47 |
| abstract_inverted_index.with | 39 |
| abstract_inverted_index.best, | 55 |
| abstract_inverted_index.i.e., | 28 |
| abstract_inverted_index.model | 82 |
| abstract_inverted_index.paper | 2 |
| abstract_inverted_index.study | 42 |
| abstract_inverted_index.their | 11 |
| abstract_inverted_index.users | 77 |
| abstract_inverted_index.yield | 66 |
| abstract_inverted_index.better | 79 |
| abstract_inverted_index.models | 9 |
| abstract_inverted_index.speech | 15, 18 |
| abstract_inverted_index.study. | 48 |
| abstract_inverted_index.useful | 67 |
| abstract_inverted_index.Results | 49 |
| abstract_inverted_index.analyze | 36 |
| abstract_inverted_index.compare | 21 |
| abstract_inverted_index.counter | 17 |
| abstract_inverted_index.observe | 71 |
| abstract_inverted_index.Overall, | 69 |
| abstract_inverted_index.ablation | 41 |
| abstract_inverted_index.followed | 56 |
| abstract_inverted_index.performs | 54 |
| abstract_inverted_index.results. | 68 |
| abstract_inverted_index.strongly | 74 |
| abstract_inverted_index.supports | 75 |
| abstract_inverted_index.different | 25 |
| abstract_inverted_index.detection. | 19 |
| abstract_inverted_index.approaches, | 27, 34 |
| abstract_inverted_index.experiments | 63 |
| abstract_inverted_index.investigate | 4 |
| abstract_inverted_index.transformer | 8 |
| abstract_inverted_index.plausibility | 12 |
| abstract_inverted_index.predictions. | 83 |
| abstract_inverted_index.qualitatively | 44 |
| abstract_inverted_index.understanding | 80 |
| abstract_inverted_index.Prototypebased | 62 |
| abstract_inverted_index.explainability | 6, 26, 53, 73 |
| abstract_inverted_index.gradient-based | 58 |
| abstract_inverted_index.quantitatively | 38 |
| abstract_inverted_index.attention-based | 60 |
| abstract_inverted_index.explainability. | 61 |
| abstract_inverted_index.gradient-based, | 29 |
| abstract_inverted_index.prototype-based | 33 |
| abstract_inverted_index.representatives | 22 |
| abstract_inverted_index.attention-based, | 31 |
| abstract_inverted_index.perturbation-based | 52 |
| abstract_inverted_index.perturbation-based, | 30 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |