Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2506.13746
Phishing attacks remain one of the most prevalent and persistent cybersecurity threat with attackers continuously evolving and intensifying tactics to evade the general detection system. Despite significant advances in artificial intelligence and machine learning, faithfully reproducing the interpretable reasoning with classification and explainability that underpin phishing judgments remains challenging. Due to recent advancement in Natural Language Processing, Large Language Models (LLMs) show a promising direction and potential for improving domain specific phishing classification tasks. However, enhancing the reliability and robustness of classification models requires not only accurate predictions from LLMs but also consistent and trustworthy explanations aligning with those predictions. Therefore, a key question remains: can LLMs not only classify phishing emails accurately but also generate explanations that are reliably aligned with their predictions and internally self-consistent? To answer these questions, we have fine-tuned transformer based models, including BERT, Llama models, and Wizard, to improve domain relevance and make them more tailored to phishing specific distinctions, using Binary Sequence Classification, Contrastive Learning (CL) and Direct Preference Optimization (DPO). To that end, we examined their performance in phishing classification and explainability by applying the ConsistenCy measure based on SHAPley values (CC SHAP), which measures prediction explanation token alignment to test the model's internal faithfulness and consistency and uncover the rationale behind its predictions and reasoning. Overall, our findings show that Llama models exhibit stronger prediction explanation token alignment with higher CC SHAP scores despite lacking reliable decision making accuracy, whereas Wizard achieves better prediction accuracy but lower CC SHAP scores.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2506.13746
- https://arxiv.org/pdf/2506.13746
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415109347
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415109347Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2506.13746Digital Object Identifier
- Title
-
Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and ExplainabilityWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-06-16Full publication date if available
- Authors
-
Shova Kuikel, Aritran Piplai, Palvi AggarwalList of authors in order
- Landing page
-
https://arxiv.org/abs/2506.13746Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2506.13746Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2506.13746Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415109347 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2506.13746 |
| ids.doi | https://doi.org/10.48550/arxiv.2506.13746 |
| ids.openalex | https://openalex.org/W4415109347 |
| fwci | |
| type | preprint |
| title | Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11147 |
| topics[0].field.id | https://openalex.org/fields/33 |
| topics[0].field.display_name | Social Sciences |
| topics[0].score | 0.9733999967575073 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/3312 |
| topics[0].subfield.display_name | Sociology and Political Science |
| topics[0].display_name | Misinformation and Its Impacts |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9528999924659729 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| topics[2].id | https://openalex.org/T10664 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.916100025177002 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Sentiment Analysis and Opinion Mining |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2506.13746 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2506.13746 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2506.13746 |
| locations[1].id | doi:10.48550/arxiv.2506.13746 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2506.13746 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5113371669 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Shova Kuikel |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Kuikel, Shova |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5014855298 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-6437-1324 |
| authorships[1].author.display_name | Aritran Piplai |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Piplai, Aritran |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5090959301 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-2488-8959 |
| authorships[2].author.display_name | Palvi Aggarwal |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Aggarwal, Palvi |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2506.13746 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-13T00:00:00 |
| display_name | Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11147 |
| primary_topic.field.id | https://openalex.org/fields/33 |
| primary_topic.field.display_name | Social Sciences |
| primary_topic.score | 0.9733999967575073 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/3312 |
| primary_topic.subfield.display_name | Sociology and Political Science |
| primary_topic.display_name | Misinformation and Its Impacts |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2506.13746 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2506.13746 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2506.13746 |
| primary_location.id | pmh:oai:arXiv.org:2506.13746 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2506.13746 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2506.13746 |
| publication_date | 2025-06-16 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 62, 101 |
| abstract_inverted_index.CC | 229, 246 |
| abstract_inverted_index.To | 127, 168 |
| abstract_inverted_index.by | 180 |
| abstract_inverted_index.in | 28, 53, 175 |
| abstract_inverted_index.of | 4, 80 |
| abstract_inverted_index.on | 186 |
| abstract_inverted_index.to | 19, 50, 143, 152, 197 |
| abstract_inverted_index.we | 131, 171 |
| abstract_inverted_index.(CC | 189 |
| abstract_inverted_index.Due | 49 |
| abstract_inverted_index.and | 8, 16, 31, 41, 65, 78, 93, 124, 141, 147, 163, 178, 203, 205, 212 |
| abstract_inverted_index.are | 118 |
| abstract_inverted_index.but | 90, 113, 244 |
| abstract_inverted_index.can | 105 |
| abstract_inverted_index.for | 67 |
| abstract_inverted_index.its | 210 |
| abstract_inverted_index.key | 102 |
| abstract_inverted_index.not | 84, 107 |
| abstract_inverted_index.one | 3 |
| abstract_inverted_index.our | 215 |
| abstract_inverted_index.the | 5, 21, 36, 76, 182, 199, 207 |
| abstract_inverted_index.(CL) | 162 |
| abstract_inverted_index.LLMs | 89, 106 |
| abstract_inverted_index.SHAP | 230, 247 |
| abstract_inverted_index.also | 91, 114 |
| abstract_inverted_index.end, | 170 |
| abstract_inverted_index.from | 88 |
| abstract_inverted_index.have | 132 |
| abstract_inverted_index.make | 148 |
| abstract_inverted_index.more | 150 |
| abstract_inverted_index.most | 6 |
| abstract_inverted_index.only | 85, 108 |
| abstract_inverted_index.show | 61, 217 |
| abstract_inverted_index.test | 198 |
| abstract_inverted_index.that | 43, 117, 169, 218 |
| abstract_inverted_index.them | 149 |
| abstract_inverted_index.with | 12, 39, 97, 121, 227 |
| abstract_inverted_index.BERT, | 138 |
| abstract_inverted_index.Large | 57 |
| abstract_inverted_index.Llama | 139, 219 |
| abstract_inverted_index.based | 135, 185 |
| abstract_inverted_index.evade | 20 |
| abstract_inverted_index.lower | 245 |
| abstract_inverted_index.their | 122, 173 |
| abstract_inverted_index.these | 129 |
| abstract_inverted_index.those | 98 |
| abstract_inverted_index.token | 195, 225 |
| abstract_inverted_index.using | 156 |
| abstract_inverted_index.which | 191 |
| abstract_inverted_index.(DPO). | 167 |
| abstract_inverted_index.(LLMs) | 60 |
| abstract_inverted_index.Binary | 157 |
| abstract_inverted_index.Direct | 164 |
| abstract_inverted_index.Models | 59 |
| abstract_inverted_index.SHAP), | 190 |
| abstract_inverted_index.Wizard | 239 |
| abstract_inverted_index.answer | 128 |
| abstract_inverted_index.behind | 209 |
| abstract_inverted_index.better | 241 |
| abstract_inverted_index.domain | 69, 145 |
| abstract_inverted_index.emails | 111 |
| abstract_inverted_index.higher | 228 |
| abstract_inverted_index.making | 236 |
| abstract_inverted_index.models | 82, 220 |
| abstract_inverted_index.recent | 51 |
| abstract_inverted_index.remain | 2 |
| abstract_inverted_index.scores | 231 |
| abstract_inverted_index.tasks. | 73 |
| abstract_inverted_index.threat | 11 |
| abstract_inverted_index.values | 188 |
| abstract_inverted_index.Despite | 25 |
| abstract_inverted_index.Natural | 54 |
| abstract_inverted_index.SHAPley | 187 |
| abstract_inverted_index.Wizard, | 142 |
| abstract_inverted_index.aligned | 120 |
| abstract_inverted_index.attacks | 1 |
| abstract_inverted_index.despite | 232 |
| abstract_inverted_index.exhibit | 221 |
| abstract_inverted_index.general | 22 |
| abstract_inverted_index.improve | 144 |
| abstract_inverted_index.lacking | 233 |
| abstract_inverted_index.machine | 32 |
| abstract_inverted_index.measure | 184 |
| abstract_inverted_index.model's | 200 |
| abstract_inverted_index.models, | 136, 140 |
| abstract_inverted_index.remains | 47 |
| abstract_inverted_index.scores. | 248 |
| abstract_inverted_index.system. | 24 |
| abstract_inverted_index.tactics | 18 |
| abstract_inverted_index.uncover | 206 |
| abstract_inverted_index.whereas | 238 |
| abstract_inverted_index.However, | 74 |
| abstract_inverted_index.Language | 55, 58 |
| abstract_inverted_index.Learning | 161 |
| abstract_inverted_index.Overall, | 214 |
| abstract_inverted_index.Phishing | 0 |
| abstract_inverted_index.Sequence | 158 |
| abstract_inverted_index.accuracy | 243 |
| abstract_inverted_index.accurate | 86 |
| abstract_inverted_index.achieves | 240 |
| abstract_inverted_index.advances | 27 |
| abstract_inverted_index.aligning | 96 |
| abstract_inverted_index.applying | 181 |
| abstract_inverted_index.classify | 109 |
| abstract_inverted_index.decision | 235 |
| abstract_inverted_index.evolving | 15 |
| abstract_inverted_index.examined | 172 |
| abstract_inverted_index.findings | 216 |
| abstract_inverted_index.generate | 115 |
| abstract_inverted_index.internal | 201 |
| abstract_inverted_index.measures | 192 |
| abstract_inverted_index.phishing | 45, 71, 110, 153, 176 |
| abstract_inverted_index.question | 103 |
| abstract_inverted_index.reliable | 234 |
| abstract_inverted_index.reliably | 119 |
| abstract_inverted_index.remains: | 104 |
| abstract_inverted_index.requires | 83 |
| abstract_inverted_index.specific | 70, 154 |
| abstract_inverted_index.stronger | 222 |
| abstract_inverted_index.tailored | 151 |
| abstract_inverted_index.underpin | 44 |
| abstract_inverted_index.accuracy, | 237 |
| abstract_inverted_index.alignment | 196, 226 |
| abstract_inverted_index.attackers | 13 |
| abstract_inverted_index.detection | 23 |
| abstract_inverted_index.direction | 64 |
| abstract_inverted_index.enhancing | 75 |
| abstract_inverted_index.improving | 68 |
| abstract_inverted_index.including | 137 |
| abstract_inverted_index.judgments | 46 |
| abstract_inverted_index.learning, | 33 |
| abstract_inverted_index.potential | 66 |
| abstract_inverted_index.prevalent | 7 |
| abstract_inverted_index.promising | 63 |
| abstract_inverted_index.rationale | 208 |
| abstract_inverted_index.reasoning | 38 |
| abstract_inverted_index.relevance | 146 |
| abstract_inverted_index.Preference | 165 |
| abstract_inverted_index.Therefore, | 100 |
| abstract_inverted_index.accurately | 112 |
| abstract_inverted_index.artificial | 29 |
| abstract_inverted_index.consistent | 92 |
| abstract_inverted_index.faithfully | 34 |
| abstract_inverted_index.fine-tuned | 133 |
| abstract_inverted_index.internally | 125 |
| abstract_inverted_index.persistent | 9 |
| abstract_inverted_index.prediction | 193, 223, 242 |
| abstract_inverted_index.questions, | 130 |
| abstract_inverted_index.reasoning. | 213 |
| abstract_inverted_index.robustness | 79 |
| abstract_inverted_index.ConsistenCy | 183 |
| abstract_inverted_index.Contrastive | 160 |
| abstract_inverted_index.Processing, | 56 |
| abstract_inverted_index.advancement | 52 |
| abstract_inverted_index.consistency | 204 |
| abstract_inverted_index.explanation | 194, 224 |
| abstract_inverted_index.performance | 174 |
| abstract_inverted_index.predictions | 87, 123, 211 |
| abstract_inverted_index.reliability | 77 |
| abstract_inverted_index.reproducing | 35 |
| abstract_inverted_index.significant | 26 |
| abstract_inverted_index.transformer | 134 |
| abstract_inverted_index.trustworthy | 94 |
| abstract_inverted_index.Optimization | 166 |
| abstract_inverted_index.challenging. | 48 |
| abstract_inverted_index.continuously | 14 |
| abstract_inverted_index.explanations | 95, 116 |
| abstract_inverted_index.faithfulness | 202 |
| abstract_inverted_index.intelligence | 30 |
| abstract_inverted_index.intensifying | 17 |
| abstract_inverted_index.predictions. | 99 |
| abstract_inverted_index.cybersecurity | 10 |
| abstract_inverted_index.distinctions, | 155 |
| abstract_inverted_index.interpretable | 37 |
| abstract_inverted_index.classification | 40, 72, 81, 177 |
| abstract_inverted_index.explainability | 42, 179 |
| abstract_inverted_index.Classification, | 159 |
| abstract_inverted_index.self-consistent? | 126 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |