Efficiency vs. Alignment: Investigating Safety and Fairness Risks in Parameter-Efficient Fine-Tuning of LLMs Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2511.00382
Organizations are increasingly adopting and adapting Large Language Models (LLMs) hosted on public repositories such as HuggingFace. Although these adaptations often improve performance on specialized downstream tasks, recent evidence indicates that they can also degrade a model's safety or fairness. Since different fine-tuning techniques may exert distinct effects on these critical dimensions, this study undertakes a systematic assessment of their trade-offs. Four widely used Parameter-Efficient Fine-Tuning methods, LoRA, IA3, Prompt-Tuning, and P-Tuning, are applied to four instruction-tuned model families (Meta-Llama-3-8B, Qwen2.5-7B, Mistral-7B, and Gemma-7B). In total, 235 fine-tuned variants are evaluated across eleven safety hazard categories and nine demographic fairness dimensions. The results show that adapter-based approaches (LoRA, IA3) tend to improve safety scores and are the least disruptive to fairness, retaining higher accuracy and lower bias scores. In contrast, prompt-based methods (Prompt-Tuning and P-Tuning) generally reduce safety and cause larger fairness regressions, with decreased accuracy and increased bias. Alignment shifts are strongly moderated by base model type: LLaMA remains stable, Qwen records modest gains, Gemma experiences the steepest safety decline, and Mistral, which is released without an internal moderation layer, displays the greatest variance. Improvements in safety do not necessarily translate into improvements in fairness, and no single configuration optimizes all fairness metrics simultaneously, indicating an inherent trade-off between these objectives. These findings suggest a practical guideline for safety-critical deployments: begin with a well-aligned base model, favour adapter-based PEFT, and conduct category-specific audits of both safety and fairness.
Related Topics
- Type
- preprint
- Landing Page
- http://arxiv.org/abs/2511.00382
- https://arxiv.org/pdf/2511.00382
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415937521
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415937521Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2511.00382Digital Object Identifier
- Title
-
Efficiency vs. Alignment: Investigating Safety and Fairness Risks in Parameter-Efficient Fine-Tuning of LLMsWork title
- Type
-
preprintOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-11-01Full publication date if available
- Authors
-
Mina Taraghi, Yann Pequignot, Amin Nikanjam, Mohamed Amine Merzouk, Foutse KhomhList of authors in order
- Landing page
-
https://arxiv.org/abs/2511.00382Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2511.00382Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2511.00382Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415937521 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2511.00382 |
| ids.doi | https://doi.org/10.48550/arxiv.2511.00382 |
| ids.openalex | https://openalex.org/W4415937521 |
| fwci | |
| type | preprint |
| title | Efficiency vs. Alignment: Investigating Safety and Fairness Risks in Parameter-Efficient Fine-Tuning of LLMs |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | |
| locations[0].id | pmh:oai:arXiv.org:2511.00382 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2511.00382 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2511.00382 |
| locations[1].id | doi:10.48550/arxiv.2511.00382 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2511.00382 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5023836003 |
| authorships[0].author.orcid | https://orcid.org/0009-0007-8250-4200 |
| authorships[0].author.display_name | Mina Taraghi |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Taraghi, Mina |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5059705050 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-1691-1020 |
| authorships[1].author.display_name | Yann Pequignot |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Pequignot, Yann |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5079607563 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-0440-6839 |
| authorships[2].author.display_name | Amin Nikanjam |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Nikanjam, Amin |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5042550145 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-8016-6806 |
| authorships[3].author.display_name | Mohamed Amine Merzouk |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Merzouk, Mohamed Amine |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5040665094 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Foutse Khomh |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Khomh, Foutse |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2511.00382 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-11-05T00:00:00 |
| display_name | Efficiency vs. Alignment: Investigating Safety and Fairness Risks in Parameter-Efficient Fine-Tuning of LLMs |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2511.00382 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2511.00382 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2511.00382 |
| primary_location.id | pmh:oai:arXiv.org:2511.00382 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2511.00382 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2511.00382 |
| publication_date | 2025-11-01 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 35, 55, 215, 223 |
| abstract_inverted_index.In | 84, 128 |
| abstract_inverted_index.an | 177, 206 |
| abstract_inverted_index.as | 15 |
| abstract_inverted_index.by | 154 |
| abstract_inverted_index.do | 188 |
| abstract_inverted_index.in | 186, 194 |
| abstract_inverted_index.is | 174 |
| abstract_inverted_index.no | 197 |
| abstract_inverted_index.of | 58, 234 |
| abstract_inverted_index.on | 11, 23, 48 |
| abstract_inverted_index.or | 38 |
| abstract_inverted_index.to | 74, 110, 119 |
| abstract_inverted_index.235 | 86 |
| abstract_inverted_index.The | 101 |
| abstract_inverted_index.all | 201 |
| abstract_inverted_index.and | 4, 70, 82, 96, 114, 124, 133, 138, 146, 171, 196, 230, 237 |
| abstract_inverted_index.are | 1, 72, 89, 115, 151 |
| abstract_inverted_index.can | 32 |
| abstract_inverted_index.for | 218 |
| abstract_inverted_index.may | 44 |
| abstract_inverted_index.not | 189 |
| abstract_inverted_index.the | 116, 167, 182 |
| abstract_inverted_index.Four | 61 |
| abstract_inverted_index.IA3) | 108 |
| abstract_inverted_index.IA3, | 68 |
| abstract_inverted_index.Qwen | 161 |
| abstract_inverted_index.also | 33 |
| abstract_inverted_index.base | 155, 225 |
| abstract_inverted_index.bias | 126 |
| abstract_inverted_index.both | 235 |
| abstract_inverted_index.four | 75 |
| abstract_inverted_index.into | 192 |
| abstract_inverted_index.nine | 97 |
| abstract_inverted_index.show | 103 |
| abstract_inverted_index.such | 14 |
| abstract_inverted_index.tend | 109 |
| abstract_inverted_index.that | 30, 104 |
| abstract_inverted_index.they | 31 |
| abstract_inverted_index.this | 52 |
| abstract_inverted_index.used | 63 |
| abstract_inverted_index.with | 143, 222 |
| abstract_inverted_index.Gemma | 165 |
| abstract_inverted_index.LLaMA | 158 |
| abstract_inverted_index.Large | 6 |
| abstract_inverted_index.LoRA, | 67 |
| abstract_inverted_index.PEFT, | 229 |
| abstract_inverted_index.Since | 40 |
| abstract_inverted_index.These | 212 |
| abstract_inverted_index.begin | 221 |
| abstract_inverted_index.bias. | 148 |
| abstract_inverted_index.cause | 139 |
| abstract_inverted_index.exert | 45 |
| abstract_inverted_index.least | 117 |
| abstract_inverted_index.lower | 125 |
| abstract_inverted_index.model | 77, 156 |
| abstract_inverted_index.often | 20 |
| abstract_inverted_index.study | 53 |
| abstract_inverted_index.their | 59 |
| abstract_inverted_index.these | 18, 49, 210 |
| abstract_inverted_index.type: | 157 |
| abstract_inverted_index.which | 173 |
| abstract_inverted_index.(LLMs) | 9 |
| abstract_inverted_index.(LoRA, | 107 |
| abstract_inverted_index.Models | 8 |
| abstract_inverted_index.across | 91 |
| abstract_inverted_index.audits | 233 |
| abstract_inverted_index.eleven | 92 |
| abstract_inverted_index.favour | 227 |
| abstract_inverted_index.gains, | 164 |
| abstract_inverted_index.hazard | 94 |
| abstract_inverted_index.higher | 122 |
| abstract_inverted_index.hosted | 10 |
| abstract_inverted_index.larger | 140 |
| abstract_inverted_index.layer, | 180 |
| abstract_inverted_index.model, | 226 |
| abstract_inverted_index.modest | 163 |
| abstract_inverted_index.public | 12 |
| abstract_inverted_index.recent | 27 |
| abstract_inverted_index.reduce | 136 |
| abstract_inverted_index.safety | 37, 93, 112, 137, 169, 187, 236 |
| abstract_inverted_index.scores | 113 |
| abstract_inverted_index.shifts | 150 |
| abstract_inverted_index.single | 198 |
| abstract_inverted_index.tasks, | 26 |
| abstract_inverted_index.total, | 85 |
| abstract_inverted_index.widely | 62 |
| abstract_inverted_index.applied | 73 |
| abstract_inverted_index.between | 209 |
| abstract_inverted_index.conduct | 231 |
| abstract_inverted_index.degrade | 34 |
| abstract_inverted_index.effects | 47 |
| abstract_inverted_index.improve | 21, 111 |
| abstract_inverted_index.methods | 131 |
| abstract_inverted_index.metrics | 203 |
| abstract_inverted_index.model's | 36 |
| abstract_inverted_index.records | 162 |
| abstract_inverted_index.remains | 159 |
| abstract_inverted_index.results | 102 |
| abstract_inverted_index.scores. | 127 |
| abstract_inverted_index.stable, | 160 |
| abstract_inverted_index.suggest | 214 |
| abstract_inverted_index.without | 176 |
| abstract_inverted_index.Although | 17 |
| abstract_inverted_index.Language | 7 |
| abstract_inverted_index.Mistral, | 172 |
| abstract_inverted_index.accuracy | 123, 145 |
| abstract_inverted_index.adapting | 5 |
| abstract_inverted_index.adopting | 3 |
| abstract_inverted_index.critical | 50 |
| abstract_inverted_index.decline, | 170 |
| abstract_inverted_index.displays | 181 |
| abstract_inverted_index.distinct | 46 |
| abstract_inverted_index.evidence | 28 |
| abstract_inverted_index.fairness | 99, 141, 202 |
| abstract_inverted_index.families | 78 |
| abstract_inverted_index.findings | 213 |
| abstract_inverted_index.greatest | 183 |
| abstract_inverted_index.inherent | 207 |
| abstract_inverted_index.internal | 178 |
| abstract_inverted_index.methods, | 66 |
| abstract_inverted_index.released | 175 |
| abstract_inverted_index.steepest | 168 |
| abstract_inverted_index.strongly | 152 |
| abstract_inverted_index.variants | 88 |
| abstract_inverted_index.Alignment | 149 |
| abstract_inverted_index.P-Tuning) | 134 |
| abstract_inverted_index.P-Tuning, | 71 |
| abstract_inverted_index.contrast, | 129 |
| abstract_inverted_index.decreased | 144 |
| abstract_inverted_index.different | 41 |
| abstract_inverted_index.evaluated | 90 |
| abstract_inverted_index.fairness, | 120, 195 |
| abstract_inverted_index.fairness. | 39, 238 |
| abstract_inverted_index.generally | 135 |
| abstract_inverted_index.guideline | 217 |
| abstract_inverted_index.increased | 147 |
| abstract_inverted_index.indicates | 29 |
| abstract_inverted_index.moderated | 153 |
| abstract_inverted_index.optimizes | 200 |
| abstract_inverted_index.practical | 216 |
| abstract_inverted_index.retaining | 121 |
| abstract_inverted_index.trade-off | 208 |
| abstract_inverted_index.translate | 191 |
| abstract_inverted_index.variance. | 184 |
| abstract_inverted_index.Gemma-7B). | 83 |
| abstract_inverted_index.approaches | 106 |
| abstract_inverted_index.assessment | 57 |
| abstract_inverted_index.categories | 95 |
| abstract_inverted_index.disruptive | 118 |
| abstract_inverted_index.downstream | 25 |
| abstract_inverted_index.fine-tuned | 87 |
| abstract_inverted_index.indicating | 205 |
| abstract_inverted_index.moderation | 179 |
| abstract_inverted_index.systematic | 56 |
| abstract_inverted_index.techniques | 43 |
| abstract_inverted_index.undertakes | 54 |
| abstract_inverted_index.Fine-Tuning | 65 |
| abstract_inverted_index.Mistral-7B, | 81 |
| abstract_inverted_index.Qwen2.5-7B, | 80 |
| abstract_inverted_index.adaptations | 19 |
| abstract_inverted_index.demographic | 98 |
| abstract_inverted_index.dimensions, | 51 |
| abstract_inverted_index.dimensions. | 100 |
| abstract_inverted_index.experiences | 166 |
| abstract_inverted_index.fine-tuning | 42 |
| abstract_inverted_index.necessarily | 190 |
| abstract_inverted_index.objectives. | 211 |
| abstract_inverted_index.performance | 22 |
| abstract_inverted_index.specialized | 24 |
| abstract_inverted_index.trade-offs. | 60 |
| abstract_inverted_index.HuggingFace. | 16 |
| abstract_inverted_index.Improvements | 185 |
| abstract_inverted_index.deployments: | 220 |
| abstract_inverted_index.improvements | 193 |
| abstract_inverted_index.increasingly | 2 |
| abstract_inverted_index.prompt-based | 130 |
| abstract_inverted_index.regressions, | 142 |
| abstract_inverted_index.repositories | 13 |
| abstract_inverted_index.well-aligned | 224 |
| abstract_inverted_index.Organizations | 0 |
| abstract_inverted_index.adapter-based | 105, 228 |
| abstract_inverted_index.configuration | 199 |
| abstract_inverted_index.(Prompt-Tuning | 132 |
| abstract_inverted_index.Prompt-Tuning, | 69 |
| abstract_inverted_index.safety-critical | 219 |
| abstract_inverted_index.simultaneously, | 204 |
| abstract_inverted_index.(Meta-Llama-3-8B, | 79 |
| abstract_inverted_index.category-specific | 232 |
| abstract_inverted_index.instruction-tuned | 76 |
| abstract_inverted_index.Parameter-Efficient | 64 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |