MML: Maximal Multiverse Learning for Robust Fine-Tuning of Language Models Article Swipe
YOU?
·
· 2019
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.1911.06182
Recent state-of-the-art language models utilize a two-phase training procedure comprised of (i) unsupervised pre-training on unlabeled text, and (ii) fine-tuning for a specific supervised task. More recently, many studies have been focused on trying to improve these models by enhancing the pre-training phase, either via better choice of hyperparameters or by leveraging an improved formulation. However, the pre-training phase is computationally expensive and often done on private datasets. In this work, we present a method that leverages BERT's fine-tuning phase to its fullest, by applying an extensive number of parallel classifier heads, which are enforced to be orthogonal, while adaptively eliminating the weaker heads during training. Our method allows the model to converge to an optimal number of parallel classifiers, depending on the given dataset at hand. We conduct an extensive inter- and intra-dataset evaluations, showing that our method improves the robustness of BERT, sometimes leading to a +9\% gain in accuracy. These results highlight the importance of a proper fine-tuning procedure, especially for relatively smaller-sized datasets. Our code is attached as supplementary and our models will be made completely public.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/1911.06182
- https://arxiv.org/pdf/1911.06182
- OA Status
- green
- References
- 5
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W2986365236
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2986365236Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.1911.06182Digital Object Identifier
- Title
-
MML: Maximal Multiverse Learning for Robust Fine-Tuning of Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2019Year of publication
- Publication date
-
2019-11-05Full publication date if available
- Authors
-
Itzik Malkiel, Lior WolfList of authors in order
- Landing page
-
https://arxiv.org/abs/1911.06182Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/1911.06182Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/1911.06182Direct OA link when available
- Concepts
-
Computer science, Artificial intelligenceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- References (count)
-
5Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2986365236 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.1911.06182 |
| ids.doi | https://doi.org/10.48550/arxiv.1911.06182 |
| ids.mag | 2986365236 |
| ids.openalex | https://openalex.org/W2986365236 |
| fwci | |
| type | preprint |
| title | MML: Maximal Multiverse Learning for Robust Fine-Tuning of Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10028 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Topic Modeling |
| topics[1].id | https://openalex.org/T10181 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9998000264167786 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Natural Language Processing Techniques |
| topics[2].id | https://openalex.org/T10201 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9980000257492065 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Speech Recognition and Synthesis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.47309020161628723 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C154945302 |
| concepts[1].level | 1 |
| concepts[1].score | 0.34751981496810913 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[1].display_name | Artificial intelligence |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.47309020161628723 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[1].score | 0.34751981496810913 |
| keywords[1].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:1911.06182 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/1911.06182 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/1911.06182 |
| locations[1].id | doi:10.48550/arxiv.1911.06182 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.1911.06182 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5067773841 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-4151-9119 |
| authorships[0].author.display_name | Itzik Malkiel |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Itzik Malkiel |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5078102229 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-5578-8892 |
| authorships[1].author.display_name | Lior Wolf |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Lior Wolf |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/1911.06182 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | MML: Maximal Multiverse Learning for Robust Fine-Tuning of Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10028 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Topic Modeling |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W2382290278, https://openalex.org/W2478288626, https://openalex.org/W4391913857, https://openalex.org/W2350741829 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:1911.06182 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/1911.06182 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/1911.06182 |
| primary_location.id | pmh:oai:arXiv.org:1911.06182 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/1911.06182 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/1911.06182 |
| publication_date | 2019-11-05 |
| publication_year | 2019 |
| referenced_works | https://openalex.org/W3104033643, https://openalex.org/W2963748441, https://openalex.org/W2970597249, https://openalex.org/W2963310665, https://openalex.org/W2963846996 |
| referenced_works_count | 5 |
| abstract_inverted_index.a | 5, 21, 73, 147, 158 |
| abstract_inverted_index.In | 68 |
| abstract_inverted_index.We | 127 |
| abstract_inverted_index.an | 52, 85, 114, 129 |
| abstract_inverted_index.as | 171 |
| abstract_inverted_index.at | 125 |
| abstract_inverted_index.be | 96, 177 |
| abstract_inverted_index.by | 38, 50, 83 |
| abstract_inverted_index.in | 150 |
| abstract_inverted_index.is | 59, 169 |
| abstract_inverted_index.of | 10, 47, 88, 117, 142, 157 |
| abstract_inverted_index.on | 14, 32, 65, 121 |
| abstract_inverted_index.or | 49 |
| abstract_inverted_index.to | 34, 80, 95, 111, 113, 146 |
| abstract_inverted_index.we | 71 |
| abstract_inverted_index.(i) | 11 |
| abstract_inverted_index.Our | 106, 167 |
| abstract_inverted_index.and | 17, 62, 132, 173 |
| abstract_inverted_index.are | 93 |
| abstract_inverted_index.for | 20, 163 |
| abstract_inverted_index.its | 81 |
| abstract_inverted_index.our | 137, 174 |
| abstract_inverted_index.the | 40, 56, 101, 109, 122, 140, 155 |
| abstract_inverted_index.via | 44 |
| abstract_inverted_index.(ii) | 18 |
| abstract_inverted_index.+9\% | 148 |
| abstract_inverted_index.More | 25 |
| abstract_inverted_index.been | 30 |
| abstract_inverted_index.code | 168 |
| abstract_inverted_index.done | 64 |
| abstract_inverted_index.gain | 149 |
| abstract_inverted_index.have | 29 |
| abstract_inverted_index.made | 178 |
| abstract_inverted_index.many | 27 |
| abstract_inverted_index.that | 75, 136 |
| abstract_inverted_index.this | 69 |
| abstract_inverted_index.will | 176 |
| abstract_inverted_index.BERT, | 143 |
| abstract_inverted_index.These | 152 |
| abstract_inverted_index.given | 123 |
| abstract_inverted_index.hand. | 126 |
| abstract_inverted_index.heads | 103 |
| abstract_inverted_index.model | 110 |
| abstract_inverted_index.often | 63 |
| abstract_inverted_index.phase | 58, 79 |
| abstract_inverted_index.task. | 24 |
| abstract_inverted_index.text, | 16 |
| abstract_inverted_index.these | 36 |
| abstract_inverted_index.which | 92 |
| abstract_inverted_index.while | 98 |
| abstract_inverted_index.work, | 70 |
| abstract_inverted_index.BERT's | 77 |
| abstract_inverted_index.Recent | 0 |
| abstract_inverted_index.allows | 108 |
| abstract_inverted_index.better | 45 |
| abstract_inverted_index.choice | 46 |
| abstract_inverted_index.during | 104 |
| abstract_inverted_index.either | 43 |
| abstract_inverted_index.heads, | 91 |
| abstract_inverted_index.inter- | 131 |
| abstract_inverted_index.method | 74, 107, 138 |
| abstract_inverted_index.models | 3, 37, 175 |
| abstract_inverted_index.number | 87, 116 |
| abstract_inverted_index.phase, | 42 |
| abstract_inverted_index.proper | 159 |
| abstract_inverted_index.trying | 33 |
| abstract_inverted_index.weaker | 102 |
| abstract_inverted_index.conduct | 128 |
| abstract_inverted_index.dataset | 124 |
| abstract_inverted_index.focused | 31 |
| abstract_inverted_index.improve | 35 |
| abstract_inverted_index.leading | 145 |
| abstract_inverted_index.optimal | 115 |
| abstract_inverted_index.present | 72 |
| abstract_inverted_index.private | 66 |
| abstract_inverted_index.public. | 180 |
| abstract_inverted_index.results | 153 |
| abstract_inverted_index.showing | 135 |
| abstract_inverted_index.studies | 28 |
| abstract_inverted_index.utilize | 4 |
| abstract_inverted_index.However, | 55 |
| abstract_inverted_index.applying | 84 |
| abstract_inverted_index.attached | 170 |
| abstract_inverted_index.converge | 112 |
| abstract_inverted_index.enforced | 94 |
| abstract_inverted_index.fullest, | 82 |
| abstract_inverted_index.improved | 53 |
| abstract_inverted_index.improves | 139 |
| abstract_inverted_index.language | 2 |
| abstract_inverted_index.parallel | 89, 118 |
| abstract_inverted_index.specific | 22 |
| abstract_inverted_index.training | 7 |
| abstract_inverted_index.accuracy. | 151 |
| abstract_inverted_index.comprised | 9 |
| abstract_inverted_index.datasets. | 67, 166 |
| abstract_inverted_index.depending | 120 |
| abstract_inverted_index.enhancing | 39 |
| abstract_inverted_index.expensive | 61 |
| abstract_inverted_index.extensive | 86, 130 |
| abstract_inverted_index.highlight | 154 |
| abstract_inverted_index.leverages | 76 |
| abstract_inverted_index.procedure | 8 |
| abstract_inverted_index.recently, | 26 |
| abstract_inverted_index.sometimes | 144 |
| abstract_inverted_index.training. | 105 |
| abstract_inverted_index.two-phase | 6 |
| abstract_inverted_index.unlabeled | 15 |
| abstract_inverted_index.adaptively | 99 |
| abstract_inverted_index.classifier | 90 |
| abstract_inverted_index.completely | 179 |
| abstract_inverted_index.especially | 162 |
| abstract_inverted_index.importance | 156 |
| abstract_inverted_index.leveraging | 51 |
| abstract_inverted_index.procedure, | 161 |
| abstract_inverted_index.relatively | 164 |
| abstract_inverted_index.robustness | 141 |
| abstract_inverted_index.supervised | 23 |
| abstract_inverted_index.eliminating | 100 |
| abstract_inverted_index.fine-tuning | 19, 78, 160 |
| abstract_inverted_index.orthogonal, | 97 |
| abstract_inverted_index.classifiers, | 119 |
| abstract_inverted_index.evaluations, | 134 |
| abstract_inverted_index.formulation. | 54 |
| abstract_inverted_index.pre-training | 13, 41, 57 |
| abstract_inverted_index.unsupervised | 12 |
| abstract_inverted_index.intra-dataset | 133 |
| abstract_inverted_index.smaller-sized | 165 |
| abstract_inverted_index.supplementary | 172 |
| abstract_inverted_index.computationally | 60 |
| abstract_inverted_index.hyperparameters | 48 |
| abstract_inverted_index.state-of-the-art | 1 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 2 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.5899999737739563 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |