Training Flow Matching Models with Reliable Labels via Self-Purification Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2509.19091
Training datasets are inherently imperfect, often containing mislabeled samples due to human annotation errors, limitations of tagging models, and other sources of noise. Such label contamination can significantly degrade the performance of a trained model. In this work, we introduce Self-Purifying Flow Matching (SPFM), a principled approach to filtering unreliable data within the flow-matching framework. SPFM identifies suspicious data using the model itself during the training process, bypassing the need for pretrained models or additional modules. Our experiments demonstrate that models trained with SPFM generate samples that accurately adhere to the specified conditioning, even when trained on noisy labels. Furthermore, we validate the robustness of SPFM on the TITW dataset, which consists of in-the-wild speech data, achieving performance that surpasses existing baselines.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2509.19091
- https://arxiv.org/pdf/2509.19091
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415251551
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415251551Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2509.19091Digital Object Identifier
- Title
-
Training Flow Matching Models with Reliable Labels via Self-PurificationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-09-23Full publication date if available
- Authors
-
Hyeongju Kim, Yechan Yu, J. I. Yi, Juheon LeeList of authors in order
- Landing page
-
https://arxiv.org/abs/2509.19091Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2509.19091Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2509.19091Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415251551 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2509.19091 |
| ids.doi | https://doi.org/10.48550/arxiv.2509.19091 |
| ids.openalex | https://openalex.org/W4415251551 |
| fwci | |
| type | preprint |
| title | Training Flow Matching Models with Reliable Labels via Self-Purification |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10791 |
| topics[0].field.id | https://openalex.org/fields/22 |
| topics[0].field.display_name | Engineering |
| topics[0].score | 0.9473999738693237 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2207 |
| topics[0].subfield.display_name | Control and Systems Engineering |
| topics[0].display_name | Advanced Control Systems Optimization |
| topics[1].id | https://openalex.org/T12761 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9344000220298767 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Data Stream Mining Techniques |
| topics[2].id | https://openalex.org/T11801 |
| topics[2].field.id | https://openalex.org/fields/22 |
| topics[2].field.display_name | Engineering |
| topics[2].score | 0.9010000228881836 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/2212 |
| topics[2].subfield.display_name | Ocean Engineering |
| topics[2].display_name | Reservoir Engineering and Simulation Methods |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2509.19091 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2509.19091 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2509.19091 |
| locations[1].id | doi:10.48550/arxiv.2509.19091 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2509.19091 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5047164073 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-8668-0323 |
| authorships[0].author.display_name | Hyeongju Kim |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Kim, Hyeongju |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5084586005 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Yechan Yu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yu, Yechan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5107828217 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-8380-1134 |
| authorships[2].author.display_name | J. I. Yi |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Yi, June Young |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5065510735 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-4516-8766 |
| authorships[3].author.display_name | Juheon Lee |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Lee, Juheon |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2509.19091 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-16T00:00:00 |
| display_name | Training Flow Matching Models with Reliable Labels via Self-Purification |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10791 |
| primary_topic.field.id | https://openalex.org/fields/22 |
| primary_topic.field.display_name | Engineering |
| primary_topic.score | 0.9473999738693237 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2207 |
| primary_topic.subfield.display_name | Control and Systems Engineering |
| primary_topic.display_name | Advanced Control Systems Optimization |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2509.19091 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2509.19091 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2509.19091 |
| primary_location.id | pmh:oai:arXiv.org:2509.19091 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2509.19091 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2509.19091 |
| publication_date | 2025-09-23 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 32, 44 |
| abstract_inverted_index.In | 35 |
| abstract_inverted_index.of | 15, 21, 31, 104, 112 |
| abstract_inverted_index.on | 96, 106 |
| abstract_inverted_index.or | 73 |
| abstract_inverted_index.to | 10, 47, 89 |
| abstract_inverted_index.we | 38, 100 |
| abstract_inverted_index.Our | 76 |
| abstract_inverted_index.and | 18 |
| abstract_inverted_index.are | 2 |
| abstract_inverted_index.can | 26 |
| abstract_inverted_index.due | 9 |
| abstract_inverted_index.for | 70 |
| abstract_inverted_index.the | 29, 52, 60, 64, 68, 90, 102, 107 |
| abstract_inverted_index.Flow | 41 |
| abstract_inverted_index.SPFM | 55, 83, 105 |
| abstract_inverted_index.Such | 23 |
| abstract_inverted_index.TITW | 108 |
| abstract_inverted_index.data | 50, 58 |
| abstract_inverted_index.even | 93 |
| abstract_inverted_index.need | 69 |
| abstract_inverted_index.that | 79, 86, 118 |
| abstract_inverted_index.this | 36 |
| abstract_inverted_index.when | 94 |
| abstract_inverted_index.with | 82 |
| abstract_inverted_index.data, | 115 |
| abstract_inverted_index.human | 11 |
| abstract_inverted_index.label | 24 |
| abstract_inverted_index.model | 61 |
| abstract_inverted_index.noisy | 97 |
| abstract_inverted_index.often | 5 |
| abstract_inverted_index.other | 19 |
| abstract_inverted_index.using | 59 |
| abstract_inverted_index.which | 110 |
| abstract_inverted_index.work, | 37 |
| abstract_inverted_index.adhere | 88 |
| abstract_inverted_index.during | 63 |
| abstract_inverted_index.itself | 62 |
| abstract_inverted_index.model. | 34 |
| abstract_inverted_index.models | 72, 80 |
| abstract_inverted_index.noise. | 22 |
| abstract_inverted_index.speech | 114 |
| abstract_inverted_index.within | 51 |
| abstract_inverted_index.(SPFM), | 43 |
| abstract_inverted_index.degrade | 28 |
| abstract_inverted_index.errors, | 13 |
| abstract_inverted_index.labels. | 98 |
| abstract_inverted_index.models, | 17 |
| abstract_inverted_index.samples | 8, 85 |
| abstract_inverted_index.sources | 20 |
| abstract_inverted_index.tagging | 16 |
| abstract_inverted_index.trained | 33, 81, 95 |
| abstract_inverted_index.Matching | 42 |
| abstract_inverted_index.Training | 0 |
| abstract_inverted_index.approach | 46 |
| abstract_inverted_index.consists | 111 |
| abstract_inverted_index.dataset, | 109 |
| abstract_inverted_index.datasets | 1 |
| abstract_inverted_index.existing | 120 |
| abstract_inverted_index.generate | 84 |
| abstract_inverted_index.modules. | 75 |
| abstract_inverted_index.process, | 66 |
| abstract_inverted_index.training | 65 |
| abstract_inverted_index.validate | 101 |
| abstract_inverted_index.achieving | 116 |
| abstract_inverted_index.bypassing | 67 |
| abstract_inverted_index.filtering | 48 |
| abstract_inverted_index.introduce | 39 |
| abstract_inverted_index.specified | 91 |
| abstract_inverted_index.surpasses | 119 |
| abstract_inverted_index.accurately | 87 |
| abstract_inverted_index.additional | 74 |
| abstract_inverted_index.annotation | 12 |
| abstract_inverted_index.baselines. | 121 |
| abstract_inverted_index.containing | 6 |
| abstract_inverted_index.framework. | 54 |
| abstract_inverted_index.identifies | 56 |
| abstract_inverted_index.imperfect, | 4 |
| abstract_inverted_index.inherently | 3 |
| abstract_inverted_index.mislabeled | 7 |
| abstract_inverted_index.pretrained | 71 |
| abstract_inverted_index.principled | 45 |
| abstract_inverted_index.robustness | 103 |
| abstract_inverted_index.suspicious | 57 |
| abstract_inverted_index.unreliable | 49 |
| abstract_inverted_index.demonstrate | 78 |
| abstract_inverted_index.experiments | 77 |
| abstract_inverted_index.in-the-wild | 113 |
| abstract_inverted_index.limitations | 14 |
| abstract_inverted_index.performance | 30, 117 |
| abstract_inverted_index.Furthermore, | 99 |
| abstract_inverted_index.conditioning, | 92 |
| abstract_inverted_index.contamination | 25 |
| abstract_inverted_index.flow-matching | 53 |
| abstract_inverted_index.significantly | 27 |
| abstract_inverted_index.Self-Purifying | 40 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |