Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2503.16188
This paper investigates the role of explicit thinking process in rule-based reinforcement fine-tuning (RFT) for MLLMs. We first propose CLS-RL for MLLM image classification, using verifiable rewards for fine-tuning. Experiments show CLS-RL significantly outperforms SFT and yields a cross-dataset generalization effect. We then rethink and question whether explicit thinking in RFT is always necessary. Challenging the convention that explicit thinking is crucial for the success of RFT, we introduce No-Thinking-RL, exploring RFT without thinking by introducing a simple equality accuracy reward. We evaluate No-Thinking-RL on 6 diverse tasks across different model sizes and types. Experimental results reveal three key findings: 1). Visual perception tasks do not require thinking during RFT, as No-Thinking-RL consistently outperforms or matches Thinking-based RFT across model sizes. 2).} Models with limited capabilities struggle to generate high-quality CoT for RFT, making Thinking-based RFT less effective than No-Thinking-RL. 3). There are inconsistencies between the answers in the thinking and answer tags for some responses of thinking-based RFT, which show lower accuracy than the overall accuracy. We hypothesize that explicit thinking before verifiable answers may hinder reward convergence and reduce performance. To test this hypothesis, we propose Think-After-Answer, which places thinking after the answer to mitigate this effect for experimental verification. Lastly, we conduct a pilot study to explore whether MLLMs can learn when to think during RFT, introducing an Adaptive-Thinking method. Experiments show that it converges to a specific prompt depending on model capability and task complexity, achieving comparable or better performance than both Thinking and No-Thinking-RL. This suggests MLLMs can adaptively decide to think or not based on their capabilities and task complexity.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2503.16188
- https://arxiv.org/pdf/2503.16188
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4414688615
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4414688615Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2503.16188Digital Object Identifier
- Title
-
Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-TuningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-03-20Full publication date if available
- Authors
-
Hongran Li, Jike Zhong, Shitian Zhao, Yuxiang Lai, Haoquan Zhang, Wenpeng Zhu, Kaipeng ZhangList of authors in order
- Landing page
-
https://arxiv.org/abs/2503.16188Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2503.16188Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2503.16188Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4414688615 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2503.16188 |
| ids.doi | https://doi.org/10.48550/arxiv.2503.16188 |
| ids.openalex | https://openalex.org/W4414688615 |
| fwci | |
| type | preprint |
| title | Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10914 |
| topics[0].field.id | https://openalex.org/fields/28 |
| topics[0].field.display_name | Neuroscience |
| topics[0].score | 0.8443999886512756 |
| topics[0].domain.id | https://openalex.org/domains/1 |
| topics[0].domain.display_name | Life Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2805 |
| topics[0].subfield.display_name | Cognitive Neuroscience |
| topics[0].display_name | Tactile and Sensory Interactions |
| topics[1].id | https://openalex.org/T10672 |
| topics[1].field.id | https://openalex.org/fields/22 |
| topics[1].field.display_name | Engineering |
| topics[1].score | 0.7904000282287598 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/2210 |
| topics[1].subfield.display_name | Mechanical Engineering |
| topics[1].display_name | Design Education and Practice |
| topics[2].id | https://openalex.org/T11729 |
| topics[2].field.id | https://openalex.org/fields/14 |
| topics[2].field.display_name | Business, Management and Accounting |
| topics[2].score | 0.7286999821662903 |
| topics[2].domain.id | https://openalex.org/domains/2 |
| topics[2].domain.display_name | Social Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1405 |
| topics[2].subfield.display_name | Management of Technology and Innovation |
| topics[2].display_name | Product Development and Customization |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2503.16188 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2503.16188 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2503.16188 |
| locations[1].id | doi:10.48550/arxiv.2503.16188 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2503.16188 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101670302 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-7437-7359 |
| authorships[0].author.display_name | Hongran Li |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Li, Ming |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5088354215 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Jike Zhong |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Zhong, Jike |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5102591920 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Shitian Zhao |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zhao, Shitian |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5111304741 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Yuxiang Lai |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Lai, Yuxiang |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5102000438 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-7023-138X |
| authorships[4].author.display_name | Haoquan Zhang |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Zhang, Haoquan |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5042939623 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-7417-3535 |
| authorships[5].author.display_name | Wenpeng Zhu |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Zhu, Wang Bill |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5036606244 |
| authorships[6].author.orcid | https://orcid.org/0000-0001-6105-6532 |
| authorships[6].author.display_name | Kaipeng Zhang |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Zhang, Kaipeng |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2503.16188 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10914 |
| primary_topic.field.id | https://openalex.org/fields/28 |
| primary_topic.field.display_name | Neuroscience |
| primary_topic.score | 0.8443999886512756 |
| primary_topic.domain.id | https://openalex.org/domains/1 |
| primary_topic.domain.display_name | Life Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2805 |
| primary_topic.subfield.display_name | Cognitive Neuroscience |
| primary_topic.display_name | Tactile and Sensory Interactions |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2503.16188 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2503.16188 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2503.16188 |
| primary_location.id | pmh:oai:arXiv.org:2503.16188 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2503.16188 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2503.16188 |
| publication_date | 2025-03-20 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.6 | 85 |
| abstract_inverted_index.a | 37, 76, 205, 229 |
| abstract_inverted_index.To | 182 |
| abstract_inverted_index.We | 16, 41, 81, 167 |
| abstract_inverted_index.an | 220 |
| abstract_inverted_index.as | 110 |
| abstract_inverted_index.by | 74 |
| abstract_inverted_index.do | 104 |
| abstract_inverted_index.in | 9, 49, 147 |
| abstract_inverted_index.is | 51, 60 |
| abstract_inverted_index.it | 226 |
| abstract_inverted_index.of | 5, 65, 156 |
| abstract_inverted_index.on | 84, 233, 260 |
| abstract_inverted_index.or | 114, 241, 257 |
| abstract_inverted_index.to | 127, 195, 208, 215, 228, 255 |
| abstract_inverted_index.we | 67, 186, 203 |
| abstract_inverted_index.1). | 100 |
| abstract_inverted_index.3). | 140 |
| abstract_inverted_index.CoT | 130 |
| abstract_inverted_index.RFT | 50, 71, 117, 135 |
| abstract_inverted_index.SFT | 34 |
| abstract_inverted_index.and | 35, 44, 92, 150, 179, 236, 247, 263 |
| abstract_inverted_index.are | 142 |
| abstract_inverted_index.can | 212, 252 |
| abstract_inverted_index.for | 14, 20, 27, 62, 131, 153, 199 |
| abstract_inverted_index.key | 98 |
| abstract_inverted_index.may | 175 |
| abstract_inverted_index.not | 105, 258 |
| abstract_inverted_index.the | 3, 55, 63, 145, 148, 164, 193 |
| abstract_inverted_index.2).} | 121 |
| abstract_inverted_index.MLLM | 21 |
| abstract_inverted_index.RFT, | 66, 109, 132, 158, 218 |
| abstract_inverted_index.This | 0, 249 |
| abstract_inverted_index.both | 245 |
| abstract_inverted_index.less | 136 |
| abstract_inverted_index.role | 4 |
| abstract_inverted_index.show | 30, 160, 224 |
| abstract_inverted_index.some | 154 |
| abstract_inverted_index.tags | 152 |
| abstract_inverted_index.task | 237, 264 |
| abstract_inverted_index.test | 183 |
| abstract_inverted_index.than | 138, 163, 244 |
| abstract_inverted_index.that | 57, 169, 225 |
| abstract_inverted_index.then | 42 |
| abstract_inverted_index.this | 184, 197 |
| abstract_inverted_index.when | 214 |
| abstract_inverted_index.with | 123 |
| abstract_inverted_index.(RFT) | 13 |
| abstract_inverted_index.MLLMs | 211, 251 |
| abstract_inverted_index.There | 141 |
| abstract_inverted_index.after | 192 |
| abstract_inverted_index.based | 259 |
| abstract_inverted_index.first | 17 |
| abstract_inverted_index.image | 22 |
| abstract_inverted_index.learn | 213 |
| abstract_inverted_index.lower | 161 |
| abstract_inverted_index.model | 90, 119, 234 |
| abstract_inverted_index.paper | 1 |
| abstract_inverted_index.pilot | 206 |
| abstract_inverted_index.sizes | 91 |
| abstract_inverted_index.study | 207 |
| abstract_inverted_index.tasks | 87, 103 |
| abstract_inverted_index.their | 261 |
| abstract_inverted_index.think | 216, 256 |
| abstract_inverted_index.three | 97 |
| abstract_inverted_index.using | 24 |
| abstract_inverted_index.which | 159, 189 |
| abstract_inverted_index.CLS-RL | 19, 31 |
| abstract_inverted_index.MLLMs. | 15 |
| abstract_inverted_index.Models | 122 |
| abstract_inverted_index.Visual | 101 |
| abstract_inverted_index.across | 88, 118 |
| abstract_inverted_index.always | 52 |
| abstract_inverted_index.answer | 151, 194 |
| abstract_inverted_index.before | 172 |
| abstract_inverted_index.better | 242 |
| abstract_inverted_index.decide | 254 |
| abstract_inverted_index.during | 108, 217 |
| abstract_inverted_index.effect | 198 |
| abstract_inverted_index.hinder | 176 |
| abstract_inverted_index.making | 133 |
| abstract_inverted_index.places | 190 |
| abstract_inverted_index.prompt | 231 |
| abstract_inverted_index.reduce | 180 |
| abstract_inverted_index.reveal | 96 |
| abstract_inverted_index.reward | 177 |
| abstract_inverted_index.simple | 77 |
| abstract_inverted_index.sizes. | 120 |
| abstract_inverted_index.types. | 93 |
| abstract_inverted_index.yields | 36 |
| abstract_inverted_index.Lastly, | 202 |
| abstract_inverted_index.answers | 146, 174 |
| abstract_inverted_index.between | 144 |
| abstract_inverted_index.conduct | 204 |
| abstract_inverted_index.crucial | 61 |
| abstract_inverted_index.diverse | 86 |
| abstract_inverted_index.effect. | 40 |
| abstract_inverted_index.explore | 209 |
| abstract_inverted_index.limited | 124 |
| abstract_inverted_index.matches | 115 |
| abstract_inverted_index.method. | 222 |
| abstract_inverted_index.overall | 165 |
| abstract_inverted_index.process | 8 |
| abstract_inverted_index.propose | 18, 187 |
| abstract_inverted_index.require | 106 |
| abstract_inverted_index.results | 95 |
| abstract_inverted_index.rethink | 43 |
| abstract_inverted_index.reward. | 80 |
| abstract_inverted_index.rewards | 26 |
| abstract_inverted_index.success | 64 |
| abstract_inverted_index.whether | 46, 210 |
| abstract_inverted_index.without | 72 |
| abstract_inverted_index.Thinking | 246 |
| abstract_inverted_index.accuracy | 79, 162 |
| abstract_inverted_index.equality | 78 |
| abstract_inverted_index.evaluate | 82 |
| abstract_inverted_index.explicit | 6, 47, 58, 170 |
| abstract_inverted_index.generate | 128 |
| abstract_inverted_index.mitigate | 196 |
| abstract_inverted_index.question | 45 |
| abstract_inverted_index.specific | 230 |
| abstract_inverted_index.struggle | 126 |
| abstract_inverted_index.suggests | 250 |
| abstract_inverted_index.thinking | 7, 48, 59, 73, 107, 149, 171, 191 |
| abstract_inverted_index.accuracy. | 166 |
| abstract_inverted_index.achieving | 239 |
| abstract_inverted_index.converges | 227 |
| abstract_inverted_index.depending | 232 |
| abstract_inverted_index.different | 89 |
| abstract_inverted_index.effective | 137 |
| abstract_inverted_index.exploring | 70 |
| abstract_inverted_index.findings: | 99 |
| abstract_inverted_index.introduce | 68 |
| abstract_inverted_index.responses | 155 |
| abstract_inverted_index.adaptively | 253 |
| abstract_inverted_index.capability | 235 |
| abstract_inverted_index.comparable | 240 |
| abstract_inverted_index.convention | 56 |
| abstract_inverted_index.necessary. | 53 |
| abstract_inverted_index.perception | 102 |
| abstract_inverted_index.rule-based | 10 |
| abstract_inverted_index.verifiable | 25, 173 |
| abstract_inverted_index.Challenging | 54 |
| abstract_inverted_index.Experiments | 29, 223 |
| abstract_inverted_index.complexity, | 238 |
| abstract_inverted_index.complexity. | 265 |
| abstract_inverted_index.convergence | 178 |
| abstract_inverted_index.fine-tuning | 12 |
| abstract_inverted_index.hypothesis, | 185 |
| abstract_inverted_index.hypothesize | 168 |
| abstract_inverted_index.introducing | 75, 219 |
| abstract_inverted_index.outperforms | 33, 113 |
| abstract_inverted_index.performance | 243 |
| abstract_inverted_index.Experimental | 94 |
| abstract_inverted_index.capabilities | 125, 262 |
| abstract_inverted_index.consistently | 112 |
| abstract_inverted_index.experimental | 200 |
| abstract_inverted_index.fine-tuning. | 28 |
| abstract_inverted_index.high-quality | 129 |
| abstract_inverted_index.investigates | 2 |
| abstract_inverted_index.performance. | 181 |
| abstract_inverted_index.cross-dataset | 38 |
| abstract_inverted_index.reinforcement | 11 |
| abstract_inverted_index.significantly | 32 |
| abstract_inverted_index.verification. | 201 |
| abstract_inverted_index.No-Thinking-RL | 83, 111 |
| abstract_inverted_index.Thinking-based | 116, 134 |
| abstract_inverted_index.generalization | 39 |
| abstract_inverted_index.thinking-based | 157 |
| abstract_inverted_index.No-Thinking-RL, | 69 |
| abstract_inverted_index.No-Thinking-RL. | 139, 248 |
| abstract_inverted_index.classification, | 23 |
| abstract_inverted_index.inconsistencies | 143 |
| abstract_inverted_index.Adaptive-Thinking | 221 |
| abstract_inverted_index.Think-After-Answer, | 188 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |