Hardware Software Optimizations for Fast Model Recovery on Reconfigurable Architectures Article Swipe
Model Recovery (MR) is a core primitive for physical AI and real-time digital twins, but GPUs often execute MR inefficiently due to iterative dependencies, kernel-launch overheads, underutilized memory bandwidth, and high data-movement latency. We present MERINDA, an FPGA-accelerated MR framework that restructures computation as a streaming dataflow pipeline. MERINDA exploits on-chip locality through BRAM tiling, fixed-point kernels, and the concurrent use of LUT fabric and carry-chain adders to expose fine-grained spatial parallelism while minimizing off-chip traffic. This hardware-aware formulation removes synchronization bottlenecks and sustains high throughput across the iterative updates in MR. On representative MR workloads, MERINDA delivers up to 6.3x fewer cycles than an FPGA-based LTC baseline, enabling real-time performance for time-critical physical systems.
Related Topics
- Type
- article
- Landing Page
- http://arxiv.org/abs/2512.06113
- https://arxiv.org/pdf/2512.06113
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W7113915522
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W7113915522Canonical identifier for this work in OpenAlex
- Title
-
Hardware Software Optimizations for Fast Model Recovery on Reconfigurable ArchitecturesWork title
- Type
-
articleOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-12-05Full publication date if available
- Authors
-
Xu, Bin, Banerjee, Ayan, Gupta, SandeepList of authors in order
- Landing page
-
https://arxiv.org/abs/2512.06113Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2512.06113Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2512.06113Direct OA link when available
- Concepts
-
Computer science, Dataflow, Parallel computing, Locality, Synchronization (alternating current), Exploit, Computation, Throughput, Software, Multi-core processor, Adder, Embedded system, Lookup table, Computer architecture, Computer hardware, Field-programmable gate array, Parallelism (grammar), Programming paradigm, Execution model, Concurrency, Graphics hardware, Model of computation, Digital signal processing, Logic synthesisTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W7113915522 |
|---|---|
| doi | |
| ids.openalex | https://openalex.org/W7113915522 |
| fwci | 0.0 |
| type | article |
| title | Hardware Software Optimizations for Fast Model Recovery on Reconfigurable Architectures |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10904 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.8514424562454224 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1708 |
| topics[0].subfield.display_name | Hardware and Architecture |
| topics[0].display_name | Embedded Systems Design Techniques |
| topics[1].id | https://openalex.org/T10054 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.023244475945830345 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1708 |
| topics[1].subfield.display_name | Hardware and Architecture |
| topics[1].display_name | Parallel Computing and Optimization Techniques |
| topics[2].id | https://openalex.org/T11206 |
| topics[2].field.id | https://openalex.org/fields/31 |
| topics[2].field.display_name | Physics and Astronomy |
| topics[2].score | 0.014133771881461143 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/3109 |
| topics[2].subfield.display_name | Statistical and Nonlinear Physics |
| topics[2].display_name | Model Reduction and Neural Networks |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8283714652061462 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C96324660 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7142021059989929 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q205446 |
| concepts[1].display_name | Dataflow |
| concepts[2].id | https://openalex.org/C173608175 |
| concepts[2].level | 1 |
| concepts[2].score | 0.6688811182975769 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q232661 |
| concepts[2].display_name | Parallel computing |
| concepts[3].id | https://openalex.org/C2779808786 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5946097373962402 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q6664603 |
| concepts[3].display_name | Locality |
| concepts[4].id | https://openalex.org/C2778562939 |
| concepts[4].level | 3 |
| concepts[4].score | 0.5033213496208191 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1298791 |
| concepts[4].display_name | Synchronization (alternating current) |
| concepts[5].id | https://openalex.org/C165696696 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5032302737236023 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11287 |
| concepts[5].display_name | Exploit |
| concepts[6].id | https://openalex.org/C45374587 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4788864850997925 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q12525525 |
| concepts[6].display_name | Computation |
| concepts[7].id | https://openalex.org/C157764524 |
| concepts[7].level | 3 |
| concepts[7].score | 0.47781914472579956 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1383412 |
| concepts[7].display_name | Throughput |
| concepts[8].id | https://openalex.org/C2777904410 |
| concepts[8].level | 2 |
| concepts[8].score | 0.475863516330719 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7397 |
| concepts[8].display_name | Software |
| concepts[9].id | https://openalex.org/C78766204 |
| concepts[9].level | 2 |
| concepts[9].score | 0.4198281764984131 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q555032 |
| concepts[9].display_name | Multi-core processor |
| concepts[10].id | https://openalex.org/C164620267 |
| concepts[10].level | 3 |
| concepts[10].score | 0.4127441346645355 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q376953 |
| concepts[10].display_name | Adder |
| concepts[11].id | https://openalex.org/C149635348 |
| concepts[11].level | 1 |
| concepts[11].score | 0.4122576415538788 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q193040 |
| concepts[11].display_name | Embedded system |
| concepts[12].id | https://openalex.org/C134835016 |
| concepts[12].level | 2 |
| concepts[12].score | 0.3996036648750305 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q690265 |
| concepts[12].display_name | Lookup table |
| concepts[13].id | https://openalex.org/C118524514 |
| concepts[13].level | 1 |
| concepts[13].score | 0.3989408612251282 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q173212 |
| concepts[13].display_name | Computer architecture |
| concepts[14].id | https://openalex.org/C9390403 |
| concepts[14].level | 1 |
| concepts[14].score | 0.37609684467315674 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q3966 |
| concepts[14].display_name | Computer hardware |
| concepts[15].id | https://openalex.org/C42935608 |
| concepts[15].level | 2 |
| concepts[15].score | 0.33475446701049805 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q190411 |
| concepts[15].display_name | Field-programmable gate array |
| concepts[16].id | https://openalex.org/C2781172179 |
| concepts[16].level | 2 |
| concepts[16].score | 0.3346168100833893 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q853109 |
| concepts[16].display_name | Parallelism (grammar) |
| concepts[17].id | https://openalex.org/C34165917 |
| concepts[17].level | 2 |
| concepts[17].score | 0.2829362750053406 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q188267 |
| concepts[17].display_name | Programming paradigm |
| concepts[18].id | https://openalex.org/C2776834041 |
| concepts[18].level | 2 |
| concepts[18].score | 0.28153693675994873 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q25346349 |
| concepts[18].display_name | Execution model |
| concepts[19].id | https://openalex.org/C193702766 |
| concepts[19].level | 2 |
| concepts[19].score | 0.27597054839134216 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q1414548 |
| concepts[19].display_name | Concurrency |
| concepts[20].id | https://openalex.org/C18945957 |
| concepts[20].level | 3 |
| concepts[20].score | 0.27533286809921265 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q5597193 |
| concepts[20].display_name | Graphics hardware |
| concepts[21].id | https://openalex.org/C184596265 |
| concepts[21].level | 3 |
| concepts[21].score | 0.2696970999240875 |
| concepts[21].wikidata | https://www.wikidata.org/wiki/Q2651576 |
| concepts[21].display_name | Model of computation |
| concepts[22].id | https://openalex.org/C84462506 |
| concepts[22].level | 2 |
| concepts[22].score | 0.2673843801021576 |
| concepts[22].wikidata | https://www.wikidata.org/wiki/Q173142 |
| concepts[22].display_name | Digital signal processing |
| concepts[23].id | https://openalex.org/C157922185 |
| concepts[23].level | 3 |
| concepts[23].score | 0.2659834623336792 |
| concepts[23].wikidata | https://www.wikidata.org/wiki/Q173198 |
| concepts[23].display_name | Logic synthesis |
| keywords[0].id | https://openalex.org/keywords/dataflow |
| keywords[0].score | 0.7142021059989929 |
| keywords[0].display_name | Dataflow |
| keywords[1].id | https://openalex.org/keywords/locality |
| keywords[1].score | 0.5946097373962402 |
| keywords[1].display_name | Locality |
| keywords[2].id | https://openalex.org/keywords/synchronization |
| keywords[2].score | 0.5033213496208191 |
| keywords[2].display_name | Synchronization (alternating current) |
| keywords[3].id | https://openalex.org/keywords/exploit |
| keywords[3].score | 0.5032302737236023 |
| keywords[3].display_name | Exploit |
| keywords[4].id | https://openalex.org/keywords/computation |
| keywords[4].score | 0.4788864850997925 |
| keywords[4].display_name | Computation |
| keywords[5].id | https://openalex.org/keywords/throughput |
| keywords[5].score | 0.47781914472579956 |
| keywords[5].display_name | Throughput |
| keywords[6].id | https://openalex.org/keywords/software |
| keywords[6].score | 0.475863516330719 |
| keywords[6].display_name | Software |
| keywords[7].id | https://openalex.org/keywords/multi-core-processor |
| keywords[7].score | 0.4198281764984131 |
| keywords[7].display_name | Multi-core processor |
| language | |
| locations[0].id | pmh:oai:arXiv.org:2512.06113 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2512.06113 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2512.06113 |
| indexed_in | arxiv |
| authorships[0].author.id | |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Xu, Bin |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Xu, Bin |
| authorships[0].is_corresponding | True |
| authorships[1].author.id | |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Banerjee, Ayan |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Banerjee, Ayan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Gupta, Sandeep |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Gupta, Sandeep |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2512.06113 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-12-11T00:00:00 |
| display_name | Hardware Software Optimizations for Fast Model Recovery on Reconfigurable Architectures |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-12-11T00:24:52.286860 |
| primary_topic.id | https://openalex.org/T10904 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.8514424562454224 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1708 |
| primary_topic.subfield.display_name | Hardware and Architecture |
| primary_topic.display_name | Embedded Systems Design Techniques |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | pmh:oai:arXiv.org:2512.06113 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2512.06113 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2512.06113 |
| primary_location.id | pmh:oai:arXiv.org:2512.06113 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2512.06113 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2512.06113 |
| publication_date | 2025-12-05 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 4, 44 |
| abstract_inverted_index.AI | 9 |
| abstract_inverted_index.MR | 18, 38, 94 |
| abstract_inverted_index.On | 92 |
| abstract_inverted_index.We | 33 |
| abstract_inverted_index.an | 36, 104 |
| abstract_inverted_index.as | 43 |
| abstract_inverted_index.in | 90 |
| abstract_inverted_index.is | 3 |
| abstract_inverted_index.of | 61 |
| abstract_inverted_index.to | 21, 67, 99 |
| abstract_inverted_index.up | 98 |
| abstract_inverted_index.LTC | 106 |
| abstract_inverted_index.LUT | 62 |
| abstract_inverted_index.MR. | 91 |
| abstract_inverted_index.and | 10, 29, 57, 64, 82 |
| abstract_inverted_index.but | 14 |
| abstract_inverted_index.due | 20 |
| abstract_inverted_index.for | 7, 111 |
| abstract_inverted_index.the | 58, 87 |
| abstract_inverted_index.use | 60 |
| abstract_inverted_index.(MR) | 2 |
| abstract_inverted_index.6.3x | 100 |
| abstract_inverted_index.BRAM | 53 |
| abstract_inverted_index.GPUs | 15 |
| abstract_inverted_index.This | 76 |
| abstract_inverted_index.core | 5 |
| abstract_inverted_index.high | 30, 84 |
| abstract_inverted_index.than | 103 |
| abstract_inverted_index.that | 40 |
| abstract_inverted_index.Model | 0 |
| abstract_inverted_index.fewer | 101 |
| abstract_inverted_index.often | 16 |
| abstract_inverted_index.while | 72 |
| abstract_inverted_index.across | 86 |
| abstract_inverted_index.adders | 66 |
| abstract_inverted_index.cycles | 102 |
| abstract_inverted_index.expose | 68 |
| abstract_inverted_index.fabric | 63 |
| abstract_inverted_index.memory | 27 |
| abstract_inverted_index.twins, | 13 |
| abstract_inverted_index.MERINDA | 48, 96 |
| abstract_inverted_index.digital | 12 |
| abstract_inverted_index.execute | 17 |
| abstract_inverted_index.on-chip | 50 |
| abstract_inverted_index.present | 34 |
| abstract_inverted_index.removes | 79 |
| abstract_inverted_index.spatial | 70 |
| abstract_inverted_index.through | 52 |
| abstract_inverted_index.tiling, | 54 |
| abstract_inverted_index.updates | 89 |
| abstract_inverted_index.MERINDA, | 35 |
| abstract_inverted_index.Recovery | 1 |
| abstract_inverted_index.dataflow | 46 |
| abstract_inverted_index.delivers | 97 |
| abstract_inverted_index.enabling | 108 |
| abstract_inverted_index.exploits | 49 |
| abstract_inverted_index.kernels, | 56 |
| abstract_inverted_index.latency. | 32 |
| abstract_inverted_index.locality | 51 |
| abstract_inverted_index.off-chip | 74 |
| abstract_inverted_index.physical | 8, 113 |
| abstract_inverted_index.sustains | 83 |
| abstract_inverted_index.systems. | 114 |
| abstract_inverted_index.traffic. | 75 |
| abstract_inverted_index.baseline, | 107 |
| abstract_inverted_index.framework | 39 |
| abstract_inverted_index.iterative | 22, 88 |
| abstract_inverted_index.pipeline. | 47 |
| abstract_inverted_index.primitive | 6 |
| abstract_inverted_index.real-time | 11, 109 |
| abstract_inverted_index.streaming | 45 |
| abstract_inverted_index.FPGA-based | 105 |
| abstract_inverted_index.bandwidth, | 28 |
| abstract_inverted_index.concurrent | 59 |
| abstract_inverted_index.minimizing | 73 |
| abstract_inverted_index.overheads, | 25 |
| abstract_inverted_index.throughput | 85 |
| abstract_inverted_index.workloads, | 95 |
| abstract_inverted_index.bottlenecks | 81 |
| abstract_inverted_index.carry-chain | 65 |
| abstract_inverted_index.computation | 42 |
| abstract_inverted_index.fixed-point | 55 |
| abstract_inverted_index.formulation | 78 |
| abstract_inverted_index.parallelism | 71 |
| abstract_inverted_index.performance | 110 |
| abstract_inverted_index.fine-grained | 69 |
| abstract_inverted_index.restructures | 41 |
| abstract_inverted_index.data-movement | 31 |
| abstract_inverted_index.dependencies, | 23 |
| abstract_inverted_index.inefficiently | 19 |
| abstract_inverted_index.kernel-launch | 24 |
| abstract_inverted_index.time-critical | 112 |
| abstract_inverted_index.underutilized | 26 |
| abstract_inverted_index.hardware-aware | 77 |
| abstract_inverted_index.representative | 93 |
| abstract_inverted_index.synchronization | 80 |
| abstract_inverted_index.FPGA-accelerated | 37 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile.value | 0.85964662 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |