An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2411.18002
With the rapid advancements in deep learning, computer vision tasks have seen significant improvements, making two-stream neural networks a popular focus for video based action recognition. Traditional models using RGB and optical flow streams achieve strong performance but at a high computational cost. To address this, we introduce a representation flow algorithm to replace the optical flow branch in the egocentric action recognition model, enabling end-to-end training while reducing computational cost and prediction time. Our model, designed for egocentric action recognition, uses class activation maps (CAMs) to improve accuracy and ConvLSTM for spatio temporal encoding with spatial attention. When evaluated on the GTEA61, EGTEA GAZE+, and HMDB datasets, our model matches the accuracy of the original model on GTEA61 and exceeds it by 0.65% and 0.84% on EGTEA GAZE+ and HMDB, respectively. Prediction runtimes are significantly reduced to 0.1881s, 0.1503s, and 0.1459s, compared to the original model's 101.6795s, 25.3799s, and 203.9958s. Ablation studies were also conducted to study the impact of different parameters on model performance. Keywords: two-stream, egocentric, action recognition, CAM, representation flow, CAM, ConvLSTM
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2411.18002
- https://arxiv.org/pdf/2411.18002
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4404990374
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4404990374Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2411.18002Digital Object Identifier
- Title
-
An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action RecognitionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-11-27Full publication date if available
- Authors
-
Steven Y. J. Lai, Tsun-Hin Cheung, Kin Yip Fung, Tianshan Liu, Kin‐Man LamList of authors in order
- Landing page
-
https://arxiv.org/abs/2411.18002Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2411.18002Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2411.18002Direct OA link when available
- Concepts
-
End-to-end principle, Flow (mathematics), Computer science, Representation (politics), Action (physics), Artificial intelligence, Mathematics, Geometry, Physics, Law, Quantum mechanics, Politics, Political scienceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4404990374 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2411.18002 |
| ids.doi | https://doi.org/10.48550/arxiv.2411.18002 |
| ids.openalex | https://openalex.org/W4404990374 |
| fwci | |
| type | preprint |
| title | An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10812 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9933000206947327 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Human Pose and Action Recognition |
| topics[1].id | https://openalex.org/T11512 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9850999712944031 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Anomaly Detection Techniques and Applications |
| topics[2].id | https://openalex.org/T10331 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9753000140190125 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Video Surveillance and Tracking Methods |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C74296488 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7189487218856812 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q2527392 |
| concepts[0].display_name | End-to-end principle |
| concepts[1].id | https://openalex.org/C38349280 |
| concepts[1].level | 2 |
| concepts[1].score | 0.614315390586853 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q1434290 |
| concepts[1].display_name | Flow (mathematics) |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.5901092290878296 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C2776359362 |
| concepts[3].level | 3 |
| concepts[3].score | 0.49629050493240356 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q2145286 |
| concepts[3].display_name | Representation (politics) |
| concepts[4].id | https://openalex.org/C2780791683 |
| concepts[4].level | 2 |
| concepts[4].score | 0.4623793959617615 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q846785 |
| concepts[4].display_name | Action (physics) |
| concepts[5].id | https://openalex.org/C154945302 |
| concepts[5].level | 1 |
| concepts[5].score | 0.44296780228614807 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[5].display_name | Artificial intelligence |
| concepts[6].id | https://openalex.org/C33923547 |
| concepts[6].level | 0 |
| concepts[6].score | 0.18790331482887268 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[6].display_name | Mathematics |
| concepts[7].id | https://openalex.org/C2524010 |
| concepts[7].level | 1 |
| concepts[7].score | 0.07067370414733887 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q8087 |
| concepts[7].display_name | Geometry |
| concepts[8].id | https://openalex.org/C121332964 |
| concepts[8].level | 0 |
| concepts[8].score | 0.0560021698474884 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[8].display_name | Physics |
| concepts[9].id | https://openalex.org/C199539241 |
| concepts[9].level | 1 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q7748 |
| concepts[9].display_name | Law |
| concepts[10].id | https://openalex.org/C62520636 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[10].display_name | Quantum mechanics |
| concepts[11].id | https://openalex.org/C94625758 |
| concepts[11].level | 2 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q7163 |
| concepts[11].display_name | Politics |
| concepts[12].id | https://openalex.org/C17744445 |
| concepts[12].level | 0 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q36442 |
| concepts[12].display_name | Political science |
| keywords[0].id | https://openalex.org/keywords/end-to-end-principle |
| keywords[0].score | 0.7189487218856812 |
| keywords[0].display_name | End-to-end principle |
| keywords[1].id | https://openalex.org/keywords/flow |
| keywords[1].score | 0.614315390586853 |
| keywords[1].display_name | Flow (mathematics) |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.5901092290878296 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/representation |
| keywords[3].score | 0.49629050493240356 |
| keywords[3].display_name | Representation (politics) |
| keywords[4].id | https://openalex.org/keywords/action |
| keywords[4].score | 0.4623793959617615 |
| keywords[4].display_name | Action (physics) |
| keywords[5].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[5].score | 0.44296780228614807 |
| keywords[5].display_name | Artificial intelligence |
| keywords[6].id | https://openalex.org/keywords/mathematics |
| keywords[6].score | 0.18790331482887268 |
| keywords[6].display_name | Mathematics |
| keywords[7].id | https://openalex.org/keywords/geometry |
| keywords[7].score | 0.07067370414733887 |
| keywords[7].display_name | Geometry |
| keywords[8].id | https://openalex.org/keywords/physics |
| keywords[8].score | 0.0560021698474884 |
| keywords[8].display_name | Physics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2411.18002 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by-nc-nd |
| locations[0].pdf_url | https://arxiv.org/pdf/2411.18002 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | https://openalex.org/licenses/cc-by-nc-nd |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2411.18002 |
| locations[1].id | doi:10.48550/arxiv.2411.18002 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2411.18002 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5014767558 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-8907-4289 |
| authorships[0].author.display_name | Steven Y. J. Lai |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Lai, Song-Jiang |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5004935515 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-5467-6872 |
| authorships[1].author.display_name | Tsun-Hin Cheung |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Cheung, Tsun-Hin |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5081709216 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-2131-2970 |
| authorships[2].author.display_name | Kin Yip Fung |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Fung, Ka-Chun |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5011105024 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-3831-8893 |
| authorships[3].author.display_name | Tianshan Liu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Liu, Tian-Shan |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5019678322 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-0422-8454 |
| authorships[4].author.display_name | Kin‐Man Lam |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Lam, Kin-Man |
| authorships[4].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2411.18002 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10812 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9933000206947327 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Human Pose and Action Recognition |
| related_works | https://openalex.org/W2151749779, https://openalex.org/W3179968364, https://openalex.org/W1999612375, https://openalex.org/W2938107654, https://openalex.org/W3196421258, https://openalex.org/W4387301579, https://openalex.org/W2763956190, https://openalex.org/W3008587939, https://openalex.org/W2062195135, https://openalex.org/W4400488565 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2411.18002 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by-nc-nd |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2411.18002 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by-nc-nd |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2411.18002 |
| primary_location.id | pmh:oai:arXiv.org:2411.18002 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by-nc-nd |
| primary_location.pdf_url | https://arxiv.org/pdf/2411.18002 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | https://openalex.org/licenses/cc-by-nc-nd |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2411.18002 |
| publication_date | 2024-11-27 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 18, 39, 48 |
| abstract_inverted_index.To | 43 |
| abstract_inverted_index.at | 38 |
| abstract_inverted_index.by | 122 |
| abstract_inverted_index.in | 4, 58 |
| abstract_inverted_index.it | 121 |
| abstract_inverted_index.of | 113, 160 |
| abstract_inverted_index.on | 100, 117, 126, 163 |
| abstract_inverted_index.to | 52, 86, 137, 143, 156 |
| abstract_inverted_index.we | 46 |
| abstract_inverted_index.Our | 74 |
| abstract_inverted_index.RGB | 29 |
| abstract_inverted_index.and | 30, 71, 89, 105, 119, 124, 129, 140, 149 |
| abstract_inverted_index.are | 134 |
| abstract_inverted_index.but | 37 |
| abstract_inverted_index.for | 21, 77, 91 |
| abstract_inverted_index.our | 108 |
| abstract_inverted_index.the | 1, 54, 59, 101, 111, 114, 144, 158 |
| abstract_inverted_index.CAM, | 171, 174 |
| abstract_inverted_index.HMDB | 106 |
| abstract_inverted_index.When | 98 |
| abstract_inverted_index.With | 0 |
| abstract_inverted_index.also | 154 |
| abstract_inverted_index.cost | 70 |
| abstract_inverted_index.deep | 5 |
| abstract_inverted_index.flow | 32, 50, 56 |
| abstract_inverted_index.have | 10 |
| abstract_inverted_index.high | 40 |
| abstract_inverted_index.maps | 84 |
| abstract_inverted_index.seen | 11 |
| abstract_inverted_index.uses | 81 |
| abstract_inverted_index.were | 153 |
| abstract_inverted_index.with | 95 |
| abstract_inverted_index.0.65% | 123 |
| abstract_inverted_index.0.84% | 125 |
| abstract_inverted_index.EGTEA | 103, 127 |
| abstract_inverted_index.GAZE+ | 128 |
| abstract_inverted_index.HMDB, | 130 |
| abstract_inverted_index.based | 23 |
| abstract_inverted_index.class | 82 |
| abstract_inverted_index.cost. | 42 |
| abstract_inverted_index.flow, | 173 |
| abstract_inverted_index.focus | 20 |
| abstract_inverted_index.model | 109, 116, 164 |
| abstract_inverted_index.rapid | 2 |
| abstract_inverted_index.study | 157 |
| abstract_inverted_index.tasks | 9 |
| abstract_inverted_index.this, | 45 |
| abstract_inverted_index.time. | 73 |
| abstract_inverted_index.using | 28 |
| abstract_inverted_index.video | 22 |
| abstract_inverted_index.while | 67 |
| abstract_inverted_index.(CAMs) | 85 |
| abstract_inverted_index.GAZE+, | 104 |
| abstract_inverted_index.GTEA61 | 118 |
| abstract_inverted_index.action | 24, 61, 79, 169 |
| abstract_inverted_index.branch | 57 |
| abstract_inverted_index.impact | 159 |
| abstract_inverted_index.making | 14 |
| abstract_inverted_index.model, | 63, 75 |
| abstract_inverted_index.models | 27 |
| abstract_inverted_index.neural | 16 |
| abstract_inverted_index.spatio | 92 |
| abstract_inverted_index.strong | 35 |
| abstract_inverted_index.vision | 8 |
| abstract_inverted_index.GTEA61, | 102 |
| abstract_inverted_index.achieve | 34 |
| abstract_inverted_index.address | 44 |
| abstract_inverted_index.exceeds | 120 |
| abstract_inverted_index.improve | 87 |
| abstract_inverted_index.matches | 110 |
| abstract_inverted_index.model's | 146 |
| abstract_inverted_index.optical | 31, 55 |
| abstract_inverted_index.popular | 19 |
| abstract_inverted_index.reduced | 136 |
| abstract_inverted_index.replace | 53 |
| abstract_inverted_index.spatial | 96 |
| abstract_inverted_index.streams | 33 |
| abstract_inverted_index.studies | 152 |
| abstract_inverted_index.0.1459s, | 141 |
| abstract_inverted_index.0.1503s, | 139 |
| abstract_inverted_index.0.1881s, | 138 |
| abstract_inverted_index.Ablation | 151 |
| abstract_inverted_index.ConvLSTM | 90, 175 |
| abstract_inverted_index.accuracy | 88, 112 |
| abstract_inverted_index.compared | 142 |
| abstract_inverted_index.computer | 7 |
| abstract_inverted_index.designed | 76 |
| abstract_inverted_index.enabling | 64 |
| abstract_inverted_index.encoding | 94 |
| abstract_inverted_index.networks | 17 |
| abstract_inverted_index.original | 115, 145 |
| abstract_inverted_index.reducing | 68 |
| abstract_inverted_index.runtimes | 133 |
| abstract_inverted_index.temporal | 93 |
| abstract_inverted_index.training | 66 |
| abstract_inverted_index.25.3799s, | 148 |
| abstract_inverted_index.Keywords: | 166 |
| abstract_inverted_index.algorithm | 51 |
| abstract_inverted_index.conducted | 155 |
| abstract_inverted_index.datasets, | 107 |
| abstract_inverted_index.different | 161 |
| abstract_inverted_index.evaluated | 99 |
| abstract_inverted_index.introduce | 47 |
| abstract_inverted_index.learning, | 6 |
| abstract_inverted_index.101.6795s, | 147 |
| abstract_inverted_index.203.9958s. | 150 |
| abstract_inverted_index.Prediction | 132 |
| abstract_inverted_index.activation | 83 |
| abstract_inverted_index.attention. | 97 |
| abstract_inverted_index.egocentric | 60, 78 |
| abstract_inverted_index.end-to-end | 65 |
| abstract_inverted_index.parameters | 162 |
| abstract_inverted_index.prediction | 72 |
| abstract_inverted_index.two-stream | 15 |
| abstract_inverted_index.Traditional | 26 |
| abstract_inverted_index.egocentric, | 168 |
| abstract_inverted_index.performance | 36 |
| abstract_inverted_index.recognition | 62 |
| abstract_inverted_index.significant | 12 |
| abstract_inverted_index.two-stream, | 167 |
| abstract_inverted_index.advancements | 3 |
| abstract_inverted_index.performance. | 165 |
| abstract_inverted_index.recognition, | 80, 170 |
| abstract_inverted_index.recognition. | 25 |
| abstract_inverted_index.computational | 41, 69 |
| abstract_inverted_index.improvements, | 13 |
| abstract_inverted_index.respectively. | 131 |
| abstract_inverted_index.significantly | 135 |
| abstract_inverted_index.representation | 49, 172 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |