The Solution for Temporal Action Localisation Task of Perception Test Challenge 2024 Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2410.09088
This report presents our method for Temporal Action Localisation (TAL), which focuses on identifying and classifying actions within specific time intervals throughout a video sequence. We employ a data augmentation technique by expanding the training dataset using overlapping labels from the Something-SomethingV2 dataset, enhancing the model's ability to generalize across various action classes. For feature extraction, we utilize state-of-the-art models, including UMT, VideoMAEv2 for video features, and BEATs and CAV-MAE for audio features. Our approach involves training both multimodal (video and audio) and unimodal (video only) models, followed by combining their predictions using the Weighted Box Fusion (WBF) method. This fusion strategy ensures robust action localisation. our overall approach achieves a score of 0.5498, securing first place in the competition.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2410.09088
- https://arxiv.org/pdf/2410.09088
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4403564230
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4403564230Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2410.09088Digital Object Identifier
- Title
-
The Solution for Temporal Action Localisation Task of Perception Test Challenge 2024Work title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-10-08Full publication date if available
- Authors
-
Han Yang, Qing-Yuan Jiang, Huiyuan Mei, Yang Yang, Jinhui TangList of authors in order
- Landing page
-
https://arxiv.org/abs/2410.09088Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2410.09088Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2410.09088Direct OA link when available
- Concepts
-
Action (physics), Task (project management), Perception, Test (biology), Computer science, Cognitive psychology, Psychology, Neuroscience, Economics, Management, Biology, Physics, Quantum mechanics, PaleontologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4403564230 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2410.09088 |
| ids.doi | https://doi.org/10.48550/arxiv.2410.09088 |
| ids.openalex | https://openalex.org/W4403564230 |
| fwci | |
| type | preprint |
| title | The Solution for Temporal Action Localisation Task of Perception Test Challenge 2024 |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12111 |
| topics[0].field.id | https://openalex.org/fields/22 |
| topics[0].field.display_name | Engineering |
| topics[0].score | 0.8118000030517578 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2209 |
| topics[0].subfield.display_name | Industrial and Manufacturing Engineering |
| topics[0].display_name | Industrial Vision Systems and Defect Detection |
| topics[1].id | https://openalex.org/T13832 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.8044000267982483 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1710 |
| topics[1].subfield.display_name | Information Systems |
| topics[1].display_name | Advanced Decision-Making Techniques |
| topics[2].id | https://openalex.org/T11605 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.8019000291824341 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Visual Attention and Saliency Detection |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2780791683 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7429205179214478 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q846785 |
| concepts[0].display_name | Action (physics) |
| concepts[1].id | https://openalex.org/C2780451532 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7105722427368164 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q759676 |
| concepts[1].display_name | Task (project management) |
| concepts[2].id | https://openalex.org/C26760741 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6160993576049805 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q160402 |
| concepts[2].display_name | Perception |
| concepts[3].id | https://openalex.org/C2777267654 |
| concepts[3].level | 2 |
| concepts[3].score | 0.610418975353241 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q3519023 |
| concepts[3].display_name | Test (biology) |
| concepts[4].id | https://openalex.org/C41008148 |
| concepts[4].level | 0 |
| concepts[4].score | 0.43643954396247864 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[4].display_name | Computer science |
| concepts[5].id | https://openalex.org/C180747234 |
| concepts[5].level | 1 |
| concepts[5].score | 0.3667009174823761 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q23373 |
| concepts[5].display_name | Cognitive psychology |
| concepts[6].id | https://openalex.org/C15744967 |
| concepts[6].level | 0 |
| concepts[6].score | 0.34172534942626953 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[6].display_name | Psychology |
| concepts[7].id | https://openalex.org/C169760540 |
| concepts[7].level | 1 |
| concepts[7].score | 0.1253366470336914 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q207011 |
| concepts[7].display_name | Neuroscience |
| concepts[8].id | https://openalex.org/C162324750 |
| concepts[8].level | 0 |
| concepts[8].score | 0.11413496732711792 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q8134 |
| concepts[8].display_name | Economics |
| concepts[9].id | https://openalex.org/C187736073 |
| concepts[9].level | 1 |
| concepts[9].score | 0.08668553829193115 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q2920921 |
| concepts[9].display_name | Management |
| concepts[10].id | https://openalex.org/C86803240 |
| concepts[10].level | 0 |
| concepts[10].score | 0.07306578755378723 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[10].display_name | Biology |
| concepts[11].id | https://openalex.org/C121332964 |
| concepts[11].level | 0 |
| concepts[11].score | 0.06914347410202026 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[11].display_name | Physics |
| concepts[12].id | https://openalex.org/C62520636 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q944 |
| concepts[12].display_name | Quantum mechanics |
| concepts[13].id | https://openalex.org/C151730666 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q7205 |
| concepts[13].display_name | Paleontology |
| keywords[0].id | https://openalex.org/keywords/action |
| keywords[0].score | 0.7429205179214478 |
| keywords[0].display_name | Action (physics) |
| keywords[1].id | https://openalex.org/keywords/task |
| keywords[1].score | 0.7105722427368164 |
| keywords[1].display_name | Task (project management) |
| keywords[2].id | https://openalex.org/keywords/perception |
| keywords[2].score | 0.6160993576049805 |
| keywords[2].display_name | Perception |
| keywords[3].id | https://openalex.org/keywords/test |
| keywords[3].score | 0.610418975353241 |
| keywords[3].display_name | Test (biology) |
| keywords[4].id | https://openalex.org/keywords/computer-science |
| keywords[4].score | 0.43643954396247864 |
| keywords[4].display_name | Computer science |
| keywords[5].id | https://openalex.org/keywords/cognitive-psychology |
| keywords[5].score | 0.3667009174823761 |
| keywords[5].display_name | Cognitive psychology |
| keywords[6].id | https://openalex.org/keywords/psychology |
| keywords[6].score | 0.34172534942626953 |
| keywords[6].display_name | Psychology |
| keywords[7].id | https://openalex.org/keywords/neuroscience |
| keywords[7].score | 0.1253366470336914 |
| keywords[7].display_name | Neuroscience |
| keywords[8].id | https://openalex.org/keywords/economics |
| keywords[8].score | 0.11413496732711792 |
| keywords[8].display_name | Economics |
| keywords[9].id | https://openalex.org/keywords/management |
| keywords[9].score | 0.08668553829193115 |
| keywords[9].display_name | Management |
| keywords[10].id | https://openalex.org/keywords/biology |
| keywords[10].score | 0.07306578755378723 |
| keywords[10].display_name | Biology |
| keywords[11].id | https://openalex.org/keywords/physics |
| keywords[11].score | 0.06914347410202026 |
| keywords[11].display_name | Physics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2410.09088 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2410.09088 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2410.09088 |
| locations[1].id | doi:10.48550/arxiv.2410.09088 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2410.09088 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100737925 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-8791-3616 |
| authorships[0].author.display_name | Han Yang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Han, Yinan |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5064050872 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-9214-7960 |
| authorships[1].author.display_name | Qing-Yuan Jiang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Jiang, Qingyuan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5111128968 |
| authorships[2].author.orcid | https://orcid.org/0009-0007-4653-4867 |
| authorships[2].author.display_name | Huiyuan Mei |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Mei, Hongming |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5100397725 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-0608-9408 |
| authorships[3].author.display_name | Yang Yang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Yang, Yang |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5035112538 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-9008-222X |
| authorships[4].author.display_name | Jinhui Tang |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Tang, Jinhui |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2410.09088 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-10-20T00:00:00 |
| display_name | The Solution for Temporal Action Localisation Task of Perception Test Challenge 2024 |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12111 |
| primary_topic.field.id | https://openalex.org/fields/22 |
| primary_topic.field.display_name | Engineering |
| primary_topic.score | 0.8118000030517578 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2209 |
| primary_topic.subfield.display_name | Industrial and Manufacturing Engineering |
| primary_topic.display_name | Industrial Vision Systems and Defect Detection |
| related_works | https://openalex.org/W2628861693, https://openalex.org/W3203087560, https://openalex.org/W641782856, https://openalex.org/W4361279463, https://openalex.org/W4232814730, https://openalex.org/W2975814312, https://openalex.org/W2598946408, https://openalex.org/W2042006092, https://openalex.org/W2087303720, https://openalex.org/W4211075255 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2410.09088 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2410.09088 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2410.09088 |
| primary_location.id | pmh:oai:arXiv.org:2410.09088 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2410.09088 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2410.09088 |
| publication_date | 2024-10-08 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 22, 27, 110 |
| abstract_inverted_index.We | 25 |
| abstract_inverted_index.by | 31, 88 |
| abstract_inverted_index.in | 117 |
| abstract_inverted_index.of | 112 |
| abstract_inverted_index.on | 12 |
| abstract_inverted_index.to | 47 |
| abstract_inverted_index.we | 56 |
| abstract_inverted_index.Box | 95 |
| abstract_inverted_index.For | 53 |
| abstract_inverted_index.Our | 73 |
| abstract_inverted_index.and | 14, 66, 68, 80, 82 |
| abstract_inverted_index.for | 5, 63, 70 |
| abstract_inverted_index.our | 3, 106 |
| abstract_inverted_index.the | 33, 40, 44, 93, 118 |
| abstract_inverted_index.This | 0, 99 |
| abstract_inverted_index.UMT, | 61 |
| abstract_inverted_index.both | 77 |
| abstract_inverted_index.data | 28 |
| abstract_inverted_index.from | 39 |
| abstract_inverted_index.time | 19 |
| abstract_inverted_index.(WBF) | 97 |
| abstract_inverted_index.BEATs | 67 |
| abstract_inverted_index.audio | 71 |
| abstract_inverted_index.first | 115 |
| abstract_inverted_index.only) | 85 |
| abstract_inverted_index.place | 116 |
| abstract_inverted_index.score | 111 |
| abstract_inverted_index.their | 90 |
| abstract_inverted_index.using | 36, 92 |
| abstract_inverted_index.video | 23, 64 |
| abstract_inverted_index.which | 10 |
| abstract_inverted_index.(TAL), | 9 |
| abstract_inverted_index.(video | 79, 84 |
| abstract_inverted_index.Action | 7 |
| abstract_inverted_index.Fusion | 96 |
| abstract_inverted_index.across | 49 |
| abstract_inverted_index.action | 51, 104 |
| abstract_inverted_index.audio) | 81 |
| abstract_inverted_index.employ | 26 |
| abstract_inverted_index.fusion | 100 |
| abstract_inverted_index.labels | 38 |
| abstract_inverted_index.method | 4 |
| abstract_inverted_index.report | 1 |
| abstract_inverted_index.robust | 103 |
| abstract_inverted_index.within | 17 |
| abstract_inverted_index.0.5498, | 113 |
| abstract_inverted_index.CAV-MAE | 69 |
| abstract_inverted_index.ability | 46 |
| abstract_inverted_index.actions | 16 |
| abstract_inverted_index.dataset | 35 |
| abstract_inverted_index.ensures | 102 |
| abstract_inverted_index.feature | 54 |
| abstract_inverted_index.focuses | 11 |
| abstract_inverted_index.method. | 98 |
| abstract_inverted_index.model's | 45 |
| abstract_inverted_index.models, | 59, 86 |
| abstract_inverted_index.overall | 107 |
| abstract_inverted_index.utilize | 57 |
| abstract_inverted_index.various | 50 |
| abstract_inverted_index.Temporal | 6 |
| abstract_inverted_index.Weighted | 94 |
| abstract_inverted_index.achieves | 109 |
| abstract_inverted_index.approach | 74, 108 |
| abstract_inverted_index.classes. | 52 |
| abstract_inverted_index.dataset, | 42 |
| abstract_inverted_index.followed | 87 |
| abstract_inverted_index.involves | 75 |
| abstract_inverted_index.presents | 2 |
| abstract_inverted_index.securing | 114 |
| abstract_inverted_index.specific | 18 |
| abstract_inverted_index.strategy | 101 |
| abstract_inverted_index.training | 34, 76 |
| abstract_inverted_index.unimodal | 83 |
| abstract_inverted_index.combining | 89 |
| abstract_inverted_index.enhancing | 43 |
| abstract_inverted_index.expanding | 32 |
| abstract_inverted_index.features, | 65 |
| abstract_inverted_index.features. | 72 |
| abstract_inverted_index.including | 60 |
| abstract_inverted_index.intervals | 20 |
| abstract_inverted_index.sequence. | 24 |
| abstract_inverted_index.technique | 30 |
| abstract_inverted_index.VideoMAEv2 | 62 |
| abstract_inverted_index.generalize | 48 |
| abstract_inverted_index.multimodal | 78 |
| abstract_inverted_index.throughout | 21 |
| abstract_inverted_index.classifying | 15 |
| abstract_inverted_index.extraction, | 55 |
| abstract_inverted_index.identifying | 13 |
| abstract_inverted_index.overlapping | 37 |
| abstract_inverted_index.predictions | 91 |
| abstract_inverted_index.Localisation | 8 |
| abstract_inverted_index.augmentation | 29 |
| abstract_inverted_index.competition. | 119 |
| abstract_inverted_index.localisation. | 105 |
| abstract_inverted_index.state-of-the-art | 58 |
| abstract_inverted_index.Something-SomethingV2 | 41 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |