Topic Segmentation of Semi-Structured and Unstructured Conversational Datasets using Language Models Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2310.17120
Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic segmentation often focus on segmentation of structured texts. In this paper, we comprehensively analyze the generalization capabilities of state-of-the-art topic segmentation models on unstructured texts. We find that: (a) Current strategies of pre-training on a large corpus of structured text such as Wiki-727K do not help in transferability to unstructured conversational data. (b) Training from scratch with only a relatively small-sized dataset of the target unstructured domain improves the segmentation results by a significant margin. We stress-test our proposed Topic Segmentation approach by experimenting with multiple loss functions, in order to mitigate effects of imbalance in unstructured conversational datasets. Our empirical evaluation indicates that Focal Loss function is a robust alternative to Cross-Entropy and re-weighted Cross-Entropy loss function when segmenting unstructured and semi-structured chats.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2310.17120
- https://arxiv.org/pdf/2310.17120
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4387995129
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4387995129Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2310.17120Digital Object Identifier
- Title
-
Topic Segmentation of Semi-Structured and Unstructured Conversational Datasets using Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-10-26Full publication date if available
- Authors
-
Reshmi Ghosh, Harjeet Singh Kajal, Sharanya Kamath, Dhuri Shrivastava, Samyadeep Basu, Hansi Zeng, Soundararajan SrinivasanList of authors in order
- Landing page
-
https://arxiv.org/abs/2310.17120Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2310.17120Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2310.17120Direct OA link when available
- Concepts
-
Computer science, Segmentation, Margin (machine learning), Unstructured data, Market segmentation, Artificial intelligence, Natural language processing, Focus (optics), Conversation, Cross entropy, Principle of maximum entropy, Machine learning, Data mining, Linguistics, Big data, Optics, Business, Marketing, Philosophy, PhysicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4387995129 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2310.17120 |
| ids.doi | https://doi.org/10.48550/arxiv.2310.17120 |
| ids.openalex | https://openalex.org/W4387995129 |
| fwci | 0.0 |
| type | preprint |
| title | Topic Segmentation of Semi-Structured and Unstructured Conversational Datasets using Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10028 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Topic Modeling |
| topics[1].id | https://openalex.org/T10181 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9991999864578247 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Natural Language Processing Techniques |
| topics[2].id | https://openalex.org/T11550 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9628999829292297 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Text and Document Classification Technologies |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8139019012451172 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C89600930 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7654542922973633 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q1423946 |
| concepts[1].display_name | Segmentation |
| concepts[2].id | https://openalex.org/C774472 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6358891129493713 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q6760393 |
| concepts[2].display_name | Margin (machine learning) |
| concepts[3].id | https://openalex.org/C2781252014 |
| concepts[3].level | 3 |
| concepts[3].score | 0.6311219930648804 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1141900 |
| concepts[3].display_name | Unstructured data |
| concepts[4].id | https://openalex.org/C125308379 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5552920699119568 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q363057 |
| concepts[4].display_name | Market segmentation |
| concepts[5].id | https://openalex.org/C154945302 |
| concepts[5].level | 1 |
| concepts[5].score | 0.5485799908638 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[5].display_name | Artificial intelligence |
| concepts[6].id | https://openalex.org/C204321447 |
| concepts[6].level | 1 |
| concepts[6].score | 0.5446348786354065 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[6].display_name | Natural language processing |
| concepts[7].id | https://openalex.org/C192209626 |
| concepts[7].level | 2 |
| concepts[7].score | 0.4950782358646393 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q190909 |
| concepts[7].display_name | Focus (optics) |
| concepts[8].id | https://openalex.org/C2777200299 |
| concepts[8].level | 2 |
| concepts[8].score | 0.43460917472839355 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q52943 |
| concepts[8].display_name | Conversation |
| concepts[9].id | https://openalex.org/C167981619 |
| concepts[9].level | 3 |
| concepts[9].score | 0.42348015308380127 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q1685498 |
| concepts[9].display_name | Cross entropy |
| concepts[10].id | https://openalex.org/C9679016 |
| concepts[10].level | 2 |
| concepts[10].score | 0.33946359157562256 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q1417473 |
| concepts[10].display_name | Principle of maximum entropy |
| concepts[11].id | https://openalex.org/C119857082 |
| concepts[11].level | 1 |
| concepts[11].score | 0.29734480381011963 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[11].display_name | Machine learning |
| concepts[12].id | https://openalex.org/C124101348 |
| concepts[12].level | 1 |
| concepts[12].score | 0.17994719743728638 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q172491 |
| concepts[12].display_name | Data mining |
| concepts[13].id | https://openalex.org/C41895202 |
| concepts[13].level | 1 |
| concepts[13].score | 0.1361357867717743 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[13].display_name | Linguistics |
| concepts[14].id | https://openalex.org/C75684735 |
| concepts[14].level | 2 |
| concepts[14].score | 0.13505852222442627 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q858810 |
| concepts[14].display_name | Big data |
| concepts[15].id | https://openalex.org/C120665830 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q14620 |
| concepts[15].display_name | Optics |
| concepts[16].id | https://openalex.org/C144133560 |
| concepts[16].level | 0 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q4830453 |
| concepts[16].display_name | Business |
| concepts[17].id | https://openalex.org/C162853370 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q39809 |
| concepts[17].display_name | Marketing |
| concepts[18].id | https://openalex.org/C138885662 |
| concepts[18].level | 0 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[18].display_name | Philosophy |
| concepts[19].id | https://openalex.org/C121332964 |
| concepts[19].level | 0 |
| concepts[19].score | 0.0 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[19].display_name | Physics |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8139019012451172 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/segmentation |
| keywords[1].score | 0.7654542922973633 |
| keywords[1].display_name | Segmentation |
| keywords[2].id | https://openalex.org/keywords/margin |
| keywords[2].score | 0.6358891129493713 |
| keywords[2].display_name | Margin (machine learning) |
| keywords[3].id | https://openalex.org/keywords/unstructured-data |
| keywords[3].score | 0.6311219930648804 |
| keywords[3].display_name | Unstructured data |
| keywords[4].id | https://openalex.org/keywords/market-segmentation |
| keywords[4].score | 0.5552920699119568 |
| keywords[4].display_name | Market segmentation |
| keywords[5].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[5].score | 0.5485799908638 |
| keywords[5].display_name | Artificial intelligence |
| keywords[6].id | https://openalex.org/keywords/natural-language-processing |
| keywords[6].score | 0.5446348786354065 |
| keywords[6].display_name | Natural language processing |
| keywords[7].id | https://openalex.org/keywords/focus |
| keywords[7].score | 0.4950782358646393 |
| keywords[7].display_name | Focus (optics) |
| keywords[8].id | https://openalex.org/keywords/conversation |
| keywords[8].score | 0.43460917472839355 |
| keywords[8].display_name | Conversation |
| keywords[9].id | https://openalex.org/keywords/cross-entropy |
| keywords[9].score | 0.42348015308380127 |
| keywords[9].display_name | Cross entropy |
| keywords[10].id | https://openalex.org/keywords/principle-of-maximum-entropy |
| keywords[10].score | 0.33946359157562256 |
| keywords[10].display_name | Principle of maximum entropy |
| keywords[11].id | https://openalex.org/keywords/machine-learning |
| keywords[11].score | 0.29734480381011963 |
| keywords[11].display_name | Machine learning |
| keywords[12].id | https://openalex.org/keywords/data-mining |
| keywords[12].score | 0.17994719743728638 |
| keywords[12].display_name | Data mining |
| keywords[13].id | https://openalex.org/keywords/linguistics |
| keywords[13].score | 0.1361357867717743 |
| keywords[13].display_name | Linguistics |
| keywords[14].id | https://openalex.org/keywords/big-data |
| keywords[14].score | 0.13505852222442627 |
| keywords[14].display_name | Big data |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2310.17120 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2310.17120 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2310.17120 |
| locations[1].id | doi:10.48550/arxiv.2310.17120 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2310.17120 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5019507987 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-1814-2133 |
| authorships[0].author.display_name | Reshmi Ghosh |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Ghosh, Reshmi |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5043323726 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Harjeet Singh Kajal |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Kajal, Harjeet Singh |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5074319078 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Sharanya Kamath |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Kamath, Sharanya |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5025735958 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Dhuri Shrivastava |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Shrivastava, Dhuri |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5085795724 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Samyadeep Basu |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Basu, Samyadeep |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5020583822 |
| authorships[5].author.orcid | https://orcid.org/0009-0000-2699-8460 |
| authorships[5].author.display_name | Hansi Zeng |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Zeng, Hansi |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5102163048 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Soundararajan Srinivasan |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Srinivasan, Soundararajan |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2310.17120 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Topic Segmentation of Semi-Structured and Unstructured Conversational Datasets using Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10028 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Topic Modeling |
| related_works | https://openalex.org/W3125011624, https://openalex.org/W1508631387, https://openalex.org/W2370917603, https://openalex.org/W3203889067, https://openalex.org/W3184725726, https://openalex.org/W2017776670, https://openalex.org/W2952760143, https://openalex.org/W2378793138, https://openalex.org/W2759357633, https://openalex.org/W2347897961 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2310.17120 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2310.17120 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2310.17120 |
| primary_location.id | pmh:oai:arXiv.org:2310.17120 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2310.17120 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2310.17120 |
| publication_date | 2023-10-26 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 2, 5, 69, 93, 107, 143 |
| abstract_inverted_index.In | 43 |
| abstract_inverted_index.We | 60, 110 |
| abstract_inverted_index.an | 17 |
| abstract_inverted_index.as | 76 |
| abstract_inverted_index.by | 106, 117 |
| abstract_inverted_index.do | 78 |
| abstract_inverted_index.in | 22, 81, 123, 130 |
| abstract_inverted_index.is | 16, 142 |
| abstract_inverted_index.of | 40, 52, 66, 72, 97, 128 |
| abstract_inverted_index.on | 12, 33, 38, 57, 68 |
| abstract_inverted_index.or | 4 |
| abstract_inverted_index.to | 83, 125, 146 |
| abstract_inverted_index.we | 46 |
| abstract_inverted_index.(a) | 63 |
| abstract_inverted_index.(b) | 87 |
| abstract_inverted_index.Our | 134 |
| abstract_inverted_index.and | 19, 148, 156 |
| abstract_inverted_index.can | 25 |
| abstract_inverted_index.its | 13 |
| abstract_inverted_index.not | 79 |
| abstract_inverted_index.our | 112 |
| abstract_inverted_index.the | 49, 98, 103 |
| abstract_inverted_index.Loss | 140 |
| abstract_inverted_index.NLP, | 23 |
| abstract_inverted_index.down | 1 |
| abstract_inverted_index.find | 61 |
| abstract_inverted_index.from | 89 |
| abstract_inverted_index.help | 80 |
| abstract_inverted_index.into | 7 |
| abstract_inverted_index.loss | 121, 151 |
| abstract_inverted_index.many | 27 |
| abstract_inverted_index.only | 92 |
| abstract_inverted_index.such | 75 |
| abstract_inverted_index.text | 74 |
| abstract_inverted_index.that | 138 |
| abstract_inverted_index.this | 44 |
| abstract_inverted_index.when | 153 |
| abstract_inverted_index.with | 91, 119 |
| abstract_inverted_index.Focal | 139 |
| abstract_inverted_index.Topic | 114 |
| abstract_inverted_index.based | 11 |
| abstract_inverted_index.data. | 86 |
| abstract_inverted_index.focus | 37 |
| abstract_inverted_index.large | 70 |
| abstract_inverted_index.often | 36 |
| abstract_inverted_index.order | 124 |
| abstract_inverted_index.that: | 62 |
| abstract_inverted_index.topic | 34, 54 |
| abstract_inverted_index.which | 24 |
| abstract_inverted_index.works | 32 |
| abstract_inverted_index.assist | 26 |
| abstract_inverted_index.chats. | 158 |
| abstract_inverted_index.corpus | 71 |
| abstract_inverted_index.domain | 101 |
| abstract_inverted_index.models | 56 |
| abstract_inverted_index.paper, | 45 |
| abstract_inverted_index.robust | 144 |
| abstract_inverted_index.target | 99 |
| abstract_inverted_index.tasks. | 29 |
| abstract_inverted_index.texts. | 42, 59 |
| abstract_inverted_index.Current | 64 |
| abstract_inverted_index.analyze | 48 |
| abstract_inverted_index.current | 31 |
| abstract_inverted_index.dataset | 96 |
| abstract_inverted_index.effects | 127 |
| abstract_inverted_index.margin. | 109 |
| abstract_inverted_index.problem | 21 |
| abstract_inverted_index.results | 105 |
| abstract_inverted_index.scratch | 90 |
| abstract_inverted_index.Breaking | 0 |
| abstract_inverted_index.However, | 30 |
| abstract_inverted_index.Training | 88 |
| abstract_inverted_index.approach | 116 |
| abstract_inverted_index.document | 3 |
| abstract_inverted_index.function | 141, 152 |
| abstract_inverted_index.improves | 102 |
| abstract_inverted_index.mitigate | 126 |
| abstract_inverted_index.multiple | 8, 120 |
| abstract_inverted_index.proposed | 113 |
| abstract_inverted_index.segments | 10 |
| abstract_inverted_index.semantic | 14 |
| abstract_inverted_index.Wiki-727K | 77 |
| abstract_inverted_index.datasets. | 133 |
| abstract_inverted_index.empirical | 135 |
| abstract_inverted_index.imbalance | 129 |
| abstract_inverted_index.important | 18 |
| abstract_inverted_index.indicates | 137 |
| abstract_inverted_index.structure | 15 |
| abstract_inverted_index.contiguous | 9 |
| abstract_inverted_index.downstream | 28 |
| abstract_inverted_index.evaluation | 136 |
| abstract_inverted_index.functions, | 122 |
| abstract_inverted_index.relatively | 94 |
| abstract_inverted_index.segmenting | 154 |
| abstract_inverted_index.strategies | 65 |
| abstract_inverted_index.structured | 41, 73 |
| abstract_inverted_index.alternative | 145 |
| abstract_inverted_index.challenging | 20 |
| abstract_inverted_index.re-weighted | 149 |
| abstract_inverted_index.significant | 108 |
| abstract_inverted_index.small-sized | 95 |
| abstract_inverted_index.stress-test | 111 |
| abstract_inverted_index.Segmentation | 115 |
| abstract_inverted_index.capabilities | 51 |
| abstract_inverted_index.conversation | 6 |
| abstract_inverted_index.pre-training | 67 |
| abstract_inverted_index.segmentation | 35, 39, 55, 104 |
| abstract_inverted_index.unstructured | 58, 84, 100, 131, 155 |
| abstract_inverted_index.Cross-Entropy | 147, 150 |
| abstract_inverted_index.experimenting | 118 |
| abstract_inverted_index.conversational | 85, 132 |
| abstract_inverted_index.generalization | 50 |
| abstract_inverted_index.comprehensively | 47 |
| abstract_inverted_index.semi-structured | 157 |
| abstract_inverted_index.transferability | 82 |
| abstract_inverted_index.state-of-the-art | 53 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.7599999904632568 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |