TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2403.19318
We introduce TableLLM, a robust large language model (LLM) with 8 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios. We propose a distant supervision method for training, which comprises a reasoning process extension strategy, aiding in training LLMs to understand reasoning patterns more effectively as well as a cross-way validation strategy, ensuring the quality of the automatically generated data. To evaluate the performance of TableLLM, we have crafted benchmarks tailored to address both document and spreadsheet formats as well as constructed a well-organized evaluation pipeline capable of handling both scenarios. Thorough evaluations underscore the advantages of TableLLM when compared to various existing general-purpose and tabular data-focused LLMs. We have publicly released the model checkpoint, source code, benchmarks, and a web application for user interaction. Our codes and data are publicly available at https://github.com/TableLLM/TableLLM.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2403.19318
- https://arxiv.org/pdf/2403.19318
- OA Status
- green
- Cited By
- 3
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4393335916
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4393335916Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2403.19318Digital Object Identifier
- Title
-
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage ScenariosWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-03-28Full publication date if available
- Authors
-
Xiaokang Zhang, Jing Zhang, Zeyao Ma, Li Yang, Bohan Zhang, Guanlin Li, Zijun Yao, Kang Xu, Jinchang Zhou, Daniel Zhang-Li, Jifan Yu, Shu Zhao, Juanzi Li, Jie TangList of authors in order
- Landing page
-
https://arxiv.org/abs/2403.19318Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2403.19318Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2403.19318Direct OA link when available
- Concepts
-
Computer science, Data science, BusinessTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
3Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 2, 2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4393335916 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2403.19318 |
| ids.doi | https://doi.org/10.48550/arxiv.2403.19318 |
| ids.openalex | https://openalex.org/W4393335916 |
| fwci | |
| type | preprint |
| title | TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10317 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9121000170707703 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1705 |
| topics[0].subfield.display_name | Computer Networks and Communications |
| topics[0].display_name | Advanced Database Systems and Queries |
| topics[1].id | https://openalex.org/T11719 |
| topics[1].field.id | https://openalex.org/fields/18 |
| topics[1].field.display_name | Decision Sciences |
| topics[1].score | 0.9120000004768372 |
| topics[1].domain.id | https://openalex.org/domains/2 |
| topics[1].domain.display_name | Social Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1803 |
| topics[1].subfield.display_name | Management Science and Operations Research |
| topics[1].display_name | Data Quality and Management |
| topics[2].id | https://openalex.org/T10703 |
| topics[2].field.id | https://openalex.org/fields/14 |
| topics[2].field.display_name | Business, Management and Accounting |
| topics[2].score | 0.902999997138977 |
| topics[2].domain.id | https://openalex.org/domains/2 |
| topics[2].domain.display_name | Social Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1404 |
| topics[2].subfield.display_name | Management Information Systems |
| topics[2].display_name | Business Process Modeling and Analysis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.4461626410484314 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C2522767166 |
| concepts[1].level | 1 |
| concepts[1].score | 0.34299004077911377 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q2374463 |
| concepts[1].display_name | Data science |
| concepts[2].id | https://openalex.org/C144133560 |
| concepts[2].level | 0 |
| concepts[2].score | 0.3280819058418274 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q4830453 |
| concepts[2].display_name | Business |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.4461626410484314 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/data-science |
| keywords[1].score | 0.34299004077911377 |
| keywords[1].display_name | Data science |
| keywords[2].id | https://openalex.org/keywords/business |
| keywords[2].score | 0.3280819058418274 |
| keywords[2].display_name | Business |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2403.19318 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2403.19318 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2403.19318 |
| locations[1].id | doi:10.48550/arxiv.2403.19318 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2403.19318 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100412319 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-1225-4775 |
| authorships[0].author.display_name | Xiaokang Zhang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Zhang, Xiaokang |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5086548040 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-1042-7433 |
| authorships[1].author.display_name | Jing Zhang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Zhang, Jing |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5022455313 |
| authorships[2].author.orcid | https://orcid.org/0009-0006-9386-9136 |
| authorships[2].author.display_name | Zeyao Ma |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Ma, Zeyao |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5100637720 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-8152-7642 |
| authorships[3].author.display_name | Li Yang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Li, Yang |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5100653177 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-8410-3853 |
| authorships[4].author.display_name | Bohan Zhang |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Zhang, Bohan |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5100756004 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-3142-5928 |
| authorships[5].author.display_name | Guanlin Li |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Li, Guanlin |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5046687207 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-0288-9283 |
| authorships[6].author.display_name | Zijun Yao |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Yao, Zijun |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5100440745 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-6946-8635 |
| authorships[7].author.display_name | Kang Xu |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Xu, Kangli |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5104266923 |
| authorships[8].author.orcid | |
| authorships[8].author.display_name | Jinchang Zhou |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Zhou, Jinchang |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5014232813 |
| authorships[9].author.orcid | https://orcid.org/0009-0009-3681-1896 |
| authorships[9].author.display_name | Daniel Zhang-Li |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Zhang-Li, Daniel |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5038246814 |
| authorships[10].author.orcid | https://orcid.org/0000-0003-3430-4048 |
| authorships[10].author.display_name | Jifan Yu |
| authorships[10].author_position | middle |
| authorships[10].raw_author_name | Yu, Jifan |
| authorships[10].is_corresponding | False |
| authorships[11].author.id | https://openalex.org/A5100531421 |
| authorships[11].author.orcid | |
| authorships[11].author.display_name | Shu Zhao |
| authorships[11].author_position | middle |
| authorships[11].raw_author_name | Zhao, Shu |
| authorships[11].is_corresponding | False |
| authorships[12].author.id | https://openalex.org/A5003324011 |
| authorships[12].author.orcid | https://orcid.org/0000-0002-6244-0664 |
| authorships[12].author.display_name | Juanzi Li |
| authorships[12].author_position | middle |
| authorships[12].raw_author_name | Li, Juanzi |
| authorships[12].is_corresponding | False |
| authorships[13].author.id | https://openalex.org/A5089631680 |
| authorships[13].author.orcid | https://orcid.org/0000-0003-0619-0338 |
| authorships[13].author.display_name | Jie Tang |
| authorships[13].author_position | last |
| authorships[13].raw_author_name | Tang, Jie |
| authorships[13].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2403.19318 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10317 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9121000170707703 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1705 |
| primary_topic.subfield.display_name | Computer Networks and Communications |
| primary_topic.display_name | Advanced Database Systems and Queries |
| related_works | https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W2382290278, https://openalex.org/W2478288626, https://openalex.org/W4391913857, https://openalex.org/W2350741829, https://openalex.org/W2530322880 |
| cited_by_count | 3 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 2 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2403.19318 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2403.19318 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2403.19318 |
| primary_location.id | pmh:oai:arXiv.org:2403.19318 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2403.19318 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2403.19318 |
| publication_date | 2024-03-28 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.8 | 10 |
| abstract_inverted_index.a | 3, 36, 44, 62, 96, 133 |
| abstract_inverted_index.To | 74 |
| abstract_inverted_index.We | 0, 34, 122 |
| abstract_inverted_index.as | 59, 61, 92, 94 |
| abstract_inverted_index.at | 146 |
| abstract_inverted_index.in | 50 |
| abstract_inverted_index.of | 69, 78, 101, 110 |
| abstract_inverted_index.or | 27 |
| abstract_inverted_index.to | 30, 53, 85, 114 |
| abstract_inverted_index.we | 80 |
| abstract_inverted_index.Our | 139 |
| abstract_inverted_index.and | 89, 118, 132, 141 |
| abstract_inverted_index.are | 23, 143 |
| abstract_inverted_index.for | 14, 40, 136 |
| abstract_inverted_index.the | 67, 70, 76, 108, 126 |
| abstract_inverted_index.web | 134 |
| abstract_inverted_index.LLMs | 52 |
| abstract_inverted_index.both | 87, 103 |
| abstract_inverted_index.data | 18, 142 |
| abstract_inverted_index.have | 81, 123 |
| abstract_inverted_index.more | 57 |
| abstract_inverted_index.they | 22 |
| abstract_inverted_index.user | 137 |
| abstract_inverted_index.well | 60, 93 |
| abstract_inverted_index.when | 112 |
| abstract_inverted_index.with | 9 |
| abstract_inverted_index.(LLM) | 8 |
| abstract_inverted_index.LLMs. | 121 |
| abstract_inverted_index.code, | 130 |
| abstract_inverted_index.codes | 140 |
| abstract_inverted_index.data. | 73 |
| abstract_inverted_index.large | 5 |
| abstract_inverted_index.model | 7, 127 |
| abstract_inverted_index.which | 42 |
| abstract_inverted_index.aiding | 49 |
| abstract_inverted_index.method | 39 |
| abstract_inverted_index.office | 32 |
| abstract_inverted_index.robust | 4 |
| abstract_inverted_index.source | 129 |
| abstract_inverted_index.tasks, | 20 |
| abstract_inverted_index.within | 25 |
| abstract_inverted_index.address | 86 |
| abstract_inverted_index.billion | 11 |
| abstract_inverted_index.capable | 100 |
| abstract_inverted_index.crafted | 82 |
| abstract_inverted_index.distant | 37 |
| abstract_inverted_index.formats | 91 |
| abstract_inverted_index.process | 46 |
| abstract_inverted_index.propose | 35 |
| abstract_inverted_index.quality | 68 |
| abstract_inverted_index.tabular | 17, 119 |
| abstract_inverted_index.various | 115 |
| abstract_inverted_index.whether | 21 |
| abstract_inverted_index.TableLLM | 111 |
| abstract_inverted_index.Thorough | 105 |
| abstract_inverted_index.catering | 29 |
| abstract_inverted_index.compared | 113 |
| abstract_inverted_index.document | 88 |
| abstract_inverted_index.embedded | 24 |
| abstract_inverted_index.ensuring | 66 |
| abstract_inverted_index.evaluate | 75 |
| abstract_inverted_index.existing | 116 |
| abstract_inverted_index.handling | 16, 102 |
| abstract_inverted_index.language | 6 |
| abstract_inverted_index.patterns | 56 |
| abstract_inverted_index.pipeline | 99 |
| abstract_inverted_index.publicly | 124, 144 |
| abstract_inverted_index.released | 125 |
| abstract_inverted_index.tailored | 84 |
| abstract_inverted_index.training | 51 |
| abstract_inverted_index.TableLLM, | 2, 79 |
| abstract_inverted_index.available | 145 |
| abstract_inverted_index.comprises | 43 |
| abstract_inverted_index.cross-way | 63 |
| abstract_inverted_index.documents | 26 |
| abstract_inverted_index.extension | 47 |
| abstract_inverted_index.generated | 72 |
| abstract_inverted_index.introduce | 1 |
| abstract_inverted_index.reasoning | 45, 55 |
| abstract_inverted_index.strategy, | 48, 65 |
| abstract_inverted_index.training, | 41 |
| abstract_inverted_index.advantages | 109 |
| abstract_inverted_index.benchmarks | 83 |
| abstract_inverted_index.evaluation | 98 |
| abstract_inverted_index.real-world | 31 |
| abstract_inverted_index.scenarios. | 33, 104 |
| abstract_inverted_index.underscore | 107 |
| abstract_inverted_index.understand | 54 |
| abstract_inverted_index.validation | 64 |
| abstract_inverted_index.application | 135 |
| abstract_inverted_index.benchmarks, | 131 |
| abstract_inverted_index.checkpoint, | 128 |
| abstract_inverted_index.constructed | 95 |
| abstract_inverted_index.effectively | 58 |
| abstract_inverted_index.evaluations | 106 |
| abstract_inverted_index.parameters, | 12 |
| abstract_inverted_index.performance | 77 |
| abstract_inverted_index.spreadsheet | 90 |
| abstract_inverted_index.supervision | 38 |
| abstract_inverted_index.data-focused | 120 |
| abstract_inverted_index.interaction. | 138 |
| abstract_inverted_index.manipulation | 19 |
| abstract_inverted_index.proficiently | 15 |
| abstract_inverted_index.automatically | 71 |
| abstract_inverted_index.purpose-built | 13 |
| abstract_inverted_index.spreadsheets, | 28 |
| abstract_inverted_index.well-organized | 97 |
| abstract_inverted_index.general-purpose | 117 |
| abstract_inverted_index.https://github.com/TableLLM/TableLLM. | 147 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 14 |
| citation_normalized_percentile |