MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2212.13492
Text-to-SQL semantic parsing is an important NLP task, which greatly facilitates the interaction between users and the database and becomes the key component in many human-computer interaction systems. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MultiSpider, the largest multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MultiSpider, we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVe (Schema-Augmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2212.13492
- https://arxiv.org/pdf/2212.13492
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4313304938
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4313304938Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2212.13492Digital Object Identifier
- Title
-
MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic ParsingWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-12-27Full publication date if available
- Authors
-
Longxu Dou, Yan Gao, Mingyang Pan, Dingzirui Wang, Wanxiang Che, Dechen Zhan, Jian–Guang LouList of authors in order
- Landing page
-
https://arxiv.org/abs/2212.13492Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2212.13492Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2212.13492Direct OA link when available
- Concepts
-
Computer science, Natural language processing, Parsing, Artificial intelligence, SQL, Chunking (psychology), DatabaseTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2023: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4313304938 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2212.13492 |
| ids.doi | https://doi.org/10.48550/arxiv.2212.13492 |
| ids.openalex | https://openalex.org/W4313304938 |
| fwci | |
| type | preprint |
| title | MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9993000030517578 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9973000288009644 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| topics[2].id | https://openalex.org/T12016 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9803000092506409 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1710 |
| topics[2].subfield.display_name | Information Systems |
| topics[2].display_name | Web Data Mining and Analysis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.864525318145752 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C204321447 |
| concepts[1].level | 1 |
| concepts[1].score | 0.6865297555923462 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[1].display_name | Natural language processing |
| concepts[2].id | https://openalex.org/C186644900 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6831490993499756 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q194152 |
| concepts[2].display_name | Parsing |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5971078276634216 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C510870499 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5916663408279419 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q47607 |
| concepts[4].display_name | SQL |
| concepts[5].id | https://openalex.org/C203357204 |
| concepts[5].level | 2 |
| concepts[5].score | 0.44101086258888245 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1089605 |
| concepts[5].display_name | Chunking (psychology) |
| concepts[6].id | https://openalex.org/C77088390 |
| concepts[6].level | 1 |
| concepts[6].score | 0.1855323612689972 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q8513 |
| concepts[6].display_name | Database |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.864525318145752 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/natural-language-processing |
| keywords[1].score | 0.6865297555923462 |
| keywords[1].display_name | Natural language processing |
| keywords[2].id | https://openalex.org/keywords/parsing |
| keywords[2].score | 0.6831490993499756 |
| keywords[2].display_name | Parsing |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.5971078276634216 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/sql |
| keywords[4].score | 0.5916663408279419 |
| keywords[4].display_name | SQL |
| keywords[5].id | https://openalex.org/keywords/chunking |
| keywords[5].score | 0.44101086258888245 |
| keywords[5].display_name | Chunking (psychology) |
| keywords[6].id | https://openalex.org/keywords/database |
| keywords[6].score | 0.1855323612689972 |
| keywords[6].display_name | Database |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2212.13492 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2212.13492 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2212.13492 |
| locations[1].id | doi:10.48550/arxiv.2212.13492 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2212.13492 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5039847345 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Longxu Dou |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Dou, Longxu |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100462446 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-5890-9717 |
| authorships[1].author.display_name | Yan Gao |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Gao, Yan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5063170334 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-1327-6105 |
| authorships[2].author.display_name | Mingyang Pan |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Pan, Mingyang |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5000300840 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Dingzirui Wang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Wang, Dingzirui |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5019108029 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-3907-0335 |
| authorships[4].author.display_name | Wanxiang Che |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Che, Wanxiang |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5113748483 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Dechen Zhan |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Zhan, Dechen |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5025118710 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Jian–Guang Lou |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Lou, Jian-Guang |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2212.13492 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2023-01-06T00:00:00 |
| display_name | MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9993000030517578 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W6643695, https://openalex.org/W2010807697, https://openalex.org/W4381248170, https://openalex.org/W1599450222, https://openalex.org/W2090755435, https://openalex.org/W3189621521, https://openalex.org/W2045514505, https://openalex.org/W2173794830, https://openalex.org/W2039036070, https://openalex.org/W3204019825 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2023 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2212.13492 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2212.13492 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2212.13492 |
| primary_location.id | pmh:oai:arXiv.org:2212.13492 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2212.13492 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2212.13492 |
| publication_date | 2022-12-27 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 107, 139 |
| abstract_inverted_index.In | 47 |
| abstract_inverted_index.an | 4 |
| abstract_inverted_index.by | 36, 83, 152 |
| abstract_inverted_index.in | 23, 31, 111, 113 |
| abstract_inverted_index.is | 3 |
| abstract_inverted_index.of | 41, 80, 130 |
| abstract_inverted_index.on | 45 |
| abstract_inverted_index.to | 122 |
| abstract_inverted_index.we | 50, 72, 136 |
| abstract_inverted_index.NLP | 6 |
| abstract_inverted_index.and | 15, 18, 68, 77, 87, 90, 104, 117, 155 |
| abstract_inverted_index.are | 43, 120 |
| abstract_inverted_index.but | 39 |
| abstract_inverted_index.for | 126 |
| abstract_inverted_index.gap | 160 |
| abstract_inverted_index.has | 33 |
| abstract_inverted_index.key | 21 |
| abstract_inverted_index.the | 11, 16, 20, 53, 75, 124, 127, 134, 149, 157 |
| abstract_inverted_index.1.8% | 154 |
| abstract_inverted_index.6.1% | 108 |
| abstract_inverted_index.Much | 28 |
| abstract_inverted_index.SAVe | 144 |
| abstract_inverted_index.Upon | 70 |
| abstract_inverted_index.also | 137 |
| abstract_inverted_index.been | 34 |
| abstract_inverted_index.drop | 110, 129 |
| abstract_inverted_index.each | 131 |
| abstract_inverted_index.many | 24 |
| abstract_inverted_index.most | 40 |
| abstract_inverted_index.them | 42 |
| abstract_inverted_index.this | 48 |
| abstract_inverted_index.29.5% | 158 |
| abstract_inverted_index.about | 153 |
| abstract_inverted_index.seven | 60 |
| abstract_inverted_index.task, | 7 |
| abstract_inverted_index.their | 91 |
| abstract_inverted_index.three | 99 |
| abstract_inverted_index.under | 98 |
| abstract_inverted_index.users | 14 |
| abstract_inverted_index.which | 8, 58, 146 |
| abstract_inverted_index.work, | 49 |
| abstract_inverted_index.across | 93, 161 |
| abstract_inverted_index.boosts | 148 |
| abstract_inverted_index.closes | 156 |
| abstract_inverted_index.covers | 59 |
| abstract_inverted_index.driven | 35 |
| abstract_inverted_index.reason | 125 |
| abstract_inverted_index.recent | 29 |
| abstract_inverted_index.reveal | 106 |
| abstract_inverted_index.schema | 141 |
| abstract_inverted_index.simple | 140 |
| abstract_inverted_index.(caused | 82 |
| abstract_inverted_index.Besides | 133 |
| abstract_inverted_index.French, | 64 |
| abstract_inverted_index.German, | 63 |
| abstract_inverted_index.becomes | 19 |
| abstract_inverted_index.between | 13 |
| abstract_inverted_index.dataset | 57 |
| abstract_inverted_index.dialect | 88 |
| abstract_inverted_index.further | 73 |
| abstract_inverted_index.greatly | 9 |
| abstract_inverted_index.largest | 54 |
| abstract_inverted_index.lexical | 76 |
| abstract_inverted_index.overall | 150 |
| abstract_inverted_index.parsing | 2 |
| abstract_inverted_index.present | 51 |
| abstract_inverted_index.propose | 138 |
| abstract_inverted_index.results | 97 |
| abstract_inverted_index.typical | 100 |
| abstract_inverted_index.Chinese, | 67 |
| abstract_inverted_index.English. | 46 |
| abstract_inverted_index.Spanish, | 65 |
| abstract_inverted_index.absolute | 109 |
| abstract_inverted_index.accuracy | 112 |
| abstract_inverted_index.analyses | 119 |
| abstract_inverted_index.centered | 44 |
| abstract_inverted_index.database | 17 |
| abstract_inverted_index.dataset, | 135 |
| abstract_inverted_index.identify | 74 |
| abstract_inverted_index.language | 85 |
| abstract_inverted_index.progress | 30 |
| abstract_inverted_index.sayings) | 89 |
| abstract_inverted_index.semantic | 1 |
| abstract_inverted_index.settings | 101 |
| abstract_inverted_index.specific | 84 |
| abstract_inverted_index.systems. | 27 |
| abstract_inverted_index.(English, | 62 |
| abstract_inverted_index.Japanese, | 66 |
| abstract_inverted_index.component | 22 |
| abstract_inverted_index.conducted | 121 |
| abstract_inverted_index.datasets, | 38 |
| abstract_inverted_index.different | 94 |
| abstract_inverted_index.framework | 143 |
| abstract_inverted_index.important | 5 |
| abstract_inverted_index.intensity | 92 |
| abstract_inverted_index.language. | 132 |
| abstract_inverted_index.languages | 61 |
| abstract_inverted_index.challenges | 79 |
| abstract_inverted_index.languages. | 95, 115, 162 |
| abstract_inverted_index.properties | 86 |
| abstract_inverted_index.structural | 78 |
| abstract_inverted_index.understand | 123 |
| abstract_inverted_index.(zero-shot, | 102 |
| abstract_inverted_index.Qualitative | 116 |
| abstract_inverted_index.Text-to-SQL | 0 |
| abstract_inverted_index.facilitates | 10 |
| abstract_inverted_index.interaction | 12, 26 |
| abstract_inverted_index.large-scale | 37 |
| abstract_inverted_index.monolingual | 103 |
| abstract_inverted_index.non-English | 114 |
| abstract_inverted_index.performance | 128, 151, 159 |
| abstract_inverted_index.text-to-SQL | 32, 56, 81 |
| abstract_inverted_index.Experimental | 96 |
| abstract_inverted_index.MultiSpider, | 52, 71 |
| abstract_inverted_index.Vietnamese). | 69 |
| abstract_inverted_index.augmentation | 142 |
| abstract_inverted_index.multilingual | 55 |
| abstract_inverted_index.quantitative | 118 |
| abstract_inverted_index.multilingual) | 105 |
| abstract_inverted_index.significantly | 147 |
| abstract_inverted_index.human-computer | 25 |
| abstract_inverted_index.(Schema-Augmentation-with-Verification), | 145 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.7900000214576721 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |