TURSpider: A Turkish Text-to-SQL Dataset and LLM-Based Study Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.1109/access.2024.3498841
This paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, cross-domain SQL datasets for the Turkish language. TURSpider incorporates a wide range of query difficulties, including nested queries, to create a comprehensive benchmark for Turkish Text-to-SQL tasks. The dataset enables cross-language comparison and significantly enhances the training and evaluation of large language models (LLMs) in generating SQL queries from Turkish natural language inputs. We fine-tuned several Turkish-supported LLMs on TURSpider and evaluated their performance in comparison to state-of-the-art models like GPT-3.5 Turbo and GPT-4. Our results show that fine-tuned Turkish LLMs demonstrate competitive performance, with one model even surpassing GPT-based models on execution accuracy. We also apply the Chain-of-Feedback (CoF) methodology to further improve model performance, demonstrating its effectiveness across multiple LLMs. This work provides a valuable resource for Turkish NLP and addresses specific challenges in developing accurate Text-to-SQL models for low-resource languages.
Related Topics To Compare & Contrast
- Type
- article
- Language
- en
- Landing Page
- http://doi.org/10.1109/access.2024.3498841
- OA Status
- gold
- Cited By
- 1
- References
- 31
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4404411228