Building a Llama2-finetuned LLM for Odia Language Utilizing Domain Knowledge Instruction Set Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2312.12624
Building LLMs for languages other than English is in great demand due to the unavailability and performance of multilingual LLMs, such as understanding the local context. The problem is critical for low-resource languages due to the need for instruction sets. In a multilingual country like India, there is a need for LLMs supporting Indic languages to provide generative AI and LLM-based technologies and services to its citizens. This paper presents our approach of i) generating a large Odia instruction set, including domain knowledge data suitable for LLM fine-tuning, and ii) building a Llama2-finetuned model tailored for enhanced performance in the Odia domain. The proposed work will help researchers build an instruction set and LLM, particularly for Indic languages. We will release the model and instruction set for the public for research and noncommercial purposes.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2312.12624
- https://arxiv.org/pdf/2312.12624
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4390091832
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4390091832Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2312.12624Digital Object Identifier
- Title
-
Building a Llama2-finetuned LLM for Odia Language Utilizing Domain Knowledge Instruction SetWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-12-19Full publication date if available
- Authors
-
Guneet Singh Kohli, Shantipriya Parida, Sambit Sekhar, Samirit Saha, Nipun B Nair, Parul Agarwal, Sonal Khosla, Kusumlata Patiyal, Debasish DhalList of authors in order
- Landing page
-
https://arxiv.org/abs/2312.12624Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2312.12624Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2312.12624Direct OA link when available
- Concepts
-
Computer science, Unavailability, Context (archaeology), Set (abstract data type), Domain (mathematical analysis), Resource (disambiguation), Generative grammar, World Wide Web, Artificial intelligence, Programming language, Paleontology, Mathematical analysis, Mathematics, Engineering, Reliability engineering, Computer network, BiologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4390091832 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2312.12624 |
| ids.doi | https://doi.org/10.48550/arxiv.2312.12624 |
| ids.openalex | https://openalex.org/W4390091832 |
| fwci | |
| type | preprint |
| title | Building a Llama2-finetuned LLM for Odia Language Utilizing Domain Knowledge Instruction Set |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9896000027656555 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8081667423248291 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C2780505938 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7758490443229675 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q17093282 |
| concepts[1].display_name | Unavailability |
| concepts[2].id | https://openalex.org/C2779343474 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6267504096031189 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q3109175 |
| concepts[2].display_name | Context (archaeology) |
| concepts[3].id | https://openalex.org/C177264268 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6236777305603027 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[3].display_name | Set (abstract data type) |
| concepts[4].id | https://openalex.org/C36503486 |
| concepts[4].level | 2 |
| concepts[4].score | 0.6033401489257812 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11235244 |
| concepts[4].display_name | Domain (mathematical analysis) |
| concepts[5].id | https://openalex.org/C206345919 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5190668702125549 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q20380951 |
| concepts[5].display_name | Resource (disambiguation) |
| concepts[6].id | https://openalex.org/C39890363 |
| concepts[6].level | 2 |
| concepts[6].score | 0.48310375213623047 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q36108 |
| concepts[6].display_name | Generative grammar |
| concepts[7].id | https://openalex.org/C136764020 |
| concepts[7].level | 1 |
| concepts[7].score | 0.36196231842041016 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q466 |
| concepts[7].display_name | World Wide Web |
| concepts[8].id | https://openalex.org/C154945302 |
| concepts[8].level | 1 |
| concepts[8].score | 0.32328203320503235 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[8].display_name | Artificial intelligence |
| concepts[9].id | https://openalex.org/C199360897 |
| concepts[9].level | 1 |
| concepts[9].score | 0.18430182337760925 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[9].display_name | Programming language |
| concepts[10].id | https://openalex.org/C151730666 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q7205 |
| concepts[10].display_name | Paleontology |
| concepts[11].id | https://openalex.org/C134306372 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[11].display_name | Mathematical analysis |
| concepts[12].id | https://openalex.org/C33923547 |
| concepts[12].level | 0 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[12].display_name | Mathematics |
| concepts[13].id | https://openalex.org/C127413603 |
| concepts[13].level | 0 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[13].display_name | Engineering |
| concepts[14].id | https://openalex.org/C200601418 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q2193887 |
| concepts[14].display_name | Reliability engineering |
| concepts[15].id | https://openalex.org/C31258907 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q1301371 |
| concepts[15].display_name | Computer network |
| concepts[16].id | https://openalex.org/C86803240 |
| concepts[16].level | 0 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[16].display_name | Biology |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8081667423248291 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/unavailability |
| keywords[1].score | 0.7758490443229675 |
| keywords[1].display_name | Unavailability |
| keywords[2].id | https://openalex.org/keywords/context |
| keywords[2].score | 0.6267504096031189 |
| keywords[2].display_name | Context (archaeology) |
| keywords[3].id | https://openalex.org/keywords/set |
| keywords[3].score | 0.6236777305603027 |
| keywords[3].display_name | Set (abstract data type) |
| keywords[4].id | https://openalex.org/keywords/domain |
| keywords[4].score | 0.6033401489257812 |
| keywords[4].display_name | Domain (mathematical analysis) |
| keywords[5].id | https://openalex.org/keywords/resource |
| keywords[5].score | 0.5190668702125549 |
| keywords[5].display_name | Resource (disambiguation) |
| keywords[6].id | https://openalex.org/keywords/generative-grammar |
| keywords[6].score | 0.48310375213623047 |
| keywords[6].display_name | Generative grammar |
| keywords[7].id | https://openalex.org/keywords/world-wide-web |
| keywords[7].score | 0.36196231842041016 |
| keywords[7].display_name | World Wide Web |
| keywords[8].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[8].score | 0.32328203320503235 |
| keywords[8].display_name | Artificial intelligence |
| keywords[9].id | https://openalex.org/keywords/programming-language |
| keywords[9].score | 0.18430182337760925 |
| keywords[9].display_name | Programming language |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2312.12624 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by-nc-sa |
| locations[0].pdf_url | https://arxiv.org/pdf/2312.12624 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by-nc-sa |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2312.12624 |
| locations[1].id | doi:10.48550/arxiv.2312.12624 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2312.12624 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5083456192 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-4030-7253 |
| authorships[0].author.display_name | Guneet Singh Kohli |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Kohli, Guneet Singh |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5015497545 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-3387-6300 |
| authorships[1].author.display_name | Shantipriya Parida |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Parida, Shantipriya |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5102942268 |
| authorships[2].author.orcid | https://orcid.org/0009-0005-4698-8004 |
| authorships[2].author.display_name | Sambit Sekhar |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Sekhar, Sambit |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5050079364 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Samirit Saha |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Saha, Samirit |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5087242501 |
| authorships[4].author.orcid | https://orcid.org/0009-0000-7342-7732 |
| authorships[4].author.display_name | Nipun B Nair |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Nair, Nipun B |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5062423018 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-7297-335X |
| authorships[5].author.display_name | Parul Agarwal |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Agarwal, Parul |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5049576827 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-9325-1639 |
| authorships[6].author.display_name | Sonal Khosla |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Khosla, Sonal |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5093557688 |
| authorships[7].author.orcid | |
| authorships[7].author.display_name | Kusumlata Patiyal |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Patiyal, Kusumlata |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5093557689 |
| authorships[8].author.orcid | https://orcid.org/0009-0009-2217-8831 |
| authorships[8].author.display_name | Debasish Dhal |
| authorships[8].author_position | last |
| authorships[8].raw_author_name | Dhal, Debasish |
| authorships[8].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2312.12624 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2023-12-22T00:00:00 |
| display_name | Building a Llama2-finetuned LLM for Odia Language Utilizing Domain Knowledge Instruction Set |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9896000027656555 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W4237235066, https://openalex.org/W2026539069, https://openalex.org/W207884067, https://openalex.org/W3127016596, https://openalex.org/W2365973415, https://openalex.org/W3146085540, https://openalex.org/W1482423459, https://openalex.org/W2996457675, https://openalex.org/W2129918226, https://openalex.org/W843992174 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2312.12624 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by-nc-sa |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2312.12624 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by-nc-sa |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2312.12624 |
| primary_location.id | pmh:oai:arXiv.org:2312.12624 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by-nc-sa |
| primary_location.pdf_url | https://arxiv.org/pdf/2312.12624 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by-nc-sa |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2312.12624 |
| publication_date | 2023-12-19 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 41, 48, 75, 91 |
| abstract_inverted_index.AI | 58 |
| abstract_inverted_index.In | 40 |
| abstract_inverted_index.We | 118 |
| abstract_inverted_index.an | 109 |
| abstract_inverted_index.as | 21 |
| abstract_inverted_index.i) | 73 |
| abstract_inverted_index.in | 8, 98 |
| abstract_inverted_index.is | 7, 28, 47 |
| abstract_inverted_index.of | 17, 72 |
| abstract_inverted_index.to | 12, 34, 55, 64 |
| abstract_inverted_index.LLM | 86 |
| abstract_inverted_index.The | 26, 102 |
| abstract_inverted_index.and | 15, 59, 62, 88, 112, 123, 131 |
| abstract_inverted_index.due | 11, 33 |
| abstract_inverted_index.for | 2, 30, 37, 50, 85, 95, 115, 126, 129 |
| abstract_inverted_index.ii) | 89 |
| abstract_inverted_index.its | 65 |
| abstract_inverted_index.our | 70 |
| abstract_inverted_index.set | 111, 125 |
| abstract_inverted_index.the | 13, 23, 35, 99, 121, 127 |
| abstract_inverted_index.LLM, | 113 |
| abstract_inverted_index.LLMs | 1, 51 |
| abstract_inverted_index.Odia | 77, 100 |
| abstract_inverted_index.This | 67 |
| abstract_inverted_index.data | 83 |
| abstract_inverted_index.help | 106 |
| abstract_inverted_index.like | 44 |
| abstract_inverted_index.need | 36, 49 |
| abstract_inverted_index.set, | 79 |
| abstract_inverted_index.such | 20 |
| abstract_inverted_index.than | 5 |
| abstract_inverted_index.will | 105, 119 |
| abstract_inverted_index.work | 104 |
| abstract_inverted_index.Indic | 53, 116 |
| abstract_inverted_index.LLMs, | 19 |
| abstract_inverted_index.build | 108 |
| abstract_inverted_index.great | 9 |
| abstract_inverted_index.large | 76 |
| abstract_inverted_index.local | 24 |
| abstract_inverted_index.model | 93, 122 |
| abstract_inverted_index.other | 4 |
| abstract_inverted_index.paper | 68 |
| abstract_inverted_index.sets. | 39 |
| abstract_inverted_index.there | 46 |
| abstract_inverted_index.India, | 45 |
| abstract_inverted_index.demand | 10 |
| abstract_inverted_index.domain | 81 |
| abstract_inverted_index.public | 128 |
| abstract_inverted_index.English | 6 |
| abstract_inverted_index.country | 43 |
| abstract_inverted_index.domain. | 101 |
| abstract_inverted_index.problem | 27 |
| abstract_inverted_index.provide | 56 |
| abstract_inverted_index.release | 120 |
| abstract_inverted_index.Building | 0 |
| abstract_inverted_index.approach | 71 |
| abstract_inverted_index.building | 90 |
| abstract_inverted_index.context. | 25 |
| abstract_inverted_index.critical | 29 |
| abstract_inverted_index.enhanced | 96 |
| abstract_inverted_index.presents | 69 |
| abstract_inverted_index.proposed | 103 |
| abstract_inverted_index.research | 130 |
| abstract_inverted_index.services | 63 |
| abstract_inverted_index.suitable | 84 |
| abstract_inverted_index.tailored | 94 |
| abstract_inverted_index.LLM-based | 60 |
| abstract_inverted_index.citizens. | 66 |
| abstract_inverted_index.including | 80 |
| abstract_inverted_index.knowledge | 82 |
| abstract_inverted_index.languages | 3, 32, 54 |
| abstract_inverted_index.purposes. | 133 |
| abstract_inverted_index.generating | 74 |
| abstract_inverted_index.generative | 57 |
| abstract_inverted_index.languages. | 117 |
| abstract_inverted_index.supporting | 52 |
| abstract_inverted_index.instruction | 38, 78, 110, 124 |
| abstract_inverted_index.performance | 16, 97 |
| abstract_inverted_index.researchers | 107 |
| abstract_inverted_index.fine-tuning, | 87 |
| abstract_inverted_index.low-resource | 31 |
| abstract_inverted_index.multilingual | 18, 42 |
| abstract_inverted_index.particularly | 114 |
| abstract_inverted_index.technologies | 61 |
| abstract_inverted_index.noncommercial | 132 |
| abstract_inverted_index.understanding | 22 |
| abstract_inverted_index.unavailability | 14 |
| abstract_inverted_index.Llama2-finetuned | 92 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 9 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.5199999809265137 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |