LLMs for Closed-Library Multi-Document Query, Test Generation, and Evaluation Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.21203/rs.3.rs-5038817/v1
Learning complex, detailed, and evolving knowledge is a challenge in multiple technical professions. Relevant source knowledge is contained within many large documents and information sources with frequent updates to these documents. Knowledge tests need to be generated on new material and existing tests revised, tracking knowledge base updates. Large Language Models (LLMs) provide a framework for artificial intelligence-assisted knowledge acquisition and continued learning. Retrieval-Augmented Generation (RAG) provides a framework to leverage available, trained LLMs combined with technical area-specific knowledge bases. Herein, two methods are introduced, which together enable effective implementation of LLM-RAG question-answering on large documents. Additionally, the AI tools for knowledge intensive tasks (AIKIT) solution is presented for working with numerous documents for training and continuing education. AIKIT is provided as a containerized open source solution that deploys on standalone, high performance, and cloud systems. AIKIT includes LLM, RAG, vector stores, relational database with a Ruby on Rails web interface.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://doi.org/10.21203/rs.3.rs-5038817/v1
- OA Status
- gold
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4402324443
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4402324443Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.21203/rs.3.rs-5038817/v1Digital Object Identifier
- Title
-
LLMs for Closed-Library Multi-Document Query, Test Generation, and EvaluationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-09-06Full publication date if available
- Authors
-
Maj Claire Randolph, Adam Michaleas, Darrell RickeList of authors in order
- Landing page
-
https://doi.org/10.21203/rs.3.rs-5038817/v1Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.21203/rs.3.rs-5038817/v1Direct OA link when available
- Concepts
-
Computer science, Leverage (statistics), Knowledge base, World Wide Web, Test (biology), Information retrieval, Knowledge management, Data science, Artificial intelligence, Biology, PaleontologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4402324443 |
|---|---|
| doi | https://doi.org/10.21203/rs.3.rs-5038817/v1 |
| ids.doi | https://doi.org/10.21203/rs.3.rs-5038817/v1 |
| ids.openalex | https://openalex.org/W4402324443 |
| fwci | 0.0 |
| type | preprint |
| title | LLMs for Closed-Library Multi-Document Query, Test Generation, and Evaluation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10181 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9460999965667725 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Natural Language Processing Techniques |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9459999799728394 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7429558634757996 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C153083717 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6511533856391907 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q6535263 |
| concepts[1].display_name | Leverage (statistics) |
| concepts[2].id | https://openalex.org/C4554734 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5698257684707642 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q593744 |
| concepts[2].display_name | Knowledge base |
| concepts[3].id | https://openalex.org/C136764020 |
| concepts[3].level | 1 |
| concepts[3].score | 0.49757102131843567 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q466 |
| concepts[3].display_name | World Wide Web |
| concepts[4].id | https://openalex.org/C2777267654 |
| concepts[4].level | 2 |
| concepts[4].score | 0.4699176847934723 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q3519023 |
| concepts[4].display_name | Test (biology) |
| concepts[5].id | https://openalex.org/C23123220 |
| concepts[5].level | 1 |
| concepts[5].score | 0.4247959852218628 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[5].display_name | Information retrieval |
| concepts[6].id | https://openalex.org/C56739046 |
| concepts[6].level | 1 |
| concepts[6].score | 0.37235182523727417 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q192060 |
| concepts[6].display_name | Knowledge management |
| concepts[7].id | https://openalex.org/C2522767166 |
| concepts[7].level | 1 |
| concepts[7].score | 0.328795850276947 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q2374463 |
| concepts[7].display_name | Data science |
| concepts[8].id | https://openalex.org/C154945302 |
| concepts[8].level | 1 |
| concepts[8].score | 0.20123931765556335 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[8].display_name | Artificial intelligence |
| concepts[9].id | https://openalex.org/C86803240 |
| concepts[9].level | 0 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[9].display_name | Biology |
| concepts[10].id | https://openalex.org/C151730666 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q7205 |
| concepts[10].display_name | Paleontology |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7429558634757996 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/leverage |
| keywords[1].score | 0.6511533856391907 |
| keywords[1].display_name | Leverage (statistics) |
| keywords[2].id | https://openalex.org/keywords/knowledge-base |
| keywords[2].score | 0.5698257684707642 |
| keywords[2].display_name | Knowledge base |
| keywords[3].id | https://openalex.org/keywords/world-wide-web |
| keywords[3].score | 0.49757102131843567 |
| keywords[3].display_name | World Wide Web |
| keywords[4].id | https://openalex.org/keywords/test |
| keywords[4].score | 0.4699176847934723 |
| keywords[4].display_name | Test (biology) |
| keywords[5].id | https://openalex.org/keywords/information-retrieval |
| keywords[5].score | 0.4247959852218628 |
| keywords[5].display_name | Information retrieval |
| keywords[6].id | https://openalex.org/keywords/knowledge-management |
| keywords[6].score | 0.37235182523727417 |
| keywords[6].display_name | Knowledge management |
| keywords[7].id | https://openalex.org/keywords/data-science |
| keywords[7].score | 0.328795850276947 |
| keywords[7].display_name | Data science |
| keywords[8].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[8].score | 0.20123931765556335 |
| keywords[8].display_name | Artificial intelligence |
| language | en |
| locations[0].id | doi:10.21203/rs.3.rs-5038817/v1 |
| locations[0].is_oa | True |
| locations[0].source | |
| locations[0].license | cc-by |
| locations[0].pdf_url | |
| locations[0].version | acceptedVersion |
| locations[0].raw_type | posted-content |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://doi.org/10.21203/rs.3.rs-5038817/v1 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5107046954 |
| authorships[0].author.orcid | https://orcid.org/0009-0005-7421-2705 |
| authorships[0].author.display_name | Maj Claire Randolph |
| authorships[0].countries | CA |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I4210164862 |
| authorships[0].affiliations[0].raw_affiliation_string | Department of the Air Force, Artificial Intelligence Accelerator |
| authorships[0].institutions[0].id | https://openalex.org/I4210164862 |
| authorships[0].institutions[0].ror | https://ror.org/05p590m36 |
| authorships[0].institutions[0].type | company |
| authorships[0].institutions[0].lineage | https://openalex.org/I4210164862 |
| authorships[0].institutions[0].country_code | CA |
| authorships[0].institutions[0].display_name | Artificial Intelligence in Medicine (Canada) |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Maj Claire Randolph |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Department of the Air Force, Artificial Intelligence Accelerator |
| authorships[1].author.id | https://openalex.org/A5016624503 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-7402-8303 |
| authorships[1].author.display_name | Adam Michaleas |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I4210122954 |
| authorships[1].affiliations[0].raw_affiliation_string | MIT Lincoln Laboratory |
| authorships[1].institutions[0].id | https://openalex.org/I4210122954 |
| authorships[1].institutions[0].ror | https://ror.org/022z6jk58 |
| authorships[1].institutions[0].type | facility |
| authorships[1].institutions[0].lineage | https://openalex.org/I4210122954, https://openalex.org/I63966007 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | MIT Lincoln Laboratory |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Adam Michaleas |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | MIT Lincoln Laboratory |
| authorships[2].author.id | https://openalex.org/A5031850946 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-2842-2809 |
| authorships[2].author.display_name | Darrell Ricke |
| authorships[2].countries | US |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I4210122954 |
| authorships[2].affiliations[0].raw_affiliation_string | MIT Lincoln Laboratory |
| authorships[2].institutions[0].id | https://openalex.org/I4210122954 |
| authorships[2].institutions[0].ror | https://ror.org/022z6jk58 |
| authorships[2].institutions[0].type | facility |
| authorships[2].institutions[0].lineage | https://openalex.org/I4210122954, https://openalex.org/I63966007 |
| authorships[2].institutions[0].country_code | US |
| authorships[2].institutions[0].display_name | MIT Lincoln Laboratory |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Darrell O. Ricke |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | MIT Lincoln Laboratory |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | http://doi.org/10.21203/rs.3.rs-5038817/v1 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | LLMs for Closed-Library Multi-Document Query, Test Generation, and Evaluation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10181 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9460999965667725 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Natural Language Processing Techniques |
| related_works | https://openalex.org/W2787993192, https://openalex.org/W2158269427, https://openalex.org/W4381280689, https://openalex.org/W2847365777, https://openalex.org/W3128025644, https://openalex.org/W2355048207, https://openalex.org/W641782856, https://openalex.org/W2750422482, https://openalex.org/W3125827053, https://openalex.org/W2920521957 |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.21203/rs.3.rs-5038817/v1 |
| best_oa_location.is_oa | True |
| best_oa_location.source | |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | |
| best_oa_location.version | acceptedVersion |
| best_oa_location.raw_type | posted-content |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://doi.org/10.21203/rs.3.rs-5038817/v1 |
| primary_location.id | doi:10.21203/rs.3.rs-5038817/v1 |
| primary_location.is_oa | True |
| primary_location.source | |
| primary_location.license | cc-by |
| primary_location.pdf_url | |
| primary_location.version | acceptedVersion |
| primary_location.raw_type | posted-content |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://doi.org/10.21203/rs.3.rs-5038817/v1 |
| publication_date | 2024-09-06 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 8, 54, 68, 123, 146 |
| abstract_inverted_index.AI | 99 |
| abstract_inverted_index.as | 122 |
| abstract_inverted_index.be | 36 |
| abstract_inverted_index.in | 10 |
| abstract_inverted_index.is | 7, 17, 107, 120 |
| abstract_inverted_index.of | 91 |
| abstract_inverted_index.on | 38, 94, 130, 148 |
| abstract_inverted_index.to | 29, 35, 70 |
| abstract_inverted_index.and | 4, 23, 41, 61, 116, 134 |
| abstract_inverted_index.are | 84 |
| abstract_inverted_index.for | 56, 101, 109, 114 |
| abstract_inverted_index.new | 39 |
| abstract_inverted_index.the | 98 |
| abstract_inverted_index.two | 82 |
| abstract_inverted_index.web | 150 |
| abstract_inverted_index.LLM, | 139 |
| abstract_inverted_index.LLMs | 74 |
| abstract_inverted_index.RAG, | 140 |
| abstract_inverted_index.Ruby | 147 |
| abstract_inverted_index.base | 47 |
| abstract_inverted_index.high | 132 |
| abstract_inverted_index.many | 20 |
| abstract_inverted_index.need | 34 |
| abstract_inverted_index.open | 125 |
| abstract_inverted_index.that | 128 |
| abstract_inverted_index.with | 26, 76, 111, 145 |
| abstract_inverted_index.(RAG) | 66 |
| abstract_inverted_index.AIKIT | 119, 137 |
| abstract_inverted_index.Large | 49 |
| abstract_inverted_index.Rails | 149 |
| abstract_inverted_index.cloud | 135 |
| abstract_inverted_index.large | 21, 95 |
| abstract_inverted_index.tasks | 104 |
| abstract_inverted_index.tests | 33, 43 |
| abstract_inverted_index.these | 30 |
| abstract_inverted_index.tools | 100 |
| abstract_inverted_index.which | 86 |
| abstract_inverted_index.(LLMs) | 52 |
| abstract_inverted_index.Models | 51 |
| abstract_inverted_index.bases. | 80 |
| abstract_inverted_index.enable | 88 |
| abstract_inverted_index.source | 15, 126 |
| abstract_inverted_index.vector | 141 |
| abstract_inverted_index.within | 19 |
| abstract_inverted_index.(AIKIT) | 105 |
| abstract_inverted_index.Herein, | 81 |
| abstract_inverted_index.LLM-RAG | 92 |
| abstract_inverted_index.deploys | 129 |
| abstract_inverted_index.methods | 83 |
| abstract_inverted_index.provide | 53 |
| abstract_inverted_index.sources | 25 |
| abstract_inverted_index.stores, | 142 |
| abstract_inverted_index.trained | 73 |
| abstract_inverted_index.updates | 28 |
| abstract_inverted_index.working | 110 |
| abstract_inverted_index.Language | 50 |
| abstract_inverted_index.Learning | 1 |
| abstract_inverted_index.Relevant | 14 |
| abstract_inverted_index.combined | 75 |
| abstract_inverted_index.complex, | 2 |
| abstract_inverted_index.database | 144 |
| abstract_inverted_index.evolving | 5 |
| abstract_inverted_index.existing | 42 |
| abstract_inverted_index.frequent | 27 |
| abstract_inverted_index.includes | 138 |
| abstract_inverted_index.leverage | 71 |
| abstract_inverted_index.material | 40 |
| abstract_inverted_index.multiple | 11 |
| abstract_inverted_index.numerous | 112 |
| abstract_inverted_index.provided | 121 |
| abstract_inverted_index.provides | 67 |
| abstract_inverted_index.revised, | 44 |
| abstract_inverted_index.solution | 106, 127 |
| abstract_inverted_index.systems. | 136 |
| abstract_inverted_index.together | 87 |
| abstract_inverted_index.tracking | 45 |
| abstract_inverted_index.training | 115 |
| abstract_inverted_index.updates. | 48 |
| abstract_inverted_index.Knowledge | 32 |
| abstract_inverted_index.challenge | 9 |
| abstract_inverted_index.contained | 18 |
| abstract_inverted_index.continued | 62 |
| abstract_inverted_index.detailed, | 3 |
| abstract_inverted_index.documents | 22, 113 |
| abstract_inverted_index.effective | 89 |
| abstract_inverted_index.framework | 55, 69 |
| abstract_inverted_index.generated | 37 |
| abstract_inverted_index.intensive | 103 |
| abstract_inverted_index.knowledge | 6, 16, 46, 59, 79, 102 |
| abstract_inverted_index.learning. | 63 |
| abstract_inverted_index.presented | 108 |
| abstract_inverted_index.technical | 12, 77 |
| abstract_inverted_index.Generation | 65 |
| abstract_inverted_index.artificial | 57 |
| abstract_inverted_index.available, | 72 |
| abstract_inverted_index.continuing | 117 |
| abstract_inverted_index.documents. | 31, 96 |
| abstract_inverted_index.education. | 118 |
| abstract_inverted_index.interface. | 151 |
| abstract_inverted_index.relational | 143 |
| abstract_inverted_index.acquisition | 60 |
| abstract_inverted_index.information | 24 |
| abstract_inverted_index.introduced, | 85 |
| abstract_inverted_index.standalone, | 131 |
| abstract_inverted_index.performance, | 133 |
| abstract_inverted_index.professions. | 13 |
| abstract_inverted_index.Additionally, | 97 |
| abstract_inverted_index.area-specific | 78 |
| abstract_inverted_index.containerized | 124 |
| abstract_inverted_index.implementation | 90 |
| abstract_inverted_index.question-answering | 93 |
| abstract_inverted_index.Retrieval-Augmented | 64 |
| abstract_inverted_index.intelligence-assisted | 58 |
| abstract_inverted_index.<title>Abstract</title> | 0 |
| cited_by_percentile_year | |
| countries_distinct_count | 2 |
| institutions_distinct_count | 3 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.6200000047683716 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile.value | 0.15636226 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |