Machine learning and rule-based embedding techniques for classifying text documents Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.1007/s13198-024-02555-w
Rapid expansion of electronic document archives and the proliferation of online information have made it incredibly difficult to categorize text documents. Classification helps in information retrieval from a conceptual framework. This study addresses the challenge of efficiently categorizing text documents amidst the vast electronic document landscape. Employing machine learning models and a novel document categorization method, W2vRule, we compare its performance with traditional methods. Emphasizing the importance of tuning hyperparameters for optimal performance, the research recommends the W2vRule, a word-to-vector rule-based framework, for improved association-based text classification. The study used the Reuters Newswire dataset. Findings show that W2vRule and machine learning can effectively tell apart important categories. Rule-based approaches perform better than Naive Bayes, BayesNet, Decision Tables, and others in terms of performance metrics.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.1007/s13198-024-02555-w
- OA Status
- hybrid
- Cited By
- 1
- References
- 67
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4403730193
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4403730193Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1007/s13198-024-02555-wDigital Object Identifier
- Title
-
Machine learning and rule-based embedding techniques for classifying text documentsWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-10-24Full publication date if available
- Authors
-
Asmaa M. Aubaid, Alok Mishra, Atul MishraList of authors in order
- Landing page
-
https://doi.org/10.1007/s13198-024-02555-wPublisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
hybridOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.1007/s13198-024-02555-wDirect OA link when available
- Concepts
-
Computer science, Artificial intelligence, Embedding, Natural language processing, Machine learning, Information retrievalTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- References (count)
-
67Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4403730193 |
|---|---|
| doi | https://doi.org/10.1007/s13198-024-02555-w |
| ids.doi | https://doi.org/10.1007/s13198-024-02555-w |
| ids.openalex | https://openalex.org/W4403730193 |
| fwci | 0.63877855 |
| type | article |
| title | Machine learning and rule-based embedding techniques for classifying text documents |
| biblio.issue | 12 |
| biblio.volume | 15 |
| biblio.last_page | 5652 |
| biblio.first_page | 5637 |
| topics[0].id | https://openalex.org/T11550 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9997000098228455 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Text and Document Classification Technologies |
| topics[1].id | https://openalex.org/T13083 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.998199999332428 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Advanced Text Analysis Techniques |
| topics[2].id | https://openalex.org/T10028 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9973000288009644 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Topic Modeling |
| is_xpac | False |
| apc_list.value | 2390 |
| apc_list.currency | EUR |
| apc_list.value_usd | 2990 |
| apc_paid.value | 2390 |
| apc_paid.currency | EUR |
| apc_paid.value_usd | 2990 |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.5912510752677917 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C154945302 |
| concepts[1].level | 1 |
| concepts[1].score | 0.5855090022087097 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[1].display_name | Artificial intelligence |
| concepts[2].id | https://openalex.org/C41608201 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5242650508880615 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q980509 |
| concepts[2].display_name | Embedding |
| concepts[3].id | https://openalex.org/C204321447 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5024051666259766 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[3].display_name | Natural language processing |
| concepts[4].id | https://openalex.org/C119857082 |
| concepts[4].level | 1 |
| concepts[4].score | 0.4326993227005005 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[4].display_name | Machine learning |
| concepts[5].id | https://openalex.org/C23123220 |
| concepts[5].level | 1 |
| concepts[5].score | 0.4031020998954773 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[5].display_name | Information retrieval |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.5912510752677917 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[1].score | 0.5855090022087097 |
| keywords[1].display_name | Artificial intelligence |
| keywords[2].id | https://openalex.org/keywords/embedding |
| keywords[2].score | 0.5242650508880615 |
| keywords[2].display_name | Embedding |
| keywords[3].id | https://openalex.org/keywords/natural-language-processing |
| keywords[3].score | 0.5024051666259766 |
| keywords[3].display_name | Natural language processing |
| keywords[4].id | https://openalex.org/keywords/machine-learning |
| keywords[4].score | 0.4326993227005005 |
| keywords[4].display_name | Machine learning |
| keywords[5].id | https://openalex.org/keywords/information-retrieval |
| keywords[5].score | 0.4031020998954773 |
| keywords[5].display_name | Information retrieval |
| language | en |
| locations[0].id | doi:10.1007/s13198-024-02555-w |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S40280859 |
| locations[0].source.issn | 0975-6809, 0976-4348 |
| locations[0].source.type | journal |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | 0975-6809 |
| locations[0].source.is_core | True |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | International Journal of Systems Assurance Engineering and Management |
| locations[0].source.host_organization | https://openalex.org/P4310319900 |
| locations[0].source.host_organization_name | Springer Science+Business Media |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310319900, https://openalex.org/P4310319965 |
| locations[0].source.host_organization_lineage_names | Springer Science+Business Media, Springer Nature |
| locations[0].license | cc-by |
| locations[0].pdf_url | |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | International Journal of System Assurance Engineering and Management |
| locations[0].landing_page_url | https://doi.org/10.1007/s13198-024-02555-w |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5016252221 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Asmaa M. Aubaid |
| authorships[0].countries | IQ |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I4210128557 |
| authorships[0].affiliations[0].raw_affiliation_string | Ministry of Higher Education and Scientific Research/Science and Technology, Baghdad/Al-Jadriya, Iraq |
| authorships[0].institutions[0].id | https://openalex.org/I4210128557 |
| authorships[0].institutions[0].ror | https://ror.org/03m5ehy13 |
| authorships[0].institutions[0].type | government |
| authorships[0].institutions[0].lineage | https://openalex.org/I4210128557 |
| authorships[0].institutions[0].country_code | IQ |
| authorships[0].institutions[0].display_name | Ministry of Higher Education and Scientific Research |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Asmaa M. Aubaid |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Ministry of Higher Education and Scientific Research/Science and Technology, Baghdad/Al-Jadriya, Iraq |
| authorships[1].author.id | https://openalex.org/A5100660168 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-1275-2050 |
| authorships[1].author.display_name | Alok Mishra |
| authorships[1].countries | NO |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I204778367 |
| authorships[1].affiliations[0].raw_affiliation_string | Faculty of Engineering, Norwegian University of Science and Technology, Trondheim, Norway |
| authorships[1].institutions[0].id | https://openalex.org/I204778367 |
| authorships[1].institutions[0].ror | https://ror.org/05xg72x27 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I204778367 |
| authorships[1].institutions[0].country_code | NO |
| authorships[1].institutions[0].display_name | Norwegian University of Science and Technology |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Alok Mishra |
| authorships[1].is_corresponding | True |
| authorships[1].raw_affiliation_strings | Faculty of Engineering, Norwegian University of Science and Technology, Trondheim, Norway |
| authorships[2].author.id | https://openalex.org/A5103019262 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-8995-0429 |
| authorships[2].author.display_name | Atul Mishra |
| authorships[2].countries | IN |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I1323093577 |
| authorships[2].affiliations[0].raw_affiliation_string | BML Munjal University, Kapriwas, India |
| authorships[2].institutions[0].id | https://openalex.org/I1323093577 |
| authorships[2].institutions[0].ror | https://ror.org/058ay3j75 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I1323093577 |
| authorships[2].institutions[0].country_code | IN |
| authorships[2].institutions[0].display_name | BML Munjal University |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Atul Mishra |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | BML Munjal University, Kapriwas, India |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.1007/s13198-024-02555-w |
| open_access.oa_status | hybrid |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Machine learning and rule-based embedding techniques for classifying text documents |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T11550 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9997000098228455 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Text and Document Classification Technologies |
| related_works | https://openalex.org/W2961085424, https://openalex.org/W4306674287, https://openalex.org/W3046775127, https://openalex.org/W3107602296, https://openalex.org/W4394896187, https://openalex.org/W3170094116, https://openalex.org/W4386462264, https://openalex.org/W4364306694, https://openalex.org/W4312192474, https://openalex.org/W4283697347 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1007/s13198-024-02555-w |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S40280859 |
| best_oa_location.source.issn | 0975-6809, 0976-4348 |
| best_oa_location.source.type | journal |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | 0975-6809 |
| best_oa_location.source.is_core | True |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | International Journal of Systems Assurance Engineering and Management |
| best_oa_location.source.host_organization | https://openalex.org/P4310319900 |
| best_oa_location.source.host_organization_name | Springer Science+Business Media |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310319900, https://openalex.org/P4310319965 |
| best_oa_location.source.host_organization_lineage_names | Springer Science+Business Media, Springer Nature |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | International Journal of System Assurance Engineering and Management |
| best_oa_location.landing_page_url | https://doi.org/10.1007/s13198-024-02555-w |
| primary_location.id | doi:10.1007/s13198-024-02555-w |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S40280859 |
| primary_location.source.issn | 0975-6809, 0976-4348 |
| primary_location.source.type | journal |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | 0975-6809 |
| primary_location.source.is_core | True |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | International Journal of Systems Assurance Engineering and Management |
| primary_location.source.host_organization | https://openalex.org/P4310319900 |
| primary_location.source.host_organization_name | Springer Science+Business Media |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310319900, https://openalex.org/P4310319965 |
| primary_location.source.host_organization_lineage_names | Springer Science+Business Media, Springer Nature |
| primary_location.license | cc-by |
| primary_location.pdf_url | |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | International Journal of System Assurance Engineering and Management |
| primary_location.landing_page_url | https://doi.org/10.1007/s13198-024-02555-w |
| publication_date | 2024-10-24 |
| publication_year | 2024 |
| referenced_works | https://openalex.org/W4289329149, https://openalex.org/W3035394311, https://openalex.org/W3210781772, https://openalex.org/W4223916699, https://openalex.org/W2901643192, https://openalex.org/W1949819223, https://openalex.org/W2117332520, https://openalex.org/W2493916176, https://openalex.org/W2802209325, https://openalex.org/W4380090443, https://openalex.org/W2800318991, https://openalex.org/W2622418746, https://openalex.org/W2517396745, https://openalex.org/W3094758902, https://openalex.org/W2041557121, https://openalex.org/W4213115401, https://openalex.org/W2133990480, https://openalex.org/W3040010257, https://openalex.org/W2096866608, https://openalex.org/W2963626623, https://openalex.org/W1966927117, https://openalex.org/W2150747245, https://openalex.org/W2250189634, https://openalex.org/W1985303292, https://openalex.org/W3081425653, https://openalex.org/W4232663847, https://openalex.org/W4292237320, https://openalex.org/W3121257501, https://openalex.org/W1723619723, https://openalex.org/W4386379388, https://openalex.org/W2888611496, https://openalex.org/W3134349173, https://openalex.org/W1998257453, https://openalex.org/W4254931243, https://openalex.org/W1965154800, https://openalex.org/W3156333129, https://openalex.org/W2755854422, https://openalex.org/W3195777109, https://openalex.org/W6608868369, https://openalex.org/W2135813353, https://openalex.org/W2560070550, https://openalex.org/W2944152053, https://openalex.org/W46219046, https://openalex.org/W2252466151, https://openalex.org/W2250539671, https://openalex.org/W3126380714, https://openalex.org/W3129990883, https://openalex.org/W2168812139, https://openalex.org/W2341234495, https://openalex.org/W2118020653, https://openalex.org/W4221057175, https://openalex.org/W2057455558, https://openalex.org/W3082362580, https://openalex.org/W3039232647, https://openalex.org/W2519564115, https://openalex.org/W3007294284, https://openalex.org/W4210827551, https://openalex.org/W2070996757, https://openalex.org/W2168540192, https://openalex.org/W2123504579, https://openalex.org/W2018605192, https://openalex.org/W4390057677, https://openalex.org/W2915395710, https://openalex.org/W3199748762, https://openalex.org/W3102363003, https://openalex.org/W4285142800, https://openalex.org/W4212962367 |
| referenced_works_count | 67 |
| abstract_inverted_index.a | 28, 52, 79 |
| abstract_inverted_index.in | 24, 120 |
| abstract_inverted_index.it | 15 |
| abstract_inverted_index.of | 3, 10, 36, 68, 122 |
| abstract_inverted_index.to | 18 |
| abstract_inverted_index.we | 58 |
| abstract_inverted_index.The | 88 |
| abstract_inverted_index.and | 7, 51, 99, 118 |
| abstract_inverted_index.can | 102 |
| abstract_inverted_index.for | 71, 83 |
| abstract_inverted_index.its | 60 |
| abstract_inverted_index.the | 8, 34, 42, 66, 74, 77, 91 |
| abstract_inverted_index.This | 31 |
| abstract_inverted_index.from | 27 |
| abstract_inverted_index.have | 13 |
| abstract_inverted_index.made | 14 |
| abstract_inverted_index.show | 96 |
| abstract_inverted_index.tell | 104 |
| abstract_inverted_index.text | 20, 39, 86 |
| abstract_inverted_index.than | 112 |
| abstract_inverted_index.that | 97 |
| abstract_inverted_index.used | 90 |
| abstract_inverted_index.vast | 43 |
| abstract_inverted_index.with | 62 |
| abstract_inverted_index.Naive | 113 |
| abstract_inverted_index.Rapid | 1 |
| abstract_inverted_index.apart | 105 |
| abstract_inverted_index.helps | 23 |
| abstract_inverted_index.novel | 53 |
| abstract_inverted_index.study | 32, 89 |
| abstract_inverted_index.terms | 121 |
| abstract_inverted_index.Bayes, | 114 |
| abstract_inverted_index.amidst | 41 |
| abstract_inverted_index.better | 111 |
| abstract_inverted_index.models | 50 |
| abstract_inverted_index.online | 11 |
| abstract_inverted_index.others | 119 |
| abstract_inverted_index.tuning | 69 |
| abstract_inverted_index.Reuters | 92 |
| abstract_inverted_index.Tables, | 117 |
| abstract_inverted_index.W2vRule | 98 |
| abstract_inverted_index.compare | 59 |
| abstract_inverted_index.machine | 48, 100 |
| abstract_inverted_index.method, | 56 |
| abstract_inverted_index.optimal | 72 |
| abstract_inverted_index.perform | 110 |
| abstract_inverted_index.Abstract | 0 |
| abstract_inverted_index.Decision | 116 |
| abstract_inverted_index.Findings | 95 |
| abstract_inverted_index.Newswire | 93 |
| abstract_inverted_index.W2vRule, | 57, 78 |
| abstract_inverted_index.archives | 6 |
| abstract_inverted_index.dataset. | 94 |
| abstract_inverted_index.document | 5, 45, 54 |
| abstract_inverted_index.improved | 84 |
| abstract_inverted_index.learning | 49, 101 |
| abstract_inverted_index.methods. | 64 |
| abstract_inverted_index.metrics. | 124 |
| abstract_inverted_index.research | 75 |
| abstract_inverted_index.BayesNet, | 115 |
| abstract_inverted_index.Employing | 47 |
| abstract_inverted_index.addresses | 33 |
| abstract_inverted_index.challenge | 35 |
| abstract_inverted_index.difficult | 17 |
| abstract_inverted_index.documents | 40 |
| abstract_inverted_index.expansion | 2 |
| abstract_inverted_index.important | 106 |
| abstract_inverted_index.retrieval | 26 |
| abstract_inverted_index.Rule-based | 108 |
| abstract_inverted_index.approaches | 109 |
| abstract_inverted_index.categorize | 19 |
| abstract_inverted_index.conceptual | 29 |
| abstract_inverted_index.documents. | 21 |
| abstract_inverted_index.electronic | 4, 44 |
| abstract_inverted_index.framework, | 82 |
| abstract_inverted_index.framework. | 30 |
| abstract_inverted_index.importance | 67 |
| abstract_inverted_index.incredibly | 16 |
| abstract_inverted_index.landscape. | 46 |
| abstract_inverted_index.recommends | 76 |
| abstract_inverted_index.rule-based | 81 |
| abstract_inverted_index.Emphasizing | 65 |
| abstract_inverted_index.categories. | 107 |
| abstract_inverted_index.effectively | 103 |
| abstract_inverted_index.efficiently | 37 |
| abstract_inverted_index.information | 12, 25 |
| abstract_inverted_index.performance | 61, 123 |
| abstract_inverted_index.traditional | 63 |
| abstract_inverted_index.categorizing | 38 |
| abstract_inverted_index.performance, | 73 |
| abstract_inverted_index.proliferation | 9 |
| abstract_inverted_index.Classification | 22 |
| abstract_inverted_index.categorization | 55 |
| abstract_inverted_index.word-to-vector | 80 |
| abstract_inverted_index.classification. | 87 |
| abstract_inverted_index.hyperparameters | 70 |
| abstract_inverted_index.association-based | 85 |
| cited_by_percentile_year.max | 95 |
| cited_by_percentile_year.min | 91 |
| corresponding_author_ids | https://openalex.org/A5100660168 |
| countries_distinct_count | 3 |
| institutions_distinct_count | 3 |
| corresponding_institution_ids | https://openalex.org/I204778367 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/16 |
| sustainable_development_goals[0].score | 0.5899999737739563 |
| sustainable_development_goals[0].display_name | Peace, Justice and strong institutions |
| citation_normalized_percentile.value | 0.72260288 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |