Recognizing and Extracting Cybersecurtity-relevant Entities from Text Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2208.01693
Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI. We have created an initial unstructured CTI corpus from a variety of open sources that we are using to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities. We also describe methods to apply cybersecurity domain entity linking with existing world knowledge from Wikidata. Our future work will survey and test spaCy NLP tools and create methods for continuous integration of new information extracted from text.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2208.01693
- https://arxiv.org/pdf/2208.01693
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4299406457
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4299406457Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2208.01693Digital Object Identifier
- Title
-
Recognizing and Extracting Cybersecurtity-relevant Entities from TextWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-08-02Full publication date if available
- Authors
-
Casey Hanks, Michael M. Maiden, Priyanka Ranade, Tim Finin, Anupam JoshiList of authors in order
- Landing page
-
https://arxiv.org/abs/2208.01693Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2208.01693Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2208.01693Direct OA link when available
- Concepts
-
Computer science, Variety (cybernetics), Domain (mathematical analysis), Test (biology), Computer security, Data science, Cyber threats, Domain knowledge, Artificial intelligence, Biology, Mathematical analysis, Paleontology, MathematicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2022: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4299406457 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2208.01693 |
| ids.doi | https://doi.org/10.48550/arxiv.2208.01693 |
| ids.openalex | https://openalex.org/W4299406457 |
| fwci | 0.4831543 |
| type | preprint |
| title | Recognizing and Extracting Cybersecurtity-relevant Entities from Text |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11147 |
| topics[0].field.id | https://openalex.org/fields/33 |
| topics[0].field.display_name | Social Sciences |
| topics[0].score | 0.98089998960495 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/3312 |
| topics[0].subfield.display_name | Sociology and Political Science |
| topics[0].display_name | Misinformation and Its Impacts |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9629999995231628 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| topics[2].id | https://openalex.org/T10994 |
| topics[2].field.id | https://openalex.org/fields/33 |
| topics[2].field.display_name | Social Sciences |
| topics[2].score | 0.9549999833106995 |
| topics[2].domain.id | https://openalex.org/domains/2 |
| topics[2].domain.display_name | Social Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/3312 |
| topics[2].subfield.display_name | Sociology and Political Science |
| topics[2].display_name | Terrorism, Counterterrorism, and Political Violence |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8117119073867798 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C136197465 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6357672214508057 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q1729295 |
| concepts[1].display_name | Variety (cybernetics) |
| concepts[2].id | https://openalex.org/C36503486 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5992723703384399 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11235244 |
| concepts[2].display_name | Domain (mathematical analysis) |
| concepts[3].id | https://openalex.org/C2777267654 |
| concepts[3].level | 2 |
| concepts[3].score | 0.4515399932861328 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q3519023 |
| concepts[3].display_name | Test (biology) |
| concepts[4].id | https://openalex.org/C38652104 |
| concepts[4].level | 1 |
| concepts[4].score | 0.4475364685058594 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q3510521 |
| concepts[4].display_name | Computer security |
| concepts[5].id | https://openalex.org/C2522767166 |
| concepts[5].level | 1 |
| concepts[5].score | 0.43415531516075134 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q2374463 |
| concepts[5].display_name | Data science |
| concepts[6].id | https://openalex.org/C3018725008 |
| concepts[6].level | 2 |
| concepts[6].score | 0.43091216683387756 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q4071928 |
| concepts[6].display_name | Cyber threats |
| concepts[7].id | https://openalex.org/C207685749 |
| concepts[7].level | 2 |
| concepts[7].score | 0.41912710666656494 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q2088941 |
| concepts[7].display_name | Domain knowledge |
| concepts[8].id | https://openalex.org/C154945302 |
| concepts[8].level | 1 |
| concepts[8].score | 0.3193439841270447 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[8].display_name | Artificial intelligence |
| concepts[9].id | https://openalex.org/C86803240 |
| concepts[9].level | 0 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[9].display_name | Biology |
| concepts[10].id | https://openalex.org/C134306372 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[10].display_name | Mathematical analysis |
| concepts[11].id | https://openalex.org/C151730666 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q7205 |
| concepts[11].display_name | Paleontology |
| concepts[12].id | https://openalex.org/C33923547 |
| concepts[12].level | 0 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[12].display_name | Mathematics |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8117119073867798 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/variety |
| keywords[1].score | 0.6357672214508057 |
| keywords[1].display_name | Variety (cybernetics) |
| keywords[2].id | https://openalex.org/keywords/domain |
| keywords[2].score | 0.5992723703384399 |
| keywords[2].display_name | Domain (mathematical analysis) |
| keywords[3].id | https://openalex.org/keywords/test |
| keywords[3].score | 0.4515399932861328 |
| keywords[3].display_name | Test (biology) |
| keywords[4].id | https://openalex.org/keywords/computer-security |
| keywords[4].score | 0.4475364685058594 |
| keywords[4].display_name | Computer security |
| keywords[5].id | https://openalex.org/keywords/data-science |
| keywords[5].score | 0.43415531516075134 |
| keywords[5].display_name | Data science |
| keywords[6].id | https://openalex.org/keywords/cyber-threats |
| keywords[6].score | 0.43091216683387756 |
| keywords[6].display_name | Cyber threats |
| keywords[7].id | https://openalex.org/keywords/domain-knowledge |
| keywords[7].score | 0.41912710666656494 |
| keywords[7].display_name | Domain knowledge |
| keywords[8].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[8].score | 0.3193439841270447 |
| keywords[8].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2208.01693 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2208.01693 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2208.01693 |
| locations[1].id | doi:10.48550/arxiv.2208.01693 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article-journal |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2208.01693 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5050964474 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Casey Hanks |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Hanks, Casey |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5102493314 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Michael M. Maiden |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Maiden, Michael |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5090051188 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-3859-5356 |
| authorships[2].author.display_name | Priyanka Ranade |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Ranade, Priyanka |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5009972149 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Tim Finin |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Finin, Tim |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5020975010 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-8641-3193 |
| authorships[4].author.display_name | Anupam Joshi |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Joshi, Anupam |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2208.01693 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Recognizing and Extracting Cybersecurtity-relevant Entities from Text |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11147 |
| primary_topic.field.id | https://openalex.org/fields/33 |
| primary_topic.field.display_name | Social Sciences |
| primary_topic.score | 0.98089998960495 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/3312 |
| primary_topic.subfield.display_name | Sociology and Political Science |
| primary_topic.display_name | Misinformation and Its Impacts |
| related_works | https://openalex.org/W2032233321, https://openalex.org/W3121970507, https://openalex.org/W2110028391, https://openalex.org/W54497855, https://openalex.org/W217960748, https://openalex.org/W3125814499, https://openalex.org/W1583422155, https://openalex.org/W1649619740, https://openalex.org/W3213252596, https://openalex.org/W1534006406 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2022 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2208.01693 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2208.01693 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2208.01693 |
| primary_location.id | pmh:oai:arXiv.org:2208.01693 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2208.01693 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2208.01693 |
| publication_date | 2022-08-02 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 32, 63 |
| abstract_inverted_index.We | 54, 92 |
| abstract_inverted_index.an | 57 |
| abstract_inverted_index.as | 16, 25 |
| abstract_inverted_index.is | 4, 13, 31 |
| abstract_inverted_index.of | 65, 124 |
| abstract_inverted_index.to | 35, 39, 45, 72, 87, 96 |
| abstract_inverted_index.we | 69 |
| abstract_inverted_index.CTI | 60 |
| abstract_inverted_index.NLP | 116 |
| abstract_inverted_index.Our | 108 |
| abstract_inverted_index.and | 10, 12, 47, 74, 83, 113, 118 |
| abstract_inverted_index.are | 70 |
| abstract_inverted_index.for | 19, 121 |
| abstract_inverted_index.new | 125 |
| abstract_inverted_index.the | 80 |
| abstract_inverted_index.CTI. | 53 |
| abstract_inverted_index.also | 93 |
| abstract_inverted_index.data | 18 |
| abstract_inverted_index.from | 52, 62, 106, 128 |
| abstract_inverted_index.have | 55 |
| abstract_inverted_index.need | 34 |
| abstract_inverted_index.open | 66 |
| abstract_inverted_index.such | 24 |
| abstract_inverted_index.test | 75, 114 |
| abstract_inverted_index.that | 68 |
| abstract_inverted_index.used | 15 |
| abstract_inverted_index.will | 111 |
| abstract_inverted_index.with | 102 |
| abstract_inverted_index.work | 110 |
| abstract_inverted_index.(CTI) | 3 |
| abstract_inverted_index.Cyber | 0 |
| abstract_inverted_index.There | 30 |
| abstract_inverted_index.apply | 97 |
| abstract_inverted_index.cyber | 21 |
| abstract_inverted_index.often | 14 |
| abstract_inverted_index.spaCy | 81, 115 |
| abstract_inverted_index.text. | 129 |
| abstract_inverted_index.tools | 117 |
| abstract_inverted_index.train | 40, 73 |
| abstract_inverted_index.using | 71, 79 |
| abstract_inverted_index.world | 104 |
| abstract_inverted_index.(CKG). | 29 |
| abstract_inverted_index.Graphs | 28 |
| abstract_inverted_index.Threat | 1 |
| abstract_inverted_index.corpus | 61 |
| abstract_inverted_index.create | 119 |
| abstract_inverted_index.domain | 99 |
| abstract_inverted_index.entity | 77, 100 |
| abstract_inverted_index.future | 109 |
| abstract_inverted_index.models | 78 |
| abstract_inverted_index.strong | 33 |
| abstract_inverted_index.survey | 112 |
| abstract_inverted_index.threat | 7 |
| abstract_inverted_index.attacks | 11 |
| abstract_inverted_index.created | 56 |
| abstract_inverted_index.defense | 22 |
| abstract_inverted_index.develop | 36 |
| abstract_inverted_index.extract | 49 |
| abstract_inverted_index.initial | 58 |
| abstract_inverted_index.linking | 101 |
| abstract_inverted_index.methods | 86, 95, 120 |
| abstract_inverted_index.sources | 67 |
| abstract_inverted_index.systems | 23 |
| abstract_inverted_index.variety | 64 |
| abstract_inverted_index.AI-based | 20, 42 |
| abstract_inverted_index.datasets | 38 |
| abstract_inverted_index.describe | 94 |
| abstract_inverted_index.existing | 41, 103 |
| abstract_inverted_index.insights | 51 |
| abstract_inverted_index.training | 17 |
| abstract_inverted_index.vectors, | 8 |
| abstract_inverted_index.Knowledge | 27 |
| abstract_inverted_index.Wikidata. | 107 |
| abstract_inverted_index.entities. | 91 |
| abstract_inverted_index.exploring | 84 |
| abstract_inverted_index.extracted | 127 |
| abstract_inverted_index.framework | 82 |
| abstract_inverted_index.knowledge | 105 |
| abstract_inverted_index.pipelines | 44 |
| abstract_inverted_index.recognize | 89 |
| abstract_inverted_index.accurately | 48 |
| abstract_inverted_index.continuous | 122 |
| abstract_inverted_index.describing | 6 |
| abstract_inverted_index.meaningful | 50 |
| abstract_inverted_index.efficiently | 46 |
| abstract_inverted_index.information | 5, 126 |
| abstract_inverted_index.integration | 123 |
| abstract_inverted_index.Intelligence | 2 |
| abstract_inverted_index.unstructured | 59 |
| abstract_inverted_index.Cybersecurity | 26 |
| abstract_inverted_index.automatically | 88 |
| abstract_inverted_index.cybersecurity | 43, 76, 90, 98 |
| abstract_inverted_index.self-learning | 85 |
| abstract_inverted_index.vulnerabilities, | 9 |
| abstract_inverted_index.community-accessible | 37 |
| cited_by_percentile_year.max | 94 |
| cited_by_percentile_year.min | 89 |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile.value | 0.70800428 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |