Dataset or Not? A Study on the Veracity of Semantic Markup for Dataset Pages Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.1007/978-3-030-88361-4_20
Semantic markup, such as , allows providers on the Web to describe content using a shared controlled vocabulary. This markup is invaluable in enabling a broad range of applications, from vertical search engines, to rich snippets in search results, to actions on emails, to many others. In this paper, we focus on semantic markup for datasets, specifically in the context of developing a vertical search engine for datasets on the Web, Google’s Dataset Search. Dataset Search relies on to identify pages that describe datasets. While was the core enabling technology for this vertical search, we also discovered that we need to address the following problem: pages from 61% of internet hosts that provide markup do not actually describe datasets. We analyze the veracity of dataset markup for Dataset Search’s Web-scale corpus and categorize pages where this markup is not reliable. We then propose a way to drastically increase the quality of the dataset metadata corpus by developing a deep neural-network classifier that identifies whether or not a page with markup is a dataset page. Our classifier achieves 96.7% recall at the 95% precision point. This level of precision enables Dataset Search to circumvent the noise in semantic markup and to use the metadata to provide high quality results to users.
Related Topics
- Type
- book-chapter
- Language
- en
- Landing Page
- https://doi.org/10.1007/978-3-030-88361-4_20
- https://link.springer.com/content/pdf/10.1007%2F978-3-030-88361-4_20.pdf
- OA Status
- hybrid
- Cited By
- 7
- References
- 26
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W3197320265
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W3197320265Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1007/978-3-030-88361-4_20Digital Object Identifier
- Title
-
Dataset or Not? A Study on the Veracity of Semantic Markup for Dataset PagesWork title
- Type
-
book-chapterOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-01-01Full publication date if available
- Authors
-
Tarfah Alrashed, Dimitris Paparas, Omar Benjelloun, Ying Sheng, Natasha NoyList of authors in order
- Landing page
-
https://doi.org/10.1007/978-3-030-88361-4_20Publisher landing page
- PDF URL
-
https://link.springer.com/content/pdf/10.1007%2F978-3-030-88361-4_20.pdfDirect link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
hybridOpen access status per OpenAlex
- OA URL
-
https://link.springer.com/content/pdf/10.1007%2F978-3-030-88361-4_20.pdfDirect OA link when available
- Concepts
-
Computer science, Markup language, Information retrieval, Metadata, World Wide Web, HTML, The Internet, Artificial intelligence, XMLTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
7Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 2, 2024: 2, 2023: 2, 2021: 1Per-year citation counts (last 5 years)
- References (count)
-
26Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W3197320265 |
|---|---|
| doi | https://doi.org/10.1007/978-3-030-88361-4_20 |
| ids.doi | https://doi.org/10.1007/978-3-030-88361-4_20 |
| ids.mag | 3197320265 |
| ids.openalex | https://openalex.org/W3197320265 |
| fwci | 1.67890174 |
| type | book-chapter |
| title | Dataset or Not? A Study on the Veracity of Semantic Markup for Dataset Pages |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | 356 |
| biblio.first_page | 338 |
| topics[0].id | https://openalex.org/T10028 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9995999932289124 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Topic Modeling |
| topics[1].id | https://openalex.org/T12016 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9993000030517578 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1710 |
| topics[1].subfield.display_name | Information Systems |
| topics[1].display_name | Web Data Mining and Analysis |
| topics[2].id | https://openalex.org/T11719 |
| topics[2].field.id | https://openalex.org/fields/18 |
| topics[2].field.display_name | Decision Sciences |
| topics[2].score | 0.9965000152587891 |
| topics[2].domain.id | https://openalex.org/domains/2 |
| topics[2].domain.display_name | Social Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1803 |
| topics[2].subfield.display_name | Management Science and Operations Research |
| topics[2].display_name | Data Quality and Management |
| is_xpac | False |
| apc_list.value | 5000 |
| apc_list.currency | EUR |
| apc_list.value_usd | 5392 |
| apc_paid.value | 5000 |
| apc_paid.currency | EUR |
| apc_paid.value_usd | 5392 |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.8811239004135132 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C45874996 |
| concepts[1].level | 3 |
| concepts[1].score | 0.8222467303276062 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q37045 |
| concepts[1].display_name | Markup language |
| concepts[2].id | https://openalex.org/C23123220 |
| concepts[2].level | 1 |
| concepts[2].score | 0.698672890663147 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[2].display_name | Information retrieval |
| concepts[3].id | https://openalex.org/C93518851 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5882362723350525 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q180160 |
| concepts[3].display_name | Metadata |
| concepts[4].id | https://openalex.org/C136764020 |
| concepts[4].level | 1 |
| concepts[4].score | 0.48925089836120605 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q466 |
| concepts[4].display_name | World Wide Web |
| concepts[5].id | https://openalex.org/C138708601 |
| concepts[5].level | 3 |
| concepts[5].score | 0.44567206501960754 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q8811 |
| concepts[5].display_name | HTML |
| concepts[6].id | https://openalex.org/C110875604 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4102972447872162 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q75 |
| concepts[6].display_name | The Internet |
| concepts[7].id | https://openalex.org/C154945302 |
| concepts[7].level | 1 |
| concepts[7].score | 0.33805856108665466 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[7].display_name | Artificial intelligence |
| concepts[8].id | https://openalex.org/C8797682 |
| concepts[8].level | 2 |
| concepts[8].score | 0.1670530140399933 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q2115 |
| concepts[8].display_name | XML |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.8811239004135132 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/markup-language |
| keywords[1].score | 0.8222467303276062 |
| keywords[1].display_name | Markup language |
| keywords[2].id | https://openalex.org/keywords/information-retrieval |
| keywords[2].score | 0.698672890663147 |
| keywords[2].display_name | Information retrieval |
| keywords[3].id | https://openalex.org/keywords/metadata |
| keywords[3].score | 0.5882362723350525 |
| keywords[3].display_name | Metadata |
| keywords[4].id | https://openalex.org/keywords/world-wide-web |
| keywords[4].score | 0.48925089836120605 |
| keywords[4].display_name | World Wide Web |
| keywords[5].id | https://openalex.org/keywords/html |
| keywords[5].score | 0.44567206501960754 |
| keywords[5].display_name | HTML |
| keywords[6].id | https://openalex.org/keywords/the-internet |
| keywords[6].score | 0.4102972447872162 |
| keywords[6].display_name | The Internet |
| keywords[7].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[7].score | 0.33805856108665466 |
| keywords[7].display_name | Artificial intelligence |
| keywords[8].id | https://openalex.org/keywords/xml |
| keywords[8].score | 0.1670530140399933 |
| keywords[8].display_name | XML |
| language | en |
| locations[0].id | doi:10.1007/978-3-030-88361-4_20 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S106296714 |
| locations[0].source.issn | 0302-9743, 1611-3349 |
| locations[0].source.type | book series |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | 0302-9743 |
| locations[0].source.is_core | True |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Lecture notes in computer science |
| locations[0].source.host_organization | https://openalex.org/P4310319900 |
| locations[0].source.host_organization_name | Springer Science+Business Media |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310319900, https://openalex.org/P4310319965 |
| locations[0].source.host_organization_lineage_names | Springer Science+Business Media, Springer Nature |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://link.springer.com/content/pdf/10.1007%2F978-3-030-88361-4_20.pdf |
| locations[0].version | publishedVersion |
| locations[0].raw_type | book-chapter |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Lecture Notes in Computer Science |
| locations[0].landing_page_url | https://doi.org/10.1007/978-3-030-88361-4_20 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5008176418 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Tarfah Alrashed |
| authorships[0].affiliations[0].raw_affiliation_string | CSAIL, MIT, Cambridge, USA |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Tarfah Alrashed |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | CSAIL, MIT, Cambridge, USA |
| authorships[1].author.id | https://openalex.org/A5084443795 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Dimitris Paparas |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I1291425158 |
| authorships[1].affiliations[0].raw_affiliation_string | Google Research, Google, New York, USA |
| authorships[1].institutions[0].id | https://openalex.org/I1291425158 |
| authorships[1].institutions[0].ror | https://ror.org/00njsd438 |
| authorships[1].institutions[0].type | company |
| authorships[1].institutions[0].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | Google (United States) |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Dimitris Paparas |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Google Research, Google, New York, USA |
| authorships[2].author.id | https://openalex.org/A5103212061 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-4173-7709 |
| authorships[2].author.display_name | Omar Benjelloun |
| authorships[2].countries | US |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I1291425158 |
| authorships[2].affiliations[0].raw_affiliation_string | Google Research, Google, New York, USA |
| authorships[2].institutions[0].id | https://openalex.org/I1291425158 |
| authorships[2].institutions[0].ror | https://ror.org/00njsd438 |
| authorships[2].institutions[0].type | company |
| authorships[2].institutions[0].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[2].institutions[0].country_code | US |
| authorships[2].institutions[0].display_name | Google (United States) |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Omar Benjelloun |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | Google Research, Google, New York, USA |
| authorships[3].author.id | https://openalex.org/A5102900861 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-4561-2097 |
| authorships[3].author.display_name | Ying Sheng |
| authorships[3].countries | US |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I1291425158 |
| authorships[3].affiliations[0].raw_affiliation_string | Google Research, Google, New York, USA |
| authorships[3].institutions[0].id | https://openalex.org/I1291425158 |
| authorships[3].institutions[0].ror | https://ror.org/00njsd438 |
| authorships[3].institutions[0].type | company |
| authorships[3].institutions[0].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[3].institutions[0].country_code | US |
| authorships[3].institutions[0].display_name | Google (United States) |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Ying Sheng |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | Google Research, Google, New York, USA |
| authorships[4].author.id | https://openalex.org/A5041421536 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-7437-0624 |
| authorships[4].author.display_name | Natasha Noy |
| authorships[4].countries | US |
| authorships[4].affiliations[0].institution_ids | https://openalex.org/I1291425158 |
| authorships[4].affiliations[0].raw_affiliation_string | Google Research, Google, New York, USA |
| authorships[4].institutions[0].id | https://openalex.org/I1291425158 |
| authorships[4].institutions[0].ror | https://ror.org/00njsd438 |
| authorships[4].institutions[0].type | company |
| authorships[4].institutions[0].lineage | https://openalex.org/I1291425158, https://openalex.org/I4210128969 |
| authorships[4].institutions[0].country_code | US |
| authorships[4].institutions[0].display_name | Google (United States) |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Natasha Noy |
| authorships[4].is_corresponding | False |
| authorships[4].raw_affiliation_strings | Google Research, Google, New York, USA |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://link.springer.com/content/pdf/10.1007%2F978-3-030-88361-4_20.pdf |
| open_access.oa_status | hybrid |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Dataset or Not? A Study on the Veracity of Semantic Markup for Dataset Pages |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10028 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9995999932289124 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Topic Modeling |
| related_works | https://openalex.org/W2042562985, https://openalex.org/W183683573, https://openalex.org/W2561743279, https://openalex.org/W2139931245, https://openalex.org/W3021385460, https://openalex.org/W2477306097, https://openalex.org/W1566994962, https://openalex.org/W2178182010, https://openalex.org/W2362437884, https://openalex.org/W1559346900 |
| cited_by_count | 7 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 2 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 2 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 2 |
| counts_by_year[3].year | 2021 |
| counts_by_year[3].cited_by_count | 1 |
| locations_count | 1 |
| best_oa_location.id | doi:10.1007/978-3-030-88361-4_20 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S106296714 |
| best_oa_location.source.issn | 0302-9743, 1611-3349 |
| best_oa_location.source.type | book series |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | 0302-9743 |
| best_oa_location.source.is_core | True |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Lecture notes in computer science |
| best_oa_location.source.host_organization | https://openalex.org/P4310319900 |
| best_oa_location.source.host_organization_name | Springer Science+Business Media |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310319900, https://openalex.org/P4310319965 |
| best_oa_location.source.host_organization_lineage_names | Springer Science+Business Media, Springer Nature |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://link.springer.com/content/pdf/10.1007%2F978-3-030-88361-4_20.pdf |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | book-chapter |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Lecture Notes in Computer Science |
| best_oa_location.landing_page_url | https://doi.org/10.1007/978-3-030-88361-4_20 |
| primary_location.id | doi:10.1007/978-3-030-88361-4_20 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S106296714 |
| primary_location.source.issn | 0302-9743, 1611-3349 |
| primary_location.source.type | book series |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | 0302-9743 |
| primary_location.source.is_core | True |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Lecture notes in computer science |
| primary_location.source.host_organization | https://openalex.org/P4310319900 |
| primary_location.source.host_organization_name | Springer Science+Business Media |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310319900, https://openalex.org/P4310319965 |
| primary_location.source.host_organization_lineage_names | Springer Science+Business Media, Springer Nature |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://link.springer.com/content/pdf/10.1007%2F978-3-030-88361-4_20.pdf |
| primary_location.version | publishedVersion |
| primary_location.raw_type | book-chapter |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Lecture Notes in Computer Science |
| primary_location.landing_page_url | https://doi.org/10.1007/978-3-030-88361-4_20 |
| publication_date | 2021-01-01 |
| publication_year | 2021 |
| referenced_works | https://openalex.org/W2152805927, https://openalex.org/W3108941002, https://openalex.org/W2140730091, https://openalex.org/W2969723769, https://openalex.org/W2950860947, https://openalex.org/W2166601255, https://openalex.org/W2196674927, https://openalex.org/W2055886766, https://openalex.org/W2914479823, https://openalex.org/W2610332888, https://openalex.org/W2084975492, https://openalex.org/W3081176230, https://openalex.org/W2160024037, https://openalex.org/W4255399413, https://openalex.org/W2926805670, https://openalex.org/W2166706824, https://openalex.org/W2059586463, https://openalex.org/W2157583440, https://openalex.org/W2982150889, https://openalex.org/W2799037506, https://openalex.org/W3006502835, https://openalex.org/W2964091842, https://openalex.org/W3099461227, https://openalex.org/W3102654612, https://openalex.org/W109245773, https://openalex.org/W2971262600 |
| referenced_works_count | 26 |
| abstract_inverted_index., | 5 |
| abstract_inverted_index.a | 15, 25, 63, 144, 158, 167, 172 |
| abstract_inverted_index.In | 47 |
| abstract_inverted_index.We | 120, 141 |
| abstract_inverted_index.as | 4 |
| abstract_inverted_index.at | 180 |
| abstract_inverted_index.by | 156 |
| abstract_inverted_index.do | 115 |
| abstract_inverted_index.in | 23, 37, 58, 196 |
| abstract_inverted_index.is | 21, 138, 171 |
| abstract_inverted_index.of | 28, 61, 109, 124, 151, 187 |
| abstract_inverted_index.on | 8, 42, 52, 69, 78 |
| abstract_inverted_index.or | 165 |
| abstract_inverted_index.to | 11, 34, 40, 44, 79, 101, 146, 192, 200, 204, 209 |
| abstract_inverted_index.we | 50, 95, 99 |
| abstract_inverted_index.61% | 108 |
| abstract_inverted_index.95% | 182 |
| abstract_inverted_index.Our | 175 |
| abstract_inverted_index.Web | 10 |
| abstract_inverted_index.and | 132, 199 |
| abstract_inverted_index.for | 55, 67, 91, 127 |
| abstract_inverted_index.not | 116, 139, 166 |
| abstract_inverted_index.the | 9, 59, 70, 87, 103, 122, 149, 152, 181, 194, 202 |
| abstract_inverted_index.use | 201 |
| abstract_inverted_index.was | 86 |
| abstract_inverted_index.way | 145 |
| abstract_inverted_index.This | 19, 185 |
| abstract_inverted_index.Web, | 71 |
| abstract_inverted_index.also | 96 |
| abstract_inverted_index.core | 88 |
| abstract_inverted_index.deep | 159 |
| abstract_inverted_index.from | 30, 107 |
| abstract_inverted_index.high | 206 |
| abstract_inverted_index.many | 45 |
| abstract_inverted_index.need | 100 |
| abstract_inverted_index.page | 168 |
| abstract_inverted_index.rich | 35 |
| abstract_inverted_index.such | 3 |
| abstract_inverted_index.that | 82, 98, 112, 162 |
| abstract_inverted_index.then | 142 |
| abstract_inverted_index.this | 48, 92, 136 |
| abstract_inverted_index.with | 169 |
| abstract_inverted_index.96.7% | 178 |
| abstract_inverted_index.While | 85 |
| abstract_inverted_index.broad | 26 |
| abstract_inverted_index.focus | 51 |
| abstract_inverted_index.hosts | 111 |
| abstract_inverted_index.level | 186 |
| abstract_inverted_index.noise | 195 |
| abstract_inverted_index.page. | 174 |
| abstract_inverted_index.pages | 81, 106, 134 |
| abstract_inverted_index.range | 27 |
| abstract_inverted_index.using | 14 |
| abstract_inverted_index.where | 135 |
| abstract_inverted_index.Search | 76, 191 |
| abstract_inverted_index.allows | 6 |
| abstract_inverted_index.corpus | 131, 155 |
| abstract_inverted_index.engine | 66 |
| abstract_inverted_index.markup | 20, 54, 114, 126, 137, 170, 198 |
| abstract_inverted_index.paper, | 49 |
| abstract_inverted_index.point. | 184 |
| abstract_inverted_index.recall | 179 |
| abstract_inverted_index.relies | 77 |
| abstract_inverted_index.search | 32, 38, 65 |
| abstract_inverted_index.shared | 16 |
| abstract_inverted_index.users. | 210 |
| abstract_inverted_index.Dataset | 73, 75, 128, 190 |
| abstract_inverted_index.Search. | 74 |
| abstract_inverted_index.actions | 41 |
| abstract_inverted_index.address | 102 |
| abstract_inverted_index.analyze | 121 |
| abstract_inverted_index.content | 13 |
| abstract_inverted_index.context | 60 |
| abstract_inverted_index.dataset | 125, 153, 173 |
| abstract_inverted_index.emails, | 43 |
| abstract_inverted_index.enables | 189 |
| abstract_inverted_index.markup, | 2 |
| abstract_inverted_index.others. | 46 |
| abstract_inverted_index.propose | 143 |
| abstract_inverted_index.provide | 113, 205 |
| abstract_inverted_index.quality | 150, 207 |
| abstract_inverted_index.results | 208 |
| abstract_inverted_index.search, | 94 |
| abstract_inverted_index.whether | 164 |
| abstract_inverted_index.Abstract | 0 |
| abstract_inverted_index.Semantic | 1 |
| abstract_inverted_index.achieves | 177 |
| abstract_inverted_index.actually | 117 |
| abstract_inverted_index.datasets | 68 |
| abstract_inverted_index.describe | 12, 83, 118 |
| abstract_inverted_index.enabling | 24, 89 |
| abstract_inverted_index.engines, | 33 |
| abstract_inverted_index.identify | 80 |
| abstract_inverted_index.increase | 148 |
| abstract_inverted_index.internet | 110 |
| abstract_inverted_index.metadata | 154, 203 |
| abstract_inverted_index.problem: | 105 |
| abstract_inverted_index.results, | 39 |
| abstract_inverted_index.semantic | 53, 197 |
| abstract_inverted_index.snippets | 36 |
| abstract_inverted_index.veracity | 123 |
| abstract_inverted_index.vertical | 31, 64, 93 |
| abstract_inverted_index.Web-scale | 130 |
| abstract_inverted_index.datasets, | 56 |
| abstract_inverted_index.datasets. | 84, 119 |
| abstract_inverted_index.following | 104 |
| abstract_inverted_index.precision | 183, 188 |
| abstract_inverted_index.providers | 7 |
| abstract_inverted_index.reliable. | 140 |
| abstract_inverted_index.Google’s | 72 |
| abstract_inverted_index.Search’s | 129 |
| abstract_inverted_index.categorize | 133 |
| abstract_inverted_index.circumvent | 193 |
| abstract_inverted_index.classifier | 161, 176 |
| abstract_inverted_index.controlled | 17 |
| abstract_inverted_index.developing | 62, 157 |
| abstract_inverted_index.discovered | 97 |
| abstract_inverted_index.identifies | 163 |
| abstract_inverted_index.invaluable | 22 |
| abstract_inverted_index.technology | 90 |
| abstract_inverted_index.drastically | 147 |
| abstract_inverted_index.vocabulary. | 18 |
| abstract_inverted_index.specifically | 57 |
| abstract_inverted_index.applications, | 29 |
| abstract_inverted_index.neural-network | 160 |
| cited_by_percentile_year.max | 97 |
| cited_by_percentile_year.min | 89 |
| countries_distinct_count | 1 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile.value | 0.86728811 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |