ekzhu/datasketch: Introduce Lean MinHash and better documentation Article Swipe
Eric Y. Zhu
,
Vadim Markovtsev
,
fpug
·
YOU?
·
· 2017
· Open Access
·
· DOI: https://doi.org/10.5281/zenodo.399063
YOU?
·
· 2017
· Open Access
·
· DOI: https://doi.org/10.5281/zenodo.399063
LeanMinHash is a subclass of MinHash. It uses less memory and allows faster (de)serialization. See documentation for details. Removed serialize, deserialize, and bytesize methods from MinHash. These are supported in LeanMinHash instead. Serialized MinHash objects before this version will not be deserialized properly. To migrate see here. Documentation now have its own website!
Related Topics
Metadata
- Type
- article
- Language
- en
- Landing Page
- https://zenodo.org/record/399063
- OA Status
- green
- Related Works
- 20
- OpenAlex ID
- https://openalex.org/W3208189306
All OpenAlex metadata
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W3208189306Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.5281/zenodo.399063Digital Object Identifier
- Title
-
ekzhu/datasketch: Introduce Lean MinHash and better documentationWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2017Year of publication
- Publication date
-
2017-03-15Full publication date if available
- Authors
-
Eric Y. Zhu, Vadim Markovtsev, fpugList of authors in order
- Landing page
-
https://zenodo.org/record/399063Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://zenodo.org/record/399063Direct OA link when available
- Concepts
-
Documentation, Computer science, Programming languageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
20Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W3208189306 |
|---|---|
| doi | https://doi.org/10.5281/zenodo.399063 |
| ids.doi | https://doi.org/10.5281/zenodo.399063 |
| ids.mag | 3208189306 |
| ids.openalex | https://openalex.org/W3208189306 |
| fwci | |
| type | article |
| title | ekzhu/datasketch: Introduce Lean MinHash and better documentation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11719 |
| topics[0].field.id | https://openalex.org/fields/18 |
| topics[0].field.display_name | Decision Sciences |
| topics[0].score | 0.06909999996423721 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1803 |
| topics[0].subfield.display_name | Management Science and Operations Research |
| topics[0].display_name | Data Quality and Management |
| topics[1].id | https://openalex.org/T10538 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.06729999929666519 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1710 |
| topics[1].subfield.display_name | Information Systems |
| topics[1].display_name | Data Mining Algorithms and Applications |
| topics[2].id | https://openalex.org/T11891 |
| topics[2].field.id | https://openalex.org/fields/14 |
| topics[2].field.display_name | Business, Management and Accounting |
| topics[2].score | 0.06700000166893005 |
| topics[2].domain.id | https://openalex.org/domains/2 |
| topics[2].domain.display_name | Social Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1404 |
| topics[2].subfield.display_name | Management Information Systems |
| topics[2].display_name | Big Data and Business Intelligence |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C56666940 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7757523059844971 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q788790 |
| concepts[0].display_name | Documentation |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.37327057123184204 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C199360897 |
| concepts[2].level | 1 |
| concepts[2].score | 0.062037259340286255 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[2].display_name | Programming language |
| keywords[0].id | https://openalex.org/keywords/documentation |
| keywords[0].score | 0.7757523059844971 |
| keywords[0].display_name | Documentation |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.37327057123184204 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/programming-language |
| keywords[2].score | 0.062037259340286255 |
| keywords[2].display_name | Programming language |
| language | en |
| locations[0].id | pmh:oai:zenodo.org:399063 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400562 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | Zenodo (CERN European Organization for Nuclear Research) |
| locations[0].source.host_organization | https://openalex.org/I67311998 |
| locations[0].source.host_organization_name | European Organization for Nuclear Research |
| locations[0].source.host_organization_lineage | https://openalex.org/I67311998 |
| locations[0].license | other-oa |
| locations[0].pdf_url | |
| locations[0].version | submittedVersion |
| locations[0].raw_type | info:eu-repo/semantics/other |
| locations[0].license_id | https://openalex.org/licenses/other-oa |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://zenodo.org/record/399063 |
| locations[1].id | doi:10.5281/zenodo.399063 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400562 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | Zenodo (CERN European Organization for Nuclear Research) |
| locations[1].source.host_organization | https://openalex.org/I67311998 |
| locations[1].source.host_organization_name | European Organization for Nuclear Research |
| locations[1].source.host_organization_lineage | https://openalex.org/I67311998 |
| locations[1].license | other-oa |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/other-oa |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.5281/zenodo.399063 |
| indexed_in | datacite |
| authorships[0].author.id | https://openalex.org/A5111344861 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-6381-8809 |
| authorships[0].author.display_name | Eric Y. Zhu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Eric Zhu |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5007074945 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Vadim Markovtsev |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Vadim Markovtsev |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5027271781 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | fpug |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | fpug |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://zenodo.org/record/399063 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | ekzhu/datasketch: Introduce Lean MinHash and better documentation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11719 |
| primary_topic.field.id | https://openalex.org/fields/18 |
| primary_topic.field.display_name | Decision Sciences |
| primary_topic.score | 0.06909999996423721 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1803 |
| primary_topic.subfield.display_name | Management Science and Operations Research |
| primary_topic.display_name | Data Quality and Management |
| related_works | https://openalex.org/W2259198476, https://openalex.org/W2188056638, https://openalex.org/W3015628834, https://openalex.org/W3211279986, https://openalex.org/W3022210390, https://openalex.org/W174240912, https://openalex.org/W2785620487, https://openalex.org/W3117131235, https://openalex.org/W2795358005, https://openalex.org/W2734468690, https://openalex.org/W2743740449, https://openalex.org/W2954633088, https://openalex.org/W3180501944, https://openalex.org/W1597388299, https://openalex.org/W3114110932, https://openalex.org/W2883640129, https://openalex.org/W1529592439, https://openalex.org/W2000047634, https://openalex.org/W2898724861, https://openalex.org/W2083795309 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:zenodo.org:399063 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400562 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | Zenodo (CERN European Organization for Nuclear Research) |
| best_oa_location.source.host_organization | https://openalex.org/I67311998 |
| best_oa_location.source.host_organization_name | European Organization for Nuclear Research |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I67311998 |
| best_oa_location.license | other-oa |
| best_oa_location.pdf_url | |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | info:eu-repo/semantics/other |
| best_oa_location.license_id | https://openalex.org/licenses/other-oa |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://zenodo.org/record/399063 |
| primary_location.id | pmh:oai:zenodo.org:399063 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400562 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | Zenodo (CERN European Organization for Nuclear Research) |
| primary_location.source.host_organization | https://openalex.org/I67311998 |
| primary_location.source.host_organization_name | European Organization for Nuclear Research |
| primary_location.source.host_organization_lineage | https://openalex.org/I67311998 |
| primary_location.license | other-oa |
| primary_location.pdf_url | |
| primary_location.version | submittedVersion |
| primary_location.raw_type | info:eu-repo/semantics/other |
| primary_location.license_id | https://openalex.org/licenses/other-oa |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://zenodo.org/record/399063 |
| publication_date | 2017-03-15 |
| publication_year | 2017 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 2 |
| abstract_inverted_index.It | 6 |
| abstract_inverted_index.To | 43 |
| abstract_inverted_index.be | 40 |
| abstract_inverted_index.in | 29 |
| abstract_inverted_index.is | 1 |
| abstract_inverted_index.of | 4 |
| abstract_inverted_index.See | 14 |
| abstract_inverted_index.and | 10, 21 |
| abstract_inverted_index.are | 27 |
| abstract_inverted_index.for | 16 |
| abstract_inverted_index.its | 50 |
| abstract_inverted_index.not | 39 |
| abstract_inverted_index.now | 48 |
| abstract_inverted_index.own | 51 |
| abstract_inverted_index.see | 45 |
| abstract_inverted_index.from | 24 |
| abstract_inverted_index.have | 49 |
| abstract_inverted_index.less | 8 |
| abstract_inverted_index.this | 36 |
| abstract_inverted_index.uses | 7 |
| abstract_inverted_index.will | 38 |
| abstract_inverted_index.These | 26 |
| abstract_inverted_index.here. | 46 |
| abstract_inverted_index.allows | 11 |
| abstract_inverted_index.before | 35 |
| abstract_inverted_index.faster | 12 |
| abstract_inverted_index.memory | 9 |
| abstract_inverted_index.Removed | 18 |
| abstract_inverted_index.methods | 23 |
| abstract_inverted_index.migrate | 44 |
| abstract_inverted_index.objects | 34 |
| abstract_inverted_index.version | 37 |
| abstract_inverted_index.details. | 17 |
| abstract_inverted_index.instead. | 31 |
| abstract_inverted_index.subclass | 3 |
| abstract_inverted_index.website! | 52 |
| abstract_inverted_index.properly. | 42 |
| abstract_inverted_index.supported | 28 |
| abstract_inverted_index.Serialized | 32 |
| abstract_inverted_index.deserialized | 41 |
| abstract_inverted_index.Documentation | 47 |
| abstract_inverted_index.documentation | 15 |
| abstract_inverted_index.(de)serialization. | 13 |
| abstract_inverted_index.<code>MinHash</code> | 33 |
| abstract_inverted_index.<code>MinHash</code>. | 5, 25 |
| abstract_inverted_index.<code>bytesize</code> | 22 |
| abstract_inverted_index.<code>serialize</code>, | 19 |
| abstract_inverted_index.<code>LeanMinHash</code> | 0, 30 |
| abstract_inverted_index.<code>deserialize</code>, | 20 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |