Roderic Page
YOU?
Author Swipe
View article: Make Data Count Kaggle Competition
Make Data Count Kaggle Competition Open
I've written several times here about the Make Data Count project and its major output to date, the Data Citation Corpus, currently at version 4 (see The fourth release of the Data Citation Corpus incorporates data citations from Europe PM…
View article: How many times are DNA barcoding datasets cited?
How many times are DNA barcoding datasets cited? Open
This note accompanies a dataset that I uploaded to Zenodo (https://doi.org/10.5281/zenodo.15824274). My goal in creating this dataset is to link data created on the Barcode of Life Data Systems to the DOIs for those datasets, and then to l…
View article: A metabarcoding mess and the importance of just looking at the data
A metabarcoding mess and the importance of just looking at the data Open
Here I summarise a few posts on Bluesky where I raised concerns about some metadabarcoding datasets that were highlighted by GBIF: Looking at these datasets it's clear that something is wrong. Data The datasets discussed are for CO1 Amplic…
View article: Tracking changes in DNA barcode BINs
Tracking changes in DNA barcode BINs Open
Following on from releasing BOLD View I've started to explore how the classifcation of DNA barcodes changes over time. BOLD uses the RESL algorithm described in Ratnasingham & Hebert (2013, 2016) to cluster barcodes into "BINs". As the…
View article: Future interfaces for the Biodiversity Heritage Library
Future interfaces for the Biodiversity Heritage Library Open
On Wednesday this week (April 9th, 2025) I gave a talk entitled "Future interface(s) for BHL" (the slides are on FigShare) at BHL Day 2025.
View article: BOLD View: exploring DNA barcodes
BOLD View: exploring DNA barcodes Open
For a while now I've been exploring ways to navigate through DNA barcodes. Over the years I've built various "toys" to explore barcodes, such as Displaying a million DNA barcodes on Google Maps using CouchDB, built a small scale browser us…
View article: Internet Archive as a single point of failure
Internet Archive as a single point of failure Open
How to cite: Page, R. (2024). Internet Archive as a single point of failure https://doi.org/10.59350/1r3m1-c5e22 Just a placeholder to mark the ongoing impact of the Internet Archive being attacked (see here, here and here for details). …
View article: Exploring BOLD's DNA barcode data releases: there's a fraction too much friction
Exploring BOLD's DNA barcode data releases: there's a fraction too much friction Open
How to cite: Page, R. (2024). Exploring BOLD's DNA barcode data releases: there's a fraction too much friction https://doi.org/10.59350/6qepn-ge510 Recently I've been exploring data downloaded from BOLD. Part of this was motivated by wor…
View article: The Data Citation Corpus revisited
The Data Citation Corpus revisited Open
How to cite: Page, R. (2024). The Data Citation Corpus revisited https://doi.org/10.59350/wvwva-v7125 TL;DR These are some brief notes on the latest version (v. 2) of the Data Citation Corpus, relased shortly before the Make Data Count S…
View article: Who is Doing Taxonomy, Whereabouts, and Who Is Funding Them? A Practical Test of What Knowledge Graphs Can Tell Us about Taxonomic Research
Who is Doing Taxonomy, Whereabouts, and Who Is Funding Them? A Practical Test of What Knowledge Graphs Can Tell Us about Taxonomic Research Open
What is the current state of taxonomy? Quentin Wheeler on his podcast "Species Hall of Fame" fears for taxonomy's future, whereas Lucas Joppa and colleagues have famously argued that we've never had so many taxonomists as we do now (Joppa …
View article: Why do museum and gallery displays ignore the web?
Why do museum and gallery displays ignore the web? Open
How to cite: Page, R. (2024). Why do museum and gallery displays ignore the web? https://doi.org/10.59350/a83tn-c6t14 This post is inspired by the Pharaoh exhibition at the NGV in Melbourne, Australia. This is a beautifully displayed exh…
View article: A future for the Biodiversity Heritage Library
A future for the Biodiversity Heritage Library Open
How to cite: Page, R. (2024). A future for the Biodiversity Heritage Library https://doi.org/10.59350/n3dkt-6xd05 Following the 2024 BHL meeting, and the departure of Martin Kalfatovic and the uncertainty the departure of such a pivitol …
View article: Visualising big trees: a talk at the Systematics Association 2024
Visualising big trees: a talk at the Systematics Association 2024 Open
How to cite: Page, R. (2024). Visualising big trees: a talk at the Systematics Association 2024 https://doi.org/10.59350/cf6n4-ch767 This blog post has some notes in support of a talk given to the Systematics Association meeting in Readi…
View article: Nanopubs, a way to create even more silos
Nanopubs, a way to create even more silos Open
How to cite: Page, R. (2024). Nanopubs, a way to create even more silos https://doi.org/10.59350/6nj85-7te92 Pensoft have recently introduced "nanopubs", small structured publications that can be thought of as containing the minimum poss…
View article: Notes on transforming BHL images
Notes on transforming BHL images Open
How to cite: Page, R. (2024). Notes on transforming BHL images https://doi.org/10.59350/2gpbb-98a53 I've been down this road before, e.g. BHL, DjVu, and reading the f*cking manual and Demo of full-text indexing of BHL using CouchDB hoste…
View article: Hugging Face Autotrain
Hugging Face Autotrain Open
How to cite: Page, R. (2024). Hugging Face Autotrain https://doi.org/10.59350/7p1n4-wdv84 These are notes to myself on using Hugging Face AutoTrain. The first version of this had a very nice interface where you could simply upload a fold…
View article: Problems with the DataCite Data Citation Corpus
Problems with the DataCite Data Citation Corpus Open
How to cite: Page, R. (2024). Problems with the DataCite Data Citation Corpus https://doi.org/10.59350/t80g1-xys37 DataCite have released the Data Citation Corpus, together with a dashboard that summarises the corpus. This is billed as: …
View article: It's 2023 - why are we still not sharing phylogenies?
It's 2023 - why are we still not sharing phylogenies? Open
How to cite: Page, R. (2023). It's 2023 - why are we still not sharing phylogenies? https://doi.org/10.59350/n681n-syx67 A quick note to support a recent Twitter thread https://twitter.com/rdmpage/status/1729816558866718796?s=61&t=nM…
View article: Where are the plant type specimens? Mapping JSTOR Global Plants to GBIF
Where are the plant type specimens? Mapping JSTOR Global Plants to GBIF Open
How to cite: Page, R. (2023). Where are the plant type specimens? Mapping JSTOR Global Plants to GBIF. https://doi.org/10.59350/m59qn-22v52 This blog post documents my attempts to create links between two major resources for plant taxono…
View article: JSTOR plant type specimens linked to GBIF occurrences
JSTOR plant type specimens linked to GBIF occurrences Open
A mapping between URLs for type specimens in JSTOR Global Plants and the corresponding occurrence in the Global Biodiversity Information Facility (GBIF). Guide to fields: doi: JSTOR identifier code: Barcode: gbif: GBIF occurrence id occurr…
View article: JSTOR plant type specimens linked to GBIF occurrences
JSTOR plant type specimens linked to GBIF occurrences Open
A mapping between URLs for type specimens in JSTOR Global Plants and the corresponding occurrence in the Global Biodiversity Information Facility (GBIF). Guide to fields: doi: JSTOR identifier code: Barcode: gbif: GBIF occurrence id occurr…
View article: Ten years and a million links: building a global taxonomic library connecting persistent identifiers for names, publications and people
Ten years and a million links: building a global taxonomic library connecting persistent identifiers for names, publications and people Open
A major gap in the biodiversity knowledge graph is a connection between taxonomic names and the taxonomic literature. While both names and publications often have persistent identifiers (PIDs), such as Life Science Identifiers (LSIDs) or D…
View article: Ten Years and a Million Links: Building a global taxonomic library connecting persistent identifiers for names (LSIDs), publications (DOIs), and people (ORCIDs)
Ten Years and a Million Links: Building a global taxonomic library connecting persistent identifiers for names (LSIDs), publications (DOIs), and people (ORCIDs) Open
One thing the field of biodiversity informatics has been very good at is creating databases. However, this success in creation has not been matched by equivalent success in creating deep links between records in those databases. Instead, w…
View article: Document layout analysis
Document layout analysis Open
How to cite: Page, R. (2023). Document layout analysis. https://doi.org/10.59350/z574z-dcw92 Some notes to self on document layout analysis. I'm revisiting the problem of taking a PDF or a scanned document and determining its structure (…
View article: The problem with GBIF's Phylogeny Explorer
The problem with GBIF's Phylogeny Explorer Open
How to cite: Page, R. (2023). The problem with GBIF's Phylogeny Explorer. https://doi.org/10.59350/v0bt3-zp114 GBIF recently released the Phylogeny Explorer, using legumes as an example dataset. The goal is to enables users to "view occu…
View article: Sub-second searching of millions of DNA barcodes using a vector database
Sub-second searching of millions of DNA barcodes using a vector database Open
How to cite: Page, R. (2023). Sub-second searching of millions of DNA barcodes using a vector database. https://doi.org/10.59350/qkn8x-mgz20 Recently I've been messing about with DNA barcodes.
View article: What, if anything, is the Biodiversity Knowledge Hub?
What, if anything, is the Biodiversity Knowledge Hub? Open
How to cite: Page, R. (2023). What, if anything, is the Biodiversity Knowledge Hub?
View article: Adventures in machine learning: iNaturalist, DNA barcodes, and Lepidoptera
Adventures in machine learning: iNaturalist, DNA barcodes, and Lepidoptera Open
How to cite: Page, R. (2023). Adventures in machine learning: iNaturalist, DNA barcodes, and Lepidoptera. https://doi.org/10.59350/5q854-j4s23 Recently I've been working with a masters student, Maja Nagler, on a project using machine lea…