Rogue Scholar

Published June 8, 2020

These are simply notes to myself about taxonomic classifications in Wikidata. Classifications in Wikidata can be complex and are often not trees. For example, if we trace the parents of the frog family Leptodactylidae back we get a graph like this: Each oval represents a taxon in Wikidata, and each arrow connects a taxon to its parent(s) in Wikidata.

ClassificationD3jsRDFSPARQLWikidataComputer and Information Sciences

Displaying taxonomic classifications from Wikidata using d3js and SPARQL

https://doi.org/10.59350/99cvt-94e65

Published January 14, 2017

Author Roderic Page

Following on from previous posts The Semantic Web made fun: d3sparql and The Biodiversity Heritage Library meets Wikidata via Wikispecies: adding author identifiers to BioStor I've put together an example query that can be used to extract a taxonomic classification from Wikidata.

ClassificationErrorGBIFLiverwortComputer and Information Sciences

GBIF liverwort taxonomy broken

https://doi.org/10.59350/cq66n-7mc69

Published March 3, 2014

Author Roderic Page

A quick note to myself to document a problem with the GBIF classification of liverworts (I've created issue POR-1879 for this).While building a new tool to browse GBIF data I ran into a problem that the taxon "Jungermanniales" popped up in two different places in the GBIF classification, which broke a graphical display widget I was using.If you search GBIF for Jungermanniales you get two results, both listed as "accepted":Based on Wikipedia

BatsClassificationCluster MapsData CleaningGBIFComputer and Information Sciences

Cluster maps, papaya plots, and the trouble with GBIF taxonomy

https://doi.org/10.59350/dq1cv-szd96

Published August 14, 2013

Author Roderic Page

Continuing the theme of the failings of the GBIF classification I've been playing further with cluster maps to visualise the problem (see this earlier post for an introduction).Browsing through bats in GBIF I keep finding the same species appearing more than once, albeit in different genera.

ClassificationCluster MapsDemansiaEOLVisualisationComputer and Information Sciences

Visualising differences between classifications using cluster maps

https://doi.org/10.59350/snnd5-9pa62

Published June 13, 2012

Author Roderic Page

As part of a project to build a tool to navigate through taxonomic names and classifications I've become interested in quick ways to compare classifications.

BioStorClassificationData CleaningErrorGBIFComputer and Information Sciences

The GBIF classification is broken — how do we fix it?

https://doi.org/10.59350/5a5re-kp839

Published May 30, 2012

Author Roderic Page

This post arose from an ongoing email conversation with Tony Rees about extracting and annotating taxonomic names. In BioStor I use the GBIF classification to display the taxonomic names found in the OCR text in the form of a tree.

ClassificationGregg's ParadoxTaxonomyWikipediaComputer and Information Sciences

Wikipedia and Gregg's paradox

https://doi.org/10.59350/6bbcj-xf875

Published October 6, 2009

Author Roderic Page

Continuing the theme of taxonomic classification in Wikipedia, I'm perversely delighted that Wikipedia demonstrates Gregg's paradox so nicely. The late John R. Gregg wrote several papers and a book exploring the logical structure of taxonomy.

ClassificationWikipediaComputer and Information Sciences

Wikipedia's taxonomic classification is badly broken

https://doi.org/10.59350/vxhjg-y5c77

Published October 5, 2009

Author Roderic Page

Wikipedia is wonderful, but parts of it are horribly broken. Take, for example, taxonomic classifications. A classification is a rooted tree, which means that each node in the tree has a single parent. We can store trees in databases in a variety of ways. For example, for each node we could store a list of its children, or we could store the single unique parent of each node. Ideally we'd choose to store one or other, but not both.

ClassificationMammal Species Of The WorldMammalsMSWWikipediaComputer and Information Sciences

Comparing Wikipedia and Mammal Species of the World classifications

https://doi.org/10.59350/b679a-wjz41

Published August 31, 2009

Author Roderic Page

Continuing the saga of making sense of the mammal classification in Wikipedia, I've done a quick comparison with the Mammal Species of the World (third edition) classification. MSW is the default taxonomic reference used by WikiProject Mammals.

ClassificationMammalsVisualisationWikipediaComputer and Information Sciences

Mammal tree from Wikipedia

https://doi.org/10.59350/qj5rg-hmk44

Published August 29, 2009

Author Roderic Page

Following on from my previous post about visualising the mammalian classification in Wikipedia, I've extracted the largest component from the graph for all mammal taxa in Wikipedia, and it is a tree. This wasn't apparent in the previous diagram, where the component appeared as a big ball due to the layout algorithm used.

iPhylo

Towards visualising classifications from Wikidata

Displaying taxonomic classifications from Wikidata using d3js and SPARQL

GBIF liverwort taxonomy broken

Cluster maps, papaya plots, and the trouble with GBIF taxonomy

Visualising differences between classifications using cluster maps

The GBIF classification is broken — how do we fix it?

Wikipedia and Gregg's paradox

Wikipedia's taxonomic classification is badly broken

Comparing Wikipedia and Mammal Species of the World classifications

Mammal tree from Wikipedia