Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics.  For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188. Written content on this site is licensed under a Creative Commons Attribution 4.0 International license.

University of Glasgow

Roderic Page

Quick note to self, having stumbled on the Wikipedia page on transitive reduction. Given a graph like this:the transitive reduction is:Note that the original graph has an edge a -&gt; d, but this is absent after the reduction because we can get from a to d via b (or c). What's the point?

Transitive reduction

No, not taxonomy the discipline (although I've given a talk asking this question), but taxonomy.zoology.gla.ac.uk, my long-running web server hosting such venerable software projects as TreeView, NDE, and GeneTree, along with my home page.A series of power cuts in my building while I was away finally did for my ancient Sun Sparcstation5, running the CERN web server (yes, it's

 that

old). I can remember the thrill (mixed with mild terror)

Taxonomy is dead, long live taxonomy

The first pPod workshop happened last month at NESCent, and some of the presentations are online on the pPod Wiki. Although I'm a "consultant" I couldn't be there, which is a pity because it looks to have been an interesting meeting. When pPod was first announced I blogged some of my own thoughts on phylogenetics databases.

processing PhylOData (pPOD)

The latest version of the David and Wayne Maddison's Cartographer module for their program Mesquite can export KML files for Google Earth. They graciously acknowledge my crude efforts in this direction, and  Bill Piel's work -- he really started this whole thing rolling.So, those of you inspired to try your hand at Google Earth trees, and who were frustrated by the lack of tools should grab a copy of Mesquite and take it for a spin.

Mesquite does Google Earth files

In an earlier post I described the TBMap database (doi:10.1186/1471-2105-8-158), which contains a mapping of TreeBASE taxon names onto names in other databases. While this is one step towards making it easier to query TreeBASE, what I'd really like is to link the data in TreeBASE to sources such as GenBank and specimen databases.

Matching names in phylogeny data files

I would prefer to avoid Microsoft-bashing, but today I've spent time trying to get my tree viewer to work under Internet Explorer 6 and 7, and it's hell. Here are the problems I've had to deal with:

 Empty DIV bug

On IE 6 the top of the scrollbar overlapped the transparent area when the page first loads.

Internet Explorer -- argh!!!!!

For the "to do" list, expand-ahead browsing looks like a useful approach to build upon PygmyBrowse (see my live demo). The approach is described in "Expand-Ahead: A Space-Filling Strategy for Browsing Trees" by McGuffin et al. (doi:10.1109/INFOVIS.2004.21, PDF also here).There is a video on Ravin Balakrishnan's site, which is an AVI file that I haven't bee able to coerce my Mac into playing, hence I've posted it on YouTube.

Expand-Ahead: A Space-Filling Strategy for Browsing Trees

This guest post by Tony Rees explores some of the themes from his recent talk 10 years of Global Biodiversity Databases: Are We There Yet?.

 10 years of global biodiversity databases: are we there yet?

Guest post: 10 years of global biodiversity databases: are we there yet?

OK, time to put my money where my mouth is. Here's a first stab at displaying big trees in a browser. Not terribly sophisticated, but reasonably fast. Take a look at Big Trees.

 Approach

Given a tree I simply draw it in a predetermined area (in these examples 400 x 600 pixels). If there are more leaves than can be drawn without overlapping I simply cull the leaf labels.

Visualising very big trees, Part II

One of the striking pictures in Tamara Munzner et al.'s paper "TreeJuxtaposer: Scalable Tree Comparison using Focus+Context with Guaranteed Visibility" (doi:10.1145/882262.882291, also available here) is that of a biologist struggling to visualise a large phylogeny.

Viewing very large trees

This October 22-24 there is a phyloinformatics workshop at the e-Science Institute in Edinburgh, Scotland, hosted in conjunction with the Isaac Newton Institute for Mathematical Sciences's  Phylogenetics Programme. The invited speakers are:Prof.

Phyloinformatics Workshop in Edinburgh October 22-24 2007

Based on my recent experience developing an OpenURL service (described here, here, and here), linking this to a reference parser and AJAX tool (see David Shorthouse's description of how he did this), and thoughts on XMP, maybe it's time to try and articulate how this could be put together to make taxonomic literature more accessible.Details below, but basically I think we could make major progress by:Creating an OpenURL service that knows about

Making taxonomic literature available online

Now, for something completely different. I've been playing with Google Earth as a phylogeny viewer, inspired by Bill Piel's efforts, the cool avian flu visualisation Janies et al. published in

 Systematic Biology

(doi:10.1080/10635150701266848), and David Kidd's work.As an example, I've taken a phylogeny for

 Banza

katydids from Shapiro et al. (doi:10.1016/j.ympev.2006.04.006), and created a KML file.

Google Earth phylogenies

Added the Biodiversity Heritage Library blog to my links on my blog, then noticed that BHL have disabled comments. So, we can view their progress, but can't leave comments. Sigh, I wonder whether BHL has quite grasped that one of the best uses of a blog is to interact with the people who leave comments, in other words, have a conversation.

Biodiversity Heritage Library blog - look but don't touch

postgenomic is a great way to keep up with science blogs. For example, searching for encyclopedia of life pulls up all sorts of interesting posts. A sampling:Island of doubtSciGuyMy Biotech LifePharnagulaThese are not the only blogs, and as always the comments left by others on these blogs is also fascinating. My sense is there is a "wow" factor based on the the publicity, coupled with not inconsiderable skepticism  about content.

EoL in the blogsphere

My paper on mapping TreeBASE names to other databases is out as provisional PDF on the

 BMC Bioinformatics

web site (doi:10.1186/1471-2105-8-158 -- not working yet).The abstract:The TBMap web site needs some work, it's really only intended to document the mapping. Once I've tweaked and updated the mapping, I hope to use it in my forthcoming all-sining, all-dancing, phylogeny database...

TBMap paper out

David Shorthouse has entered the blogsphere with his iSpiders blog.

David Shorthouse enters the blogsphere

The funding of pPOD mentioned earlier today motivates me to write some notes on what I think "core database technologies for enabling the integration of AToL data" could, or indeed,

 should

be about.

A manifesto

Stumbled across New Infrastructures for Knowledge Production: Understanding E-science while writing about TAXACOM on the iSpecies blog. The book is edited by Christine Hine, who has an article entitled The politics and practice of accessibility in systematics, which I think will be part of Past, Present &amp; Future of Research in the Information Society.

The politics and practice of accessibility in systematics

Donat Agosti has interesting commentary on E. O. Wilson's forthcoming receipt of a TED 2007 prize. In typically forthright fashion, Donat asks:Read his post for Donat's own wish list.

EO Wilson recipient of the TED award 2007

In the spirit of "release early, release often", a preliminary version of the TreeBASE name mapping project is now online at TBMap. It's a bit crude, the graphs look awful because they're generated on the fly on a Linux box using GraphViz, but you'll get the idea. I'll try and tidy it up and add a few more visuals to it next week after the EOL Informatics meeting at Woods Hole. There are also some missing mappings to add to the database.

TreeBASE name mapping

My Taxonomic Search Engine is back online (mostly). This tool to search multiple databases for scientific names was another casualty of hacking. Having been developed under PHP 4, it needed some work to play nice with PHP 5. The changes were minor, and mainly concerned changes in code involving XPath and XSLT. I've commited these changes to SourceForge.

Taxonomic Search Engine back online

Not a huge fan of IE, but this post on David Patten's blog nicely illustrates the ease of use of A9's OpenSearch with IE 7. I'd previously played with OpenSearch as a quick way to integrate biodiversity sources, and put together a couple that have been registered with A9 (search for "taxonomy" and you'll find them). It's essentially adding a few tags to RSS or Atom feeds, coupled with a simple way to describe the search engine.Perhaps it's time

OpenSearch

A quick Google found this Firefox extension for turning built-in SVG on and off, posted on or maybe something uplifting.Really useful little extension, because Firefox SVG support is actually pretty awful very good. (Just discovered that FireFox couldn't handle my original SVG, but if I put in the namespaces as attributes of the

 svg

tag, everything worked fine.

Firefox Extension for Turning Built-in SVG on and off

Continuing the theme of visualising phylogenies, one thing which strikes me is the parallel between genome browsers that display annotation "tracks" (such as the UCSC Genome Browser) and illustrations of "chronograms" with geological periods and accompanying data, such as sea levels, isotope levels, etc. In my haste I couldn't find an example with a sea-level track, but I know they exist. The chronogram at right comes from Steppan et al.

Genome browsers and chronograms

I've updated my extension to resolve LSIDs in Firefox so that it works with version 1.5.0.1 (the most recent version of Firefox). The extension is available from Mozdev. It may take a little while for the mirrors to update with the new version, so if you get a "404" when trying to download, you may need to come back later.IBM's LSID project have their own Firefox extension LSID Launchpad for Firefox, which is a lot slicker than mine.

LSID Firefox extension update

One of the potentially powerful features of TreeBASE II is availability of a RDF version of a study. This means that, in principle, one could take the RDF for a TreeBASE study, combine it with RDF from other sources, and generate a richer view of a particular study.

TreeBASE II RDF

I've added a feature to my Biodiversity Heritage Library viewer that should help make sense of the names found on a page. Until now I've displayed them as a list of "tags", which ignores the relations among the names.

Tag trees: displaying the taxonomy of names in BHL

The following is a guest post by Bob Mesibov. Do you know the party game "Telephone", also known as "Chinese Whispers"? The first player whispers a message in the ear of the next player, who passes the message in the same way to a third player, and so on. When the last player has heard the whispered message, the starting and finishing versions of the message are spoken out loud. The two versions are rarely the same.

Guest post: Our taxonomy is not your taxonomy

At present BioStor provides a simple display of an article extracted from BHL. You get the page images, and sometimes a map and an altmetric "donut". But we can do better than this. For example, I'm starting to experiment with displaying a list of literature cited by the article.

New feature for BioStor: extracting literature cited from OCR text

Bibliographic coupling is a term coined by Kessler (doi:10.1002/asi.5090140103) in 1963 as a measure of similarity between documents. If two documents, A and B, cite a third, C, then A and B are coupled.I'm interested in extending this to data, such as DNA sequences and specimens. In part this is because within the challenge dataset I'm finding cases where authors cite data, but not the paper publishing the data.

From bibliographic coupling to data coupling

In case I forget how to do this, and as an example of how easy it is to get sucked into a black hole of programming micro-details, I spent a hour or more trying to figure out how to handle Japanese characters.I'm building a database of publications linked to taxonomic names, and I'm interested in linking to electronic versions of those publications.

Turning Japanese: EUC-JP, UTF-8, and percent-encoding

How to cite:

 Page, R. (2023). It’s 2023 - why are we still not sharing phylogenies? https://doi.org/10.59350/n681n-syx67

A quick note to support a recent Twitter thread https://twitter.com/rdmpage/status/1729816558866718796?s=61&amp;t=nM4XCRsGtE7RLYW3MyIpMA The article “Diversification of flowering plants in space and time” by Dimitrov et al. describes a genus-level phylogeny for 14,244 flowering plant genera.

It's 2023 - why are we still not sharing phylogenies?

As trailed on a Twitter thread last week I’ve been working on a manuscript describing the efforts to map taxonomic names to their original descriptions in the taxonomic literature. The preprint is on bioRxiv doi:10.1101/2023.05.29.542697  Much of the work has been linking taxa to names, which still has huge gaps.

Ten years and a million links

Some quick notes on interface ideas for digital libraries and/or knowledge graphs. Recently there’s been something of an explosion in bibliographic tools to explore the literature.

Library interfaces, knowledge graphs, and Miller columns

One thing about ChatGPT is it has opened my eyes to some concepts I was dimly aware of but am only now beginning to fully appreciate. ChatGPT enables you ask it questions, but the answers depend on what ChatGPT “knows”. As several people have noted, what would be even better is to be able to run ChatGPT on your own content. Indeed, ChatGPT itself now supports this using plugins.

ChatGPT, semantic search, and knowledge graphs

Markus Strasser (@mkstra write a fascinating article entitled "The Business of Extracting Knowledge from Academic Publications".  His TL;DR:  After recounting the many problems of knowledge extraction - including a swipe at nanopubs which "are ... dead in my view (without admitting it)" - he concludes:  Well worth a read, and much food for thought.

The Business of Extracting Knowledge from Academic Publications

How to cite:

 Page, R. (2024). Hugging Face Autotrain https://doi.org/10.59350/7p1n4-wdv84

These are notes to myself on using Hugging Face AutoTrain. The first version of this had a very nice interface where you could simply upload a folder of images and train a model. It was limited in the range of tasks and models, but made up for that in ease of use.

Hugging Face Autotrain

I've released a very crude GraphQL endpoint for WikiData. More precisely, the endpoint is for a subset of the entities that are of interest to WikiCite, such as scholarly articles, people, and journals. There is a crude demo at https://wikicite-graphql.herokuapp.com. The endpoint itself is at https://wikicite-graphql.herokuapp.com/gql.php.

GraphQL for WikiData (WikiCite)

Taxonomic treatments have come up in various discussions I'm involved in, and I'm curious as to whether they are actually being used, in particular, whether they are actually being cited. Consider the following quote:  "Traditional" academic citation is from article to article.

Does anyone cite taxonomic treatments?

uBioRSS: Tracking taxonomic literature using RSS

Aggregating, Tagging and Integrating Biodiversity Research

Over a decade ago RSS (RDF Site Summary or Really Simple Syndication) was attracting a lot of interest as a way to integrate data across various websites. Many science publishers would provide a list of their latest articles in XML in one of three flavours of RSS (RDF, RSS, Atom). This led to tools such as uBioRSS [1] and my own e-Biosphere Challenge: visualising biodiversity digitisation in real time.

Revisiting RSS to monitor the latest taxonomic research

This is a guest post by Tony Rees. It would be difficult to encounter a scientist, or anyone interested in science, who is not familiar with the microscope, a tool for making objects visible that are otherwise too small to be properly seen by the unaided eye, or to reveal otherwise invisible fine detail in larger objects.

Reflections on "The Macroscope" - a tool for the 21st Century?

I've made Species Cite live. This is a web site I've been working on with the GBIF Challenge as a notional deadline so I'll actually get something out the door. "Species Cite" takes as its inspiration the suggestion that citing original taxonomic descriptions (and subsequent revisions) would increase citation metrics for taxonomists, and give them the credit they deserve.

Species Cite: linking scientific names to publications and taxonomists

Quick note on a tool I've been working on to parse citations, that is to take a series of strings such as: Möllendorff O (1894) On a collection of land-shells from the Samui Islands, Gulf of Siam. Proceedings of the Zoological Society of London, 1894: 146–156. de Morgan J (1885) Mollusques terrestres &amp; fluviatiles du royaume de Pérak et des pays voisins (Presqúile Malaise). Bulletin de la Société Zoologique de France, 10: 353–249.

Citation parsing tool released

Is it's been a while since I've blogged here. The last few months have been, um, interesting for so many reasons.

It's been a while...

A challenge in working with large taxonomic classifications is how you display them to the user, especially if the user probably doesn't want all the gory details.

Maximum entropy summary trees to display higher classifications

Last week I submitted a manuscript entitled "Wikidata and the bibliography of life". I've been thinking about the "bibliography of life" (AKA a database of every taxonomic publication ever published) for a while, and this paper explores the idea that Wikidata is the place to create this database.

Preprint on Wikidata and the bibliography of life

Somewhat stunned by the fact that my DNA barcode browser I described earlier was one of the (minor) prizewinners in this year's GBIF Ebbe Nielsen Challenge. For details on the winner and other place getters see ShinyBIOMOD wins 2020 GBIF Ebbe Nielsen Challenge. Obviously I'm biased, but it's nice to see the challenge inspiring creativity in biodiversity informatics. Congratulations to everyone who took part.

GBIF Challenge success

I'm giving a short talk at the Workshop On Open Citations And Open Scholarly Metadata 2020, which will be held online on September 9th.

Workshop On Open Citations And Open Scholarly Metadata 2020 talk

Just a note that ORCID serves data using terms from schema.org, and has done for a while (since April 2018), but somehow I missed this. You can get linked data in JSON-LD using content negotiation.

ORCID serves schema.org linked data via content negotiation - who knew?

How to cite:

 Page, R. (2023). Sub-second searching of millions of DNA barcodes using a vector database. https://doi.org/10.59350/qkn8x-mgz20

Recently I’ve been messing about with DNA barcodes.

Sub-second searching of millions of DNA barcodes using a vector database

The November 2006 issue of D-Lib magazine contains an article by Elaine Peterson entitled "Beneath the Metadata: Some Philosophical Problems with Folksonomy" (doi:10.1045/november2006-peterson). She writes:This article is one of the most

 irritating

things I've read in a while, and as much as I like philosophy, it reinforces my prejudice that invoking philosophy is almost always a bad idea.

Folksonomies - why philosophy is a bad thing

This post is inspired by the Pharaoh exhibition at the NGV in Melbourne, Australia. This is a beautifully displayed exhibition of objects from the British Museum, London. It has all the trappings of a modern exhibition, beautiful lighting, a custom sound track, and lots of social media coverage. But I found it immensely frustrating to visit.

Why do museuym and gallery displays ignore the web?

D. Ross Robertson has published a paper entitled "Global biogeographical data bases on marine fishes: caveat emptor" (doi:10.1111/j.1472-4642.2008.00519.x - DOI is broken, you can get the article here). The paper concludes:As I've noted elsewhere on this blog, and as demonstrated by Yesson et al.'s paper on legume records in GBIF (doi:10.1371/journal.pone.0001124) (not cited by Robertson), there are major problems with geographical information

Global biogeographical data bases on marine fishes: caveat emptor

This is just some random notes on an “ideal” taxonomic journal, inspired in part by some recent discussions on “turbo-taxonomy” (e.g., https://doi.org/10.3897/zookeys.1087.76720 and https://doi.org/10.1186/1742-9994-10-15), and also examples such as the Australian Journal of Taxonomy https://doi.org/10.54102/ajt.qxi3r which seems well-intentioned but limited.

The ideal taxonomic journal

Say what you will about Elsevier, they are certainly exploring ways to re-imagine the scientific article. In a comment on an earlier post Fabian Schreiber pointed out that Elsevier have released an app to display phylogenies in articles they publish. The app is based on jsPhyloSVGand is described here.

Elsevier articles have interactive phylogenies

In any discussion of data gathering or data cleaning the term "crowdsourcing" inevitably comes up. A example where this approach has been successful is the Encyclopedia of Life's Flickr pool, where Flickr users upload images that are harvested by EOL.Given that many Flickr photos are taken with cameras that have built-in GPS (such as the iPhone, the most common camera on Flickr) we could potentially use the Flickr photos not only as a source of

Where is the "crowd" in crowdsourcing? Mapping EOL Flickr photos

Viewing phylogenies on the web: Javascript conversion of Newick tree to SVG

Here are some quick notes on how BHL could use Mendeley as a "CiteBank".

 As a repository of bibliographic data

If the goal is to assemble a "bibliography of life" then there are various ways this could be done.

 Taxon-specific bibliographies

Create groups that are taxon-specific (or find existing groups in Mendeley.

Mendeley as CiteBank: some ideas

Tomorrow is the Anchoring Biodiversity Information: From Sherborn to the 21st century and beyond meeting.

Linking taxonomic names to literature: beyond digitised 5×3 index cards

Following on from the last post, I've now set up a trivial NCBI RDF service at bioguid.info/taxonomy/ (based on the ISSN resolver I released yesterday and announced on the Bibliographic Ontology Specification Group).If you visit it in a web browser it's nothing special. However, if you choose to display XML you'll see some simple RDF.

NCBI RDF

This week seems to be API week. The Encyclopedia of Life API Beta Test has been out since August 12th.

Navigating the Encyclopedia of Life tree on the desktop and the iPhone

In a moment of madness brought on by trying to make sense of 10 Mb of conference schedule for Evolution 2010, I extracted the text from the schedule and created a series of crude iCal files that I can add to my iCal calendar on my Mac (and hence sync to my iPhone). This way I can set reminders of specific talks I want to see.I'm making these ical files available here, on the understanding that you can use them entirely at your own risk (some

Evolution 2010 talk schedule for iCal

To much fanfare (e.g.,

 Nature News

, "Linnaeus meets the Internet" doi:10.1038/news.2010.221), on May 5th

 PLoS ONE

published Sandy Knapp's "Four New Vining Species of

 Solanum

(Dulcamaroid Clade) from Montane Habitats in Tropical America" doi:10.1371/journal.pone.0010502.

Linnaeus meets the Internet: PLoS + Botany =  #fail

Just noticed that BioStor now has just over 70,000 articles extracted from the Biodiversity Heritage Library. This number is a little "soft" as there are some duplicates in the database that I need to clean out, but it's a nice sounding number.

70,000 articles extracted from the Biodiversity Heritage Library

Quick notes on "taxon concepts". In order to navigate through taxon names I plan to have at least one taxonomic classification in BioNames. GBIF makes the most sense at this stage. The model I'm adopting is that the classification is a graph where nodes have the id used by the external database (in this case GBIF). Each node has one or more names attached, and where possible the names are linked to the original description.

BioNames update - taxon concepts

As part of the discussion on whether legacy biodiversity literature matters a graph from the following paper came up:So, why is the Sarkar et al. graph bogus? Here is their graph (Fig. 3) for animals:This is the number of new animal species described each year, estimated by parsing taxonomic names and extracting the date in the taxonomic authority. There are two prominent "spikes" which are worrying.

Rate of description of new animal species and *that* Taxatoy graph

Following on from my earlier post Linking taxonomic names to literature: beyond digitised 5×3 index cards I've been slowly updating my latest toy:http://iphylo.org/~rpage/itaxonThis site displays a database mapping over 200,000 animal names to the primary literature, using a mix of identifiers (DOIs, Handles, PubMed, URLs) as well as links to freely available PDFs where they are available.

Mapping names to literature: closing in on 250,000 names

Déjà vu is a scary thing. Four years ago I released a mapping between names in TreeBASE and other databases called TBMap (described here: doi:10.1186/1471-2105-8-158). Today I find myself releasing yet another mapping, as part of my NCBI to Wikipedia project. By embedding the mapping in a wiki, it can be edited, so the kinds of problems I encountered with TbMap, recounted here, here, and here.

TreeBASE meets NCBI, again

Finally submitted (two days late) a manuscript for the

 BMC Bioinformatics

Special Issue on Biodiversity Informatics organised by Neil Sarkar and sponsored by EOL and CBOL. The manuscript, entitled "bioGUID: resolving, discovering, and minting identifiers for biodiversity informatics" describes my bioGUID project. If you are interested made pre-print available at Nature Precedings (hdl:10101/npre.2009.3079.1).

bioGUID manuscript

TreeView X, the open source version of TreeView, has been slowly suffering bit rot as C++ compilers and operating systems change. Every so often I'd tweak the code to build on some Linux version or other, but this isn't something I've a lot of time for. Moreover, because of the hassle of rebuilding binaries and source tar balls the updated versions weren't uploaded to the TreeView X web site.

TreeView X now on Google Code

I'm in the midst of rebuilding iSpecies (my mash-up of Wikipedia, NCBI, GBIF, Yahoo, and Google search results) with the aim of outputting the results in RDF. The goal is to convert iSpecies from a pretty crude "on-the-fly" mash-up to a triple store where results are cached and can be queried in interesting ways. Why?

Referring to a one-degree square in RDF using c-squares

Chris Freeland has written a thoughtful summary of his experiences of the two-day closed session to create a road map for biodiversity informatics, entitled #ebio09, silverbacks, &amp; haiku.

ChrisFreeland.com: #ebio09, silverbacks, &amp; haiku

Here's the video of my talk at the NHM (courtesy of Vince Smith).Going Digital - by Rod Page from Vince Smith on Vimeo.

London Calling Video

Random half-formed idea time. Thinking about marking up an article (e.g., from PLoS) with a phylogeny (such as the image below, see doi:10.1371/journal.pone.0001109.g001), I keep hitting the fact that existing web-based tree viewers are, in general, crap.Given that a PLoS article is an XML document, it would be great if the tree diagram was itself XML, in particular SVG.

What I want from a web phylogeny viewer - XML, SVG and Newick round tripping

I've been playing recently with the Biodiversity Heritage Library (BHL), and am starting to get a sense for the complexities (and limitations) of the metadata BHL stores about publications.

Biodiversity Heritage Library, Google books, and  metadata quality

One of the fun things about developing web sites is learning new tricks, tools, and techniques. Typically I hack away on my MacBook, and when something seems vaguely usable I stick it on a web server.

BioNames update - API documentation

As part of my on-going experiments with Wikipedia as a repository of taxonomic information, I've extracted mammal pages from Wikipedia.

Visualising the Wikipedia classification of mammals

The Science Commons has released a short video by Jesse Dylan, who made the Yes We Can video.

Yes We Can - "scientists are the ultimate remixers"

Watch CBS Videos OnlineCBS News Sunday Morning Segment on the EOL. All fun stuff (Paddy skewering the interviewer who fails to recognise an echidna), but still long on promises and short on actual product.

EOL on CBS

CrossRef have released a tool for bloggers to look up DOIs and insert them into blog posts:So far the tool is only available for WordPress blogs.

CrossRef blogger tool for DOI lookup

The PC hosting linnaeus.zoology.gla.ac.uk and darwin.zoology.gla.ac.uk has died, and this spells the end of my interest in (a) using generic PC hardware and (b) running Linux. The former keeps breaking down, the later is just harder than it needs to be (much as I like the idea). From now on, it's Macs only. No more geeky knapsacks for me.Because of this crash a lot of my experimental web sites are offline.

I hate computers

Returning to the subject of personal knowledge graphs Kyle Scheer has an interesting repository of Markdown files that describe academic disciplines at https://github.com/kyletscheer/academic-disciplines (see his blog post for more background).  If you add these files to Obsidian you get a nice visualisation of a taxonomy of academic disciplines.

Obsidian, markdown, and taxonomic trees

Note to self. The challenge of finding specimen citations in papers keeps coming around. It seems that this is basically the same problem as finding citations to papers, and can be approached in much the same way. If you want to build a database of reference from scratch, one way is to scrape citations from papers (e.g., from the "literature cited" section), convert those strings into structured data, and add those to your database.

Finding citations of specimens

For a short, but reasonably technical sumary of what I think the issues are, please read this "Technical Report", which I presented at the Workshop on Database Issues in Biological Databases (DBiBD) in Edinburgh in January 2005. This document is itself based on a BBSCR grant  proposal which was funded.

Towards a Taxonomically Intelligent Phylogenetic Database

Shameless plug. One of my former PhD students, Katie Davis, is second author on "Dinosaurs and the Cretaceous Terrestrial Revolution" (doi:10.1098/rspb.2008.0715), which came out recently in

 Proceedings of the Royal Society

. The abstract:Now, if we could just get the bird supertree paper out the door...

Dinosaurs and the Cretaceous Terrestrial Revolution

Hasan Jamil has released PhyQL, a visual system for querying phylogenetic information. To quote from the web site:There is also a YouTube screencast:I haven't had a chance to play with it yet. PhyQL was originally described by Jamil et al. "Querying phylogenies visually", BIBE 2001 (doi:10.1109/BIBE.2001.974405).

PhyQL

The manuscript for

 Briefings in Bioinformatics

that I alluded to earlier has been accepted for publication. I've put a preprint up at Nature Preceding (hdl:10101/npre.2008.1760.1). The final version will appear in print later this year.

Biodiversity informatics: the challenge of linking data and the role of shared identifiers

This post arose from an ongoing email conversation with Tony Rees about extracting and annotating taxonomic names. In BioStor I use the GBIF classification to display the taxonomic names found in the OCR text in the form of a tree.

The GBIF classification is broken — how do we fix it?

Not really a blog post, more a note to self. If I ever did get around to writing a book again, I think

 
 Scripting Life
 

would be a great title.

Scripting Life

How to cite:

 Page, R. (2023). A taxonomic search engine. https://doi.org/10.59350/r3g44-d5s15

Tony Rees commented on my recent post Ten years and a million links. I’ve responded to some of his comments, but I think the bigger question deserves more space, hence this blog post. Tony’s comment {#tony’s-comment} My response I think there are several ways to approach this.

A taxonomic search engine

Given that the Twitter stream tagged #vizbi will fade away soon, I've grabbed most of the links I tweeted during VIZBI 2011 and have put them here.

Some VIZBI 2011 links

I'm trying to get my head around the data model used by ZooBank to store taxonomic names. To do this, I've built a graph for the species

 Belonoperca pylei

described by Baldwin &amp;

ZooBank data model

I attended the TDWG-GUID workshop on Global Unique Indenitifers (GUIDs) held at NESCent, which has issued a report. Essentially, the aim of this work is to deploy globally unique identifiers for digital objects in biodiversity informatics, such as taxon names, specimen records, images, etc. The workshop settled on LSIDs (Life Science Identifiers), which is a sensible choice.

Globally Unique Identifiers

Comments by David Marjanović elsewhere on this blog (here and here) about TreeBASE, classification and Phylocode have prompted me to write a little bit about why I'm underwhelmed by the Phylocode. Suppose I have the question:How do I answer this? Well, my approach is to do the following. Firstly, I attempt to map every name in TreeBASE onto a name in an external database, such as NCBI Taxonomy, uBio, etc.

Phylocode

Some good news! pPOD, a  NSF-funded project on integrating data from AToL (A Tree of Life) projects has been funded. Val Tannen (right) is the co-ordinating PI. I'm a consultant, which means more opportunities to mouth-off about phylogenetic data and databases (for earlier examples see TreeBASE rocks, TreeBASE talk at CIPRES, and Towards the ToL database - some visions).The project is called pPOD, and has a wiki.

Recently I've been thinking about the best ways to make article-level metadata from BioStor more widely available. For example, for someone visiting the BHL site there is no easy way to find articles, which are the basic unit for much of the scientific literature. How hard would it be to add articles to BHL?

Adding article-level metadata to BHL

From Nature's blog on web technology and science comes this post on Open Text Mining Interface (OTMI):and furtherCurrently playing in iTunes:

 By the Time I Get to Phoenix

by Glen Campbell

Nascent: Open Text Mining Interface

Was going to post this as a comment on the BHL blog but they use Blogger's native comment system, which is horrible, and it refused to accept my comment (yes, yes, I'm sure it did that on grounds of taste). I read the recent post Building a BHL Africa and couldn't believe my eyes when I read the following:CDs! Really?

iPhylo

The Business of Extracting Knowledge from Academic Publications

Problems with Plazi parsing: how reliable are automated methods for extracting specimens from the literature?

Text mining for museum specimen identifiers

BHL and text-mining: some ideas