Quick note to express the frustration I experience sometimes when dealing with taxonomic literature.
Quick note to express the frustration I experience sometimes when dealing with taxonomic literature.
Jeff Atwood, one of the co-founders of Stack Overflow recently wrote a blog post Trouble In the House of Google, where he noted that several sites that scrape Stack Overflow content (which Stack Overflow's CC-BY-SA license permits) appear higher in Google's search rankings than the original Stack Overflow pages . When Stack Overflow chose the CC-BY-SA license they made the assumption that:Jeff Atwood's post goes on to argue that something
Yesterday I posted notes on Web Hooks and OpenURL. That post was written when I was already late (you know, when you say to yourself "yeah, I've got time, it'll just take 5 minutes to finish this..."). The Web Hooks + OpenURL project is still very much a work in progress, but I thought a screen cast would help explain why I think this is going to make my life a lot easier.
For me one of the most frustrating things about online databases is that they often can't be edited. For example, I've recently created a version of the Australian Faunal Directory on CouchDB, which contains a list of all animals in Australia, and a fairly comprehensive bibliography of taxonomic publication on those animals. What I'd like to do is locate those publications online.
One of the things I'm enjoying about the Australian Faunal Directory on CouchDB is the chance to play with some ideas without worrying about breaking lots of code or, indeed, upsetting any users ('cos, let's face it, there aren't any). As a result, I can start to play with ideas that may one day find their way into other projects.One of these ideas is to use quantum treemaps to display an author's publications.
Geoffery Bilder's comments about the unsuitability of URLs as long term identifiers (as opposed, say, to DOIs) came to mind when I discovered that the domain phthiraptera.org is up for sale: This domain used to be home to a wealth of resources on lice (order Phthiraptera). I discovered that ownership of the domain had expired when a bunch of links to PDFs returned by an iSpecies search for Collodennyus all bounced to the holding page
In my last post I discussed why I thought the decision of The Plant List to use a restrictive license (CC-BY-NC-ND) was such a poor choice. CC-BY-NC-ND states that To make this point more concrete, I've created this site:Experiments with The Plant Listto show the kinds of things that The Plant List's choice of license prevents the taxonomic community from doing.
The Plant List (http://www.theplantlist.org/) has been released today, complete with glowing press releases. The list includes some 1,040,426 names. I eagerly looked for the Download button, but none is to be found.
Some quick notes on OCR. Revisiting my DjVu viewer experiments it really struck me how "dirty" the OCR text is. It's readable, but if we were to display the OCR text rather than the images, it would be a little offputting.
One year ago I released BioStor, which scratched my itch regarding finding articles in the Biodiversity Heritage Library. This anniversary seems to be a good time to think about where next with this project, but also to ask whether it's been successful.