Rogue Scholar

Published July 28, 2021

The more precisely the position of some particle is determined, the less precisely its momentum can be predicted from initial conditions, and vice versa.

Natural Sciences

L⁻¹T⁻¹ETL

https://doi.org/10.59350/hzxpe-she63

Published July 23, 2021

Author Donny Winston

A lot of extract-transform-load (ETL) work requires unloading and un-transforming first. Rather than \(ETL\), it’s \(L_A^{-1}T_A^{-1}ET_BL_B\). What the data provider did is \(A\). What you want to do is \(B\). The data provider gave you a “dump” of their data. You don’t know what it means. If you did, you could extract (subset) from it according to your needs – filter entities by some meaningful criteria and collect selected attributes.

Natural Sciences

On the Wisdom of Opaque Identifiers

https://doi.org/10.59350/h5kq9-m3120

Published July 12, 2021

Author Donny Winston

Wikidata uses opaque identifiers for its catalogued information resources. For example, the statement “wd:Q42 wdt:P69 wd:Q691283” may map to the label sequence “‘Douglas Adams’ ’educated at’ ‘St John’s College’” with a language-locale preference of English-US. Opaque naming is wise for internationalization.

Natural Sciences

Object Persistence: a Matter of Service

https://doi.org/10.59350/58jzk-mxm74

Published July 5, 2021

Author Donny Winston

I recently encountered the Archival Resource Key (ARK) identifier scheme and read its latest draft specification. ¹ Its thesis is that the persistence – i.e., long-term identification and access – of any information resource – i.e., an “object” – is a matter of service.

Natural Sciences

Data Structures as Snapshots of Process

https://doi.org/10.59350/62xb4-ckd72

Published June 24, 2021

Author Donny Winston

Imagine a data system modeled as three parts: an interface, a processor, and a repository. The repository “contains” information. The processor receives symbol streams to alter or retrieve information from the repository, and the processor outputs symbol streams. The interface is the medium, the opaque surface, of symbol-stream exchange between you and the processor. ¹ What information is “in” the repository?

Natural Sciences

Cool LUIs Dont Change

https://doi.org/10.59350/w7b17-7f121

Published June 16, 2021

Author Donny Winston

After my last note on identifiers, Leo Talirz pointed me to a great riff ¹ on Tim Berners-Lee’s classic note ² on “cool URIs”. In the “Cool DOIs” article, Fenner breaks down a DOI into three parts: proxy, prefix, and suffix. A proxy is a server that maintains a map from prefixes to registrants. Example proxies are https://doi.org/ and https://hdl.handle.net/. An example prefix is 10.5281.

Natural Sciences

Identifiers Versus Labels

https://doi.org/10.59350/xj8et-1bj74

Published June 4, 2021

Author Donny Winston

Data protocols vary over project lifetimes, and many projects involve parameter sweeps. You might see filesystem directory structures evolve naming schemes like the following ¹ : # let's not overthink this at first. concentration_A_0.25/ # hierarchy is good, right? concentration_A/0.25/ # paths are getting too long. conc_A/0.25/ # change to percentages. clever!

Natural Sciences

Data Citation: Restoring ↔ Querying

https://doi.org/10.59350/j0yb8-vbz45

Published May 21, 2021

Author Donny Winston

Approaches to data citation may span classes of big-O complexity, for both space (memory/storage) and time (compute/transfer). Dataset revisions may be minted and persisted without any delta encoding / structural sharing. The main mechanism of reproduction for citations in this case is restoration. Space complexity is high, as storage needs are high.

Natural Sciences

On the Hourglass Model, for FAIR

https://doi.org/10.59350/q8bmr-73v49

Published May 17, 2021

Author Donny Winston

The hourglass model is an approach to layered system architecture where a middle layer is intentionally constricted in order to support flexibility in the implementation of layers above and below.

Natural Sciences

Data Citation via a Query and a Basis

https://doi.org/10.59350/xkq74-kfq53

Published May 12, 2021

Author Donny Winston

Against what bases are queries against your data evaluated? If you only expose a single “data base” that changes over time, then data citations cannot be a combination of query and basis. When citing a passage in a book, the edition/variation of its publication, i.e. the thing that is assigned an ISBN, is the basis. Optionally, a citation may include a “query” against this basis – a page number, page range, chapter number/title, etc.

Donny Winston

The Kent Uncertainty Principle - On Data and Reality

L⁻¹T⁻¹ETL

On the Wisdom of Opaque Identifiers

Object Persistence: a Matter of Service

Data Structures as Snapshots of Process

Cool LUIs Dont Change

Identifiers Versus Labels

Data Citation: Restoring ↔ Querying

On the Hourglass Model, for FAIR

Data Citation via a Query and a Basis