Natural SciencesHugo

Donny Winston

Donny Winston
Made as simple as possible, but not simpler.
Home PageAtom FeedMastodon
language
Published

Many tests use oracles , where you know the answers for some inputs and you check those correspondences. To cover more of the input state space, you can generate random inputs and check some properties for each corresponding output. You don’t have an enumeration of exact answers like with oracles, but you can check things like the output always being greater than zero, etc.

Published

In a collaboration, data objects are produced at many sites. To make the data objects findable, you may steward a central, searchable index for their metadata. How then do you make the data objects accessible for download? One common solution is to centralize the custodianship – have all sites upload copies of their data objects to a central store. The central store may partition storage across several physical servers behind the scenes (e.g.

Published

One powerful mechanism of robustness is exploratory behavior, for which the desired outcome is produced by a generate-and-test mechanism. This organization allows the generator to work and be developed independently of the tester that accepts or rejects a particular result. One can make an analogy to biological evolution, where the generator is random mutation and the tester is natural selection.

Published

Resource description refers to defining concepts and relationships that represent the content and structure of some subject matter (ontology) or a database (schema) in a formal language. The relationship between ontology and database schema is nuanced – Uschold provides a nice comparison. 1 You can formally describe resources using the resource description framework (RDF), SQL’s data definition language (DDL), etc.

Published

In the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), a repository is a means of exposing metadata to harvesters . The OAI-PMH spec goes into great detail about how a data provider should implement a repository so that a harvester can simply be a client application that issues one of six possible HTTP requests.

Published

When you have several different applications (e.g. to perform simulations and analyses) that each have their own data model, it’s typical for each to also maintain its own siloed data store. Then, in order to use all the applications in concert to complete a research project, or to support an ongoing research program, you need to run extract-transform-load (ETL) pipelines to sync the data.

Published

Earlier this week, I wrote that As luck would have it, the U.S. Department of Energy (DOE) posted a funding opportunity announcement (FOA) yesterday on Data Reduction for Science: There have been efforts for decades to identify and deal with this issue, with cute acronyms for relevant data like ROT (Redundant Obsolete and Trivial), WORN (Write Once Read Never), and WORSE (Write Once Read Seldom if Ever). However, the DOE FOA highlights that it