Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[january-dev] Typed datasets in January

Hi all,
as discussed with @jonahkichwacoders at EclipseCon 2016, I've been
investing some resources into prototyping the addition of Units of
Measurement (UoM) to January.

This would enable a system to know that (for example) a dataset in
metres can only be added to a dataset in metres (and not one in
seconds).  But, a dataset in metres can be divided by a dataset in
seconds to give a dataset in metres/second.  This additional metadata
can both remove some opportunity for error in implementation, and
prove useful to the user.

This investment of time did seem of great value to the scientific
community, it would appear a ground rule for "good science" that the
units of measured data are always explicitly stated (it would also
have avoided the loss of a Mars Orbiter).

The data I will be working is structured similarly to weather-balloon
ascent data. The weather balloon is released, and as it ascends it
captures (for example) measurements of altitude, temperature,
pressure, humidity.  In this dataset, altitude is a continuous
dimension that is used as an index to the other measurements.  These
four sets of measurements are captured and stored in a single dataset.

While January is capable of storing this as an array of 4 * n doubles,
it is only at the IDataset level that metadata (such as Units of
Measurement) can be applied.   So, only one Unit of Measurement can be
specified - not the 4 actually in use (distance, temperature,
pressure, humidity).

The alternative is to store 4 separate datasets, each with correct UoM
metadata.  But, this devalues the data structure - since the
temperature dataset can only be exploited in conjunction with the
altitude dataset.  They're intrinsically linked, and the data can only
be extracted using the altitude index value.

I expect that elapsed time is probably the most common index dimension
across science in general.  This introduces a further complication if
it uses the common practice of using a long timestamp value (millis
since the epoch).  It certainly isn't possible to mix long and double
values in a January dataset.

At last Autumn's meeting of the London Eclipse User Group Mark Basham
(DLC) indicated that it was possible to integrate datasets of
disparate types, with one or more datasets designated as the "index"
that can allow measurements to be extracted.   But, in my exploration
with January so far I've only encountered use of traditional array
index values, and can't find a way to integrate multiple datasets.

Am I missing something here guys?

cheers,
Ian


Back to the top