Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [january-dev] Typed datasets in January

Hi all,

I'm another developer here at Diamond with Peter and Matt, I use January pretty heavily and am interested in the UoM and "datasets that only have meaning with other datasets".

One feature I use pretty heavily is the Metadata interface.

All IDatasets can carry Metadata, which might be axes (time, energy...), units, masks... anything that gives the data meaning beyond being just an NDarray.

What makes this interface really useful is the @Sliceable annotation, which can be used to mark datasets in the metadata to be sliced when the main dataset is sliced. A lot of the data processing we do with January involves taking a very large ILazyDataset, backed by HDF5 or a stack of Tiff files, setting axes on the different dimensions of the dataset, then iterating through slices of the data. The @Sliceable metadata slices the metadata datasets to the correct size during the slicing of the main dataset, saving the large amounts of code we used to have to write to do this manually. 

I'm not sure if this is useful for your case, but if you took the altitude dataset, made an AxesMetadata object, inserted pressure, temperature etc as datasets in the AxesMetadata then set this in the altitude dataset, every slice you took from altitude would have the corresponding pressure and temperature values in. 

For units we have a very simple UnitMetadata object that allows us to associate units with a dataset (but not do much more). This is in DawnSci rather than January because it has extra dependencies, but it would be nice to improve the functionality of this area.

Thanks,

Jake

Dr Jacob Filik
Senior Software Scientist
Tel: +441235 77 8690
 
Diamond Light Source Ltd.
Diamond House
Harwell Science & Innovation Campus
Didcot
Oxfordshire
OX11 0DE



-----Original Message-----
From: january-dev-bounces@xxxxxxxxxxx [mailto:january-dev-bounces@xxxxxxxxxxx] On Behalf Of Ian Mayo
Sent: 30 January 2017 14:33
To: january-dev@xxxxxxxxxxx
Subject: [january-dev] Typed datasets in January

Hi all,
as discussed with @jonahkichwacoders at EclipseCon 2016, I've been investing some resources into prototyping the addition of Units of Measurement (UoM) to January.

This would enable a system to know that (for example) a dataset in metres can only be added to a dataset in metres (and not one in seconds).  But, a dataset in metres can be divided by a dataset in seconds to give a dataset in metres/second.  This additional metadata can both remove some opportunity for error in implementation, and prove useful to the user.

This investment of time did seem of great value to the scientific community, it would appear a ground rule for "good science" that the units of measured data are always explicitly stated (it would also have avoided the loss of a Mars Orbiter).

The data I will be working is structured similarly to weather-balloon ascent data. The weather balloon is released, and as it ascends it captures (for example) measurements of altitude, temperature, pressure, humidity.  In this dataset, altitude is a continuous dimension that is used as an index to the other measurements.  These four sets of measurements are captured and stored in a single dataset.

While January is capable of storing this as an array of 4 * n doubles, it is only at the IDataset level that metadata (such as Units of
Measurement) can be applied.   So, only one Unit of Measurement can be
specified - not the 4 actually in use (distance, temperature, pressure, humidity).

The alternative is to store 4 separate datasets, each with correct UoM metadata.  But, this devalues the data structure - since the temperature dataset can only be exploited in conjunction with the altitude dataset.  They're intrinsically linked, and the data can only be extracted using the altitude index value.

I expect that elapsed time is probably the most common index dimension across science in general.  This introduces a further complication if it uses the common practice of using a long timestamp value (millis since the epoch).  It certainly isn't possible to mix long and double values in a January dataset.

At last Autumn's meeting of the London Eclipse User Group Mark Basham
(DLC) indicated that it was possible to integrate datasets of disparate types, with one or more datasets designated as the "index"
that can allow measurements to be extracted.   But, in my exploration
with January so far I've only encountered use of traditional array index values, and can't find a way to integrate multiple datasets.

Am I missing something here guys?

cheers,
Ian
_______________________________________________
january-dev mailing list
january-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit https://dev.eclipse.org/mailman/listinfo/january-dev

-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom



Back to the top