Select Page

Digital Representation of Units of Measure (DRUM)

Members of the Task Group

  • Robert Hanisch, NIST, USA (Chair)
  • Stuart Chalk, University of North Florida, USA (Secretary)
  • Simon Cox, CSIRO, Australia (Member and TG Liaison)
  • Steven Emmerson, UCAR, USA (Member)
  • Jeremy Frey, University of Southampton, UK (Member)
  • Joachim Meier, PTB, Germany (Member)

Liaisons, CODATA Executive Committee

  • Richard Hartshorn, University of Canterbury, NZ
  • Simon Hodson, CODATA Executive Director

Issue to be Addressed

The current state of the digital representation of units of measure(ment) (physical quantities) (DRUM) across domains is a significant problem relative to the interoperability of data and it needs to be addressed immediately. Working toward a solution for digital units that works for machines (in repositories, databases, software, scripts) and humans (websites, books, spreadsheets) is a fundamental component of what is needed for the CODATA Decadal Programme on “Making data work for cross-domain grand challenges”. Note, while this group is focusing on units related to physical quantities we realize that this discussion sits in a broader area around units (see footnote).

Across the scientific disciplines there is a wide variety of knowledge about, focus on, and care with the recording of a unit of measure with each piece of experimental, calculated, modeled or derived data. Much information is available for annotation of units for humans, however there is no authoritative source for how to represent and store units of measures (in any units’ system) in digital systems.  This is a fundamental problem for data science currently and a major problem for the future integration of large, heterogeneous datasets both within and across disciplines.  It is the most important single issue for the development of general or domain repositories, for the ideas behind machine learning/artificial intelligence (ML/AI) and open data and the implementation of systems that support Findable, Accessible, Interoperable, and Reusable (FAIR) data.

Significance and Importance

Every measurement made where a numerical value is generated requires the recording of and association with a unit of measure.  In the current research environment, where the paradigm is shifting to the digital publication of research data in openly accessible formats, researchers annotate a unit of measure by adding a string of characters to a numeric value in a computer system (database, spreadsheet, text file etc.). While the researcher may well report the unit in a common unit system (e.g. the SI) the guidelines for formatting these strings (e.g., BIPM, ISO, NIST) are often not followed. As a result, there is a significant problem in normalization of units and this is a significant barrier to the interoperability of data.

Outputs of the Task Group

Across the International Scientific Unions (ISUs) (and affiliated organizations) there is a variety of expertise in this area in addition to technical resources on units available online.  This task group will focus on:

  • Collection of current unit representation approaches in the ISUs (survey)
  • Identification and aggregation of current unit representation systems
  • Analysis of current unit representation systems  – usage/advantages/disadvantages
  • Aggregation of activities where different disciplines adopted/implemented new units of measure representation systems (e.g., UDUnits -> Climatology, NERC Vocabulary Service -> Oceanography)
  • Collection issues/pain/friction points with the usage of units presentation systems
  • Collection of best practices for using unit representation systems

Outcomes of this Activity (Impact and Effect)

  • A better understanding of options and issues with use of unit of measure representations
  • Convergence on the representation of units of measure that will allow greater interoperability of data
  • Guidance for ISUs toward normalization of units within a discipline
  • Digital repository developers to provide services around units for data ingest/exposure
  • Research scientists to annotate data with a unit ‘at birth’ that is unambiguous
  • Software vendors and developers to use consistent and compatible units in scientific data file formats
  • Journal publishers to automate the checking of units in scientific papers and datasets
  • Educational publishers to systematically represent units in texts

Envisaged Outputs the TG Intends to Deliver

  • Recommendation of unit of measure representation systems for different use cases
  • Guidelines for the annotation of data with a unit of measure in digital systems
  • Best practices for adopting and implementing the recommended DRUM system
  • ‘Metrology 101’ educational resource to present the concepts in metrology
  • Units of Measure Interoperability Service – hosted by NIST

Key Activities that the TG will Conduct

The following are activities the TG intends to run based on the funding available.  However, many events require participation from liaisons to each of the ISUs which will have to be self-funded.  As a result the speed of the development of the outcomes of the TG may be impacted by the attendance of TG meetings.

  • Monthly VMs
  • TG Meeting at the biannual CODATA Conference
Footnote

Assigning values to characteristics or qualities of things in the world, which includes nominal values, classifications, ordering, as well as quantitative measures which are scaled with units. ‘Scales’ for nominal values are sometimes called ‘controlled vocabularies’ or ‘code-lists’; ordinal scales are things like the geological timescale, in which ordering relationships are fixed, even as the temporal position is continuously adjusted due to better characterization of the chronometric scale; taxonomies (in the biological sense) are a kind of hierarchical classification.