Use Case: Customization of environmental data

Coordination: Centre National de Recherches Météorologiques de Météo-France et du CNRS (CNRM)

Contact: Christophe Baehr

Multidisciplinary academic partners:

  • (Informatics) IRIT
  • (Ergonomics) CLLE and MSHS-T
  • (Meteorology) CNRM
  • (Environmental Sciences) OMP-GET

Nature of the data :

  • tabular data
  • sensor measurements
  • numerical data
  • environmental (meteorological) data

Scientific lock :

  • customising access to datasets to make them accessible to users from another domain

From case study to project submission

This DataNoos case study focuses on the cross-fertilisation of data in the environmental sciences, with a specific focus on the difficulty of finding the right data produced by different disciplines to meet a specific need and for specific uses. Thus, it contributes to defining the conditions necessary for datasets to be truly FAIR compliant (Findable, Accessible, Interoperable and Reusable), including for people from other disciplines than datascience.

Initial case study: Targeted weather report generation

The initial objective of this study was the personalisation of weather reports so that the reports are tailored to the user communities they are intended for.

Based on the data provided by the meteorological centre from measurements in weather stations, images etc., and presented in the form of maps, the forecaster has to take into account the needs of each user community to produce weather reports for those communities.

The study aimed to facilitate and assist this process. We planned to automate or facilitate the production of (verbal) data from weather data that could be spontaneously understood by a particular recipient community according to its expectations. The chosen method was to define learning models calculating correlations between data already produced manually: the inputs of the process (weather data provided to forecasters) and the outputs (weather reports targeted according to communities - sailors, civil protection, farmers, ...).

Semantics4FAIR: semantic modelling of metadata in response to FAIR criteria

After a first contact with biology researchers studying pollens, and following the release of the ANR Flash 2019 Call for Projects supporting Open Science, we have revised the objective to meet the need of a non-meteorologist researcher who wants to easily retrieve weather data. This means improving the suitability of these datasets to the FAIR criteria. The submitted project, Semantics4FAIR, was selected by the ANR and started in January 2020.

To facilitate access to the data by non-specialists in meteorology, we proposed an interdisciplinary response combining computer scientists, researchers in human sciences (ergonomics), researchers producing the data (CNRM) and users of these data (CNRM and OMP). We have chosen the approach of ontologies and formal vocabularies defining in a unique way the concepts, properties and entities necessary to define rich and comprehensible metadata. The semantic representation allows reasoning about this data when searching it, and facilitates its alignment with other open data.

The Semantics4FAIR ANR Flash project (2020 - 2022)

Adopting an ergonomic approach to work analysis, the project seeks first to understand why biology researchers cannot find the open Météo France datasets that suit them. It aims to tool the bottleneck that exists between users and producers of data, as shown in the figure below.

Data users versus producers
The bottleneck between data producers and users

 

The next phase consisted in building ontologies accessible to users to propose a better description of the datasets through semantic metadata. The formalisation of metadata makes it possible to define homogeneous descriptions for all the data sets in a repository by defining forms (templates). It also facilitates the definition of constraints and controls when entering this data. When searching for datasets, it also makes it possible to propose parameter values based on the datasets already described, and to filter the proposed values as certain parameters are chosen. Finally, the ontology's definitions and relationships are used to guide the development of description forms and searches by non-specialist users.

In a third step, two modules of a prototype repository of open meteorological datasets exploiting this ontology have been implemented. A first module allows to define dataset description forms and then to describe datasets using this form. All these data sets constitute a repository. A second module is dedicated to searching for datasets within this repository.


Discover the project website, its partners and its results here.