In the Barcelona Supercomputing Center, scientists from different groups rely on a shared data archive curated by the Data and Diagnostics Team (DDT, composed of around 20 members), which is in charge, among other activities, of downloading, standardizing, checking, and curating model and observational data from external sources, as well as to develop common tools for the data analysis tools.
Most of the tools for the data pipelines are written in Python and use the community standards of NetCDF and GRIB.
The successful candidate will join the Data and Diagnostics Team to develop a new Python suite to create a data pipeline to manage and update the BSC storage containing reanalysis and seasonal forecasts data from the ECMWF Climate Data Store (CDS, https://cds.climate.copernicus.eu/datasets).
The successful candidate, with the support of the department's data managers, will study different ways to access the CDS, efficiently download and format the data by exploring different community libraries, and integrate this into a workflow using the workflow manager developed in the department (Autosubmit, https://autosubmit.readthedocs.io/en/master/).
In a second time, the work will consist of generalizing this tool for other data platforms (NMME and other reanalysis).
Key Duties
- Develop a Python suite to download and format climate data from the climate data store and other similar platforms
- Study the efficiency of the different options of the suite
- Integrate the suite into a data pipeline workflow
- Develop an automatic data checker in the data pipeline
Data de tancament: Dissabte, 14 Desembre, 2024
Més posts de Recerca
Cap comentari:
Publica un comentari a l'entrada