peakweather.dataset¶

class PeakWeatherDataset(root: str | None = None, pad_missing_values: bool = True, years: int | Sequence[int] | None = None, parameters: str | Sequence[str] | None = None, extended_topo_vars: str | Sequence[str] | None = 'none', extended_nwp_pars: str | Sequence[str] | None = 'none', imputation_method: Literal['locf', 'zero', None] = 'zero', interpolation_method: str = 'nearest', freq: str | None = None, compute_uv: bool = False, station_type: Literal['rain_gauge', 'meteo_station'] | None = None, aggregation_methods: dict[str, str] | None = None)¶

Bases: object

PeakWeather is a high-quality meteorological dataset derived from SwissMetNet, the automated measurement network operated by MeteoSwiss. It offers a robust resource for research and applications in spatiotemporal modeling.

PeakWeather includes high-frequency meteorological observations recorded every 10 minutes, collected from 302 ground stations distributed across Switzerland, covering the period from January 1st, 2017 to October 13th, 2025. The dataset also provides high-resolution topographic features at 50-meter resolution and ensemble forecasts from the ICON-CH1-EPS operational numerical weather prediction (NWP) model. The dataset is described in more details in “PeakWeather: MeteoSwiss Weather Station Measurements for Spatiotemporal Deep Learning” (Zambon et al., 2025).

This class loads the PeakWeather dataset and provides utilities for accessing, preprocessing, and integrating the data into machine learning workflows.

Dataset size:

Time steps: 461952
Stations: 302
Channels: 8
Sampling interval: 10 minutes

Channels:

wind_direction: Wind direction (degree). Ten minutes mean.
wind_speed: Wind speed scalar (meter/second). Ten minutes mean.
wind_gust: Gust peak (meter/second). Maximum recorded over ten minutes.
pressure: Atmospheric pressure at barometric altitude (QFE) (hectopascal). Instant value.
precipitation: Precipitation (millimeter). Ten minutes total.
sunshine: Sunshine duration (minute). Ten minutes total.
temperature: Air temperature 2 m above ground (degree Celsius). Instant value.
humidity: Relative air humidity 2 m above ground (per cent). Instant value.

Static attributes:

stations_table: Information associated with the stations, including name, type, latitude, longitude, height, and topographical descriptors.
installation_table: Information about stations’ installation.
parameters_table: Descriptions of the measured quantities.

Parameters:

root (str, optional) – The root directory where the dataset is stored. If None, the dataset is stored in the current working directory. (default: None)
pad_missing_values (bool, optional) – If True, pad missing parameter values with NaN values. Padding missing data is recommended when working with arrays and tensors. (default: True)
years (int or list of int, optional) – The years to include in the dataset. If None, all available years are included. (default: None)
parameters (str or list of str, optional) – The parameters to include in the dataset. If None, all available parameters are included. Otherwise, the dataset will include only the requested parameters that are available. The stored dataframe will contain parameter columns sorted alphabetically, regardless of the order provided in parameters. (default: None)
extended_topo_vars (str or list of str, optional) – The topography variables to include in the dataset. If None, no topography variables are included. Use "all" to include all available variables. (default: "none")
extended_nwp_pars (str or list of str, optional) – The NWP (ICON-CH1-EPS) parameters to include in the dataset. If None, no NWP parameters are included. (default: "none")
imputation_method (str, optional) – The method to use for imputing missing values. Options are “locf” (last observation carried forward), “zero” (fill with zero), or None (no imputation). (default: "zero")
interpolation_method (str, optional) – The method to use for interpolating topography variables. Options are “linear”, “nearest”, “quadratic”, “cubic”, “barycentric”, “krogh”, “akima”, or “makima”. (default: "nearest")
freq (str, optional) – Resample frequency (e.g., “h” for hourly). If None, no resampling is applied. (default: None)
compute_uv (bool, optional) – Whether the u-v components of the wind should be computed and included in the dataset. (default: False)
station_type (str, optional) – The type of stations to consider, either meteorological stations or rain gauges. If not specified, all stations are included. (default: None)
aggregation_methods (dict, optional) – If given, allows specifying non-default aggregation strategies for selected parameters. The dictionary must map a parameter name to one of "mean", "max", "sum", "sum_straight", "last", "circ_mean", or "dir_from_uv". The difference between "sum_straight" and "sum" is that the former sums the available values in the resampling window, while the latter handles missing observations by implicitly imputing them with the mean of the available values. The aggregations "circ_mean" and "dir_from_uv" are intended for the parameter ‘wind_direction’: "circ_mean" computes the circular mean of the wind direction, while "dir_from_uv" computes wind direction from the direction of the aggregated u-v wind components. (default: None)

align_windows(obs: Windows, nwp: xr.Dataset, drop_extra_y_pars: bool, as_xarray: bool) → Tuple[Windows, xr.Dataset | ndarray]¶

Align observation windows with NWP forecast windows along the time axis.

Parameters:

obs (Windows) – Observation windows to align.
nwp (xarray.Dataset) – NWP forecast windows to align.
drop_extra_y_pars (bool) – If True, drop observation parameters that are not present in the NWP data. If False, keep all observation parameters and store the mapping from NWP to observation parameters in obs.nwp_to_y.
as_xarray (bool) – If True, return the aligned data as xarray.Dataset. If False, return as numpy.ndarray.

Returns:

Aligned observation: windows and NWP forecast windows.

Return type:

Tuple[Windows, Union[xarray.Dataset, np.ndarray]]

available_icon = {'humidity', 'precipitation', 'pressure', 'sunshine', 'temperature', 'wind_gust', 'wind_u', 'wind_v'}¶

available_parameters = {'humidity': 'ure200s0', 'precipitation': 'rre150z0', 'pressure': 'prestas0', 'sunshine': 'sre000z0', 'temperature': 'tre200s0', 'wind_direction': 'dkl010z0', 'wind_gust': 'fkl010z1', 'wind_speed': 'fkl010z0'}¶

available_topography = {'ASPECT_10000M_SIGRATIO1', 'ASPECT_2000M_SIGRATIO1', 'DEM', 'SLOPE_10000M_SIGRATIO1', 'SLOPE_2000M_SIGRATIO1', 'SN_DERIVATIVE_10000M_SIGRATIO1', 'SN_DERIVATIVE_2000M_SIGRATIO1', 'STD_10000M', 'STD_2000M', 'TPI_10000M', 'TPI_2000M', 'WE_DERIVATIVE_10000M_SIGRATIO1', 'WE_DERIVATIVE_2000M_SIGRATIO1'}¶

available_years = {2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025}¶

base_url = 'https://huggingface.co/datasets/MeteoSwiss/PeakWeather/resolve/v0.2.0/data/'¶

download() → None¶: Download the dataset if it is not already present.

get_observation_windows(window_size: int, horizon_size: int, stations: str | List[str] | None = None, parameters: str | List[str] | None = None, first_date: str | Timestamp | None = None, last_date: str | Timestamp | None = None, split: Literal['train', 'test'] | None = None, as_xarray: bool = False) → Windows¶: Get sliding windows of observations and mask. See get_windows for more details.

Get observations for the specified stations and parameters.

The observations are returned as a pandas DataFrame or numpy array, depending on the value of as_numpy. If return_mask is set to True, a tuple of (observations, mask) is returned.

The observations are filtered based on the specified stations, parameters, and date range. If no filtering is applied, all observations are returned. The date range is inclusive of the start date and exclusive of the end date.

Parameters:

stations (str or list, optional) – Station IDs to filter. If None, all stations are used. If specified, the output preserves the given station order. (default: None)
parameters (str or list, optional) – Parameter IDs to filter. If None, all parameters are used. If specified, the output preserves the given parameter order. (default: None)
first_date (str or pd.Timestamp, optional) – Start date for filtering. If None, no temporal filtering is applied. (default: None)
last_date (str or pd.Timestamp, optional) – End date for filtering. If None, no temporal filtering is applied. (default: None)
split (Literal['train', 'test'], optional) – Predefined data split to load. If given, first_date and last_date must be None. (default: None)
as_numpy (bool, optional) – If True, return the observations as a ndarray instead of a DataFrame. The returned array may have missing values filled with NaNs if the dataset is initialized with pad_missing_values set to False. (default: False)
return_mask (bool, optional) – If True, return the mask as well. (default: False)
copy (bool, optional) – If True, return a copy of the data. (default: False)

Returns:

The observations as a DataFrame or: ndarray (if as_numpy is set to True). If return_mask is set to True, a tuple of (observations, mask) is returned.

Return type:

FrameArray or tuple

static get_uv_wind(wind_speed: ndarray, wind_direction: ndarray, direction_unit: Literal['deg', 'rad'] = 'deg') → Tuple[ndarray, ndarray]¶

Computes the u,v components of the wind given wind speed and direction.

The u component is the eastward component while v is the northward component.

Parameters:

wind_speed (np.ndarray) – The wind speed.
wind_direction (np.ndarray) – The wind direction, increasing clockwise where a northerly wind has 0 degrees.
direction_unit (Literal["deg", "rad"], optional) – Angle unit. (default: "deg")

Returns:

The tuple (u, v).

Return type:

Tuple[np.ndarray, np.ndarray]

static get_wind_direction(u: ndarray, v: ndarray) → ndarray¶

Given the u and v components, get the wind direction.

Parameters:

u (np.ndarray) – The eastward wind component.
v (np.ndarray) – The northward wind component.

Returns:

The wind direction.

Return type:

np.ndarray

static get_wind_speed(u: ndarray, v: ndarray) → ndarray¶

Given the u and v components, get the wind speed.

Parameters:

u (np.ndarray) – The eastward wind component.
v (np.ndarray) – The northward wind component.

Returns:

The wind speed.

Return type:

np.ndarray

Get sliding windows of observations, mask of missing values and, when requested, NWP forecasts. The input data is reshaped into sliding windows of size (window_size, num_stations, num_channels) and the target data is reshaped into sliding windows of size (horizon_size, num_stations, num_channels). The NWP forecasts are of shape (horizon_size, num_stations, num_channels, num_ensemble_members).

Parameters:

window_size (int) – Size of the input window.
horizon_size (int) – Size of the output horizon.
stations (str or list, optional) – Station IDs to filter. If None, all stations are used. If specified, the output preserves the given station order. (default: None)
parameters (str or list, optional) – Parameter IDs to filter. If None, all parameters are used. If specified, the output preserves the given parameter order. (default: None)
first_date (str or pd.Timestamp, optional) – Start date for filtering. If None, no temporal filtering is applied. (default: None)
last_date (str or pd.Timestamp, optional) – End date for filtering. If None, no temporal filtering is applied. (default: None)
split (Literal['train', 'test', 'nwp_test'], optional) – Predefined data split to load. If given, first_date and last_date must be None. (default: None)
nwp_parameters (str or list, optional) – Defines which NWP parameters to return. If no parameter is specified, none will be returned. If split is ‘nwp_test’, an NWP parameter must be specified. The windows will then then be aligned on the reference foreasting time available from the NWP model ICON-CH1-EPS, i.e., every 3 hours starting at midnight UTC. If a list is given, all those parameters are loaded. The nwp data is then available in Windows.nwp. If no split is specified but nwp_parameter is not None, the windows will nonetheless be aligned with the first available timestamp of the nwp data. (default: None)
drop_extra_y_pars (bool, optional) – If False, the returned parameters in y are selected according to parameters, as per x. If True, only the features corresponding to nwp_parameter are returned. (default: True)
as_xarray (bool, optional) – If True, return the Windows with x, mask_x, y and mask_y as a Dataset instead of a ndarray. The returned data may have missing values filled with NaNs if the dataset is initialized with pad_missing_values set to False. (default: False)

Returns:

a windowed view of observations and NWP forecasts.

Return type:

Windows

had_values_before(cutoff_time: str | Timestamp) → Series¶

Return a binary mask indicating whether each station measured a parameter before the given cutoff time.

Parameters:: cutoff_time (str or pd.Timestamp) – Cutoff timestamp in UTC.
Returns:: Binary (Boolean) series with a MultiIndex (station, parameter).
Return type:: pd.Series

interpolate_topography(topographic_params: dict, stations_table: DataFrame) → DataFrame | None¶

Interpolate topography variables to station locations.

It requires the module xarray to be installed.

Parameters:

topographic_params (dict) – A dictionary containing the topography data for each variable.
stations_table (pd.DataFrame) – A DataFrame containing the station metadata.

Returns:

A pandas.DataFrame containing the: interpolated topography data for each station. If no topography variables are provided, returns None.

Return type:

Optional[pd.DataFrame]

load(aggregation_methods: dict[str, str] | None = None)¶

Load the dataset.

This method downloads the dataset if it is not already present and loads the data into memory. The data is returned as a tuple containing the observations, mask, and static tables.

The observations are resampled to the specified frequency and missing values are imputed using the specified method.

The topography data is interpolated to the station locations using the specified interpolation method.

Parameters:: aggregation_methods (dict, optional) – If given, applies different aggregation strategies for the specified parameters. (default: None)

load_icon_data(icon_parameter: str | Sequence[str]) → xr.Dataset¶

Returns an xarray.Dataset with the requested parameters.

Parameters:: icon_parameter (str) – The ICON parameters. Must be one of self.available_icon.
Raises:: ValueError – If the corresponding zarr is not available.
Returns:: The dataset with the ICON forecasts.
Return type:: xarray.Dataset

load_raw(aggregation_methods: dict[str, str] | None = None)¶

Load the raw dataset.

This method downloads the dataset if it is not already present and loads the data into memory. The data is returned as a tuple containing the observations, static tables, and optional topography data.

load_topography() → dict¶

Load the topography data.

This method loads the topography data specified in extended_topo_vars. The data is returned as a dictionary containing the topography data as xarray.Dataset objects, indexed by their variable names with the prefix topo_ stripped.

property missing_values: Series¶: Missing values for each parameter, considering stations equipped with the necessary sensor.

np_windows_as_xr(windows: Windows, stations: str | List[str] | None, parameters: str | List[str] | None) → Windows¶

Convert numpy sliding windows to xarray Dataset.

Parameters:

windows (Windows) – Sliding windows in numpy format.
stations (list or str, optional) – Station IDs to filter. If None, all stations are used.
parameters (list or str, optional) – Parameter IDs to filter. If None, all parameters are used.

Returns:

Sliding windows in xarray.Dataset format.

Return type:

Windows

property num_parameters: int¶: Number of parameters in the dataset.

property num_stations: int¶: Number of stations in the dataset.

property num_time_steps: int¶: Number of time_steps in the dataset.

property parameters: Index¶: Parameters measured by the stations in the dataset.

property required_file_names: Mapping[str, str]¶: The relative filepaths that must be present in order to skip downloading.

property required_files_paths: Mapping[str, str]¶: The absolute filepaths that must be present in order to skip downloading.

resample(df_observations: DataFrame, df_parameters: DataFrame) → DataFrame¶

Resample observations to specified freq.

Parameters:

df_observations (pd.DataFrame) – Observations.
df_parameters (pd.DataFrame) – Parameter metadata including aggregation strategies.

Returns:

Resampled observations.

Return type:

pd.DataFrame

property root_dir: str¶: The root directory where the dataset is stored.

show_parameters_description()¶: Show description of the parameters in the dataset.

property stations: Index¶: IDs of stations in the dataset.

test_set_start = Timestamp('2024-10-01 00:00:00+0000', tz='UTC')¶

property urls: Mapping[str, str]¶: The URLs of the files to download.

Bases: object

Windows is a data class containing sliding-window data as numpy.ndarray, pandas.DataFrame, or xarray.Dataset.

Parameters:

x (pandas.DataFrame | numpy.ndarray | xarray.Dataset) – Input data for the look-back window.
mask_x (pandas.DataFrame | numpy.ndarray | xarray.Dataset) – Boolean mask for x indicating valid or missing values (True = observed, False = missing).
y (pandas.DataFrame | numpy.ndarray | xarray.Dataset) – Target data for the forecast horizon window.
mask_y (pandas.DataFrame | numpy.ndarray | xarray.Dataset) – Boolean mask for y indicating valid or missing values (True = observed, False = missing).
index_x (pandas.DatetimeIndex | numpy.ndarray, optional) – Timestamps or indices corresponding to x. (default: None)
index_y (pandas.DatetimeIndex | numpy.ndarray, optional) – Timestamps or indices corresponding to y. (default: None)
nwp (numpy.ndarray | xarray.Dataset, optional) – Numerical Weather Prediction (ICON-CH1-EPS) data associated with the target horizon. (default: None)
nwp_to_y (Sequence[int], optional) – Mapping from NWP parameters to target parameters in y. (default: None)

index_x: DatetimeIndex | ndarray | None = None¶

index_y: DatetimeIndex | ndarray | None = None¶

mask_x: DataFrame | ndarray | xr.Dataset¶

mask_y: DataFrame | ndarray | xr.Dataset¶

nwp: ndarray | xr.Dataset | None = None¶

nwp_to_y: Sequence[int] | None = None¶

x: DataFrame | ndarray | xr.Dataset¶

y: DataFrame | ndarray | xr.Dataset¶