PeakWeather Demo¶
[1]:
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import matplotlib
import peakweather
print(peakweather.__version__)
from peakweather import PeakWeatherDataset
0.2.2
Loading the data¶
To get the dataset, simply initialize a PeakWeatherDataset object. This will, by default, download the data in the current working directory, unless it was previously downloaded.
[2]:
dataset = PeakWeatherDataset()
stations.parquet: 57.3kB [00:00, 194kB/s]
installation.parquet: 90.1kB [00:00, 348kB/s]
parameters.parquet: 8.19kB [00:00, 34.5kB/s]
disclaimer.txt: 8.19kB [00:00, 36.4kB/s]
2017.parquet: 56.9MB [00:01, 41.3MB/s]
2018.parquet: 57.3MB [00:01, 39.9MB/s]
2019.parquet: 57.5MB [00:01, 40.5MB/s]
2020.parquet: 57.1MB [00:01, 41.2MB/s]
2021.parquet: 56.5MB [00:01, 39.7MB/s]
2022.parquet: 56.8MB [00:01, 40.7MB/s]
2023.parquet: 57.5MB [00:01, 38.6MB/s]
2024.parquet: 57.5MB [00:01, 41.0MB/s]
2025.parquet: 45.1MB [00:01, 38.6MB/s]
With get_observations you can load the timeseries data as a dataframe or array. It is possible to load specific spatial and temporal subsets and to obtain a binary mask that specifies the availability of each measurement.
[3]:
df, mask = dataset.get_observations(
stations=["ABO", "GRO", "KLO"], # list of stations
parameters=["temperature", "wind_speed", "humidity"], # list of weather parameters
first_date="2024-08-02 16:32",
last_date="2024-08-06 23:26",
return_mask=True,
)
df.head()
[3]:
| nat_abbr | ABO | GRO | KLO | ||||||
|---|---|---|---|---|---|---|---|---|---|
| name | temperature | wind_speed | humidity | temperature | wind_speed | humidity | temperature | wind_speed | humidity |
| datetime | |||||||||
| 2024-08-02 16:40:00+00:00 | 19.400000 | 2.0 | 67.000000 | 24.100000 | 0.8 | 72.500000 | 26.299999 | 2.8 | 50.000000 |
| 2024-08-02 16:50:00+00:00 | 19.600000 | 2.7 | 67.300003 | 23.100000 | 0.9 | 79.099998 | 25.500000 | 3.2 | 56.200001 |
| 2024-08-02 17:00:00+00:00 | 19.200001 | 2.0 | 71.199997 | 22.900000 | 1.4 | 80.800003 | 25.400000 | 2.7 | 55.799999 |
| 2024-08-02 17:10:00+00:00 | 19.000000 | 0.9 | 73.900002 | 23.000000 | 2.3 | 78.300003 | 25.200001 | 2.7 | 57.000000 |
| 2024-08-02 17:20:00+00:00 | 19.100000 | 1.5 | 71.099998 | 22.799999 | 2.3 | 78.199997 | 25.000000 | 2.3 | 58.000000 |
[4]:
mask.head()
[4]:
| nat_abbr | ABO | GRO | KLO | ||||||
|---|---|---|---|---|---|---|---|---|---|
| name | temperature | wind_speed | humidity | temperature | wind_speed | humidity | temperature | wind_speed | humidity |
| datetime | |||||||||
| 2024-08-02 16:40:00+00:00 | True | True | True | True | True | True | True | True | True |
| 2024-08-02 16:50:00+00:00 | True | True | True | True | True | True | True | True | True |
| 2024-08-02 17:00:00+00:00 | True | True | True | True | True | True | True | True | True |
| 2024-08-02 17:10:00+00:00 | True | True | True | True | True | True | True | True | True |
| 2024-08-02 17:20:00+00:00 | True | True | True | True | True | True | True | True | True |
Visualization of the data¶
[5]:
def plot_weather_parameters(df, stations, parameters):
fig, ax = plt.subplots(len(parameters), figsize=(12, 7))
time_freq = mdates.date2num(df.index[1]) - mdates.date2num(df.index[0])
time_freq_minutes = int(time_freq * 24 * 60)
colors = matplotlib.colormaps["tab10"]
for i, param in enumerate(parameters):
for station in stations:
ax[i].plot(
df.index,
df[station, param],
label=f"{station} {param}",
color=colors(stations.index(station)),
)
ax[i].set_title(f"{param.capitalize()} ({time_freq_minutes} min)")
ax[i].set_ylabel(f"{param.capitalize()} ({dataset.parameters_table['unit'][param]})")
ax[i].legend()
ax[i].grid(True)
plt.tight_layout()
plt.show()
[6]:
plot_weather_parameters(df, stations=["ABO", "GRO", "KLO"], parameters=["temperature", "wind_speed", "humidity"])
Time series windowing¶
The get_windows function extracts sliding windows from the time series using a look-back window w and forecast horizon h. This prepares the data in a format that’s easy to use with framework-specific datasets and data loaders (e.g., PyTorch, TensorFlow, JAX).
This returns arrays of shape [n_w, w, n_s, n_f] for the inputs x and [n_w, h, n_s, n_f] for the outputs y, where n_w is the number of windows (or examples/samples), n_s is the number of stations and n_f is the number of features.
[7]:
windows = dataset.get_windows(
window_size=24, # number of lookback time steps
horizon_size=6, # number of lead times to be predicted
parameters=["temperature", "wind_speed", "humidity"],
stations=["ABO", "KLO", "GRO", "LUG"],
)
# [num_windows, num_time_steps, num_stations, num_parameters]
print(f"Windows x shape: \t{windows.x.shape}")
print(f"Windows mask_x shape: \t{windows.mask_x.shape}")
print(f"Windows y shape: \t{windows.y.shape}")
print(f"Windows mask_y shape: \t{windows.mask_y.shape}")
Windows x shape: (461923, 24, 4, 3)
Windows mask_x shape: (461923, 24, 4, 3)
Windows y shape: (461923, 6, 4, 3)
Windows mask_y shape: (461923, 6, 4, 3)
Temporal re-sampling and NWP forecasts¶
We can initialize the dataset to resample the time series to a coarser frequency, such as from the default 10-minute interval to hourly. This is useful for reducing the variability of highly dynamic parameters like wind speed, aligning with lower temporal resolutions, or shortening sequence length for training.
Hourly data also allows us comparing with the forecasts from the numerical weather prediction (NWP) model, which are provided as hourly forecasts within PeakWeather.
Each parameter has a default aggregation method used during temporal resampling, which can be overridden using the aggregation_method parameter.
[8]:
dataset = PeakWeatherDataset(
freq="h",
aggregation_methods={"temperature": "mean", "wind_gust": "max"},
extended_nwp_pars=["temperature", "wind_u", "wind_v"],
compute_uv=True,
)
temperature.zarr.zip: 1.28GB [00:26, 48.7MB/s]
wind_u.zarr.zip: 1.69GB [00:35, 47.5MB/s]
wind_v.zarr.zip: 1.70GB [00:34, 49.1MB/s]
Just like before, let’s look that temperature and wind speed for the stations of Adelboden, Grono and Kloten, this time with an hourly temporal resolution.
[9]:
df, mask = dataset.get_observations(
stations=["ABO", "GRO", "KLO"],
parameters=["temperature", "wind_speed", "humidity"],
first_date="2024-08-02 16:32",
last_date="2024-08-06 23:26",
return_mask=True,
)
[10]:
plot_weather_parameters(df, stations=["ABO", "GRO", "KLO"], parameters=["temperature", "wind_speed", "humidity"])
Let’s now compare data from the stations with forecasts from the NWP model. The ICON-CH1-EPS NWP model produces 11 ensemble members for an horizon of up to 33 hours ahead. The NWP model is initialized every 3 hours and by setting split="nwp_test", we can generate a subset of the test set with the windows aligned to the NWP model.
Other options are split="train" or split="test". In those cases, only observations windows will be produced, with a time delta between them that depends on the selected dataset frequency.
[11]:
windows = dataset.get_windows(
window_size=24, # number of lookback time steps
horizon_size=24, # number of lead times to be predicted
stations=["KLO", "ABO"],
split="nwp_test",
parameters=["temperature", "wind_u", "wind_v"],
nwp_parameters=["temperature", "wind_u", "wind_v"],
drop_extra_y_pars=False,
)
[12]:
# [num_windows, num_time_steps, num_stations, num_parameters]
print(f"""
Observations x shape: \t{windows.x.shape}
Observations mask_x shape: \t{windows.mask_x.shape}
Observations index_x shape: \t{windows.index_x.shape}
Observations y shape: \t{windows.y.shape}
Observations mask_y shape: \t{windows.mask_y.shape}
Observations index_y shape: \t{windows.index_y.shape}\n
""")
# [num_windows, num_time_steps, num_stations, num_parameters, num_ensemble_members]
print(f"NWP model fcasts shape: \t{windows.nwp.shape}\n")
print(f"""
{windows.x.shape[0]}\tNumber of windows
{windows.x.shape[1]}\tLookback window size
{windows.y.shape[1]}\tLead times to predict
{windows.x.shape[2]}\tNumber of considered stations
{windows.x.shape[3]}\tNumber of observed parameters
{windows.nwp.shape[3]}\tNumber of NWP forecasted parameters
{windows.nwp.shape[-1]}\tNumber of NWP ensemble members\n
""")
print(
f"{(windows.index_x[1][0] - windows.index_x[0][0]).total_seconds() / 3600}h"
"\t Time delta between windows"
)
Observations x shape: (3003, 24, 2, 3)
Observations mask_x shape: (3003, 24, 2, 3)
Observations index_x shape: (3003, 24)
Observations y shape: (3003, 24, 2, 3)
Observations mask_y shape: (3003, 24, 2, 3)
Observations index_y shape: (3003, 24)
NWP model fcasts shape: (11, 3003, 24, 2, 3)
3003 Number of windows
24 Lookback window size
24 Lead times to predict
2 Number of considered stations
3 Number of observed parameters
2 Number of NWP forecasted parameters
3 Number of NWP ensemble members
3.0h Time delta between windows
[13]:
fig, ax = plt.subplots(2, figsize=(10, 7))
colors = matplotlib.colormaps["tab10"]
w = 18
station = 1
index_to_station = {0: "KLO", 1: "ABO"}
# Plot temperature
ax[0].plot(
windows.index_x[w],
windows.x[w, :, station, 0],
label=f"Input {index_to_station[station]} observations",
color="black",
linestyle="dashed",
)
for ens_member in range(windows.nwp.shape[0]):
if ens_member == 0:
ax[0].plot(
windows.index_y[w],
windows.nwp[ens_member, w, :, station, 0],
label=f"NWP forecasts at {index_to_station[station]}",
color=colors(0)
)
else:
ax[0].plot(
windows.index_y[w],
windows.nwp[ens_member, w, :, station, 0],
color=colors(0)
)
ax[0].plot(
windows.index_y[w],
windows.y[w, :, station, 0],
label=f"Target {index_to_station[station]} observations",
color="black",
)
ax[0].set_title("Temperature")
ax[0].set_ylabel("Temperature (°C)")
ax[0].legend()
ax[0].xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d %H:%M"))
ax[0].xaxis.set_major_locator(mdates.AutoDateLocator())
ax[0].tick_params(axis="x", rotation=30)
ax[0].grid(True)
# Plot wind speed from u and v componented (eastward and northward)
ax[1].plot(
windows.index_x[w],
dataset.get_wind_speed(
u=windows.x[w, :, station, 1], v=windows.x[w, :, station, 2]
),
label=f"Input {index_to_station[station]} observations",
color="black",
linestyle="dashed",
)
for ens_member in range(windows.nwp.shape[0]):
nwp_wind_speed = dataset.get_wind_speed(
u=windows.nwp[..., 1], v=windows.nwp[..., 2]
)
if ens_member == 0:
ax[1].plot(
windows.index_y[w],
nwp_wind_speed[ens_member, w, :, station],
label=f"NWP forecasts at {index_to_station[station]}",
color=colors(1)
)
else:
ax[1].plot(
windows.index_y[w],
nwp_wind_speed[ens_member, w, :, station],
color=colors(1)
)
ax[1].plot(
windows.index_y[w],
dataset.get_wind_speed(
u=windows.y[w, :, station, 1], v=windows.y[w, :, station, 2]
),
label=f"Target {index_to_station[station]} observations",
color="black",
)
ax[1].set_title("Wind speed (from u and v)")
ax[1].set_ylabel("Wind speed (m/s)")
ax[1].legend()
ax[1].xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d %H:%M"))
ax[1].xaxis.set_major_locator(mdates.AutoDateLocator())
ax[1].tick_params(axis="x", rotation=30)
ax[1].grid(True)
plt.tight_layout()
plt.show()
PeakWeather can also output sliding windows as Xarray datasets, allowing the data to remain tightly coupled with its coordinates for easier indexing and analysis.
[14]:
icon_windows = dataset.get_windows(
window_size=24,
horizon_size=24,
stations=["KLO", "ABO"],
split="nwp_test",
parameters=["temperature", "wind_u", "wind_v"],
nwp_parameters=["temperature", "wind_u", "wind_v"],
as_xarray=True,
).nwp
icon_windows
[14]:
<xarray.Dataset> Size: 19MB
Dimensions: (reftime: 3003, lead: 24, nat_abbr: 2, realization: 11)
Coordinates:
* reftime (reftime) datetime64[ns] 24kB 2024-10-02 ... 2025-10-13
* lead (lead) timedelta64[ns] 192B 01:00:00 ... 1 days 00:00:00
* nat_abbr (nat_abbr) <U5 40B 'KLO' 'ABO'
* realization (realization) int64 88B 0 1 2 3 4 5 6 7 8 9 10
Data variables:
temperature (reftime, lead, nat_abbr, realization) float32 6MB 12.39 ......
wind_u (reftime, lead, nat_abbr, realization) float32 6MB 0.1056 .....
wind_v (reftime, lead, nat_abbr, realization) float32 6MB 0.9805 .....
Attributes:
LICENSE: CC-BY-4.0
parameter: Air temperature, Eastward wind, Northward wind.
period: 2017-01-01:2025-10-13
source: MeteoSwiss (ICON-CH1-EPS)
time_zone: UTCStatic attributes¶
Each station is either a rain_gauge, which measures precipitation (and sometimes temperature), or a meteo_station station, which records multiple weather parameters. Every station has a known geographic location, an elevation above sea level, and additional static attributes derived from topographic data interpolated to its position.
The full topographic features at a \(50m\) resolution can be loaded by initializing the PeakWeatherDataset dataset with the extended_topo_vars parameter.
[15]:
dataset.stations_table
[15]:
| station_name | latitude | longitude | station_height | swiss_easting | swiss_northing | ASPECT_2000M_SIGRATIO1 | WE_DERIVATIVE_2000M_SIGRATIO1 | TPI_2000M | SN_DERIVATIVE_10000M_SIGRATIO1 | dem | SN_DERIVATIVE_2000M_SIGRATIO1 | SLOPE_10000M_SIGRATIO1 | ASPECT_10000M_SIGRATIO1 | SLOPE_2000M_SIGRATIO1 | STD_2000M | STD_10000M | TPI_10000M | WE_DERIVATIVE_10000M_SIGRATIO1 | station_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| nat_abbr | ||||||||||||||||||||
| ABE | Aarberg | 47.057969 | 7.285350 | 444.00 | 2.588355e+06 | 1.211894e+06 | 310.688416 | 0.010961 | -6.219757 | -0.011349 | 442.315613 | -0.009424 | 0.872107 | 318.209137 | 0.828149 | 0.000000 | 44.313626 | -38.296173 | 0.010144 | rain_gauge |
| ABO | Adelboden | 46.491703 | 7.560703 | 1321.38 | 2.609372e+06 | 1.148939e+06 | 124.863129 | -0.167379 | -53.303345 | -0.026807 | 1317.771851 | 0.116605 | 1.702339 | 25.582031 | 11.529667 | 120.940262 | 374.830623 | -443.723877 | -0.012833 | meteo_station |
| AEG | Oberägeri | 47.133636 | 8.608206 | 724.43 | 2.688729e+06 | 1.220956e+06 | 190.899567 | 0.010865 | -20.030273 | -0.013760 | 724.173462 | 0.056423 | 1.099379 | 315.811829 | 3.288559 | 27.269887 | 141.396220 | -165.792114 | 0.013376 | meteo_station |
| AFI | Andelfingen | 47.604669 | 8.670289 | 360.00 | 2.692617e+06 | 1.273392e+06 | 193.235062 | 0.004745 | -12.622437 | -0.001833 | 357.506256 | 0.020173 | 0.299592 | 290.519653 | 1.187198 | 0.000000 | 31.905980 | -55.071655 | 0.004897 | rain_gauge |
| AGATT | Attelwil | 47.265233 | 8.050519 | 475.00 | 2.646308e+06 | 1.235106e+06 | 96.557785 | -0.020930 | -14.010315 | -0.009250 | 474.737488 | 0.002406 | 0.593415 | 333.268860 | 1.206895 | 0.000000 | 75.500339 | -105.056763 | 0.004659 | rain_gauge |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| WYN | Wynau | 47.255025 | 7.787475 | 421.99 | 2.626406e+06 | 1.233849e+06 | 319.106384 | 0.021612 | -16.944305 | -0.002912 | 421.595306 | -0.024955 | 0.167600 | 5.381668 | 1.890799 | 19.067440 | 16.002325 | -33.209930 | -0.000274 | meteo_station |
| ZER | Zermatt | 46.029272 | 7.752433 | 1638.35 | 2.624297e+06 | 1.097574e+06 | 126.072998 | -0.222615 | -173.187622 | 0.023503 | 1645.671875 | 0.162173 | 2.437473 | 123.514168 | 15.398737 | 213.252085 | 469.596215 | -791.969727 | -0.035491 | meteo_station |
| ZEV | Zervreila | 46.578797 | 9.118797 | 1738.00 | 2.728781e+06 | 1.159992e+06 | 109.527702 | -0.111410 | -240.550537 | -0.028125 | 1738.876587 | 0.039513 | 2.217149 | 43.410706 | 6.741610 | 146.893540 | 324.236704 | -571.910034 | -0.026606 | rain_gauge |
| ZWE | Zweisimmen | 46.550511 | 7.384917 | 936.00 | 2.595880e+06 | 1.155471e+06 | 273.443573 | 0.167828 | -93.677673 | -0.005795 | 939.551575 | -0.010099 | 2.334606 | 278.171326 | 9.543953 | 118.914438 | 316.598639 | -477.413879 | 0.040355 | rain_gauge |
| ZWK | Zwillikon | 47.290092 | 8.431958 | 461.00 | 2.675139e+06 | 1.238164e+06 | 166.458527 | -0.001038 | -20.879974 | 0.011795 | 461.351562 | 0.004308 | 1.648120 | 245.798660 | 0.253896 | 0.000000 | 93.148896 | -49.712555 | 0.026244 | rain_gauge |
302 rows × 20 columns
Detailed Dataset Initialisation¶
In the previous section, the dataset was initialised by relying on certain default parameters. Below you can find an example of more detailed initialisation. Refer to the official documentation for additional details.
[16]:
ds = PeakWeatherDataset(
root=None, # Path to the dataset
pad_missing_values=True, # Pad missing values with NaN
years=None, # Years to include in the dataset (None for all)
parameters=None, # Parameters to include in the dataset (None for all)
extended_topo_vars="none", # Optional extended topographic variables
extended_nwp_pars="none", # Optional extended NWP model (ICON) variables
imputation_method="zero", # Method for imputing missing values
freq="h", # Frequency of the data (e.g., "h" for hourly)
compute_uv=True, # Compute u and v (zonal and meridional) components of wind
station_type="meteo_station", # Which station type to load (None for all)
aggregation_methods={'temperature': 'mean'} # Use specific temporal aggregation
)
Topographic features¶
The dataset comes with a digital elevation model (DEM) and topographic features at a \(50m\) resolution. They can be downloaded with the extended_topo_vars parameter. The values are provided with the swiss EPSG:2056 coordinate system. As an example, let’s look at the DEM and aspect of the terrain over the swiss radar domain.
[17]:
ds = PeakWeatherDataset(
extended_topo_vars=["DEM", "ASPECT_10000M_SIGRATIO1" ]
)
ds.load_topography()['topo_DEM']
ASPECT_10000M_SIGRATIO1.zarr.zip: 417MB [00:08, 48.2MB/s]
DEM.zarr.zip: 557MB [00:10, 51.2MB/s]
[17]:
<xarray.Dataset> Size: 863MB
Dimensions: (y: 14002, x: 15402)
Coordinates:
* x (x) float64 123kB 2.225e+06 2.225e+06 ... 2.995e+06 2.995e+06
* y (y) float64 112kB 8.1e+05 8.1e+05 8.101e+05 ... 1.51e+06 1.51e+06
Data variables:
dem (y, x) float32 863MB ...
Attributes:
crs: EPSG:2056
description: Topographic descriptors computed from a blend of Dig...
domain: Swiss radar domain
history: Produced by topo-workflow
source: DHM25, DTM, NASADEM
spatial_resolution: 50 m[18]:
import matplotlib.pyplot as plt
da = ds.load_topography()['topo_DEM']["dem"][::5, ::5] # skip every 5 pixels
fig, ax = plt.subplots(1,2, figsize=(12, 5))
ax[0].imshow(
da.values,
origin="lower",
extent=[
da.x.min().item(),
da.x.max().item(),
da.y.min().item(),
da.y.max().item(),
],
aspect="auto",
)
ax[0].set_xlabel("x")
ax[0].set_ylabel("y")
ax[0].set_title("DEM")
da = ds.load_topography()['topo_ASPECT_10000M_SIGRATIO1']['ASPECT_10000M_SIGRATIO1'][::5, ::5] # skip every 5 pixels
ax[1].imshow(
da.values,
origin="lower",
extent=[
da.x.min().item(),
da.x.max().item(),
da.y.min().item(),
da.y.max().item(),
],
aspect="auto",
)
ax[1].set_xlabel("x")
ax[1].set_ylabel("y")
ax[1].set_title("ASPECT")
plt.show()