PeakWeather Demo¶

[1]:

import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import matplotlib

import peakweather

print(peakweather.__version__)

from peakweather import PeakWeatherDataset

0.2.2

Loading the data¶

To get the dataset, simply initialize a PeakWeatherDataset object. This will, by default, download the data in the current working directory, unless it was previously downloaded.

[2]:

dataset = PeakWeatherDataset()

stations.parquet: 57.3kB [00:00, 194kB/s]
installation.parquet: 90.1kB [00:00, 348kB/s]
parameters.parquet: 8.19kB [00:00, 34.5kB/s]
disclaimer.txt: 8.19kB [00:00, 36.4kB/s]
2017.parquet: 56.9MB [00:01, 41.3MB/s]
2018.parquet: 57.3MB [00:01, 39.9MB/s]
2019.parquet: 57.5MB [00:01, 40.5MB/s]
2020.parquet: 57.1MB [00:01, 41.2MB/s]
2021.parquet: 56.5MB [00:01, 39.7MB/s]
2022.parquet: 56.8MB [00:01, 40.7MB/s]
2023.parquet: 57.5MB [00:01, 38.6MB/s]
2024.parquet: 57.5MB [00:01, 41.0MB/s]
2025.parquet: 45.1MB [00:01, 38.6MB/s]

With get_observations you can load the timeseries data as a dataframe or array. It is possible to load specific spatial and temporal subsets and to obtain a binary mask that specifies the availability of each measurement.

[3]:

df, mask = dataset.get_observations(
    stations=["ABO", "GRO", "KLO"],  # list of stations
    parameters=["temperature", "wind_speed", "humidity"],  # list of weather parameters
    first_date="2024-08-02 16:32",
    last_date="2024-08-06 23:26",
    return_mask=True,
)

df.head()

[3]:

nat_abbr	ABO			GRO			KLO
name	temperature	wind_speed	humidity	temperature	wind_speed	humidity	temperature	wind_speed	humidity
datetime
2024-08-02 16:40:00+00:00	19.400000	2.0	67.000000	24.100000	0.8	72.500000	26.299999	2.8	50.000000
2024-08-02 16:50:00+00:00	19.600000	2.7	67.300003	23.100000	0.9	79.099998	25.500000	3.2	56.200001
2024-08-02 17:00:00+00:00	19.200001	2.0	71.199997	22.900000	1.4	80.800003	25.400000	2.7	55.799999
2024-08-02 17:10:00+00:00	19.000000	0.9	73.900002	23.000000	2.3	78.300003	25.200001	2.7	57.000000
2024-08-02 17:20:00+00:00	19.100000	1.5	71.099998	22.799999	2.3	78.199997	25.000000	2.3	58.000000

[4]:

mask.head()

[4]:

nat_abbr	ABO			GRO			KLO
name	temperature	wind_speed	humidity	temperature	wind_speed	humidity	temperature	wind_speed	humidity
datetime
2024-08-02 16:40:00+00:00	True	True	True	True	True	True	True	True	True
2024-08-02 16:50:00+00:00	True	True	True	True	True	True	True	True	True
2024-08-02 17:00:00+00:00	True	True	True	True	True	True	True	True	True
2024-08-02 17:10:00+00:00	True	True	True	True	True	True	True	True	True
2024-08-02 17:20:00+00:00	True	True	True	True	True	True	True	True	True

Visualization of the data¶

[5]:

def plot_weather_parameters(df, stations, parameters):
    fig, ax = plt.subplots(len(parameters), figsize=(12, 7))

    time_freq = mdates.date2num(df.index[1]) - mdates.date2num(df.index[0])
    time_freq_minutes = int(time_freq * 24 * 60)

    colors = matplotlib.colormaps["tab10"]

    for i, param in enumerate(parameters):
        for station in stations:
            ax[i].plot(
                df.index,
                df[station, param],
                label=f"{station} {param}",
                color=colors(stations.index(station)),
            )
        ax[i].set_title(f"{param.capitalize()} ({time_freq_minutes} min)")
        ax[i].set_ylabel(f"{param.capitalize()} ({dataset.parameters_table['unit'][param]})")
        ax[i].legend()
        ax[i].grid(True)

    plt.tight_layout()
    plt.show()

[6]:

plot_weather_parameters(df, stations=["ABO", "GRO", "KLO"], parameters=["temperature", "wind_speed", "humidity"])

../_images/examples_peakweather_demo_9_0.png

Time series windowing¶

The get_windows function extracts sliding windows from the time series using a look-back window w and forecast horizon h. This prepares the data in a format that’s easy to use with framework-specific datasets and data loaders (e.g., PyTorch, TensorFlow, JAX).

This returns arrays of shape [n_w, w, n_s, n_f] for the inputs x and [n_w, h, n_s, n_f] for the outputs y, where n_w is the number of windows (or examples/samples), n_s is the number of stations and n_f is the number of features.

[7]:

windows = dataset.get_windows(
    window_size=24,  # number of lookback time steps
    horizon_size=6,  # number of lead times to be predicted
    parameters=["temperature", "wind_speed", "humidity"],
    stations=["ABO", "KLO", "GRO", "LUG"],
)

# [num_windows, num_time_steps, num_stations, num_parameters]
print(f"Windows x shape: \t{windows.x.shape}")
print(f"Windows mask_x shape: \t{windows.mask_x.shape}")
print(f"Windows y shape: \t{windows.y.shape}")
print(f"Windows mask_y shape: \t{windows.mask_y.shape}")

Windows x shape:        (461923, 24, 4, 3)
Windows mask_x shape:   (461923, 24, 4, 3)
Windows y shape:        (461923, 6, 4, 3)
Windows mask_y shape:   (461923, 6, 4, 3)

Static attributes¶

Each station is either a rain_gauge, which measures precipitation (and sometimes temperature), or a meteo_station station, which records multiple weather parameters. Every station has a known geographic location, an elevation above sea level, and additional static attributes derived from topographic data interpolated to its position.

The full topographic features at a \(50m\) resolution can be loaded by initializing the PeakWeatherDataset dataset with the extended_topo_vars parameter.

[15]:

dataset.stations_table

[15]:

	station_name	latitude	longitude	station_height	swiss_easting	swiss_northing	ASPECT_2000M_SIGRATIO1	WE_DERIVATIVE_2000M_SIGRATIO1	TPI_2000M	SN_DERIVATIVE_10000M_SIGRATIO1	dem	SN_DERIVATIVE_2000M_SIGRATIO1	SLOPE_10000M_SIGRATIO1	ASPECT_10000M_SIGRATIO1	SLOPE_2000M_SIGRATIO1	STD_2000M	STD_10000M	TPI_10000M	WE_DERIVATIVE_10000M_SIGRATIO1	station_type
nat_abbr
ABE	Aarberg	47.057969	7.285350	444.00	2.588355e+06	1.211894e+06	310.688416	0.010961	-6.219757	-0.011349	442.315613	-0.009424	0.872107	318.209137	0.828149	0.000000	44.313626	-38.296173	0.010144	rain_gauge
ABO	Adelboden	46.491703	7.560703	1321.38	2.609372e+06	1.148939e+06	124.863129	-0.167379	-53.303345	-0.026807	1317.771851	0.116605	1.702339	25.582031	11.529667	120.940262	374.830623	-443.723877	-0.012833	meteo_station
AEG	Oberägeri	47.133636	8.608206	724.43	2.688729e+06	1.220956e+06	190.899567	0.010865	-20.030273	-0.013760	724.173462	0.056423	1.099379	315.811829	3.288559	27.269887	141.396220	-165.792114	0.013376	meteo_station
AFI	Andelfingen	47.604669	8.670289	360.00	2.692617e+06	1.273392e+06	193.235062	0.004745	-12.622437	-0.001833	357.506256	0.020173	0.299592	290.519653	1.187198	0.000000	31.905980	-55.071655	0.004897	rain_gauge
AGATT	Attelwil	47.265233	8.050519	475.00	2.646308e+06	1.235106e+06	96.557785	-0.020930	-14.010315	-0.009250	474.737488	0.002406	0.593415	333.268860	1.206895	0.000000	75.500339	-105.056763	0.004659	rain_gauge
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
WYN	Wynau	47.255025	7.787475	421.99	2.626406e+06	1.233849e+06	319.106384	0.021612	-16.944305	-0.002912	421.595306	-0.024955	0.167600	5.381668	1.890799	19.067440	16.002325	-33.209930	-0.000274	meteo_station
ZER	Zermatt	46.029272	7.752433	1638.35	2.624297e+06	1.097574e+06	126.072998	-0.222615	-173.187622	0.023503	1645.671875	0.162173	2.437473	123.514168	15.398737	213.252085	469.596215	-791.969727	-0.035491	meteo_station
ZEV	Zervreila	46.578797	9.118797	1738.00	2.728781e+06	1.159992e+06	109.527702	-0.111410	-240.550537	-0.028125	1738.876587	0.039513	2.217149	43.410706	6.741610	146.893540	324.236704	-571.910034	-0.026606	rain_gauge
ZWE	Zweisimmen	46.550511	7.384917	936.00	2.595880e+06	1.155471e+06	273.443573	0.167828	-93.677673	-0.005795	939.551575	-0.010099	2.334606	278.171326	9.543953	118.914438	316.598639	-477.413879	0.040355	rain_gauge
ZWK	Zwillikon	47.290092	8.431958	461.00	2.675139e+06	1.238164e+06	166.458527	-0.001038	-20.879974	0.011795	461.351562	0.004308	1.648120	245.798660	0.253896	0.000000	93.148896	-49.712555	0.026244	rain_gauge

302 rows × 20 columns

Detailed Dataset Initialisation¶

In the previous section, the dataset was initialised by relying on certain default parameters. Below you can find an example of more detailed initialisation. Refer to the official documentation for additional details.

[16]:

ds = PeakWeatherDataset(
        root=None,  # Path to the dataset
        pad_missing_values=True,  # Pad missing values with NaN
        years=None,  # Years to include in the dataset (None for all)
        parameters=None,  # Parameters to include in the dataset (None for all)
        extended_topo_vars="none",  # Optional extended topographic variables
        extended_nwp_pars="none",  # Optional extended NWP model (ICON) variables
        imputation_method="zero",  # Method for imputing missing values
        freq="h",  # Frequency of the data (e.g., "h" for hourly)
        compute_uv=True,  # Compute u and v (zonal and meridional) components of wind
        station_type="meteo_station",  # Which station type to load (None for all)
        aggregation_methods={'temperature': 'mean'} # Use specific temporal aggregation
    )