peakweather.utils

df_add_missing_columns(df: DataFrame, col0=None, col1=None, fill_value=nan) DataFrame

Add missing columns to a MultiIndex DataFrame with NaN values.

Parameters:
  • df (pd.DataFrame) – The input DataFrame.

  • col0 (list, optional) – The first level of the MultiIndex columns. If None, will use the existing columns.

  • col1 (list, optional) – The second level of the MultiIndex columns. If None, will use the existing columns.

  • fill_value (scalar) – The value to use for missing columns. Default is np.nan.

Returns:

The DataFrame with missing columns added.

Return type:

pd.DataFrame

sliding_window_view(data: ndarray, window_size: int) ndarray

Creates a sliding window view of the input data.

Parameters:
  • data (np.ndarray) – The input data with shape (num_time_steps, *).

  • window_size (int) – The size of the sliding window.

Returns:

The sliding window view of the input data with shape

(num_windows, window_size, *).

Return type:

np.ndarray

timestamps_from_xr(ds: xr.Dataset, delta: str, tz: str | None = 'UTC') ndarray

Compute a 2D array of timezone-aware timestamps by combining a reference time coordinate with a time-delta coordinate.

Parameters:
  • ds (xarray.Dataset) – The input dataset containing a reftime coordinate of type datetime64[ns] and a time-delta coordinate.

  • delta (str) – The name of the offset coordinate (e.g., lag or lead) of type timedelta64[ns].

  • tz (str) – Timezone.

Returns:

A 2D array of shape (num_reftime, num_deltas) containing

pandas.Timestamp objects localized to UTC, where each entry is reftime[i] + offset[j].

Return type:

np.ndarray

to_pandas_freq(freq: str)

Convert a frequency string to a pandas frequency object.

Parameters:

freq (str) – The frequency string.

Returns:

The pandas frequency object.

Return type:

pd.DateOffset

Raises:

ValueError – If the frequency string is not valid.

xr_to_np(a: xr.Dataset, pars: list | None = None, sample_dim: int | None = None, stack_dim: int = -1) ndarray

Extract variables from an Dataset and return them as a stacked ndarray.

Parameters:
  • a (xarray.Dataset) – The input dataset containing one or more data variables.

  • pars (list[str], optional) – The names of the variables to extract. If None, all data variables in a are used.

  • sample_dim (int, optional) – The dimension containing the samples in a, if present. If sample_dim is an int, the sample_dim dimension is rearranged as leading dimension (samples, a.shape[~sample_dim]); None indicates no sampling dimension to be moved.

  • stack_dim (int) – The dimension along which the arrays are stacked.

Returns:

A NumPy array where the selected variables are stacked along

the last axis. If each variable has shape (*dims), the returned array has shape (*dims, num_vars).

Return type:

np.ndarray