peakweather.utils¶

df_add_missing_columns(df: DataFrame, col0=None, col1=None, fill_value=nan) → DataFrame¶

Add missing columns to a MultiIndex DataFrame with NaN values.

Parameters:

df (pd.DataFrame) – The input DataFrame.
col0 (list, optional) – The first level of the MultiIndex columns. If None, will use the existing columns.
col1 (list, optional) – The second level of the MultiIndex columns. If None, will use the existing columns.
fill_value (scalar) – The value to use for missing columns. Default is np.nan.

Returns:

The DataFrame with missing columns added.

Return type:

pd.DataFrame

sliding_window_view(data: ndarray, window_size: int) → ndarray¶

Creates a sliding window view of the input data.

Parameters:

Returns:

The sliding window view of the input data with shape: (num_windows, window_size, *).

Return type:

np.ndarray

timestamps_from_xr(ds: xr.Dataset, delta: str, tz: str | None = 'UTC') → ndarray¶

Compute a 2D array of timezone-aware timestamps by combining a reference time coordinate with a time-delta coordinate.

Parameters:

ds (xarray.Dataset) – The input dataset containing a reftime coordinate of type datetime64[ns] and a time-delta coordinate.
delta (str) – The name of the offset coordinate (e.g., lag or lead) of type timedelta64[ns].
tz (str) – Timezone.

Returns:

A 2D array of shape (num_reftime, num_deltas) containing: pandas.Timestamp objects localized to UTC, where each entry is reftime[i] + offset[j].

Return type:

np.ndarray

to_pandas_freq(freq: str)¶

Convert a frequency string to a pandas frequency object.

xr_to_np(a: xr.Dataset, pars: list | None = None, sample_dim: int | None = None, stack_dim: int = -1) → ndarray¶

Extract variables from an Dataset and return them as a stacked ndarray.

Parameters:

a (xarray.Dataset) – The input dataset containing one or more data variables.
pars (list[str], optional) – The names of the variables to extract. If None, all data variables in a are used.
sample_dim (int, optional) – The dimension containing the samples in a, if present. If sample_dim is an int, the sample_dim dimension is rearranged as leading dimension (samples, a.shape[~sample_dim]); None indicates no sampling dimension to be moved.
stack_dim (int) – The dimension along which the arrays are stacked.

Returns:

A NumPy array where the selected variables are stacked along: the last axis. If each variable has shape (*dims), the returned array has shape (*dims, num_vars).

Return type:

np.ndarray