API Reference

The 4 key modules

  1. DataPreparer

  2. QualityController

  3. GaugeVsGriddedCorrelator

  4. CEHGEARSubDailyProducer

rainfall_gridder.DataPreparer(rainfall_data, rainfall_metadata, station_id_col, station_name_col, precipitation_col, date_time_col, start_date_col, end_date_col, easting_col, northing_col, gridded_rainfall_data, gridded_rainfall_col, rainfall_offset_hours, output_dir, min_n_timesteps, verbose=False)[source]

Main data preparing algorithm.

Parameters:
  • rainfall_data (DataFrame)

  • rainfall_metadata (DataFrame)

  • station_id_col (str)

  • station_name_col (str)

  • precipitation_col (str)

  • date_time_col (str)

  • start_date_col (str)

  • end_date_col (str)

  • easting_col (str)

  • northing_col (str)

  • gridded_rainfall_data (Dataset)

  • gridded_rainfall_col (str)

  • rainfall_offset_hours (int)

  • output_dir (str | Path)

  • min_n_timesteps (int)

  • verbose (bool)

rainfall_gridder.QualityController(rainfall_data, rainfall_metadata, station_id_col, station_name_col, date_time_col, precipitation_col, easting_col, northing_col, start_date_col, end_date_col, input_crs, min_n_timesteps, output_dir, time_res, smallest_rainfall_amount, min_n_neighbours, qc_framework, nearby_rainfall_data_loader_kwargs={}, verbose=False)[source]

Main quality control running algorithm.

Parameters:
  • rainfall_data (DataFrame)

  • rainfall_metadata (DataFrame)

  • station_id_col (str)

  • station_name_col (str)

  • date_time_col (str)

  • precipitation_col (str)

  • easting_col (str)

  • northing_col (str)

  • start_date_col (str)

  • end_date_col (str)

  • input_crs (str)

  • min_n_timesteps (int)

  • output_dir (str | Path)

  • time_res (str)

  • smallest_rainfall_amount (int | float)

  • min_n_neighbours (int)

  • qc_framework (str)

  • nearby_rainfall_data_loader_kwargs (dict)

  • verbose (bool)

rainfall_gridder.GaugeVsGriddedCorrelator(gauge_data, gauge_metadata, nearest_gridded_daily, station_id, precipitation_col, gridded_rainfall_col, date_time_col, start_date_col, end_date_col, station_id_col, easting_col, northing_col, rainfall_offset_hours, aggregate_gauge_to_daily=True)[source]
Parameters:
  • gauge_data (DataFrame)

  • gauge_metadata (DataFrame)

  • nearest_gridded_daily (Dataset)

  • station_id (str)

  • precipitation_col (str)

  • gridded_rainfall_col (str)

  • date_time_col (str)

  • start_date_col (str)

  • end_date_col (str)

  • station_id_col (str)

  • easting_col (str)

  • northing_col (str)

  • rainfall_offset_hours (int)

  • aggregate_gauge_to_daily (bool)

rainfall_gridder.CEHGEARSubDailyProducer(rainfall_data, rainfall_metadata, station_id_col, time_step, time_res, precipitation_col, easting_col, northing_col, date_time_col, hour_at_start_of_day, verbose)[source]
Parameters:
  • rainfall_data (DataFrame)

  • rainfall_metadata (DataFrame)

  • station_id_col (str)

  • time_step (datetime)

  • time_res (str)

  • precipitation_col (str)

  • easting_col (str)

  • northing_col (str)

  • date_time_col (str)

  • hour_at_start_of_day (int)

  • verbose (bool)

Full API

Top-level package for RainfallGridder.

class rainfall_gridder.CEHGEARSubDailyProducer(rainfall_data, rainfall_metadata, station_id_col, time_step, time_res, precipitation_col, easting_col, northing_col, date_time_col, hour_at_start_of_day, verbose)[source]

Bases: object

Methods

calculate_distance_grid

get_cells_to_stat_disag

get_subdaily_rainfall_factors

produce_ceh_gear

run_interpolation

Parameters:
  • rainfall_data (DataFrame)

  • rainfall_metadata (DataFrame)

  • station_id_col (str)

  • time_step (datetime)

  • time_res (str)

  • precipitation_col (str)

  • easting_col (str)

  • northing_col (str)

  • date_time_col (str)

  • hour_at_start_of_day (int)

  • verbose (bool)

calculate_distance_grid(land_mask, gauge_x_grid=None, gauge_y_grid=None)[source]
Return type:

DataArray

Parameters:
  • land_mask (DataArray)

  • gauge_x_grid (DataArray)

  • gauge_y_grid (DataArray)

get_cells_to_stat_disag(land_mask, daily_totals_grid, distance_grid=None, max_distance_to_gauge_m=50000)[source]
Return type:

DataArray

Parameters:
  • land_mask (DataArray)

  • daily_totals_grid (DataArray)

  • distance_grid (DataArray)

  • max_distance_to_gauge_m (int)

get_subdaily_rainfall_factors(land_mask, daily_totals_grid, one_day_gridded_daily, cells_to_stat_disag, gridded_rainfall_col)[source]
Return type:

Dataset

Parameters:
  • land_mask (DataArray)

  • daily_totals_grid (DataArray)

  • one_day_gridded_daily (Dataset)

  • cells_to_stat_disag (DataArray)

  • gridded_rainfall_col (str)

produce_ceh_gear(land_mask, one_day_gridded_daily, gridded_rainfall_col, output_rainfall_name='rainfall')[source]
Return type:

Dataset

Parameters:
  • land_mask (DataArray)

  • one_day_gridded_daily (Dataset)

  • gridded_rainfall_col (str)

  • output_rainfall_name (str)

run_interpolation(x_coords, y_coords, x_grid, y_grid)[source]
Parameters:
  • x_coords (DataArray)

  • y_coords (DataArray)

  • x_grid (DataArray)

  • y_grid (DataArray)

class rainfall_gridder.DataPreparer(rainfall_data, rainfall_metadata, station_id_col, station_name_col, precipitation_col, date_time_col, start_date_col, end_date_col, easting_col, northing_col, gridded_rainfall_data, gridded_rainfall_col, rainfall_offset_hours, output_dir, min_n_timesteps, verbose=False)[source]

Bases: object

Main data preparing algorithm.

Methods

run(save_data, return_data[, ...])

Run the data preparer and return and/or save the prepared data.

save_prepared_data([partition_by_columns])

Save data that has been prepared for gridding.

prepare_data_and_metadata_for_gridding

save_prepared_metadata

Parameters:
  • rainfall_data (DataFrame)

  • rainfall_metadata (DataFrame)

  • station_id_col (str)

  • station_name_col (str)

  • precipitation_col (str)

  • date_time_col (str)

  • start_date_col (str)

  • end_date_col (str)

  • easting_col (str)

  • northing_col (str)

  • gridded_rainfall_data (Dataset)

  • gridded_rainfall_col (str)

  • rainfall_offset_hours (int)

  • output_dir (str | Path)

  • min_n_timesteps (int)

  • verbose (bool)

prepare_data_and_metadata_for_gridding()[source]
Return type:

None

classmethod run(save_data, return_data, partition_by_columns=None, **kwargs)[source]

Run the data preparer and return and/or save the prepared data.

Parameters:
  • save_data: – Whether to save data to output directory

  • return_data: – Whether to return dataframes

  • partition_by_columns: – List of columns to partition the parquet files by if saving outputs

  • save_data (bool)

  • return_data (bool)

  • partition_by_columns (list)

Return type:

None | tuple[DataFrame, DataFrame]

Returns:

:
prepared_data:

Data run through algorithm

prepared_metadata:

Metadata of data run through algorithm

save_prepared_data(partition_by_columns=None)[source]

Save data that has been prepared for gridding.

Parameters:
  • partition_by_columns: – Columns that decide the partitioning of the output parquet file structure (default is station_id_col)

  • partition_by_columns (list)

Return type:

None

save_prepared_metadata()[source]
Return type:

None

class rainfall_gridder.GaugeVsGriddedCorrelator(gauge_data, gauge_metadata, nearest_gridded_daily, station_id, precipitation_col, gridded_rainfall_col, date_time_col, start_date_col, end_date_col, station_id_col, easting_col, northing_col, rainfall_offset_hours, aggregate_gauge_to_daily=True)[source]

Bases: object

Methods

get_corr

Parameters:
  • gauge_data (DataFrame)

  • gauge_metadata (DataFrame)

  • nearest_gridded_daily (Dataset)

  • station_id (str)

  • precipitation_col (str)

  • gridded_rainfall_col (str)

  • date_time_col (str)

  • start_date_col (str)

  • end_date_col (str)

  • station_id_col (str)

  • easting_col (str)

  • northing_col (str)

  • rainfall_offset_hours (int)

  • aggregate_gauge_to_daily (bool)

get_corr()[source]
class rainfall_gridder.QualityController(rainfall_data, rainfall_metadata, station_id_col, station_name_col, date_time_col, precipitation_col, easting_col, northing_col, start_date_col, end_date_col, input_crs, min_n_timesteps, output_dir, time_res, smallest_rainfall_amount, min_n_neighbours, qc_framework, nearby_rainfall_data_loader_kwargs={}, verbose=False)[source]

Bases: object

Main quality control running algorithm.

Methods

run(save_data, return_data[, ...])

Run the quality controller and return and/or save the prepared data.

save_qcd_data([partition_by_columns])

Save data that has been quality controlled for gridding.

update_shared_qc_kwargs(...)

Update all the shared keyword arguments.

get_nearest_neighbour

quality_control_data

save_qc_rulebase_summary

save_qcd_metadata

save_summary_of_qc

Parameters:
  • rainfall_data (DataFrame)

  • rainfall_metadata (DataFrame)

  • station_id_col (str)

  • station_name_col (str)

  • date_time_col (str)

  • precipitation_col (str)

  • easting_col (str)

  • northing_col (str)

  • start_date_col (str)

  • end_date_col (str)

  • input_crs (str)

  • min_n_timesteps (int)

  • output_dir (str | Path)

  • time_res (str)

  • smallest_rainfall_amount (int | float)

  • min_n_neighbours (int)

  • qc_framework (str)

  • nearby_rainfall_data_loader_kwargs (dict)

  • verbose (bool)

get_nearest_neighbour(nearby_rainfall_data_loader, station_id)[source]
quality_control_data()[source]
classmethod run(save_data, return_data, partition_by_columns=None, **kwargs)[source]

Run the quality controller and return and/or save the prepared data.

Parameters:
  • save_data: – Whether to save data to output directory

  • return_data: – Whether to return dataframes

  • partition_by_columns: – List of columns to partition the parquet files by if saving outputs

  • save_data (bool)

  • return_data (bool)

  • partition_by_columns (list)

Return type:

None | tuple[DataFrame, DataFrame]

Returns:

:
qc_data:

Data run through algorithm

qc_metadata:

Metadata of data run through algorithm

save_qc_rulebase_summary()[source]
Return type:

None

save_qcd_data(partition_by_columns=None)[source]

Save data that has been quality controlled for gridding.

Parameters:
  • partition_by_columns: – Columns that decide the partitioning of the output parquet file structure (default is station_id_col)

  • partition_by_columns (list)

Return type:

None

save_qcd_metadata()[source]
Return type:

None

save_summary_of_qc()[source]
Return type:

None

update_shared_qc_kwargs(nearby_rainfall_data_loader)[source]

Update all the shared keyword arguments.

TODO: Check this updating in the loop properly.

Return type:

None

Parameters:

nearby_rainfall_data_loader (NearbyRainfallDataLoader)