API Reference¶

Rain gauge matching¶

`create_output_dataframes(matches)` ¶

Converts list of match classes into output dataframes with nested station class objects broken out into said output dataframes

Parameters:

Name	Type	Description	Default
`matches`	`list`	list of match class objects	required

Returns:

Type	Description
`csv`	pandas.Dataframe saved as csv for each type of match available (accepted, rank-rejected, auto-rejected)

`generate_manual_station_matching_notebook(output_dir)` ¶

Generate manual station matching ipynb from template (either allowing for backups or not).

Parameters:

Name	Type	Description	Default
`output_dir`	`str`	Path to outputs	required

`generate_manual_station_matching_script(output_dir, matching_script)` ¶

Generate manual station matching script from template (either allowing for backups or not).

Parameters:

Name	Type	Description	Default
`output_dir`	`str`	Path to outputs	required
`matching_script`		Which template script to copy across i.e. with or without backups	required

`run_matching_algorithm(left_df, right_df, save_outputs_to_csv=False, output_dir='outputs', save_manual_matching_script=False, save_manual_matching_notebook=False, allow_backups=False, overwrite_existing=False)` ¶

Wrapper function to run matching algorithm for two sets of stations / gauges

Parameters:

Name	Type	Description	Default
`left_df`	`DataFrame`	contains rows of stations with names and co-ordinates	required
`right_df`	`DataFrame`	contains rows of stations with names and co-ordinates	required
`save_outputs_to_csv`	`bool`	Whether to save the outputs to csv (default: False)	`False`
`output_dir`	`str`	Path to outputs	`'outputs'`
`save_manual_matching_script`	`bool`	Whether to generate manual matching script (default: False)	`False`
`save_manual_matching_notebook`	`bool`	Whether to generate manual matching noebook (default: False)	`False`
`allow_backups`	`bool`	Whether the outputted manual station matching should allow for backups (default: False)	`False`
`overwrite_existing`	`bool`	Whether to overwrite existing data, scripts and/or notebooks under output_dir	`False`

Returns:

Type	Description
`list`	contains Match class objects for pairs which returned a match
`DataFrame`	dataframe with each row containing an automatically accepted match for two stations
`DataFrame`	dataframe with each row containing an rank-rejected match for two stations (it scored worse than an accepted match for the left-hand station)
`DataFrame`	dataframe with each row containing an automatically rejected match for two stations (match was detected but with a worse score than the threshold so subject to manual review)

Rain gauge comparison¶

`row_from_comparison(comparison)` ¶

Generate dictionary from comparison object that will form row of output dataframe

Parameters:

Name	Type	Description	Default
`comparison`	`RainGauge_Comparison`	comparison containing attributes to be stored in dictionary	required

Returns:

Type	Description
`dict`	dictionary containing information gathered from class attributes

`run_comparison_algorithm(gauge_pair_metadata, left_hand_file_path, right_hand_file_path, datetime_format)` ¶

Wrapper function to run comparison on timeseries at two gauges and store output

Parameters:

Name	Type	Description	Default
`gauge_pairs`		each sub-list contains a pair of gauge-ids as strings	required
`left_hand_file_path`	`str`	location of timeseries information containing left-hand gauge timeseries file	required
`right_hand_file_path`	`str`	location of timeseries information containing right-hand gauge timeseries file	required

Returns:

Type	Description
`DataFrame`	dataframe with each row as a summary comparison of gauge timeseries

Rain gauge matching classes¶

`Comparison` ¶

Bases: object

An object comprised of two RainGauges with attributes and methods for outlining basic statistical information about the gauge timeseries

`init(primary_gauge, secondary_gauge)` ¶

Attributes:

Name	Type	Description
`primary_gauge`	`RainGauge_Comparison`	gauge from the primary network
`secondary_gauge`	`RainGauge_Comparison`	gauge from the secondary network
`raw_timeseries`	`DataFrame`	contains all time-steps of timeseries data for overlapping period of both gauges start and end dates
`timeseries`	`DataFrame`	contains non-NaN timeseries data for overlapping period of both gauges start and end dates
`non_zero_timeseries`	`DataFrame`	contains non-zero timeseries data for overlapping period of both gauges start and end dates
`overlap_start_date`	`DateTime`	first timestep of overlap between self.primary_gauge.timeseries and self.secondary_gauge.timeseries
`overlap_end_date`	`DateTime`	final timestep of overlap between self.primary_gauge.timeseries and self.secondary_gauge.timeseries
`overlap_length`	`TimeDelta`	length of overlap from first shared timestep to last shared timestep
`overlap_timesteps`	`int`	count of timesteps in overlapping period
`good_timesteps`	`int`	count of timesteps where both gauges record non-NaN values
`nan_timesteps`	`int`	count of timesteps where one or both gauges record NaN value
`sum_of_good_timesteps`	`TimeDelta`	sum of good timesteps shared by both timeseries
`identical_rows`	`int`	count of timesteps with identical measured parameter values or where both are NaN
`identical_non_nan_rows`	`int`	count of timesteps with identical measured parameter values
`identical_rows_percentage`	`str`	print out of identical timesteps / total timesteps as a percentage
`identical_non_nan_rows_percentage`	`str`	print out of identical non-NaN timesteps / total non-NaN timesteps as a percentage
`primary_accumulation`	`float`	sum of measured parameter at primary gauge across entire overlap period
`secondary_accumulation`	`float`	sum of measured parameter at secondary gauge across entire overlap period
`accumulation_difference`	`float`	absolute difference between primary and secondary accumulation
`accumulation_difference_percentage`	`str`	print out of (primary accumulation / secondary accumulation) - 1 as a percentage, [-100%, +100%] for nothing in primary secondary respectively
`r_squared`	`float`	r-squared value for overlapping period of two gauges
`spcc`	`float`	Spearmans correlation coefficient for overlapping period of two gauges
`non_zero_r_sqaured`	`float`	r-squared value for non-zero timesteps during overlapping period two gauges
`non_zero_spcc`	`float`	Spearmans correlation coefficient for non-zero timesteps during overlapping period two gauges

`get_accumulation_information()` ¶

Calculate summary statistics for accumulation at pair of stations / gauges

`get_overlap()` ¶

Identifies overlapping period between start and end dates of stations / gauges within pair

`get_row_information()` ¶

Calculate summary statistics for timestep similarity at pair of stations / gauges

`get_statistical_information()` ¶

Calculate summary statistics for correlation between a pair of stations / gauges

`get_timeseries()` ¶

Gets relevant timeseries (and metadata) using overlapping period identified for both stations / gauges

`get_timestep_information()` ¶

Identify number of 'good' timesteps shared by a pair of matched stations / gauges

`prepare_comparison()` ¶

Run comparison functions

`Match` ¶

Bases: object

An object comprised of two RainGauges with attributes and methods for defining how well their metadata matches

`init(match_type, station_left, station_right, distance_score, distance_metres, string_score, common_substrings, common_banned_substrings=None, match_score=None)` ¶

`get_banned_common_strings()` ¶

Check if this is redundant

`set_auto_rejected()` ¶

Determines if scores meet criteria for an auto-rejected match

`set_match_score()` ¶

Calculates and sets match score from product of distance and string score

Returns:

Type	Description
`int`	score in [0, 1, 2, 3, 4, 6, 8, 1000, 2000, 3000, 4000, 6000, 8000]

`RainGauge` ¶

Bases: object

The most basic gauge object with a name, id, co-ordinates and a source

`init(id, name, source='Unspecified', easting=np.nan, northing=np.nan)` ¶

Attributes:

Name	Type	Description
`name`	`str`	proverbial name of the station e.g., blue-moutain station
`id`	`str`	reference id of the station e.g., ABC001
`source`	`str`	source of data e.g., random_API
`easting`	`float`	EPSG27700 Easting value
`northing`	`float`	EPSG27700 Northing value

Functions:

Name	Description
`# TODO: Add class method to create raingauges`

`get_coordinates()` ¶

Combines self.easting and self.northing into a geometry object

Returns:

Type	Description
`Point`	coordinates in EPSG:27700 (British National Grid)

`RainGauge_Comparison` ¶

Bases: RainGauge

A verision of the RainGauge with a basic timeseries and datetime metadata for more detailed comparison with another gauge

`init(*, folder_path, datetime_format, **kwargs)` ¶

Attributes:

Name	Type	Description
`folder_path`	`str`	where the timeseries file for this gauge is stored, filename should be a csv with id or name as filename e.g., ABC001.csv or blue-mountain station.csv
`datetime_format`		datetime format used in gauge timeseries files
`timeseries`	`DataFrame`	timeseries with data, generated from csv at location "{self.folder_path}/{self.id}.csv"
`start_date`	`DateTime`	first timestep in self.timeseries
`end_date`	`DateTime`	last timestep in self.timeseries

`get_coordinates()` ¶

Combines self.easting and self.northing into a geometry object

Returns:

Type	Description
`Point`	coordinates in EPSG:27700 (British National Grid)

`get_dates()` ¶

Extracts first and last timestep from timeseries dataframe

Returns:

Type	Description
`DateTime`	first timestep in timeseries
`DateTime`	final timestep in timeseries

`get_timeseries()` ¶

Extracts timeseries from csv file and checks those files contain correctly named columns

Returns:

Type	Description
`DataFrame`	timeseries containing datetime and measured parameter information

`prepare_gauge()` ¶

Runs functions to prepare gauge for timeseries comparison

`RainGauge_Matching` ¶

Bases: RainGauge

A verision of the RainGauge with attributes and methods for matching metadata between gauges

`init(*, banned_strings=None, **kwargs)` ¶

Attributes:

Name	Type	Description
`banned_strings`	`list`	Begins empty, is calculated later based on frequency of sub-string occurence across all gauges

`get_all_substrings()` ¶

Generates all alphanumeric substrings of a string

Returns:

Type	Description
`set`	unique sub-strings

`get_allowable_substrings()` ¶

Generates all allowable (not in self.banned) alphanumeric substrings of a string

Returns:

Type	Description
`set`	unique sub-strings that are not banned from the matching process

`get_common_substrings(other, mode=None)` ¶

Generates all common substrings between two station naming strings

Parameters:

Name	Type	Description	Default
`mode`	`str`	toggle for whether to exclude banned strings ('all' to ignore bans)	`None`

Returns:

Type	Description
`set`	unique common sub-strings

`get_coordinates()` ¶

Combines self.easting and self.northing into a geometry object

Returns:

Type	Description
`Point`	coordinates in EPSG:27700 (British National Grid)

`get_distance(other)` ¶

Calculates Euclidean distance between two sets of coordinates

Returns:

Type	Description
`float`	distance rounded to the nearest integer

`get_distance_score(other)` ¶

Calculates distance score based on distance between two sets of coordinates

Returns:

Type	Description
`int`	distance score in [-1, 0, 1, 2, 3, 999]

`get_match(other)` ¶

Generates Match object if scoring criteria is met for pair of stations / gauges

Returns:

Type	Description
`Match`	object containing gauges and calculated scores

`get_string_score(other)` ¶

Calculates string score based on commonality of sub-strings (left hand station has priority for counting unique sub-strings)

Returns:

Type	Description
`int`	score based on string / sub-string commonality (-1 = identical basic string, n = number of sub-strings of left not in right, 999 = no commonality)

`set_banned_strings(banned_strings)` ¶

Sets banned_strings attribute of a station

`convert_to_pandas_datetime(df, col_to_convert, datetime_format)` ¶

Convert designated column in pandas to datetime format

Parameters:

Name	Type	Description	Default
`col_to_convert`	`str`	name of column to be converted	required
`existing_format`		date format present in designated column	required

Returns:

Type	Description
`Dataframe`	copy of the dataframe with datetime column formatted

Utils¶

`required_columns(required_columns=REQUIRED_COLUMNS, easting_col='easting', northing_col='northing', data_names=None)` ¶

Decorator to ensure required columns exist in one or more DataFrame arguments.

Parameters:

Name	Type	Description	Default
`required_columns`	`list`	Columns that must exist	`REQUIRED_COLUMNS`
`easting_col`	`str`	Special columns for custom error message	`'easting'`
`northing_col`	`str`	Special columns for custom error message	`'easting'`
`data_names`	`list[str]`	Names of data to check	`None`

`required_comparison_columns(required_columns=REQUIRED_COMPARISON_COLUMNS, data_names=None)` ¶

Decorator to ensure required columns exist in one or more DataFrame arguments.

Parameters:

Name	Type	Description	Default
`required_columns`	`list`	Columns that must exist	`REQUIRED_COMPARISON_COLUMNS`
`data_names`	`list[str]`	Names of data to check	`None`

`required_timeseries_columns(required_columns=REQUIRED_TIMESERIES_COLUMNS, data_names=None)` ¶

Decorator to ensure required columns exist in one or more DataFrame arguments.

Parameters:

Name	Type	Description	Default
`required_columns`	`list`	Columns that must exist	`REQUIRED_TIMESERIES_COLUMNS`
`data_names`	`list[str]`	Names of data to check	`None`

`crs_to_crs(df, crs_in, crs_out, east_west_col_in, north_south_col_in, east_west_col_out, north_south_col_out)` ¶

Convert from one CRS projection to another

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input data to convert to another CRS	required
`crs_in`	`str \| int`	Projection of current data (e.g. 4326)	required
`crs_out`	`str \| int`	Target projection (e.g. 27700)	required
`east_west_col_in`	`str`	Name of eastward column of original projection	required
`north_south_col_in`	`str`	Name of northward column of original projection	required
`east_west_col_out`	`str`	Name of eastward column of target projection	required
`north_south_col_out`	`str`	Name of northward column of target projection	required

Returns:

Name	Type	Description
`df`	`DataFrame`	Data with new target projection columns

API Reference¶

Rain gauge matching¶

create_output_dataframes(matches) ¶

generate_manual_station_matching_notebook(output_dir) ¶

generate_manual_station_matching_script(output_dir, matching_script) ¶

run_matching_algorithm(left_df, right_df, save_outputs_to_csv=False, output_dir='outputs', save_manual_matching_script=False, save_manual_matching_notebook=False, allow_backups=False, overwrite_existing=False) ¶

Rain gauge comparison¶

row_from_comparison(comparison) ¶

run_comparison_algorithm(gauge_pair_metadata, left_hand_file_path, right_hand_file_path, datetime_format) ¶

Rain gauge matching classes¶

Comparison ¶

__init__(primary_gauge, secondary_gauge) ¶

get_accumulation_information() ¶

get_overlap() ¶

get_row_information() ¶

get_statistical_information() ¶

get_timeseries() ¶

get_timestep_information() ¶

prepare_comparison() ¶

Match ¶

__init__(match_type, station_left, station_right, distance_score, distance_metres, string_score, common_substrings, common_banned_substrings=None, match_score=None) ¶

get_banned_common_strings() ¶

set_auto_rejected() ¶

set_match_score() ¶

RainGauge ¶

__init__(id, name, source='Unspecified', easting=np.nan, northing=np.nan) ¶

get_coordinates() ¶

RainGauge_Comparison ¶

__init__(*, folder_path, datetime_format, **kwargs) ¶

get_coordinates() ¶

get_dates() ¶

get_timeseries() ¶

prepare_gauge() ¶

RainGauge_Matching ¶

__init__(*, banned_strings=None, **kwargs) ¶

get_all_substrings() ¶

get_allowable_substrings() ¶

get_common_substrings(other, mode=None) ¶

get_coordinates() ¶

get_distance(other) ¶

get_distance_score(other) ¶

get_match(other) ¶

get_string_score(other) ¶

set_banned_strings(banned_strings) ¶

convert_to_pandas_datetime(df, col_to_convert, datetime_format) ¶

Utils¶

required_columns(required_columns=REQUIRED_COLUMNS, easting_col='easting', northing_col='northing', data_names=None) ¶

required_comparison_columns(required_columns=REQUIRED_COMPARISON_COLUMNS, data_names=None) ¶

required_timeseries_columns(required_columns=REQUIRED_TIMESERIES_COLUMNS, data_names=None) ¶

crs_to_crs(df, crs_in, crs_out, east_west_col_in, north_south_col_in, east_west_col_out, north_south_col_out) ¶

`create_output_dataframes(matches)` ¶

`generate_manual_station_matching_notebook(output_dir)` ¶

`generate_manual_station_matching_script(output_dir, matching_script)` ¶

`run_matching_algorithm(left_df, right_df, save_outputs_to_csv=False, output_dir='outputs', save_manual_matching_script=False, save_manual_matching_notebook=False, allow_backups=False, overwrite_existing=False)` ¶

`row_from_comparison(comparison)` ¶

`run_comparison_algorithm(gauge_pair_metadata, left_hand_file_path, right_hand_file_path, datetime_format)` ¶

`Comparison` ¶

`init(primary_gauge, secondary_gauge)` ¶

`get_accumulation_information()` ¶

`get_overlap()` ¶

`get_row_information()` ¶

`get_statistical_information()` ¶

`get_timeseries()` ¶

`get_timestep_information()` ¶

`prepare_comparison()` ¶

`Match` ¶

`init(match_type, station_left, station_right, distance_score, distance_metres, string_score, common_substrings, common_banned_substrings=None, match_score=None)` ¶

`get_banned_common_strings()` ¶

`set_auto_rejected()` ¶

`set_match_score()` ¶

`RainGauge` ¶

`init(id, name, source='Unspecified', easting=np.nan, northing=np.nan)` ¶

`get_coordinates()` ¶

`RainGauge_Comparison` ¶

`init(*, folder_path, datetime_format, **kwargs)` ¶

`get_coordinates()` ¶

`get_dates()` ¶

`get_timeseries()` ¶

`prepare_gauge()` ¶

`RainGauge_Matching` ¶

`init(*, banned_strings=None, **kwargs)` ¶

`get_all_substrings()` ¶

`get_allowable_substrings()` ¶

`get_common_substrings(other, mode=None)` ¶

`get_coordinates()` ¶

`get_distance(other)` ¶

`get_distance_score(other)` ¶

`get_match(other)` ¶

`get_string_score(other)` ¶

`set_banned_strings(banned_strings)` ¶

`convert_to_pandas_datetime(df, col_to_convert, datetime_format)` ¶

`required_columns(required_columns=REQUIRED_COLUMNS, easting_col='easting', northing_col='northing', data_names=None)` ¶

`required_comparison_columns(required_columns=REQUIRED_COMPARISON_COLUMNS, data_names=None)` ¶

`required_timeseries_columns(required_columns=REQUIRED_TIMESERIES_COLUMNS, data_names=None)` ¶

`crs_to_crs(df, crs_in, crs_out, east_west_col_in, north_south_col_in, east_west_col_out, north_south_col_out)` ¶