Skip to content

API Reference

Rain gauge matching

create_output_dataframes(matches)

Converts list of match classes into output dataframes with nested station class objects broken out into said output dataframes

Parameters:

Name Type Description Default
matches list

list of match class objects

required

Returns:

Type Description
csv

pandas.Dataframe saved as csv for each type of match available (accepted, rank-rejected, auto-rejected)

generate_manual_station_matching_notebook(output_dir)

Generate manual station matching ipynb from template (either allowing for backups or not).

Parameters:

Name Type Description Default
output_dir str

Path to outputs

required

generate_manual_station_matching_script(output_dir, matching_script)

Generate manual station matching script from template (either allowing for backups or not).

Parameters:

Name Type Description Default
output_dir str

Path to outputs

required
matching_script

Which template script to copy across i.e. with or without backups

required

run_matching_algorithm(left_df, right_df, save_outputs_to_csv=False, output_dir='outputs', save_manual_matching_script=False, save_manual_matching_notebook=False, allow_backups=False, overwrite_existing=False)

Wrapper function to run matching algorithm for two sets of stations / gauges

Parameters:

Name Type Description Default
left_df DataFrame

contains rows of stations with names and co-ordinates

required
right_df DataFrame

contains rows of stations with names and co-ordinates

required
save_outputs_to_csv bool

Whether to save the outputs to csv (default: False)

False
output_dir str

Path to outputs

'outputs'
save_manual_matching_script bool

Whether to generate manual matching script (default: False)

False
save_manual_matching_notebook bool

Whether to generate manual matching noebook (default: False)

False
allow_backups bool

Whether the outputted manual station matching should allow for backups (default: False)

False
overwrite_existing bool

Whether to overwrite existing data, scripts and/or notebooks under output_dir

False

Returns:

Type Description
list

contains Match class objects for pairs which returned a match

DataFrame

dataframe with each row containing an automatically accepted match for two stations

DataFrame

dataframe with each row containing an rank-rejected match for two stations (it scored worse than an accepted match for the left-hand station)

DataFrame

dataframe with each row containing an automatically rejected match for two stations (match was detected but with a worse score than the threshold so subject to manual review)

Rain gauge comparison

row_from_comparison(comparison)

Generate dictionary from comparison object that will form row of output dataframe

Parameters:

Name Type Description Default
comparison RainGauge_Comparison

comparison containing attributes to be stored in dictionary

required

Returns:

Type Description
dict

dictionary containing information gathered from class attributes

run_comparison_algorithm(gauge_pair_metadata, left_hand_file_path, right_hand_file_path, datetime_format)

Wrapper function to run comparison on timeseries at two gauges and store output

Parameters:

Name Type Description Default
gauge_pairs

each sub-list contains a pair of gauge-ids as strings

required
left_hand_file_path str

location of timeseries information containing left-hand gauge timeseries file

required
right_hand_file_path str

location of timeseries information containing right-hand gauge timeseries file

required

Returns:

Type Description
DataFrame

dataframe with each row as a summary comparison of gauge timeseries

Rain gauge matching classes

Comparison

Bases: object

An object comprised of two RainGauges with attributes and methods for outlining basic statistical information about the gauge timeseries

__init__(primary_gauge, secondary_gauge)

Attributes:

Name Type Description
primary_gauge RainGauge_Comparison

gauge from the primary network

secondary_gauge RainGauge_Comparison

gauge from the secondary network

raw_timeseries DataFrame

contains all time-steps of timeseries data for overlapping period of both gauges start and end dates

timeseries DataFrame

contains non-NaN timeseries data for overlapping period of both gauges start and end dates

non_zero_timeseries DataFrame

contains non-zero timeseries data for overlapping period of both gauges start and end dates

overlap_start_date DateTime

first timestep of overlap between self.primary_gauge.timeseries and self.secondary_gauge.timeseries

overlap_end_date DateTime

final timestep of overlap between self.primary_gauge.timeseries and self.secondary_gauge.timeseries

overlap_length TimeDelta

length of overlap from first shared timestep to last shared timestep

overlap_timesteps int

count of timesteps in overlapping period

good_timesteps int

count of timesteps where both gauges record non-NaN values

nan_timesteps int

count of timesteps where one or both gauges record NaN value

sum_of_good_timesteps TimeDelta

sum of good timesteps shared by both timeseries

identical_rows int

count of timesteps with identical measured parameter values or where both are NaN

identical_non_nan_rows int

count of timesteps with identical measured parameter values

identical_rows_percentage str

print out of identical timesteps / total timesteps as a percentage

identical_non_nan_rows_percentage str

print out of identical non-NaN timesteps / total non-NaN timesteps as a percentage

primary_accumulation float

sum of measured parameter at primary gauge across entire overlap period

secondary_accumulation float

sum of measured parameter at secondary gauge across entire overlap period

accumulation_difference float

absolute difference between primary and secondary accumulation

accumulation_difference_percentage str

print out of (primary accumulation / secondary accumulation) - 1 as a percentage, [-100%, +100%] for nothing in primary secondary respectively

r_squared float

r-squared value for overlapping period of two gauges

spcc float

Spearmans correlation coefficient for overlapping period of two gauges

non_zero_r_sqaured float

r-squared value for non-zero timesteps during overlapping period two gauges

non_zero_spcc float

Spearmans correlation coefficient for non-zero timesteps during overlapping period two gauges

get_accumulation_information()

Calculate summary statistics for accumulation at pair of stations / gauges

get_overlap()

Identifies overlapping period between start and end dates of stations / gauges within pair

get_row_information()

Calculate summary statistics for timestep similarity at pair of stations / gauges

get_statistical_information()

Calculate summary statistics for correlation between a pair of stations / gauges

get_timeseries()

Gets relevant timeseries (and metadata) using overlapping period identified for both stations / gauges

get_timestep_information()

Identify number of 'good' timesteps shared by a pair of matched stations / gauges

prepare_comparison()

Run comparison functions

Match

Bases: object

An object comprised of two RainGauges with attributes and methods for defining how well their metadata matches

__init__(match_type, station_left, station_right, distance_score, distance_metres, string_score, common_substrings, common_banned_substrings=None, match_score=None)

get_banned_common_strings()

Check if this is redundant

set_auto_rejected()

Determines if scores meet criteria for an auto-rejected match

set_match_score()

Calculates and sets match score from product of distance and string score

Returns:

Type Description
int

score in [0, 1, 2, 3, 4, 6, 8, 1000, 2000, 3000, 4000, 6000, 8000]

RainGauge

Bases: object

The most basic gauge object with a name, id, co-ordinates and a source

__init__(id, name, source='Unspecified', easting=np.nan, northing=np.nan)

Attributes:

Name Type Description
name str

proverbial name of the station e.g., blue-moutain station

id str

reference id of the station e.g., ABC001

source str

source of data e.g., random_API

easting float

EPSG27700 Easting value

northing float

EPSG27700 Northing value

Functions:

Name Description
# TODO: Add class method to create raingauges

get_coordinates()

Combines self.easting and self.northing into a geometry object

Returns:

Type Description
Point

coordinates in EPSG:27700 (British National Grid)

RainGauge_Comparison

Bases: RainGauge

A verision of the RainGauge with a basic timeseries and datetime metadata for more detailed comparison with another gauge

__init__(*, folder_path, datetime_format, **kwargs)

Attributes:

Name Type Description
folder_path str

where the timeseries file for this gauge is stored, filename should be a csv with id or name as filename e.g., ABC001.csv or blue-mountain station.csv

datetime_format

datetime format used in gauge timeseries files

timeseries DataFrame

timeseries with data, generated from csv at location "{self.folder_path}/{self.id}.csv"

start_date DateTime

first timestep in self.timeseries

end_date DateTime

last timestep in self.timeseries

get_coordinates()

Combines self.easting and self.northing into a geometry object

Returns:

Type Description
Point

coordinates in EPSG:27700 (British National Grid)

get_dates()

Extracts first and last timestep from timeseries dataframe

Returns:

Type Description
DateTime

first timestep in timeseries

DateTime

final timestep in timeseries

get_timeseries()

Extracts timeseries from csv file and checks those files contain correctly named columns

Returns:

Type Description
DataFrame

timeseries containing datetime and measured parameter information

prepare_gauge()

Runs functions to prepare gauge for timeseries comparison

RainGauge_Matching

Bases: RainGauge

A verision of the RainGauge with attributes and methods for matching metadata between gauges

__init__(*, banned_strings=None, **kwargs)

Attributes:

Name Type Description
banned_strings list

Begins empty, is calculated later based on frequency of sub-string occurence across all gauges

get_all_substrings()

Generates all alphanumeric substrings of a string

Returns:

Type Description
set

unique sub-strings

get_allowable_substrings()

Generates all allowable (not in self.banned) alphanumeric substrings of a string

Returns:

Type Description
set

unique sub-strings that are not banned from the matching process

get_common_substrings(other, mode=None)

Generates all common substrings between two station naming strings

Parameters:

Name Type Description Default
mode str

toggle for whether to exclude banned strings ('all' to ignore bans)

None

Returns:

Type Description
set

unique common sub-strings

get_coordinates()

Combines self.easting and self.northing into a geometry object

Returns:

Type Description
Point

coordinates in EPSG:27700 (British National Grid)

get_distance(other)

Calculates Euclidean distance between two sets of coordinates

Returns:

Type Description
float

distance rounded to the nearest integer

get_distance_score(other)

Calculates distance score based on distance between two sets of coordinates

Returns:

Type Description
int

distance score in [-1, 0, 1, 2, 3, 999]

get_match(other)

Generates Match object if scoring criteria is met for pair of stations / gauges

Returns:

Type Description
Match

object containing gauges and calculated scores

get_string_score(other)

Calculates string score based on commonality of sub-strings (left hand station has priority for counting unique sub-strings)

Returns:

Type Description
int

score based on string / sub-string commonality (-1 = identical basic string, n = number of sub-strings of left not in right, 999 = no commonality)

set_banned_strings(banned_strings)

Sets banned_strings attribute of a station

convert_to_pandas_datetime(df, col_to_convert, datetime_format)

Convert designated column in pandas to datetime format

Parameters:

Name Type Description Default
col_to_convert str

name of column to be converted

required
existing_format

date format present in designated column

required

Returns:

Type Description
Dataframe

copy of the dataframe with datetime column formatted

Utils

required_columns(required_columns=REQUIRED_COLUMNS, easting_col='easting', northing_col='northing', data_names=None)

Decorator to ensure required columns exist in one or more DataFrame arguments.

Parameters:

Name Type Description Default
required_columns list

Columns that must exist

REQUIRED_COLUMNS
easting_col str

Special columns for custom error message

'easting'
northing_col str

Special columns for custom error message

'easting'
data_names list[str]

Names of data to check

None

required_comparison_columns(required_columns=REQUIRED_COMPARISON_COLUMNS, data_names=None)

Decorator to ensure required columns exist in one or more DataFrame arguments.

Parameters:

Name Type Description Default
required_columns list

Columns that must exist

REQUIRED_COMPARISON_COLUMNS
data_names list[str]

Names of data to check

None

required_timeseries_columns(required_columns=REQUIRED_TIMESERIES_COLUMNS, data_names=None)

Decorator to ensure required columns exist in one or more DataFrame arguments.

Parameters:

Name Type Description Default
required_columns list

Columns that must exist

REQUIRED_TIMESERIES_COLUMNS
data_names list[str]

Names of data to check

None

crs_to_crs(df, crs_in, crs_out, east_west_col_in, north_south_col_in, east_west_col_out, north_south_col_out)

Convert from one CRS projection to another

Parameters:

Name Type Description Default
df DataFrame

Input data to convert to another CRS

required
crs_in str | int

Projection of current data (e.g. 4326)

required
crs_out str | int

Target projection (e.g. 27700)

required
east_west_col_in str

Name of eastward column of original projection

required
north_south_col_in str

Name of northward column of original projection

required
east_west_col_out str

Name of eastward column of target projection

required
north_south_col_out str

Name of northward column of target projection

required

Returns:

Name Type Description
df DataFrame

Data with new target projection columns