Comparison¶

1. Why you might want to run a comparison?¶

Have you just finished pairing your gauge networks, or just want some quick analysis of the differences between the catch at collocated stations?

Running the comparison algorith will help answer any of the following questions:

Is there an overlap between the timeseries?
How long is it?
How many timesteps both contain data?
What does that sum to?
What percentage of the overlap is good data?

What are the total accumulations?
What is the difference?
As a percetange?

What is the R²?
Of non-zero values only?
What is the Spearmans Correlation?
Of non-zero-values only?

2. Running the comparison workflow¶

2.1 Inputs¶

Depending on whether you are running the Comparison module as a standalone operation or as part of the complete workflow the input table will require the data with the following columns demonstrated below.

Output from the manual matching notebook will create a dataset that is ready to be parsed straight into run_comparison().

2.1.1 Gauge Pair Metadata¶

Table 1. Example paired gauge station metadata

2.1.2 Timeseries Files¶

You will need to provide a file path to the location where you timeseries csv files are stored, each file should contain data for one gauge with the file named either {id}.csv or {name}.csv.

Table 2. Example input timeseries file

Files need renaming?

If your timeseries files need to be renamed in bulk, see advice in Data prep

2.2 Code Setup¶

If the comparison is being run as a stand-alone, separate from the complete workflow, the following code will run the algorithm and genenerate a single summary table.

To run the comparison, the gauge pair metadata requires at least these 6 columns:
"station_left.id", "station_left.name", "station_left.source", "station_right.id", "station_right.name", "station_right.source" The timeseries csv also requires at least these 2 columns: "datetime", "rainfall".

Columns need renaming?

If your data needs pre-processing to contain the required columns, see advice in Data prep

Python

from raingaugematcha.run_comparison import run_comparison_algorithm

# Using a custom data input
data = pd.read_csv("path_to_your_csv_file.csv")

# Using an output created from matching
data = pd.read_csv("final_station_matches.csv")

comparison_output = run_comparison_algorithm(
  gauge_pair_metadata = data, 
  left_hand_file_path="path_to_first_set_of_timeseries", 
  right_hand_file_path="path_to_second_set_of_timeseries"
)

2.3 Output¶

The comparison process will generate an output table like the following:

Table 3. Example output table

Version showing summarised information (not a direct output)