Skip to content

Data preparation

To use RainGaugeMatcha in a project:

import raingaugematcha

Matching Gauge Metadata Inputs

Comparison Gauge Metadata Input

If you have two files with matching gauges you want to compare, this will prepare them into a single file formatted for the gauge_pair_metadata parameter

import pandas as pd

stations_left = pd.read_csv("path_to_your_file.csv")
stations_right = pd.read_csv("path_to_your_file.csv")

# This next step assumes both files are ordered so pairs of gauges to be compared line-up 
# and are the same length
gauge_pair_metadata = pd.concat([stations_left, stations_right], axis=1)

# Add source columns if required
gauge_pair_metadata["station_left.source"] = "Source of gauges  e.g., UKCEH"
gauge_pair_metadata["station_right.source"] = "Source of gauges  e.g., API"

# Rename columns if required
gauge_pair_metadata = gauge_pair_metadata.renam(columns={
    "your_station_left_id_column": "station_left.id",
    "your_station_left_name_column": "station_left.name",
    "your_station_right_id_column": "station_right.id",
    # These two not required if added above
    "your_station_left_source_column": "station_left.source",
    "your_station_right_source_column": "station_right.source",
})

Comparison Gauge Timeseries Files

If you need to rename the columns in the timeseries files, this will allow you to loop through for all files in a directory

import os
import pandas as pd

path_to_timeseries = "path_to_your_timeseries_files"

for file in os.listdir(path_to_timeseries):
    df = pd.read_csv(f"{path_to_timeseries}/{file}")

    # Rename columns
    new_df = df.rename(columns={
        "your_datetime_column": "datetime",
        "your_rainfall_column": "rainfall
    })

    # Trim files down to speed up processing
    new_df = new_df[["datetime", "rainfall"]]

    new_df.to_csv(f"output_path/{file}", index=False)