komanawa.gw_detect_power.change_detection_counterfactual#

created matt_dumont on: 25/01/24

Classes#

AutoDetectionPowerCounterFactual

This class is used to calculate the counterfactual detection power of a pair of auto created concentration time series. The user specifies an initial concentration, and a target concentration for both a base and alternative scenario. Other parameters include groundwater age distribution models and parameters, implementation time and the slope of the previous data.

BaseDetectionCalculator

Base class for detection power calculations, provides some general methods for power calculations

DetectionPowerCounterFactual

This class is used to calculate the counterfactual detection power of a pair of concentration time series The user specifies the true concentration time series for the base and alt scenarios and the noise level for both scenarios. The power is calculated by adding many noise realisations to the true concentration time series and running a paired t test or wilcoxon signed rank test to determine if the null hypothesis (The scenarios are the same) can be rejected.

Module Contents#

class AutoDetectionPowerCounterFactual(significance_mode, nsims=1000, p_value=0.05, min_samples=10, alternative='alt!=base', wx_zero_method='wilcox', wx_correction=False, wx_method='auto', ncores=None, log_level=logging.INFO, return_true_conc=False, return_noisy_conc_itters=0, only_significant_noisy=False)[source]#

Bases: DetectionPowerCounterFactual

Inheritance diagram of komanawa.gw_detect_power.change_detection_counterfactual.AutoDetectionPowerCounterFactual

This class is used to calculate the counterfactual detection power of a pair of auto created concentration time series. The user specifies an initial concentration, and a target concentration for both a base and alternative scenario. Other parameters include groundwater age distribution models and parameters, implementation time and the slope of the previous data.

The user then specifies the sampling duration, delay, and frequency. The power is calculated by adding many user specified noise realisations to both the base and alternative concentration time series and running a paired t test or wilcoxon signed rank test to determine if the null hypothesis (The scenarios are the same) can be rejected.

The Power is calculated as the percentage (0-100) of simulations which reject the null hypothesis.

Parameters:
  • significance_mode

    str, one of:

    • ’paired-t-test’: paired t test (parametric), scipy.stats.ttest_rel

    • ’wilcoxon-signed-rank-test’: wilcoxon signed rank test (non-parametric), scipy.stats.wilcoxon

  • nsims – number of noise simulations to run for each change detection (e.g. nsims=1000, power= number of detected changes/1000 noise simulations)

  • p_value – minimum p value (see also alternative), if p >= p_value the null hypothesis will not be rejected (base and alt are the same) p < p_value the null hypothesis will be rejected (base and alt are different)

  • min_samples – minimum number of samples required, less than this number of samples will raise an exception

  • alternative

    str, one of:

    • ’alt!=base’: two sided test (default),

    • ’alt<base’: one sided test ~

    • ’alt>base’

  • wx_zero_method

    str, one of:

    • “wilcox”: Discards all zero-differences (default); see [4].

    • “pratt”: Includes zero-differences in the ranking process, but drops the ranks of the zeros (more conservative); see [3]. In this case, the normal approximation is adjusted as in [5].

    • “zsplit”: Includes zero-differences in the ranking process and splits the zero rank between positive and negative ones.

    for more info see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html

  • wx_correction – bool, If True, apply continuity correction by adjusting the Wilcoxon rank statistic by 0.5 towards the mean value when computing the z-statistic. Default is False.

  • wx_method – str, see scipy.stats.wilcoxon for more info

  • ncores – number of cores to use for multiprocessing, None will use all available cores

  • log_level – logging level for multiprocessing subprocesses

  • return_true_conc – return the true concentration time series for each simulation with power calcs (not supported with multiprocessing power calcs)

  • return_noisy_conc_itters – int <= nsims, default = 0 Number of noisy simulations to return if 0 then no noisy simulations are returned, not supported with multiprocessing power calcs

  • only_significant_noisy – bool if True then only return noisy simulations where a change was detected if there are fewer noisy simulations with changes detected than return_noisy_conc_itters all significant simulations will be returned. if there are no noisy simulations with changes detected then and empty dataframe is returned

mulitprocess_power_calcs(outpath: {Path, None, str}, idv_vals: numpy.ndarray, error_base_vals: {np.ndarray, float}, samp_years_vals: {np.ndarray, int}, samp_per_year_vals: {np.ndarray, int}, implementation_time_alt_vals: {np.ndarray, int}, initial_conc_vals: {np.ndarray, float}, target_conc_alt_vals: {np.ndarray, float}, prev_slope_vals: {np.ndarray, float}, max_conc_lim_vals: {np.ndarray, float}, min_conc_lim_vals: {np.ndarray, float}, mrt_model_vals: {np.ndarray, str}, mrt_vals: {np.ndarray, float}, target_conc_base_vals: {np.ndarray, float, None} = None, implementation_time_base_vals: {np.ndarray, int, None} = None, error_alt_vals: {np.ndarray, float, None} = None, delay_years_vals: {np.ndarray, int, None} = None, mrt_p1_vals: {np.ndarray, float, None} = None, frac_p1_vals: {np.ndarray, float, None} = None, f_p1_vals: {np.ndarray, float, None} = None, f_p2_vals: {np.ndarray, float, None} = None, seed_alt_vals: {np.ndarray, int, None} = None, seed_base_vals: {np.ndarray, int, None} = None, run=True, debug_mode=False, **kwargs)[source]#

multiprocessing wrapper for power_calc, see power_calc for details

Parameters:
  • outpath – path to save results to or None (no save)

  • idv_vals – id values for each simulation

  • error_base_vals – standard deviation of noise to add to the base time series for each simulation

  • samp_years_vals – sampling years for each simulation

  • samp_per_year_vals – sampling per year for each simulation

  • implementation_time_alt_vals – implementation time for the alternative scenario for each simulation

  • initial_conc_vals – initial concentration for each simulation

  • target_conc_alt_vals – target concentration for the alternative scenario for each simulation

  • prev_slope_vals – previous slope for each simulation

  • max_conc_lim_vals – maximum concentration limit for each simulation

  • min_conc_lim_vals – minimum concentration limit for the source for each simulation

  • mrt_model_vals – mrt model for each simulation

  • mrt_vals – mean residence time for each simulation

  • target_conc_base_vals – target concentration for the base scenario for each simulation, if None then target_conc_base = initial_conc

  • implementation_time_base_vals – implementation time for the base scenario for each simulation, if None then implementation_time_base = implementation_time_alt

  • error_alt_vals – standard deviation of the noise to add to the alt concentration time series, if None then error_alt = error_base

  • delay_years_vals – number of years to delay the start of the monitoring for each simulation, If the delay_years does not allow enough samples to be collected then an exception will be raised. If delay_years is 0 then the full length of the concentration time series will be used

  • mrt_p1_vals – mean residence time of the first piston flow model for each simulation Only used for binary_exponential_piston_flow model

  • frac_p1_vals – fraction of the first piston flow model for each simulation Only used for binary_exponential_piston_flow model

  • f_p1_vals – the exponential fraction of the first piston flow model for each simulation Only used for binary_exponential_piston_flow model

  • f_p2_vals – the exponential fraction of the second piston flow model for each simulation Only used for binary_exponential_piston_flow model

  • seed_alt_vals – random seed to generate the alternative noise for each simulation. One of: ndarray (integer seeds), int, None (no seeds passed, but will record the seed used)

  • seed_base_vals – random seed to generate the base noise for each simulation. One of: ndarray (integer seeds), int, None (no seeds passed, but will record the seed used)

Note seed_base != seed_alt (the same noise will be added to both time series, making the analysis useless)

Parameters:
  • run – if True run the simulations, if False just build the run_dict and print the number of simulations

  • debug_mode – if True run as single process to allow for easier debugging

  • kwargs – other kwargs to pass directly to the output dataframe must be either a single value or an array of values with the same shape as id_vals

Returns:

pd.DataFrame with the power calc results note power is percent 0-100

plot_iteration(y0_base, y0_alt, true_conc_base, true_conc_alt, ax=None)[source]#

plot the concentration data itteration and the true concentration data

Parameters:
  • y0_base – noisy concentration data for the base scenario

  • y0_alt – noisy concentration data for the alt scenario

  • true_conc_base – True concentration data for the base scenario

  • true_conc_alt – True concentration data for the alt scenario

  • ax – matplotlib axis to plot to, if None then a new figure and axis will be created

Returns:

power_calc(idv, error_base: float, mrt_model: str, samp_years: int, samp_per_year: int, implementation_time_alt: int, initial_conc: float, target_conc_alt: float, prev_slope: float, max_conc_lim: float, min_conc_lim: float, mrt: float = 0, target_conc_base: {float, None} = None, implementation_time_base: {int, None} = None, error_alt: {float, None} = None, delay_years: {int} = 0, mrt_p1: {float, None} = None, frac_p1: {float, None} = None, f_p1: {float, None} = None, f_p2: {float, None} = None, seed_base: {int, None} = None, seed_alt: {int, None} = None, testnitter=None, **kwargs)[source]#

calculate the counterfactual detection power of auto created concentration time series

Parameters:
  • idv – identifiers for the power calc sites, passed straight through to the output

  • error_base – standard deviation of the noise to add to the base concentration time series

  • mrt_model

    the model to use for the mean residence time options:

    • ’piston_flow’: use the piston flow model (no mixing, default)

    • ’binary_exponential_piston_flow’: use the binary exponential piston flow model

    • For unitary exponential_piston_flow model set frac_1 = 1 and mrt_p1 = mrt

    • For no lag, set mrt=0, mrt_model=’piston_flow’

  • samp_years – number of years to sample

  • samp_per_year – number of samples to collect each year

  • implementation_time_alt – number of years over which the target concentration_alt is reached

  • initial_conc – initial median value of the concentration

  • target_conc_alt – target concentration for the alt scenario

  • prev_slope – slope of the previous data (e.g. prior to the initial concentration)

  • max_conc_lim – maximum concentration limit user specified or None (default)

  • min_conc_lim – minimum concentration limit for the source, only used for the binary_exponential_piston_flow model

  • mrt – the mean residence time of the site

  • target_conc_base – the target concentration for the base scenario, if None then target_conc_base = initial_conc

  • implementation_time_base – number of years over which the target concentration_base is reached, if None then implementation_time_base = implementation_time_alt

  • error_alt – standard deviation of the noise to add to the alt concentration time series, if None then error_alt = error_base

  • delay_years – number of years to delay the start of the monitoring, If the delay_years does not allow enough samples to be collected then an exception will be raised. If delay_years is 0 then the full length of the concentration time series will be used

Options for binary_exponential_piston_flow model:

Parameters:
  • mrt_p1 – the mean residence time of the first piston flow model (only used for binary_exponential_piston_flow model)

  • frac_p1 – the fraction of the first piston flow model (only used for binary_exponential_piston_flow model)

  • f_p1 – the fraction of the first piston flow model (only used for binary_exponential_piston_flow model)

  • f_p2 – the fraction of the first piston flow model (only used for binary_exponential_piston_flow model)

  • seed_base – seed for the random number generator for the base scenario, if None then a random seed will be generated and returned with the output

  • seed_alt – seed for the random number generator for the alt scenario, if None then a random seed will be generated and returned with the output

  • testnitter – None (usually) or a different nitter then self.niter for testing run times

  • kwargs – any other kwargs to pass directly to the output Series

Returns:

pd.Series with the power calc results note power is percent 0-100

Possible other dataframes if self.return_true_conc is True or self.return_noisy_conc_itters > 0 in which case a dictionary will be returned: {‘power’: power_df, # always ‘true_conc’: true_conc_ts, if self.return_true_conc is True ‘noisy_conc’ : noisy_conc_ts, if self.return_noisy_conc_itters > 0 }

set_condensed_mode(target_conc_per=1, initial_conc_per=1, error_per=2, prev_slope_per=2, max_conc_lim_per=1, min_conc_lim_per=1, mrt_per=0, mrt_p1_per=2, frac_p1_per=2, f_p1_per=2, f_p2_per=2)[source]#

set calculator to condense the number of runs based by rounding the inputs to a specified precision

Parameters:
  • target_conc_per – precision to round target_conc to (2 = 0.01)

  • initial_conc_per – precision to round initial_conc to (2 = 0.01)

  • error_per – precision to round error to (2 = 0.01)

  • prev_slope_per – precision to round previous_slope to (2 = 0.01)

  • max_conc_lim_per – precision to round max_conc_lim to (2 = 0.01)

  • min_conc_lim_per – precision to round min_conc_lim to (2 = 0.01)

  • mrt_per – precision to round mrt to

  • mrt_p1_per – precision to round mrt_p1 to

  • frac_p1_per – precision to round frac_p1 to

  • f_p1_per – precision to round f_p1 to

  • f_p2_per – precision to round f_p2 to

Returns:

class BaseDetectionCalculator[source]#

Base class for detection power calculations, provides some general methods for power calculations

time_test_power_calc_itter(testnitter=10, **kwargs)[source]#

run a test power calc iteration to check for errors

Parameters:
  • testnitter – number of iterations to run

  • kwargs – kwargs for power_calc

Returns:

None

static truets_from_binary_exp_piston_flow(mrt, mrt_p1, frac_p1, f_p1, f_p2, initial_conc, target_conc, prev_slope, max_conc, min_conc, samp_per_year, samp_years, implementation_time, past_source_data=None, return_extras=False, low_mem=False, precision=2)[source]#

create a true concentration time series using binary piston flow model for the mean residence time note that this can be really slow for large runs and it may be better to create the source data first and then pass it to the power calcs via pass_true_conc

Parameters:
  • mrt – mean residence time years

  • mrt_p1 – mean residence time of the first pathway years

  • frac_p1 – fraction of the first pathway

  • f_p1 – ratio of the exponential volume to the total volume pathway 1

  • f_p2 – ratio of the exponential volume to the total volume pathway 2

  • initial_conc – initial concentration

  • target_conc – target concentration

  • prev_slope – previous slope of the concentration data

  • max_conc – maximum concentration limit user specified or None here the maximum concentration is specified as the maximum concentration of the source (before temporal mixing)

  • min_conc – minimum concentration limit user specified, the lowest concentration for the source

  • samp_per_year – samples per year

  • samp_years – number of years to sample

  • implementation_time – number of years to implement reductions

  • past_source_data – past source data, if None will use the initial concentration and the previous slope to estimate the past source data, this is only set as an option to allow users to preclude re-running the source data calculations if they have already been done so. Suggest that users only pass results from get_source_initial_conc_bepm with age_step = 0.01

  • return_extras – return extra variables for debugging

Returns:

true timeseries, max_conc, max_conc_time, frac_p2

static truets_from_piston_flow(mrt, initial_conc, target_conc, prev_slope, max_conc, samp_per_year, samp_years, implementation_time)[source]#

piston flow model for the mean residence time

Parameters:
  • mrt – mean residence time

  • initial_conc – initial concentration

  • target_conc – target concentration

  • prev_slope – previous slope of the concentration data mg/l/yr

  • max_conc – maximum concentration limit user specified or None

  • samp_per_year – samples per year

  • samp_years – number of years to sample

  • implementation_time – number of years to implement reductions

Returns:

true timeseries, max_conc, max_conc_time, frac_p2

class DetectionPowerCounterFactual(significance_mode, nsims=1000, p_value=0.05, min_samples=10, alternative='alt!=base', wx_zero_method='wilcox', wx_correction=False, wx_method='auto', ncores=None, log_level=logging.INFO, return_true_conc=False, return_noisy_conc_itters=0, only_significant_noisy=False)[source]#

Bases: komanawa.gw_detect_power.base_detection_calculator.BaseDetectionCalculator

Inheritance diagram of komanawa.gw_detect_power.change_detection_counterfactual.DetectionPowerCounterFactual

This class is used to calculate the counterfactual detection power of a pair of concentration time series The user specifies the true concentration time series for the base and alt scenarios and the noise level for both scenarios. The power is calculated by adding many noise realisations to the true concentration time series and running a paired t test or wilcoxon signed rank test to determine if the null hypothesis (The scenarios are the same) can be rejected.

The Power is calculated as the percentage (0-100) of simulations which reject the null hypothesis.

Parameters:
  • significance_mode

    str, one of:

    • ’paired-t-test’: paired t test (parametric), scipy.stats.ttest_rel

    • ’wilcoxon-signed-rank-test’: wilcoxon signed rank test (non-parametric), scipy.stats.wilcoxon

  • nsims – number of noise simulations to run for each change detection (e.g. nsims=1000, power= number of detected changes/1000 noise simulations)

  • p_value

    minimum p value (see also alternative), if:

    • p >= p_value the null hypothesis will not be rejected (base and alt are the same)

    • p < p_value the null hypothesis will be rejected (base and alt are different)

  • min_samples – minimum number of samples required, less than this number of samples will raise an exception

  • alternative

    str, one of:

    • ’alt!=base’: two sided test (default),

    • ’alt<base’: one sided test ~

    • ’alt>base’

  • wx_zero_method

    str, one of:

    • “wilcox”: Discards all zero-differences (default); see [4].

    • “pratt”: Includes zero-differences in the ranking process, but drops the ranks of the zeros (more conservative); see [3]. In this case, the normal approximation is adjusted as in [5].

    • “zsplit”: Includes zero-differences in the ranking process and splits the zero rank between positive and negative ones.

    for more info see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html

  • wx_correction – bool, If True, apply continuity correction by adjusting the Wilcoxon rank statistic by 0.5 towards the mean value when computing the z-statistic. Default is False.

  • wx_method – str, see scipy.stats.wilcoxon for more info

  • ncores – number of cores to use for multiprocessing, None will use all available cores

  • log_level – logging level for multiprocessing subprocesses

  • return_true_conc – return the true concentration time series for each simulation with power calcs (not supported with multiprocessing power calcs)

  • return_noisy_conc_itters – int <= nsims, default = 0 Number of noisy simulations to return if 0 then no noisy simulations are returned, not supported with multiprocessing power calcs

  • only_significant_noisy – bool if True then only return noisy simulations where a change was detected if there are fewer noisy simulations with changes detected than return_noisy_conc_itters all significant simulations will be returned. if there are no noisy simulations with changes detected then and empty dataframe is returned

mulitprocess_power_calcs(outpath: {Path, None, str}, idv_vals: numpy.ndarray, true_conc_base_vals: {np.ndarray, list}, true_conc_alt_vals: {np.ndarray, list}, error_base_vals: {np.ndarray, None, float}, error_alt_vals: {np.ndarray, None, float} = None, seed_alt_vals_vals: {np.ndarray, int, None} = None, seed_base_vals_vals: {np.ndarray, int, None} = None, run=True, debug_mode=False, **kwargs)[source]#

multiprocessing wrapper for power_calc, see power_calc for details note that if a given run raises and exception the traceback for the exception will be included in the returned dataset under the column ‘python_error’ if ‘python_error’ is None then the run was successful to change the number of cores used pass n_cores to the constructor init

Parameters:
  • outpath – path to save results to or None (no save)

  • idv_vals – id values for each simulation

All values from here on out should be either a single value or an array of values with the same shape as id_vals

Parameters:
  • true_conc_base_vals – time series concentration dta for the ‘base’ scenario. Note sampling frequency is assumed to be correct.

  • true_conc_alt_vals – time series concentration dta for the ‘alt’ scenario. Note sampling frequency is assumed to be correct.

  • error_base_vals – standard deviation of noise to add to the base time series for each simulation

  • error_alt_vals – standard deviation of noise to add to the alt time series for each simulation

  • seed_alt_vals_vals

    random seed to generate the alternative noise. One of:

    • ndarray (integer seeds),

    • None (no seeds passed, but will record the seed used)

    • int (1 seed for all simulations)

  • seed_base_vals_vals

    random seed to generate the base noise. One of:

    • ndarray (integer seeds),

    • None (no seeds passed, but will record the seed used)

    • int (1 seed for all simulations)

Note seed_base != seed_alt (the same noise will be added to both time series, making the analysis useless) :param run: if True run the simulations, if False just build the run_dict and print the number of simulations :param debug_mode: if True run as single process to allow for easier debugging :param kwargs: any other kwargs to pass directly to the output dataframe :return: dataframe with input data and the results of all of the power calcs. note power is percent 0-100

plot_iteration(y0_base, y0_alt, true_conc_base, true_conc_alt, ax=None)[source]#

plot the concentration data itteration and the true concentration data

Parameters:
  • y0_base – noisy concentration data for the base scenario

  • y0_alt – noisy concentration data for the alt scenario

  • true_conc_base – True concentration data for the base scenario

  • true_conc_alt – True concentration data for the alt scenario

  • ax – matplotlib axis to plot to, if None then a new figure and axis will be created

Returns:

power_calc(idv, error_base: float, true_conc_base: numpy.ndarray, true_conc_alt: numpy.ndarray, error_alt: {float, None} = None, seed_base: {int, None} = None, seed_alt: {int, None} = None, testnitter=None, **kwargs)[source]#

calculate the counterfactual detection power of a pair of concentration time series note the power is calculated using the sampling frequency of the true_conc_base/alt, if you want to test the power at a different sampling frequency then you should resample the true_conc_base/alt before passing it to this function

Parameters:
  • idv – identifiers for the power calc sites, passed straight through to the output

  • error_base – standard deviation of the noise to add to the base concentration time series

  • true_conc_base – the true concentration timeseries for the base scenario

  • true_conc_alt – the true concentration timeseries for the alt scenario

  • error_alt – standard deviation of the noise to add to the alt concentration time series, if None then error_alt = error_base

  • seed_base – seed for the random number generator for the base scenario, if None then a random seed will be generated and returned with the output

  • seed_alt – seed for the random number generator for the alt scenario, if None then a random seed will be generated and returned with the output

  • testnitter – None (usually) or a different nitter then self.niter for testing run times

  • kwargs – any other kwargs to pass directly to the output Series

Returns:

pd.Series with the power calc results note power is percent 0-100

Possible other dataframes if self.return_true_conc is True or self.return_noisy_conc_itters > 0 in which case a dictionary will be returned: {‘power’: power_df, # always ‘true_conc’: true_conc_ts, if self.return_true_conc is True ‘noisy_conc’ : noisy_conc_ts, if self.return_noisy_conc_itters > 0 }