komanawa.gw_detect_power.timetest_slope#

created matt_dumont on: 3/10/23

Classes#

DetectionPowerSlope

The DetectionPowerSlope class is used to calculate the power of a change detection test based on observing

Functions#

timeit_test([methods, ndata, efficency_modes, n])

run an automated timeit test, must be outside of the function definition, prints results in scientific notation

Module Contents#

class DetectionPowerSlope(significance_mode='linear-regression', nsims=1000, min_p_value=0.05, min_samples=10, expect_slope='auto', efficent_mode=True, nparts=None, min_part_size=10, no_trend_alpha=0.5, mpmk_check_step=1, mpmk_efficent_min=10, mpmk_window=0.05, nsims_pettit=2000, ncores=None, log_level=logging.INFO, return_true_conc=False, return_noisy_conc_itters=0, only_significant_noisy=False, print_freq=None)[source]#

Bases: komanawa.gw_detect_power.base_detection_calculator.BaseDetectionCalculator

Inheritance diagram of komanawa.gw_detect_power.timetest_slope.DetectionPowerSlope

The DetectionPowerSlope class is used to calculate the power of a change detection test based on observing a slope in the concentration data. The user passes a True concentration time series and the power is calculated by adding many noise realisations to the concentration data and then running one of multiple change detection tests on the noisy data.

The Power is calculated as the percentage (0-100) of simulations which detect a slope.

Parameters:
  • significance_mode

    significance mode to use, options:

    • linear-regression: linear regression of the concentration data from time 0 to the end change detected if p < min_p_value

    • linear-regression-from-[max|min]: linear regression of the concentration data from the maximum concentration of the noise free concentration data to the end change detected if p < min_p_value

    • mann-kendall: mann-kendall test of the concentration data from time 0 to the end, change detected if p < min_p_value

    • mann-kendall-from-[max|min]: mann-kendall test of the concentration data from the maximum/minimum of the noise free concentration data to the end, change detected if p < min_p_value

    • n-section-mann-kendall: 2+ part mann-kendall test to identify change points. if change points are detected then a change is detected

    • pettitt-test: pettitt test to identify change points. if change points are detected then a change is detected

  • nsims – number of noise simulations to run for each change detection (e.g. nsims=1000, power= number of detected changes/1000 noise simulations)

  • min_p_value – minimum p value to consider a change detected

  • min_samples – minimum number of samples required, less than this number of samples will raise an exception

  • expect_slope

    expected slope of the concentration data, use depends on significance mode:

    • linear-regression, linear-regression-from-max, mann-kendall, or mann-kendall-from-max: * one of 1 (increasing), -1 (decreasing), or ‘auto’ will match the slope of the concentration data before noise is added

    • n-section-mann-kendall: expected trend in each part of the time series (1 increasing, -1 decreasing, 0 no trend)

    • pettitt-test: not used.

  • efficent_mode

    bool, default = True, if True then

    • For linear regression and MannKendall based tests: run the test on the noise free data to see if any change can be detected, if no change is detected then the test will not be on the noisy data

    • For MultiPartMannKendall test: the test will be run on the noise free data to detect best change points and then the test will be run on the noisy data for a smaller window centered on the True change point see: * mpmk_efficent_min, * mpmk_window

    • For Pettitt Test: Not implemented, will be ignored and a waring passed

  • nparts – number of parts to use for the n-section-mann-kendall test (not used for other tests)

  • min_part_size – minimum number of samples in each part for the n-section-mann-kendall test (not used for other tests)

  • no_trend_alpha – alpha value to use for the no trend sections in the n-section-mann-kendall test trendless sections are only accepted if p > no_trend_alpha (not used for other tests)

  • mpmk_check_step – int or function, default = 1, number of samples to check for a change point in the MultiPartMannKendall test, used in both efficent_mode=True and efficent_mode=False if mpmk is a function it must take a single argument (n, number of samples) and return an integer check step

  • mpmk_efficent_min – int, default = 10, minimum number of possible change points to assess only used if efficent_mode = True The minimum number of breakpoints to test (mpmk_efficent_min) is always respected (i.e. if the window size is less than the minimum number of breakpoints to test, then the window size will be increased to the minimum number of breakpoints to test, but the space between breakpoints will still be defined by check_step). You can specify the exact number of breakpoints to check by setting mpmk_efficent_min=n breakpoints and setting mpmk_window=0

  • mpmk_window – float, default = 0.05, define the window around the true detected change point to run the MultiPartMannKendall. The detction window is defined as: (cp - mpmk_window*n, cp + mpmk_window*n) where cp is the detected change point and n is the number of samples in the time series Whe`re both a mpmk_window and a check_step>1 is passed the mpmk_window will be used to de`fine the window size and the check_step will be used to define the step size within the window.`

  • nsims_pettit – number of simulations to run for calc`ulating the pvalue of the pettitt test (not used for other tests)

  • ncores – number of cores to use for multiprocessing, None will use all available cores

  • log_level – logging level for multiprocessing subprocesses

  • return_true_conc – return the true concentration time series for each simulation with power calcs (not supported with multiprocessing power calcs)

  • return_noisy_conc_itters – int <= nsims, default = 0 Number of noisy simulations to return. if 0 then no noisy simulations are returned, not supported with multiprocessing power calcs

  • only_significant_noisy – bool if True then only return noisy simulations where a change was detected if there are fewer noisy simulations with changes detected than return_noisy_conc_itters all significant simulations will be returned. if there are no noisy simulations with changes detected then and empty dataframe is returned

  • print_freq – None or int: if None then no progress will be printed, if int then progress will be printed every print_freq simulations (n%print_freq==0)

mulitprocess_power_calcs(outpath: {Path, None, str}, idv_vals: numpy.ndarray, error_vals: {np.ndarray, float}, true_conc_ts_vals: {np.ndarray, list}, seed_vals: {np.ndarray, int, None} = None, run=True, debug_mode=False, **kwargs)[source]#

multiprocessing wrapper for power_calc, see power_calc for details note that if a given run raises and exception the traceback for the exception will be included in the returned dataset under the column ‘python_error’ if ‘python_error’ is None then the run was successful to change the number of cores used pass n_cores to the constructor init

Parameters:
  • outpath – path to save results to or None (no save)

  • idv_vals – id values for each simulation

All values from here on out should be either a single value or an array of values with the same shape as id_vals

Parameters:
  • error_vals – standard deviation of noise to add for each simulation

  • true_conc_ts_vals – the true concentration time series for each simulation, note that this can be a list of arrays of different lengths for each simulation, as Numpy does not support jagged arrays

  • seed – ndarray (integer seeds), None (no seeds), or int (1 seed for all simulations)

  • run – if True run the simulations, if False just build the run_dict and print the number of simulations

  • debug_mode – if True run as single process to allow for easier debugging

  • kwargs – any other kwargs to pass directly to the output dataframe

Returns:

dataframe with input data and the results of all of the power calcs. note power is percent 0-100

plot_iteration(y0, true_conc, ax=None)[source]#

plot the concentration data itteration and the true concentration data if provided as well as the power test results and any predictions from the power test (e.g. the slope of the line used)

Parameters:
  • y0 – noisy concentration data

  • true_conc – true concentration data

Returns:

fig, ax

power_calc(idv, error: float, true_conc_ts: numpy.ndarray, seed: {int, None} = None, testnitter=None, **kwargs)[source]#

calculate the slope detection power of a given concentration time series, note the power is calculated using the sampling frequency of the true_conc_ts, if you want to test the power at a different sampling frequency then you should resample the true_conc_ts before passing it to this function

Parameters:
  • idv – identifiers for the power calc sites, passed straight through to the output

  • error – standard deviation of the noise

  • true_conc_ts – the true concentration timeseries for the power calc

  • seed – int or None for random seed

  • testnitter – None (usually) or a different nitter then self.niter for testing run times

  • kwargs – any other kwargs to pass directly to the output Series

Returns:

pd.Series with the power calc results note power is percent 0-100

Possible other dataframes if self.return_true_conc is True or self.return_noisy_conc_itters > 0 in which case a dictionary will be returned:

{‘power’: power_df, # always ‘true_conc’: true_conc_ts, if self.return_true_conc is True ‘noisy_conc’ : noisy_conc_ts, if self.return_noisy_conc_itters > 0 }

timeit_test(methods=methods, ndata=ndata, efficency_modes=efficency_modes, n=1)[source]#

run an automated timeit test, must be outside of the function definition, prints results in scientific notation units are seconds

Parameters:
  • methods – list of methods to test

  • ndata – list of data sizes to test

  • efficency_modes – list of efficency modes to test

  • n – number of times to test

Returns: