cpm.optimisation

cpm.optimisation.DifferentialEvolution(model=None, data=None, minimisation=minimise.LogLikelihood.bernoulli, prior=False, parallel=False, cl=None, libraries=['numpy', 'pandas'], ppt_identifier=None, display=False, **kwargs)

Class representing the Differential Evolution optimization algorithm.

Parameters:

Name Type Description Default
model cpm.generators.Wrapper

The model to be optimized.

None
data pd.DataFrame, pd.DataFrameGroupBy, list

The data used for optimization. If a pd.Dataframe, it is grouped by the ppt_identifier. If it is a pd.DataFrameGroupby, groups are assumed to be participants. An array of dictionaries, where each dictionary contains the data for a single participant, including information about the experiment and the results too. See Notes for more information.

None
minimisation function

The loss function for the objective minimization function. Default is minimise.LogLikelihood.bernoulli. See the minimise module for more information. User-defined loss functions are also supported, but they must conform to the format of currently implemented ones.

minimise.LogLikelihood.bernoulli
prior

Whether to include priors in the optimisation. Deafult is 'False'.

False
parallel bool

Whether to use parallel processing. Default is False.

False
cl int

The number of cores to use for parallel processing. Default is None. If None, the number of cores is set to 2. If cl is set to None and parallel is set to True, the number of cores is set to the number of cores available on the machine.

None
libraries list, optional

The libraries to import for parallel processing for ipyparallel with the IPython kernel. Default is ["numpy", "pandas"]

['numpy', 'pandas']
ppt_identifier str

The key in the participant data dictionary that contains the participant identifier. Default is None. Returned in the optimization details.

None
**kwargs dict

Additional keyword arguments. See the scipy.optimize.differential_evolution documentation for what is supported.

{}

Notes

The data parameter must contain all input to the model, including the observed data. The data parameter can be a pandas DataFrame, a pandas DataFrameGroupBy object, or a list of dictionaries. If the data parameter is a pandas DataFrame, it is assumed that the data needs to be grouped by the participant identifier, ppt_identifier. If the data parameter is a pandas DataFrameGroupBy object, the groups are assumed to be participants. If the data parameter is a list of dictionaries, each dictionary should contain the data for a single participant, including information about the experiment and the results. The observed data for each participant should be included in the dictionary under the key or column 'observed'. The 'observed' key should correspond, both in format and shape, to the 'dependent' variable calculated by the model Wrapper.

export(details=False)

Exports the optimization results and fitted parameters as a pandas.DataFrame.

Parameters:

Name Type Description Default
details bool

Whether to include the various metrics related to the optimisation routine in the output.

False

Returns:

Type Description
pandas.DataFrame

A pandas DataFrame containing the optimization results and fitted parameters. If details is True, the DataFrame will also include the optimization details.

Notes

The DataFrame will not contain the population and population_energies keys from the optimization details. If you want to investigate it, please use the details attribute.

optimise()

Performs the optimization process.

Returns:

Type Description
None

reset()

Resets the optimization results and fitted parameters.

Returns: - None

cpm.optimisation.Fmin(model=None, data=None, initial_guess=None, minimisation=None, cl=None, parallel=False, libraries=['numpy', 'pandas'], prior=False, number_of_starts=1, ppt_identifier=None, display=False, **kwargs)

Class representing the Fmin search (unbounded) optimization algorithm using a downhill simplex.

Parameters:

Name Type Description Default
model cpm.generators.Wrapper

The model to be optimized.

None
data pd.DataFrame, pd.DataFrameGroupBy, list

The data used for optimization. If a pd.Dataframe, it is grouped by the ppt_identifier. If it is a pd.DataFrameGroupby, groups are assumed to be participants. An array of dictionaries, where each dictionary contains the data for a single participant, including information about the experiment and the results too. See Notes for more information.

None
minimisation function

The loss function for the objective minimization function. See the minimise module for more information. User-defined loss functions are also supported.

None
prior

Whether to include the prior in the optimization. Default is False.

False
number_of_starts int

The number of random initialisations for the optimization. Default is 1.

1
initial_guess list or array-like

The initial guess for the optimization. Default is None. If number_of_starts is set, and the initial_guess parameter is 'None', the initial guesses are randomly generated from a uniform distribution.

None
parallel bool

Whether to use parallel processing. Default is False.

False
cl int

The number of cores to use for parallel processing. Default is None. If None, the number of cores is set to 2. If cl is set to None and parallel is set to True, the number of cores is set to the number of cores available on the machine.

None
libraries list, optional

The libraries to import for parallel processing for ipyparallel with the IPython kernel. Default is ["numpy", "pandas"].

['numpy', 'pandas']
ppt_identifier str

The key in the participant data dictionary that contains the participant identifier. Default is None. Returned in the optimization details.

None
**kwargs dict

Additional keyword arguments. See the scipy.optimize.fmin documentation for what is supported.

{}

Notes

The data parameter must contain all input to the model, including the observed data. The data parameter can be a pandas DataFrame, a pandas DataFrameGroupBy object, or a list of dictionaries. If the data parameter is a pandas DataFrame, it is assumed that the data needs to be grouped by the participant identifier, ppt_identifier. If the data parameter is a pandas DataFrameGroupBy object, the groups are assumed to be participants. If the data parameter is a list of dictionaries, each dictionary should contain the data for a single participant, including information about the experiment and the results. The observed data for each participant should be included in the dictionary under the key or column 'observed'. The 'observed' key should correspond, both in format and shape, to the 'dependent' variable calculated by the model Wrapper.

The optimization process is repeated number_of_starts times, and only the best-fitting output from the best guess is stored.

export()

Exports the optimization results and fitted parameters as a pandas.DataFrame.

Returns:

Type Description
pandas.DataFrame

A pandas DataFrame containing the optimization results and fitted parameters.

optimise()

Performs the optimization process.

Returns: - None

reset(initial_guess=True)

Resets the optimization results and fitted parameters.

Parameters:

Name Type Description Default
initial_guess bool, optional

Whether to reset the initial guess (generates a new set of random numbers within parameter bounds). Default is True.

True

Returns:

Type Description
None

cpm.optimisation.FminBound(model=None, data=None, initial_guess=None, number_of_starts=1, minimisation=None, cl=None, parallel=False, libraries=['numpy', 'pandas'], prior=False, ppt_identifier=None, display=False, **kwargs)

Class representing the Fmin search (bounded) optimization algorithm using the L-BFGS-B method.

Parameters:

Name Type Description Default
model cpm.generators.Wrapper

The model to be optimized.

None
data pd.DataFrame, pd.DataFrameGroupBy, list

The data used for optimization. If a pd.Dataframe, it is grouped by the ppt_identifier. If it is a pd.DataFrameGroupby, groups are assumed to be participants. An array of dictionaries, where each dictionary contains the data for a single participant, including information about the experiment and the results too. See Notes for more information.

None
minimisation function

The loss function for the objective minimization function. See the minimise module for more information. User-defined loss functions are also supported.

None
prior

Whether to include the prior in the optimization. Default is False.

False
number_of_starts int

The number of random initialisations for the optimization. Default is 1.

1
initial_guess list or array-like

The initial guess for the optimization. Default is None. If number_of_starts is set, and the initial_guess parameter is 'None', the initial guesses are randomly generated from a uniform distribution.

None
parallel bool

Whether to use parallel processing. Default is False.

False
cl int

The number of cores to use for parallel processing. Default is None. If None, the number of cores is set to 2. If cl is set to None and parallel is set to True, the number of cores is set to the number of cores available on the machine.

None
libraries list, optional

The libraries to import for parallel processing for ipyparallel with the IPython kernel. Default is ["numpy", "pandas"].

['numpy', 'pandas']
ppt_identifier str

The key in the participant data dictionary that contains the participant identifier. Default is None. Returned in the optimization details.

None
**kwargs dict

Additional keyword arguments. See the scipy.optimize.fmin_l_bfgs_b documentation for what is supported.

{}

Notes

The data parameter must contain all input to the model, including the observed data. The data parameter can be a pandas DataFrame, a pandas DataFrameGroupBy object, or a list of dictionaries. If the data parameter is a pandas DataFrame, it is assumed that the data needs to be grouped by the participant identifier, ppt_identifier. If the data parameter is a pandas DataFrameGroupBy object, the groups are assumed to be participants. If the data parameter is a list of dictionaries, each dictionary should contain the data for a single participant, including information about the experiment and the results. The observed data for each participant should be included in the dictionary under the key or column 'observed'. The 'observed' key should correspond, both in format and shape, to the 'dependent' variable calculated by the model Wrapper.

The optimization process is repeated number_of_starts times, and only the best-fitting output from the best guess is stored.

export()

Exports the optimization results and fitted parameters as a pandas.DataFrame.

Returns:

Type Description
pandas.DataFrame

A pandas DataFrame containing the optimization results and fitted parameters.

optimise(display=True)

Performs the optimization process.

Returns: - None

reset(initial_guess=True)

Resets the optimization results and fitted parameters.

Parameters:

Name Type Description Default
initial_guess bool, optional

Whether to reset the initial guess (generates a new set of random numbers within parameter bounds). Default is True.

True

Returns:

Type Description
None

cpm.optimisation.Minimize(model=None, data=None, initial_guess=None, minimisation=None, method='Nelder-Mead', cl=None, parallel=False, libraries=['numpy', 'pandas'], prior=False, number_of_starts=1, ppt_identifier=None, display=False, **kwargs)

Class representing scipy's Minimize algorithm wrapped for subject-level parameter estimations.

Parameters:

Name Type Description Default
model cpm.generators.Wrapper

The model to be optimized.

None
data pd.DataFrame, pd.DataFrameGroupBy, list

The data used for optimization. If a pd.Dataframe, it is grouped by the ppt_identifier. If it is a pd.DataFrameGroupby, groups are assumed to be participants. An array of dictionaries, where each dictionary contains the data for a single participant, including information about the experiment and the results too. See Notes for more information.

None
minimisation function

The loss function for the objective minimization function. See the minimise module for more information. User-defined loss functions are also supported.

None
number_of_starts int

The number of random initialisations for the optimization. Default is 1.

1
initial_guess list or array-like

The initial guess for the optimization. Default is None. If number_of_starts is set, and the initial_guess parameter is 'None', the initial guesses are randomly generated from a uniform distribution.

None
parallel bool

Whether to use parallel processing. Default is False.

False
cl int

The number of cores to use for parallel processing. Default is None. If None, the number of cores is set to 2. If cl is set to None and parallel is set to True, the number of cores is set to the number of cores available on the machine.

None
libraries list, optional

The libraries to import for parallel processing for ipyparallel with the IPython kernel. Default is ["numpy", "pandas"]

['numpy', 'pandas']
ppt_identifier str

The key in the participant data dictionary that contains the participant identifier. Default is None. Returned in the optimization details.

None
**kwargs dict

Additional keyword arguments. See the scipy.optimize.minimize documentation for what is supported.

{}

Notes

The data parameter must contain all input to the model, including the observed data. The data parameter can be a pandas DataFrame, a pandas DataFrameGroupBy object, or a list of dictionaries. If the data parameter is a pandas DataFrame, it is assumed that the data needs to be grouped by the participant identifier, ppt_identifier. If the data parameter is a pandas DataFrameGroupBy object, the groups are assumed to be participants. If the data parameter is a list of dictionaries, each dictionary should contain the data for a single participant, including information about the experiment and the results. The observed data for each participant should be included in the dictionary under the key or column 'observed'. The 'observed' key should correspond, both in format and shape, to the 'dependent' variable calculated by the model Wrapper.

The optimization process is repeated number_of_starts times, and only the best-fitting output from the best guess is stored.

export()

Exports the optimization results and fitted parameters as a pandas.DataFrame.

Returns:

Type Description
pandas.DataFrame

A pandas DataFrame containing the optimization results and fitted parameters.

optimise()

Performs the optimization process.

Returns: - None

reset(initial_guess=True)

Resets the optimization results and fitted parameters.

Parameters:

Name Type Description Default
initial_guess bool, optional

Whether to reset the initial guess (generates a new set of random numbers within parameter bounds). Default is True.

True

Returns:

Type Description
None

cpm.optimisation.Bads(model=None, data=None, minimisation=minimise.LogLikelihood.continuous, prior=False, number_of_starts=1, initial_guess=None, parallel=False, cl=None, libraries=['numpy', 'pandas'], ppt_identifier=None, **kwargs)

Class representing the Bayesian Adaptive Direct Search (BADS) optimization algorithm.

Parameters:

Name Type Description Default
model cpm.generators.Wrapper

The model to be optimized.

None
data pd.DataFrame, pd.DataFrameGroupBy, list

The data used for optimization. If a pd.Dataframe, it is grouped by the ppt_identifier. If it is a pd.DataFrameGroupby, groups are assumed to be participants. An array of dictionaries, where each dictionary contains the data for a single participant, including information about the experiment and the results too. See Notes for more information.

None
minimisation function

The loss function for the objective minimization function. Default is minimise.LogLikelihood.continuous. See the minimise module for more information. User-defined loss functions are also supported.

minimise.LogLikelihood.continuous
prior

Whether to include the prior in the optimization. Default is False.

False
number_of_starts int

The number of random initialisations for the optimization. Default is 1.

1
initial_guess list or array-like

The initial guess for the optimization. Default is None. If number_of_starts is set, and the initial_guess parameter is 'None', the initial guesses are randomly generated from a uniform distribution.

None
parallel bool

Whether to use parallel processing. Default is False.

False
cl int

The number of cores to use for parallel processing. Default is None. If None, the number of cores is set to 2. If cl is set to None and parallel is set to True, the number of cores is set to the number of cores available on the machine.

None
libraries list, optional

The libraries required for the parallel processing with ipyparallel with the IPython kernel. Default is ["numpy", "pandas"].

['numpy', 'pandas']
ppt_identifier str

The key in the participant data dictionary that contains the participant identifier. Default is None. Returned in the optimization details.

None
**kwargs dict

Additional keyword arguments. See the pybads.bads documentation for what is supported.

{}

Notes

The data parameter must contain all input to the model, including the observed data. The data parameter can be a pandas DataFrame, a pandas DataFrameGroupBy object, or a list of dictionaries. If the data parameter is a pandas DataFrame, it is assumed that the data needs to be grouped by the participant identifier, ppt_identifier. If the data parameter is a pandas DataFrameGroupBy object, the groups are assumed to be participants. If the data parameter is a list of dictionaries, each dictionary should contain the data for a single participant, including information about the experiment and the results. The observed data for each participant should be included in the dictionary under the key or column 'observed'. The 'observed' key should correspond, both in format and shape, to the 'dependent' variable calculated by the model Wrapper.

The optimization process is repeated number_of_starts times, and only the best-fitting output from the best guess is stored.

The BADS algorithm has been designed to handle both deterministic and noisy (stochastic) target functions. A deterministic target function is a target function that returns the same exact probability value for a given dataset and proposed set of parameter values. By contrast, a stochastic target function returns varying probability values for the same input (data and parameters). The vast majority of models use a deterministic target function. We recommend that users make this explicit to BADS, by providing an options dictionary that includes the key uncertainty_handling set to False. Please see that BADS options documentation for more details.

export()

Exports the optimization results and fitted parameters as a pandas.DataFrame.

Returns:

Type Description
pandas.DataFrame

A pandas DataFrame containing the optimization results and fitted parameters.

optimise()

Performs the optimization process.

Returns: - None

reset(initial_guess=True)

Resets the optimization results and fitted parameters.

Parameters:

Name Type Description Default
initial_guess bool, optional

Whether to reset the initial guess (generates a new set of random numbers within parameter bounds). Default is True.

True

Returns:

Type Description
None

minimise

cpm.optimisation.minimise.LogLikelihood()

bernoulli(predicted=None, observed=None, negative=True, **kwargs)

Compute the log likelihood of the predicted values given the observed values for Bernoulli data.

Bernoulli(y|p) = p if y = 1 and 1 - p if y = 0

Parameters:

Name Type Description Default
predicted array-like

The predicted values. It must have the same shape as observed. See Notes for more details.

None
observed array-like

The observed values. It must have the same shape as predicted. See Notes for more details.

None
negative bool, optional

Flag indicating whether to return the negative log likelihood.

True

Returns:

Type Description
float

The summed log likelihood or negative log likelihood.

Notes

predicted and observed must have the same shape. observed is a binary variable, so it can only take the values 0 or 1. predicted must be a value between 0 and 1. Values are clipped to avoid log(0) and log(1). If we encounter any non-finite values, we set any log likelihood to the value of np.log(1e-100).

Examples:

>>> import numpy as np
>>> observed = np.array([1, 0, 1, 0])
>>> predicted = np.array([0.7, 0.3, 0.6, 0.4])
>>> LogLikelihood.bernoulli(predicted, observed)
1.7350011354094463

categorical(predicted=None, observed=None, negative=True, **kwargs)

Compute the log likelihood of the predicted values given the observed values for categorical data.

Categorical(y|p) = p_y

Parameters:

Name Type Description Default
predicted array-like

The predicted values. It must have the same shape as observed. See Notes for more details.

None
observed array-like

The observed values. It must have the same shape as predicted. See Notes for more details.

None
negative bool, optional

Flag indicating whether to return the negative log likelihood.

True

Returns:

Type Description
float

The log likelihood or negative log likelihood.

Notes

predicted and observed must have the same shape. observed is a vector of integers starting from 0 (first possible response), where each integer corresponds to the observed value. If there are two choice options, then observed would have a shape of (n, 2) and predicted would have a shape of (n, 2). On each row of observed, the array would have a 1 in the column corresponding to the observed value and a 0 in the other column.

Examples:

>>> import numpy as np
>>> observed = np.array([0, 1, 0, 1])
>>> predicted = np.array([[0.7, 0.3], [0.3, 0.7], [0.6, 0.4], [0.4, 0.6]])
>>> LogLikelihood.categorical(predicted, observed)
1.7350011354094463

continuous(predicted, observed, negative=True, **kwargs)

Compute the log likelihood of the predicted values given the observed values for continuous data.

Parameters:

Name Type Description Default
predicted array-like

The predicted values.

required
observed array-like

The observed values.

required
negative bool, optional

Flag indicating whether to return the negative log likelihood.

True

Returns:

Type Description
float

The summed log likelihood or negative log likelihood.

Examples:

>>> import numpy as np
>>> observed = np.array([1, 0, 1, 0])
>>> predicted = np.array([0.7, 0.3, 0.6, 0.4])
>>> LogLikelihood.continuous(predicted, observed)
1.7350011354094463

cpm.optimisation.minimise.Distance()

MSE(predicted, observed, **kwargs)

Compute the Mean Squared Errors (EDE).

Parameters:

Name Type Description Default
predicted array-like

The predicted values.

required
observed array-like

The observed values.

required

Returns:

Type Description
float

The Euclidean distance.

SSE(predicted, observed, **kwargs)

Compute the sum of squared errors (SSE).

Parameters:

Name Type Description Default
predicted array-like

The predicted values.

required
observed array-like

The observed values.

required

Returns:

Type Description
float

The sum of squared errors.

cpm.optimisation.minimise.Bayesian()

AIC(likelihood, n, k, **kwargs)

Calculate the Akaike Information Criterion (AIC).

Parameters:

Name Type Description Default
likelihood float

The log likelihood value.

required
n int

The number of data points.

required
k int

The number of parameters.

required

Returns:

Type Description
float

The AIC value.

BIC(likelihood, n, k, **kwargs)

Calculate the Bayesian Information Criterion (BIC).

Parameters:

Name Type Description Default
likelihood float

The log likelihood value.

required
n int

The number of data points.

required
k int

The number of parameters.

required

Returns:

Type Description
float

The BIC value.