cpm.optimisation
cpm.optimisation.DifferentialEvolution(model=None, data=None, minimisation=minimise.LogLikelihood.bernoulli, prior=False, parallel=False, cl=None, libraries=['numpy', 'pandas'], ppt_identifier=None, display=False, **kwargs)
Class representing the Differential Evolution optimization algorithm.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
cpm.generators.Wrapper
|
The model to be optimized. |
None
|
data |
pd.DataFrame, pd.DataFrameGroupBy, list
|
The data used for optimization. If a pd.Dataframe, it is grouped by the |
None
|
minimisation |
function
|
The loss function for the objective minimization function. Default is |
minimise.LogLikelihood.bernoulli
|
prior |
Whether to include priors in the optimisation. Deafult is 'False'. |
False
|
|
parallel |
bool
|
Whether to use parallel processing. Default is |
False
|
cl |
int
|
The number of cores to use for parallel processing. Default is |
None
|
libraries |
list, optional
|
The libraries to import for parallel processing for |
['numpy', 'pandas']
|
ppt_identifier |
str
|
The key in the participant data dictionary that contains the participant identifier. Default is |
None
|
**kwargs |
dict
|
Additional keyword arguments. See the |
{}
|
Notes
The data parameter must contain all input to the model, including the observed data. The data parameter can be a pandas DataFrame, a pandas DataFrameGroupBy object, or a list of dictionaries. If the data parameter is a pandas DataFrame, it is assumed that the data needs to be grouped by the participant identifier, ppt_identifier
. If the data parameter is a pandas DataFrameGroupBy object, the groups are assumed to be participants. If the data parameter is a list of dictionaries, each dictionary should contain the data for a single participant, including information about the experiment and the results. The observed data for each participant should be included in the dictionary under the key or column 'observed'. The 'observed' key should correspond, both in format and shape, to the 'dependent' variable calculated by the model Wrapper.
export(details=False)
Exports the optimization results and fitted parameters as a pandas.DataFrame
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
details |
bool
|
Whether to include the various metrics related to the optimisation routine in the output. |
False
|
Returns:
Type | Description |
---|---|
pandas.DataFrame
|
A pandas DataFrame containing the optimization results and fitted parameters. If |
Notes
The DataFrame will not contain the population and population_energies keys from the optimization details.
If you want to investigate it, please use the details
attribute.
optimise()
Performs the optimization process.
Returns:
Type | Description |
---|---|
None
|
reset()
Resets the optimization results and fitted parameters.
Returns: - None
cpm.optimisation.Fmin(model=None, data=None, initial_guess=None, minimisation=None, cl=None, parallel=False, libraries=['numpy', 'pandas'], prior=False, number_of_starts=1, ppt_identifier=None, display=False, **kwargs)
Class representing the Fmin search (unbounded) optimization algorithm using a downhill simplex.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
cpm.generators.Wrapper
|
The model to be optimized. |
None
|
data |
pd.DataFrame, pd.DataFrameGroupBy, list
|
The data used for optimization. If a pd.Dataframe, it is grouped by the |
None
|
minimisation |
function
|
The loss function for the objective minimization function. See the |
None
|
prior |
Whether to include the prior in the optimization. Default is |
False
|
|
number_of_starts |
int
|
The number of random initialisations for the optimization. Default is |
1
|
initial_guess |
list or array-like
|
The initial guess for the optimization. Default is |
None
|
parallel |
bool
|
Whether to use parallel processing. Default is |
False
|
cl |
int
|
The number of cores to use for parallel processing. Default is |
None
|
libraries |
list, optional
|
The libraries to import for parallel processing for |
['numpy', 'pandas']
|
ppt_identifier |
str
|
The key in the participant data dictionary that contains the participant identifier. Default is |
None
|
**kwargs |
dict
|
Additional keyword arguments. See the |
{}
|
Notes
The data parameter must contain all input to the model, including the observed data. The data parameter can be a pandas DataFrame, a pandas DataFrameGroupBy object, or a list of dictionaries. If the data parameter is a pandas DataFrame, it is assumed that the data needs to be grouped by the participant identifier, ppt_identifier
. If the data parameter is a pandas DataFrameGroupBy object, the groups are assumed to be participants. If the data parameter is a list of dictionaries, each dictionary should contain the data for a single participant, including information about the experiment and the results. The observed data for each participant should be included in the dictionary under the key or column 'observed'. The 'observed' key should correspond, both in format and shape, to the 'dependent' variable calculated by the model Wrapper.
The optimization process is repeated number_of_starts
times, and only the best-fitting output from the best guess is stored.
export()
Exports the optimization results and fitted parameters as a pandas.DataFrame
.
Returns:
Type | Description |
---|---|
pandas.DataFrame
|
A pandas DataFrame containing the optimization results and fitted parameters. |
optimise()
Performs the optimization process.
Returns: - None
reset(initial_guess=True)
Resets the optimization results and fitted parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
initial_guess |
bool, optional
|
Whether to reset the initial guess (generates a new set of random numbers within parameter bounds). Default is |
True
|
Returns:
Type | Description |
---|---|
None
|
cpm.optimisation.FminBound(model=None, data=None, initial_guess=None, number_of_starts=1, minimisation=None, cl=None, parallel=False, libraries=['numpy', 'pandas'], prior=False, ppt_identifier=None, display=False, **kwargs)
Class representing the Fmin search (bounded) optimization algorithm using the L-BFGS-B method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
cpm.generators.Wrapper
|
The model to be optimized. |
None
|
data |
pd.DataFrame, pd.DataFrameGroupBy, list
|
The data used for optimization. If a pd.Dataframe, it is grouped by the |
None
|
minimisation |
function
|
The loss function for the objective minimization function. See the |
None
|
prior |
Whether to include the prior in the optimization. Default is |
False
|
|
number_of_starts |
int
|
The number of random initialisations for the optimization. Default is |
1
|
initial_guess |
list or array-like
|
The initial guess for the optimization. Default is |
None
|
parallel |
bool
|
Whether to use parallel processing. Default is |
False
|
cl |
int
|
The number of cores to use for parallel processing. Default is |
None
|
libraries |
list, optional
|
The libraries to import for parallel processing for |
['numpy', 'pandas']
|
ppt_identifier |
str
|
The key in the participant data dictionary that contains the participant identifier. Default is |
None
|
**kwargs |
dict
|
Additional keyword arguments. See the |
{}
|
Notes
The data parameter must contain all input to the model, including the observed data. The data parameter can be a pandas DataFrame, a pandas DataFrameGroupBy object, or a list of dictionaries. If the data parameter is a pandas DataFrame, it is assumed that the data needs to be grouped by the participant identifier, ppt_identifier
. If the data parameter is a pandas DataFrameGroupBy object, the groups are assumed to be participants. If the data parameter is a list of dictionaries, each dictionary should contain the data for a single participant, including information about the experiment and the results. The observed data for each participant should be included in the dictionary under the key or column 'observed'. The 'observed' key should correspond, both in format and shape, to the 'dependent' variable calculated by the model Wrapper.
The optimization process is repeated number_of_starts
times, and only the best-fitting output from the best guess is stored.
export()
Exports the optimization results and fitted parameters as a pandas.DataFrame
.
Returns:
Type | Description |
---|---|
pandas.DataFrame
|
A pandas DataFrame containing the optimization results and fitted parameters. |
optimise(display=True)
Performs the optimization process.
Returns: - None
reset(initial_guess=True)
Resets the optimization results and fitted parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
initial_guess |
bool, optional
|
Whether to reset the initial guess (generates a new set of random numbers within parameter bounds). Default is |
True
|
Returns:
Type | Description |
---|---|
None
|
cpm.optimisation.Minimize(model=None, data=None, initial_guess=None, minimisation=None, method='Nelder-Mead', cl=None, parallel=False, libraries=['numpy', 'pandas'], prior=False, number_of_starts=1, ppt_identifier=None, display=False, **kwargs)
Class representing scipy's Minimize algorithm wrapped for subject-level parameter estimations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
cpm.generators.Wrapper
|
The model to be optimized. |
None
|
data |
pd.DataFrame, pd.DataFrameGroupBy, list
|
The data used for optimization. If a pd.Dataframe, it is grouped by the |
None
|
minimisation |
function
|
The loss function for the objective minimization function. See the |
None
|
number_of_starts |
int
|
The number of random initialisations for the optimization. Default is |
1
|
initial_guess |
list or array-like
|
The initial guess for the optimization. Default is |
None
|
parallel |
bool
|
Whether to use parallel processing. Default is |
False
|
cl |
int
|
The number of cores to use for parallel processing. Default is |
None
|
libraries |
list, optional
|
The libraries to import for parallel processing for |
['numpy', 'pandas']
|
ppt_identifier |
str
|
The key in the participant data dictionary that contains the participant identifier. Default is |
None
|
**kwargs |
dict
|
Additional keyword arguments. See the |
{}
|
Notes
The data parameter must contain all input to the model, including the observed data. The data parameter can be a pandas DataFrame, a pandas DataFrameGroupBy object, or a list of dictionaries. If the data parameter is a pandas DataFrame, it is assumed that the data needs to be grouped by the participant identifier, ppt_identifier
. If the data parameter is a pandas DataFrameGroupBy object, the groups are assumed to be participants. If the data parameter is a list of dictionaries, each dictionary should contain the data for a single participant, including information about the experiment and the results. The observed data for each participant should be included in the dictionary under the key or column 'observed'. The 'observed' key should correspond, both in format and shape, to the 'dependent' variable calculated by the model Wrapper.
The optimization process is repeated number_of_starts
times, and only the best-fitting output from the best guess is stored.
export()
Exports the optimization results and fitted parameters as a pandas.DataFrame
.
Returns:
Type | Description |
---|---|
pandas.DataFrame
|
A pandas DataFrame containing the optimization results and fitted parameters. |
optimise()
Performs the optimization process.
Returns: - None
reset(initial_guess=True)
Resets the optimization results and fitted parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
initial_guess |
bool, optional
|
Whether to reset the initial guess (generates a new set of random numbers within parameter bounds). Default is |
True
|
Returns:
Type | Description |
---|---|
None
|
cpm.optimisation.Bads(model=None, data=None, minimisation=minimise.LogLikelihood.continuous, prior=False, number_of_starts=1, initial_guess=None, parallel=False, cl=None, libraries=['numpy', 'pandas'], ppt_identifier=None, **kwargs)
Class representing the Bayesian Adaptive Direct Search (BADS) optimization algorithm.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
cpm.generators.Wrapper
|
The model to be optimized. |
None
|
data |
pd.DataFrame, pd.DataFrameGroupBy, list
|
The data used for optimization. If a pd.Dataframe, it is grouped by the |
None
|
minimisation |
function
|
The loss function for the objective minimization function. Default is |
minimise.LogLikelihood.continuous
|
prior |
Whether to include the prior in the optimization. Default is |
False
|
|
number_of_starts |
int
|
The number of random initialisations for the optimization. Default is |
1
|
initial_guess |
list or array-like
|
The initial guess for the optimization. Default is |
None
|
parallel |
bool
|
Whether to use parallel processing. Default is |
False
|
cl |
int
|
The number of cores to use for parallel processing. Default is |
None
|
libraries |
list, optional
|
The libraries required for the parallel processing with |
['numpy', 'pandas']
|
ppt_identifier |
str
|
The key in the participant data dictionary that contains the participant identifier. Default is |
None
|
**kwargs |
dict
|
Additional keyword arguments. See the |
{}
|
Notes
The data parameter must contain all input to the model, including the observed data. The data parameter can be a pandas DataFrame, a pandas DataFrameGroupBy object, or a list of dictionaries. If the data parameter is a pandas DataFrame, it is assumed that the data needs to be grouped by the participant identifier, ppt_identifier
. If the data parameter is a pandas DataFrameGroupBy object, the groups are assumed to be participants. If the data parameter is a list of dictionaries, each dictionary should contain the data for a single participant, including information about the experiment and the results. The observed data for each participant should be included in the dictionary under the key or column 'observed'. The 'observed' key should correspond, both in format and shape, to the 'dependent' variable calculated by the model Wrapper.
The optimization process is repeated number_of_starts
times, and only the best-fitting output from the best guess is stored.
The BADS algorithm has been designed to handle both deterministic and noisy (stochastic) target functions. A deterministic target function is a target function that returns the same exact probability value for a given dataset and proposed set of parameter values. By contrast, a stochastic target function returns varying probability values for the same input (data and parameters).
The vast majority of models use a deterministic target function. We recommend that users make this explicit to BADS, by providing an options
dictionary that includes the key uncertainty_handling
set to False
.
Please see that BADS options documentation for more details.
export()
Exports the optimization results and fitted parameters as a pandas.DataFrame
.
Returns:
Type | Description |
---|---|
pandas.DataFrame
|
A pandas DataFrame containing the optimization results and fitted parameters. |
optimise()
Performs the optimization process.
Returns: - None
reset(initial_guess=True)
Resets the optimization results and fitted parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
initial_guess |
bool, optional
|
Whether to reset the initial guess (generates a new set of random numbers within parameter bounds). Default is |
True
|
Returns:
Type | Description |
---|---|
None
|
minimise
cpm.optimisation.minimise.LogLikelihood()
bernoulli(predicted=None, observed=None, negative=True, **kwargs)
Compute the log likelihood of the predicted values given the observed values for Bernoulli data.
Bernoulli(y|p) = p if y = 1 and 1 - p if y = 0
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predicted |
array-like
|
The predicted values. It must have the same shape as |
None
|
observed |
array-like
|
The observed values. It must have the same shape as |
None
|
negative |
bool, optional
|
Flag indicating whether to return the negative log likelihood. |
True
|
Returns:
Type | Description |
---|---|
float
|
The summed log likelihood or negative log likelihood. |
Notes
predicted
and observed
must have the same shape.
observed
is a binary variable, so it can only take the values 0 or 1.
predicted
must be a value between 0 and 1.
Values are clipped to avoid log(0) and log(1).
If we encounter any non-finite values, we set any log likelihood to the value of np.log(1e-100).
Examples:
>>> import numpy as np
>>> observed = np.array([1, 0, 1, 0])
>>> predicted = np.array([0.7, 0.3, 0.6, 0.4])
>>> LogLikelihood.bernoulli(predicted, observed)
1.7350011354094463
categorical(predicted=None, observed=None, negative=True, **kwargs)
Compute the log likelihood of the predicted values given the observed values for categorical data.
Categorical(y|p) = p_y
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predicted |
array-like
|
The predicted values. It must have the same shape as |
None
|
observed |
array-like
|
The observed values. It must have the same shape as |
None
|
negative |
bool, optional
|
Flag indicating whether to return the negative log likelihood. |
True
|
Returns:
Type | Description |
---|---|
float
|
The log likelihood or negative log likelihood. |
Notes
predicted
and observed
must have the same shape.
observed
is a vector of integers starting from 0 (first possible response), where each integer corresponds to the observed value.
If there are two choice options, then observed would have a shape of (n, 2) and predicted would have a shape of (n, 2).
On each row of observed
, the array would have a 1 in the column corresponding to the observed value and a 0 in the other column.
Examples:
>>> import numpy as np
>>> observed = np.array([0, 1, 0, 1])
>>> predicted = np.array([[0.7, 0.3], [0.3, 0.7], [0.6, 0.4], [0.4, 0.6]])
>>> LogLikelihood.categorical(predicted, observed)
1.7350011354094463
continuous(predicted, observed, negative=True, **kwargs)
Compute the log likelihood of the predicted values given the observed values for continuous data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predicted |
array-like
|
The predicted values. |
required |
observed |
array-like
|
The observed values. |
required |
negative |
bool, optional
|
Flag indicating whether to return the negative log likelihood. |
True
|
Returns:
Type | Description |
---|---|
float
|
The summed log likelihood or negative log likelihood. |
Examples:
>>> import numpy as np
>>> observed = np.array([1, 0, 1, 0])
>>> predicted = np.array([0.7, 0.3, 0.6, 0.4])
>>> LogLikelihood.continuous(predicted, observed)
1.7350011354094463
cpm.optimisation.minimise.Distance()
MSE(predicted, observed, **kwargs)
Compute the Mean Squared Errors (EDE).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predicted |
array-like
|
The predicted values. |
required |
observed |
array-like
|
The observed values. |
required |
Returns:
Type | Description |
---|---|
float
|
The Euclidean distance. |
SSE(predicted, observed, **kwargs)
Compute the sum of squared errors (SSE).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predicted |
array-like
|
The predicted values. |
required |
observed |
array-like
|
The observed values. |
required |
Returns:
Type | Description |
---|---|
float
|
The sum of squared errors. |
cpm.optimisation.minimise.Bayesian()
AIC(likelihood, n, k, **kwargs)
Calculate the Akaike Information Criterion (AIC).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
likelihood |
float
|
The log likelihood value. |
required |
n |
int
|
The number of data points. |
required |
k |
int
|
The number of parameters. |
required |
Returns:
Type | Description |
---|---|
float
|
The AIC value. |
BIC(likelihood, n, k, **kwargs)
Calculate the Bayesian Information Criterion (BIC).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
likelihood |
float
|
The log likelihood value. |
required |
n |
int
|
The number of data points. |
required |
k |
int
|
The number of parameters. |
required |
Returns:
Type | Description |
---|---|
float
|
The BIC value. |