cpm.applications

Reinforcement Learning

cpm.applications.reinforcement_learning.RLRW(data=None, dimensions=2, parameters_settings=None, generate=False)

Bases: Wrapper

The class implements a simple reinforcement learning model for a multi-armed bandit tasks using a standard update rule calculating prediction error and a Softmax decision rule. The model is an n-dimensional and k-armed implementation of model 3 from Wilson and Collins (2019), which largely corresponds to the model presented by Suttong & Barto (2021) in Chapter 14.

Parameters:
  • data

    The data to be fit by the model. The data must contain columns for the choice and reward for each dimension. See Notes for more information on what columns should you include.

  • dimensions

    The number of distinct stimuli present in the data.

  • parameters_settings

    The parameters to be fit by the model. The parameters must be specified as a list of lists, with each list containing the value, lower, and upper bounds of the parameter. See Notes for more information on how to specify parameters and for the default settings.

Returns:
  • Wrapper

    A cpm.generators.Wrapper object.

Examples:

>>> import numpy
>>> import pandas
>>> from cpm.applications import RLRW
>>> from cpm.datasets import load_bandit_data
>>> twoarm = load_bandit_data()
>>> model = RLRW(data=data, dimensions=4)
>>> model.run()
Notes

The model implementation uses two parameters: - alpha: the learning rate, which determines how much the model updates its values based on the prediction error. - temperature: the inverse temperature, which determines the choice stochasticity -- how sensitive is the model to value differences.

Data must contain the following columns:

  • choice: the choice of the participant from the available options, starting from 0.
  • arm_n: the stimulus identifier for each option (arms in the bandit task), where n is the option available on a given trial. If there are more than one options, the stimulus identifier should be specified as separate columns of arm_1, arm_2, arm_3, etc. or arm_left, arm_middle, arm_right, etc.
  • reward_n: the reward given after each options, where n is the corresponding arm of the bandit available on a given trial. If there are more than one options, the reward should be specified as separate columns of reward_1, reward_2, reward_3, etc.

parameters_settings must be a 2D array, like [[0.5, 0, 1], [5, 1, 10]], where the first list specifies the alpha parameter and the second list specifies the temperature parameter. The first element of each list is the initial value of the parameter, the second element is the lower bound, and the third element is the upper bound. The default settings are 0.5 for alpha with a lower bound of 0 and an upper bound of 1, and 5 for temperature with a lower bound of 1 and an upper bound of 10.

References

Robert C Wilson & Anne GE Collins (2019) Ten simple rules for the computational modeling of behavioral data eLife 8:e49547.

Decision Making

cpm.applications.decision_making.PTSM(data=None, parameters_settings=None, generate=False, utility_curve=None, weighting='tk')

Bases: Wrapper

A simplified version of the Prospect Theory-based Softmax Model (PTSM) for decision-making tasks based on Tversky & Kahneman (1992), similar to the initial publication of the theory in Kahneman & Tversky (1979). It differs from cpm.applications.decision_making.PTSM2025 and cpm.applications.decision_making.PTSM1992 in that it does not use use different utility and weight curvature parameters for gains and losses.

Parameters:
  • data (DataFrame, default: None ) –

    The data, where each row is a trial and each column is an input to the model. Expected to have columns: 'safe_magnitudes', 'risky_magnitudes', 'risky_probability', 'observed'.

  • parameters_settings (dict, default: None ) –

    A dictionary containing the initial values and bounds for the model parameters. Each key must correspond to the name of the parameter, and contain a list in the form of [initial, lower_bound, upper_bound]. If not provided, default values are used. See Notes.

  • utility_curve (callable, default: None ) –

    A callable function that defines the utility curve. If provided, it overrides the default power function used for utility transformations. Its first argument should be the magnitude, and the second argument should be the curvature parameter (alpha). If None, a power function is used, see Notes.

  • weighting (str, default: 'tk' ) –

    The probability weighting function to use. Options include:

    - "power": use a simple power function (p^gamma)
    - "tk": use the Tversky–Kahneman (1992) weighting function.
    

    See cpm.models.activation.ProspectUtility for explanation and alternatives.

Returns:
  • Wrapper

    An instance of the PTSM model, which can be used to fit data and generate predictions.

Notes

The model parameters are initialized with the following default values if not specified (values are in the form [initial, lower_bound, upper_bound]):

- `alpha`: [1.0, 1e-2, 5.0] (utility curvature for both gains and losses)
- `lambda_loss`: [1.0, 1e-2, 5.0] (loss sensitivity)
- `gamma`: [0.5, 1e-2, 5.0] (curvature for the weighting function for both gains and losses)
- `temperature`: [5.0, 1e-2, 15.0] (temperature parameter for softmax)

The priors for the parameters are set as follows:

- `alpha`: truncated normal with mean 1.0 and standard deviation 1.0.
- `lambda_loss`: truncated normal with mean 2.5 and standard deviation 1.0.
- `gamma`: truncated normal with mean 2.5 and standard deviation 1.0.
- `temperature`: truncated normal with mean 10.0 and standard deviation 5.0.
Model Specification

The model computes the subjective utility of the safe and risky options using a utility function, which can be either a power function or a user-defined utility curve. If a utility curve is not provided, the model uses the following power function with curvature parameter \(\alpha\) after Tversky & Kahneman (1992):

\[ \mathcal{U}(o) = \sum_{i=1}^{n} w(p_i) \cdot u(x_i) \]

where \(w\) is a weighting function of the probability p of a potential outcome, and \(u\) is the utility function of the magnitude x of a potential outcome. The choice options is denoted with \(o\). The utility function \(u\) is defined as a power function for both gains and losses. It is implemented after Equation 5 in Tversky & Kahneman (1992):

\[ u(x) = \begin{cases} x^\alpha & \text{if } x \geq 0 \\ -\lambda \cdot (-x)^\alpha & \text{if } x < 0 \end{cases} \]

where \(\alpha\) is the utility curvature parameter for both gains and losses, and \(\lambda\) is the loss aversion parameter. The weighting function is implemented after Equation 6 in Tversky & Kahneman (1992):

\[ w(p) = \frac{p^\gamma}{(p^\gamma + (1 - p)^\gamma)^{1/\gamma}} \]

where gamma, denoted via \(\gamma\), is the discriminability parameter of the weighting function for both gains and losses. The model then applies the softmax function to compute the choice probabilities:

\[ p(o_i) = \frac{e^{\beta \cdot \mathcal{U}(o_i)}}{\sum_{j=1}^{n} e^{\beta \cdot \mathcal{U}(o_j)}} \]
Model output

The model outputs the following trial-level information:

- `policy`: the softmax probabilities for each option.
- `dependent`: the probability of choosing the risky option.
- `observed`: the observed (participant's) choice (0 for safe, 1 for risky).
- `chosen`: the chosen option based on the softmax probabilities.
- `is_optimal`: whether the chosen option is optimal (1 if chosen option is objectively better, 0 otherwise).
- `objective_best`: the objectively better option (1 for risky, 0 for safe) determined by the objective evidence for each.
- `ev_safe`: the expected value of the safe option.
- `ev_risk`: the expected value of the risky option.
- `u_safe`: the utility of the safe option.
- `u_risk`: the utility of the risky option.
See Also

cpm.models.decision.Softmax : for mapping utilities to choice probabilities.

cpm.models.activation.ProspectUtility : for the Prospect Utility class that computes subjective utilities and weighted probabilities.

References

Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–291.

Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and uncertainty, 5, 297-323.

cpm.applications.decision_making.PTSM1992(data=None, parameters_settings=None, utility_curve=None, weighting='tk')

Bases: Wrapper

A Prospect Theory-based Softmax Model (PTSM) for decision-making tasks based on Tversky & Kahneman (1992), similar to the initial publication of the theory in Kahneman & Tversky (1979). It computes expected utility by combining transformed magnitudes and weighted probabilities, suitable for safe–risky decision paradigms.

The model computes objective EV internally (ev_safe vs. ev_risk) and outputs trial-level information (including whether the chosen option is optimal).

Additionally, the model accepts a "weighting" argument that determines which probability weighting function to use when computing the subjective weighting of risky probabilities.

Parameters:
  • data (DataFrame, default: None ) –

    The data, where each row is a trial and each column is an input to the model. Expected to have columns: 'safe_magnitudes', 'risky_magnitudes', 'risky_probability', 'observed'.

  • parameters_settings (dict, default: None ) –

    A dictionary containing the initial values and bounds for the model parameters. Each key must correspond to the name of the parameter, and contain a list in the form of [initial, lower_bound, upper_bound]. If not provided, default values are used. See Notes.

  • utility_curve (callable, default: None ) –

    A callable function that defines the utility curve. If provided, it overrides the default power function used for utility transformations. Its first argument should be the magnitude. The following variables are also passed to this function: alpha, beta and lambda_loss. If None, a power function is used, see Notes.

  • weighting (str, default: 'tk' ) –

    The probability weighting function to use. Options include:

    - "power": use a simple power function (p^gamma)
    - "tk": use the Tversky–Kahneman (1992) weighting function.
    

    See cpm.models.activation.ProspectUtility for explanation and alternatives.

Returns:
  • Wrapper

    An instance of the PTSM1992 model, which can be used to fit data and generate predictions.

Notes

The model parameters are initialized with the following default values if not specified (values are in the form [initial, lower_bound, upper_bound]):

- `alpha`: [1.0, 0, 5.0] (utility curvature for gains)
- `beta`: [1.0, 0, 5.0] (utility curvature for losses)
- `lambda_loss`: [1.0, 0, 5.0] (loss sensitivity)
- `gamma`: [0.5, 0.001, 5.0] (curvature for gains)
- `delta`: [0.5, 0.001, 5.0] (curvature for losses)
- `temperature`: [5.0, 0.001, 20.0] (temperature parameter for softmax)

The priors for the parameters are set as follows:

- `alpha`: truncated normal with mean 2.5 and standard deviation 1.0.
- `beta`: truncated normal with mean 2.5 and standard deviation 1.0.
- `lambda_loss`: truncated normal with mean 2.5 and standard deviation 1.0.
- `gamma`: truncated normal with mean 2.5 and standard deviation 1.0.
- `delta`: truncated normal with mean 0 and standard deviation 1.0.
- `temperature`: truncated normal with mean 10 and standard deviation 2.5.
Model Specification

The model computes the subjective utility of the safe and risky options using a utility function, which can be either a power function or a user-defined utility curve. If a utility curve is not provided, the model uses the following power function with curvature parameter \(\alpha\) after Tversky & Kahneman (1992):

\[ \mathcal{U}(o) = \sum_{i=1}^{n} w(p_i) \cdot u(x_i) \]

where \(w\) is a weighting function of the probability p of a potential outcome, and \(u\) is the utility function of the magnitude x of a potential outcome. The choice options is denoted with \(o\). The utility function \(u\) is defined as a power function for both gains and losses. It is implemented after Equation 5 in Tversky & Kahneman (1992):

\[ u(x) = \begin{cases} x^\alpha & \text{if } x \geq 0 \\ -\lambda \cdot (-x)^\beta & \text{if } x < 0 \end{cases} \]

where \(\alpha\) is the utility curvature parameter for gains, and \(\beta\), is the curvature parameter for losses, \(\lambda\) is the loss aversion parameter. The weighting function is implemented after Equation 6 in Tversky & Kahneman (1992):

\[ w^{+}(p) = \frac{p^\gamma}{(p^\gamma + (1 - p)^\gamma)^{1/\gamma}}, w^{-}(p) = \frac{p^\delta}{(p^\delta + (1 - p)^\delta)^{1/\delta}} \]

where gamma, denoted via \(\gamma\), is the discriminability parameter of the weighting function for gains, and with delta, denoted via \(\delta\), is the discriminability parameter of the weighting function for losses.

The model then applies the softmax function to compute the choice probabilities:

\[ p(o_i) = \frac{e^{beta \cdot \mathcal{U}(o_i)}}{\sum_{j=1}^{n} e^{beta \cdot \mathcal{U}(o_j)}} \]
Model output

The model outputs the following trial-level information:

- `policy`: the softmax probabilities for each option.
- `dependent`: the probability of choosing the risky option.
- `observed`: the observed (participant's) choice (0 for safe, 1 for risky).
- `chosen`: the chosen option based on the softmax probabilities.
- `is_optimal`: whether the chosen option is optimal (1 if chosen option is objectively better, 0 otherwise).
- `objective_best`: the objectively better option (1 for risky, 0 for safe) determined by the objective evidence for each.
- `ev_safe`: the expected value of the safe option.
- `ev_risk`: the expected value of the risky option.
- `u_safe`: the utility of the safe option.
- `u_risk`: the utility of the risky option.
See Also

cpm.models.decision.Softmax : for mapping utilities to choice probabilities.

cpm.models.activation.ProspectUtility : for the Prospect Utility class that computes subjective utilities and weighted probabilities.

References

Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–291.

Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and uncertainty, 5, 297-323.

cpm.applications.decision_making.PTSM2025(data=None, parameters_settings=None, utility_curve=None, variant='alpha')

Bases: Wrapper

An Prospect Theory Softmax Model loosely based on Chew et al. (2019), incorporating a bias term (phi_gain / phi_loss) in the softmax function for risks and gains, a utility curvature parameter (alpha) for non-linear utility transformations, and an ambiguity aversion parameter (eta).

Parameters:
  • data (DataFrame, default: None ) –

    Data containing the trials to be modeled, where each row represents a trial in the experiment (a state), and each column represents a variable (e.g., safe_magnitudes, risky_magnitudes, risky_probability, ambiguity, observed variable).

  • parameters_settings (dict, default: None ) –

    A dictionary containing the initial values and bounds for the model parameters. Each key must correspond to the name of the parameter, and contain a list in the form of [initial, lower_bound, upper_bound]. If not provided, default values are used. See Notes.

  • utility_curve (callable, default: None ) –

    A callable function that defines the utility curve. If provided, it overrides the default power function used for utility transformations. Its first argument should be the magnitude, and the second argument should be the curvature parameter (alpha). If None, a power function is used, see Notes.

  • variant (str, default: 'alpha' ) –

    The variant of the model to use. Options are "alpha" for the full model with a non-linear curvature or "standard" for a simplified version without curvature. Default is "alpha".

Returns:
  • Wrapper

    An instance of the PTSM2025 model, which can be used to fit data and generate predictions.

Notes

The model parameters are initialized with the following default values if not specified (values are in the form [initial, lower_bound, upper_bound]):

- `eta`: [0.0, -0.49, 0.49] (ambiguity aversion)
- `phi_gain`: [0.0, -10.0, 10.0] (gain sensitivity)
- `phi_loss`: [0.0, -10.0, 10.0] (loss sensitivity)
- `temperature`: [5.0, 0.001, 20.0] (temperature parameter)
- `alpha`: [1.0, 0.001, 5.0] (utility curvature parameter)

The priors for the parameters are set as follows:

- `eta`: truncated normal with mean 0.0 and standard deviation 0.25.
- `phi_gain`: truncated normal with mean 0.0 and standard deviation 2.5.
- `phi_loss`: truncated normal with mean 0.0 and standard deviation 2.5.
- `temperature`: truncated normal with mean 10.0 and standard deviation 5.
- `alpha`: truncated normal with mean 1.0 and standard deviation 1.
Model Description

In what follows, we briefly describe the model's operations. First, the model calculates the subjective probability of the risky option, adjusting for ambiguity aversion using the parameter eta, denoted with \(\eta\). The subjective probability is computed as:

\[ p_{subjective} = p_{risky} - \eta \cdot ambiguity \]

where \(p_{risky}\) is the original probability of the risky choice and \(ambiguity\) is the ambiguity associated with the risky option, either 0 for non-ambiguous or 1 for ambiguous cases. The utility of the safe and risky options is then computed using a utility function, which can be either a power function or a user-defined utility curve. If a utility curve is not provided, the model uses the following power function with curvature parameter alpha, denoted with \(\alpha\):

\[ u(x) = \begin{cases} x^\alpha & \text{if } x \geq 0 \\ -|x|^\alpha & \text{if } x < 0 \end{cases} \]

The model then applies loss aversion and gain sensitivity adjustments based on the sign of the risky choice magnitude. Here, the gain sensitivity phi_gain, denoted as \(\phi_{gain}\), is applied when the risky choice is positive, and the loss sensitivity phi_loss, denoted as \(\phi_{loss}\), is applied when the risky choice is negative. The adjusted probability of choosing the risky option, \(p(A_{risky})\), is computed using a softmax function:

\[ p(A_{risky}) = \frac{e^{\beta (u_{risky} + \phi_{t})}}{e^{\beta (u_{risky} + \phi_{t})} + e^{\beta u_{safe}}} \]

where denoted with \(\beta\) is the temperature parameter, \(u_{risky}\) is the utility of the risky option, \(u_{safe}\) is the utility of the safe option, and \(\phi_{t}\) is either \(\phi_{gain}\) or \(\phi_{loss}\) depending on the sign of the risky choice magnitude. Note that in Chew et al. (2019), the model only has a gambling bias term for the gain loss, that is then added to the difference between the safe and risky utilities, and only then transformed to a probability via a sigmoid function.

Furthermore, the model generates a response based on the computed probabilities, where the choice is sampled from a Bernoulli distribution with the computed policy as the probability of choosing the risky option.

Model Output

For each trial, the model outputs the following variables:

- `policy`: The computed probabilities for the risky options.
- `model_choice`: The model's predicted choice (0 for safe, 1 for risky).
- `real_choice`: The observed (participant's) choice from the data.
- `u_safe`: The utility of the safe option.
- `u_risk`: The utility of the risky option.
- `dependent`: The computed probability of a risky choice according to the model, which can be used for further analysis or fitting
References

Chew, B., Hauser, T. U., Papoutsi, M., Magerkurth, J., Dolan, R. J., & Rutledge, R. B. (2019). Endogenous fluctuations in the dopaminergic midbrain drive behavioral choice variability. Proceedings of the National Academy of Sciences, 116(37), 18732–18737. https://doi.org/10.1073/pnas.1900872116

Metacognition

cpm.applications.signal_detection.EstimatorMetaD(data=None, bins=None, cl=None, parallel=False, libraries=['numpy', 'pandas'], prior=False, display=False, ppt_identifier=None, ignore_invalid=False, **kwargs)

Class to estimate metacognitive parameters using the meta-d model proposed by Maniscalco and Lau (2012).

Parameters:
  • data ((DataFrame, DataFrameGroupBy), default: None ) –

    DataFrame containing the data to be analyzed. See Note below.

  • bins (int, default: None ) –

    Number of bins to use for binning the confidence ratings.

  • cl (int, default: None ) –

    Number of cores to use for parallel processing. If None, all available cores will be used.

  • parallel (bool, default: False ) –

    If True, parallel processing will be used.

  • libraries (list of str, default: ["numpy", "pandas"] ) –

    List of libraries to use for parallel processing in Jupyter. Default is ["numpy", "pandas"].

  • prior (bool, default: False ) –

    If True, the log likelihoods will incorporate prior density of parameters.

  • display (int, default: 0 ) –

    Level of algorithm's verbosity: * 0 (default) : work silently. * 1 : display a termination report. * 2 : display progress during iterations. * 3 : display progress during iterations (more complete report).

  • ppt_identifier (str, default: None ) –

    Identifier for participants in the data. If None, the default identifier will be used.

  • ignore_invalid (bool, default: False ) –

    If True, invalid confidence ratings will be ignored during binning. If False, an error will be raised if invalid ratings are found. We recommend setting this to False (Default).

  • **kwargs (additional keyword arguments, default: {} ) –

    Additional keyword arguments to be passed to the optimization function.

Returns:
  • An EstimatorMetaD object.
Note

The data DataFrame should contain the following columns:

  • 'participant': Identifier for each participant.
  • 'signal' (integer): Stimulus presented to the participant, for example, 0 for S1 and 1 for S2.
  • 'response' (integer): Participant's response to the stimulus.
  • 'confidence' (integer, float): Participant's confidence rating for their response.
  • 'accuracy' (integer): Accuracy of the participant's response. 0 = incorrect, 1 = correct.

export()

Exports the optimization results and fitted parameters as a pandas.DataFrame.

Returns:
  • DataFrame

    A pandas DataFrame containing the optimization results and fitted parameters.

optimise()

Estimates the metacognitive parameters using the meta-d model. Here, we use a Trust-Region Constrained Optimization algorithm (Conn et al., 2000) to fit the model to the data. We use the trust-constr method from scipy.optimize.minimize to perform the optimization, and minimise the negative log-likelihood of the data given the model parameters. The optimization is performed for each participant in the data.

Notes

If you want to tune the behaviour of the optimization, you can do so by passing additional keyword arguments to the class constructor. See the scipy.optimize.minimize documentation for more details on the available options. By default, the optimization will use the trust-constr method with the default options specified in the scipy.optimize.minimize documentation.

References

Conn, A. R., Gould, N. I. M., & Toint, P. L. (2000). Trust Region Methods. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898719857