cpm.applications
Reinforcement Learning
cpm.applications.reinforcement_learning.RLRW(data=None, dimensions=2, parameters_settings=None, generate=False)
Bases: Wrapper
The class implements a simple reinforcement learning model for a multi-armed bandit tasks using a standard update rule calculating prediction error and a Softmax decision rule. The model is an n-dimensional and k-armed implementation of model 3 from Wilson and Collins (2019), which largely corresponds to the model presented by Suttong & Barto (2021) in Chapter 14.
Parameters: |
|
---|
Returns: |
|
---|
Examples:
>>> import numpy
>>> import pandas
>>> from cpm.applications import RLRW
>>> from cpm.datasets import load_bandit_data
>>> twoarm = load_bandit_data()
>>> model = RLRW(data=data, dimensions=4)
>>> model.run()
Notes
The model implementation uses two parameters: - alpha: the learning rate, which determines how much the model updates its values based on the prediction error. - temperature: the inverse temperature, which determines the choice stochasticity -- how sensitive is the model to value differences.
Data must contain the following columns:
- choice: the choice of the participant from the available options, starting from 0.
- arm_n: the stimulus identifier for each option (arms in the bandit task), where n is the option available on a given trial. If there are more than one options, the stimulus identifier should be specified as separate columns of arm_1, arm_2, arm_3, etc. or arm_left, arm_middle, arm_right, etc.
- reward_n: the reward given after each options, where n is the corresponding arm of the bandit available on a given trial. If there are more than one options, the reward should be specified as separate columns of reward_1, reward_2, reward_3, etc.
parameters_settings must be a 2D array, like [[0.5, 0, 1], [5, 1, 10]], where the first list specifies the alpha parameter and the second list specifies the temperature parameter. The first element of each list is the initial value of the parameter, the second element is the lower bound, and the third element is the upper bound. The default settings are 0.5 for alpha with a lower bound of 0 and an upper bound of 1, and 5 for temperature with a lower bound of 1 and an upper bound of 10.
References
Robert C Wilson & Anne GE Collins (2019) Ten simple rules for the computational modeling of behavioral data eLife 8:e49547.
Decision Making
cpm.applications.decision_making.PTSM(data=None, parameters_settings=None, generate=False, utility_curve=None, weighting='tk')
Bases: Wrapper
A simplified version of the Prospect Theory-based Softmax Model (PTSM) for decision-making tasks based on Tversky & Kahneman (1992), similar to the initial publication of the theory in Kahneman & Tversky (1979). It differs from cpm.applications.decision_making.PTSM2025 and cpm.applications.decision_making.PTSM1992 in that it does not use use different utility and weight curvature parameters for gains and losses.
Parameters: |
|
---|
Returns: |
|
---|
Notes
The model parameters are initialized with the following default values if not specified (values are in the form [initial, lower_bound, upper_bound]):
- `alpha`: [1.0, 1e-2, 5.0] (utility curvature for both gains and losses)
- `lambda_loss`: [1.0, 1e-2, 5.0] (loss sensitivity)
- `gamma`: [0.5, 1e-2, 5.0] (curvature for the weighting function for both gains and losses)
- `temperature`: [5.0, 1e-2, 15.0] (temperature parameter for softmax)
The priors for the parameters are set as follows:
- `alpha`: truncated normal with mean 1.0 and standard deviation 1.0.
- `lambda_loss`: truncated normal with mean 2.5 and standard deviation 1.0.
- `gamma`: truncated normal with mean 2.5 and standard deviation 1.0.
- `temperature`: truncated normal with mean 10.0 and standard deviation 5.0.
Model Specification
The model computes the subjective utility of the safe and risky options using a utility function, which can be either a power function or a user-defined utility curve. If a utility curve is not provided, the model uses the following power function with curvature parameter \(\alpha\) after Tversky & Kahneman (1992):
where \(w\) is a weighting function of the probability p of a potential outcome, and \(u\) is the utility function of the magnitude x of a potential outcome. The choice options is denoted with \(o\). The utility function \(u\) is defined as a power function for both gains and losses. It is implemented after Equation 5 in Tversky & Kahneman (1992):
where \(\alpha\) is the utility curvature parameter for both gains and losses, and \(\lambda\) is the loss aversion parameter. The weighting function is implemented after Equation 6 in Tversky & Kahneman (1992):
where gamma
, denoted via \(\gamma\), is the discriminability parameter of the weighting function for both gains and losses.
The model then applies the softmax function to compute the choice probabilities:
Model output
The model outputs the following trial-level information:
- `policy`: the softmax probabilities for each option.
- `dependent`: the probability of choosing the risky option.
- `observed`: the observed (participant's) choice (0 for safe, 1 for risky).
- `chosen`: the chosen option based on the softmax probabilities.
- `is_optimal`: whether the chosen option is optimal (1 if chosen option is objectively better, 0 otherwise).
- `objective_best`: the objectively better option (1 for risky, 0 for safe) determined by the objective evidence for each.
- `ev_safe`: the expected value of the safe option.
- `ev_risk`: the expected value of the risky option.
- `u_safe`: the utility of the safe option.
- `u_risk`: the utility of the risky option.
See Also
cpm.models.decision.Softmax : for mapping utilities to choice probabilities.
cpm.models.activation.ProspectUtility : for the Prospect Utility class that computes subjective utilities and weighted probabilities.
References
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–291.
Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and uncertainty, 5, 297-323.
cpm.applications.decision_making.PTSM1992(data=None, parameters_settings=None, utility_curve=None, weighting='tk')
Bases: Wrapper
A Prospect Theory-based Softmax Model (PTSM) for decision-making tasks based on Tversky & Kahneman (1992), similar to the initial publication of the theory in Kahneman & Tversky (1979). It computes expected utility by combining transformed magnitudes and weighted probabilities, suitable for safe–risky decision paradigms.
The model computes objective EV internally (ev_safe vs. ev_risk) and outputs trial-level information (including whether the chosen option is optimal).
Additionally, the model accepts a "weighting" argument that determines which probability weighting function to use when computing the subjective weighting of risky probabilities.
Parameters: |
|
---|
Returns: |
|
---|
Notes
The model parameters are initialized with the following default values if not specified (values are in the form [initial, lower_bound, upper_bound]):
- `alpha`: [1.0, 0, 5.0] (utility curvature for gains)
- `beta`: [1.0, 0, 5.0] (utility curvature for losses)
- `lambda_loss`: [1.0, 0, 5.0] (loss sensitivity)
- `gamma`: [0.5, 0.001, 5.0] (curvature for gains)
- `delta`: [0.5, 0.001, 5.0] (curvature for losses)
- `temperature`: [5.0, 0.001, 20.0] (temperature parameter for softmax)
The priors for the parameters are set as follows:
- `alpha`: truncated normal with mean 2.5 and standard deviation 1.0.
- `beta`: truncated normal with mean 2.5 and standard deviation 1.0.
- `lambda_loss`: truncated normal with mean 2.5 and standard deviation 1.0.
- `gamma`: truncated normal with mean 2.5 and standard deviation 1.0.
- `delta`: truncated normal with mean 0 and standard deviation 1.0.
- `temperature`: truncated normal with mean 10 and standard deviation 2.5.
Model Specification
The model computes the subjective utility of the safe and risky options using a utility function, which can be either a power function or a user-defined utility curve. If a utility curve is not provided, the model uses the following power function with curvature parameter \(\alpha\) after Tversky & Kahneman (1992):
where \(w\) is a weighting function of the probability p of a potential outcome, and \(u\) is the utility function of the magnitude x of a potential outcome. The choice options is denoted with \(o\). The utility function \(u\) is defined as a power function for both gains and losses. It is implemented after Equation 5 in Tversky & Kahneman (1992):
where \(\alpha\) is the utility curvature parameter for gains, and \(\beta\), is the curvature parameter for losses, \(\lambda\) is the loss aversion parameter. The weighting function is implemented after Equation 6 in Tversky & Kahneman (1992):
where gamma
, denoted via \(\gamma\), is the discriminability parameter of the weighting function for gains, and with delta
, denoted via \(\delta\), is the discriminability parameter of the weighting function for losses.
The model then applies the softmax function to compute the choice probabilities:
Model output
The model outputs the following trial-level information:
- `policy`: the softmax probabilities for each option.
- `dependent`: the probability of choosing the risky option.
- `observed`: the observed (participant's) choice (0 for safe, 1 for risky).
- `chosen`: the chosen option based on the softmax probabilities.
- `is_optimal`: whether the chosen option is optimal (1 if chosen option is objectively better, 0 otherwise).
- `objective_best`: the objectively better option (1 for risky, 0 for safe) determined by the objective evidence for each.
- `ev_safe`: the expected value of the safe option.
- `ev_risk`: the expected value of the risky option.
- `u_safe`: the utility of the safe option.
- `u_risk`: the utility of the risky option.
See Also
cpm.models.decision.Softmax : for mapping utilities to choice probabilities.
cpm.models.activation.ProspectUtility : for the Prospect Utility class that computes subjective utilities and weighted probabilities.
References
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–291.
Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and uncertainty, 5, 297-323.
cpm.applications.decision_making.PTSM2025(data=None, parameters_settings=None, utility_curve=None, variant='alpha')
Bases: Wrapper
An Prospect Theory Softmax Model loosely based on Chew et al. (2019), incorporating a bias term (phi_gain / phi_loss) in the softmax function for risks and gains, a utility curvature parameter (alpha) for non-linear utility transformations, and an ambiguity aversion parameter (eta).
Parameters: |
|
---|
Returns: |
|
---|
Notes
The model parameters are initialized with the following default values if not specified (values are in the form [initial, lower_bound, upper_bound]):
- `eta`: [0.0, -0.49, 0.49] (ambiguity aversion)
- `phi_gain`: [0.0, -10.0, 10.0] (gain sensitivity)
- `phi_loss`: [0.0, -10.0, 10.0] (loss sensitivity)
- `temperature`: [5.0, 0.001, 20.0] (temperature parameter)
- `alpha`: [1.0, 0.001, 5.0] (utility curvature parameter)
The priors for the parameters are set as follows:
- `eta`: truncated normal with mean 0.0 and standard deviation 0.25.
- `phi_gain`: truncated normal with mean 0.0 and standard deviation 2.5.
- `phi_loss`: truncated normal with mean 0.0 and standard deviation 2.5.
- `temperature`: truncated normal with mean 10.0 and standard deviation 5.
- `alpha`: truncated normal with mean 1.0 and standard deviation 1.
Model Description
In what follows, we briefly describe the model's operations. First, the model calculates the subjective probability of the risky option, adjusting for ambiguity aversion using the parameter eta
, denoted with \(\eta\). The subjective probability is computed as:
where \(p_{risky}\) is the original probability of the risky choice and \(ambiguity\) is the ambiguity associated with the risky option, either 0 for non-ambiguous or 1 for ambiguous cases.
The utility of the safe and risky options is then computed using a utility function, which can be either a power function or a user-defined utility curve.
If a utility curve is not provided, the model uses the following power function with curvature parameter alpha
, denoted with \(\alpha\):
The model then applies loss aversion and gain sensitivity adjustments based on the sign of the risky choice magnitude. Here, the gain sensitivity phi_gain
, denoted as \(\phi_{gain}\), is applied when the risky choice is positive, and the loss sensitivity phi_loss
, denoted as \(\phi_{loss}\), is applied when the risky choice is negative. The adjusted probability of choosing the risky option, \(p(A_{risky})\), is computed using a softmax function:
where denoted with \(\beta\) is the temperature
parameter, \(u_{risky}\) is the utility of the risky option, \(u_{safe}\) is the utility of the safe option, and \(\phi_{t}\) is either \(\phi_{gain}\) or \(\phi_{loss}\) depending on the sign of the risky choice magnitude. Note that in Chew et al. (2019), the model only has a gambling bias term for the gain loss, that is then added to the difference between the safe and risky utilities, and only then transformed to a probability via a sigmoid function.
Furthermore, the model generates a response based on the computed probabilities, where the choice is sampled from a Bernoulli distribution with the computed policy as the probability of choosing the risky option.
Model Output
For each trial, the model outputs the following variables:
- `policy`: The computed probabilities for the risky options.
- `model_choice`: The model's predicted choice (0 for safe, 1 for risky).
- `real_choice`: The observed (participant's) choice from the data.
- `u_safe`: The utility of the safe option.
- `u_risk`: The utility of the risky option.
- `dependent`: The computed probability of a risky choice according to the model, which can be used for further analysis or fitting
References
Chew, B., Hauser, T. U., Papoutsi, M., Magerkurth, J., Dolan, R. J., & Rutledge, R. B. (2019). Endogenous fluctuations in the dopaminergic midbrain drive behavioral choice variability. Proceedings of the National Academy of Sciences, 116(37), 18732–18737. https://doi.org/10.1073/pnas.1900872116
Metacognition
cpm.applications.signal_detection.EstimatorMetaD(data=None, bins=None, cl=None, parallel=False, libraries=['numpy', 'pandas'], prior=False, display=False, ppt_identifier=None, ignore_invalid=False, **kwargs)
Class to estimate metacognitive parameters using the meta-d model proposed by Maniscalco and Lau (2012).
Parameters: |
|
---|
Returns: |
|
---|
Note
The data DataFrame should contain the following columns:
- 'participant': Identifier for each participant.
- 'signal' (integer): Stimulus presented to the participant, for example, 0 for S1 and 1 for S2.
- 'response' (integer): Participant's response to the stimulus.
- 'confidence' (integer, float): Participant's confidence rating for their response.
- 'accuracy' (integer): Accuracy of the participant's response. 0 = incorrect, 1 = correct.
export()
Exports the optimization results and fitted parameters as a pandas.DataFrame
.
Returns: |
|
---|
optimise()
Estimates the metacognitive parameters using the meta-d model.
Here, we use a Trust-Region Constrained Optimization algorithm (Conn et al., 2000) to fit the model to the data.
We use the trust-constr
method from scipy.optimize.minimize
to perform the optimization, and minimise the negative log-likelihood of the data given the model parameters.
The optimization is performed for each participant in the data.
Notes
If you want to tune the behaviour of the optimization, you can do so by passing additional keyword arguments to the class constructor. See the scipy.optimize.minimize
documentation for more details on the available options. By default, the optimization will use the trust-constr
method with the default options specified in the scipy.optimize.minimize
documentation.
References
Conn, A. R., Gould, N. I. M., & Toint, P. L. (2000). Trust Region Methods. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898719857