cpm.applications
cpm.applications.reinforcement_learning.RLRW(data=None, dimensions=2, parameters_settings=None, generate=False)
Bases: Wrapper
The class implements a simple reinforcement learning model for a multi-armed bandit tasks using a standard update rule calculating prediction error and a Softmax decision rule. The model is an n-dimensional and k-armed implementation of model 3 from Wilson and Collins (2019).
Parameters: |
|
---|
Returns: |
|
---|
Examples:
>>> import numpy
>>> import pandas
>>> from cpm.applications import RLRW
>>> from cpm.datasets import load_bandit_data
>>> twoarm = load_bandit_data()
>>> model = RLRW(data=data, dimensions=4)
>>> model.run()
Notes
Data must contain the following columns:
- choice: the choice of the participant from the available options, starting from 0.
- arm_n: the stimulus identifier for each option (arms in the bandit task), where n is the option available on a given trial. If there are more than one options, the stimulus identifier should be specified as separate columns of arm_1, arm_2, arm_3, etc. or arm_left, arm_middle, arm_right, etc.
- reward_n: the reward given after each options, where n is the corresponding arm of the bandit available on a given trial. If there are more than one options, the reward should be specified as separate columns of reward_1, reward_2, reward_3, etc.
parameters_settings must be a 2D array, like [[0.5, 0, 1], [5, 1, 10]], where the first list specifies the alpha parameter and the second list specifies the temperature parameter. The first element of each list is the initial value of the parameter, the second element is the lower bound, and the third element is the upper bound. The default settings are 0.5 for alpha with a lower bound of 0 and an upper bound of 1, and 5 for temperature with a lower bound of 1 and an upper bound of 10.
References
Robert C Wilson Anne GE Collins (2019) Ten simple rules for the computational modeling of behavioral data eLife 8:e49547.
cpm.applications.signal_detection.EstimatorMetaD(data=None, bins=None, cl=None, parallel=False, libraries=['numpy', 'pandas'], prior=False, display=False, ppt_identifier=None, ignore_invalid=False, **kwargs)
Class to estimate metacognitive parameters using the meta-d model proposed by Maniscalco and Lau (2012).
Parameters: |
|
---|
Returns: |
|
---|
Note
The data DataFrame should contain the following columns:
- 'participant': Identifier for each participant.
- 'signal' (integer): Stimulus presented to the participant, for example, 0 for S1 and 1 for S2.
- 'response' (integer): Participant's response to the stimulus.
- 'confidence' (integer, float): Participant's confidence rating for their response.
- 'accuracy' (integer): Accuracy of the participant's response. 0 = incorrect, 1 = correct.
export()
Exports the optimization results and fitted parameters as a pandas.DataFrame
.
Returns: |
|
---|
optimise()
Estimates the metacognitive parameters using the meta-d model.
Here, we use a Trust-Region Constrained Optimization algorithm (Conn et al., 2000) to fit the model to the data.
We use the trust-constr
method from scipy.optimize.minimize
to perform the optimization, and minimise the negative log-likelihood of the data given the model parameters.
The optimization is performed for each participant in the data.
Notes
If you want to tune the behaviour of the optimization, you can do so by passing additional keyword arguments to the class constructor. See the scipy.optimize.minimize
documentation for more details on the available options. By default, the optimization will use the trust-constr
method with the default options specified in the scipy.optimize.minimize
documentation.
References
Conn, A. R., Gould, N. I. M., & Toint, P. L. (2000). Trust Region Methods. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898719857