cpm.applications
cpm.applications.RLRW(data=None, dimensions=2, parameters_settings=None, generate=False)
Bases: Wrapper
The class implements a simple reinforcement learning model for a multi-armed bandit tasks using a standard update rule calculating prediction error and a Softmax decision rule. The model is an n-dimensional and k-armed implementation of model 3 from Wilson and Collins (2019).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
The data to be fit by the model. The data must contain columns for the choice and reward for each dimension. See Notes for more information on what columns should you include. |
None
|
|
dimensions |
The number of distinct stimuli present in the data. |
2
|
|
parameters_settings |
The parameters to be fit by the model. The parameters must be specified as a list of lists, with each list containing the value, lower, and upper bounds of the parameter. See Notes for more information on how to specify parameters and for the default settings. |
None
|
Returns:
Type | Description |
---|---|
cpm.generators.Wrapper
|
A cpm.generators.Wrapper object. |
Examples:
>>> import numpy
>>> import pandas
>>> from cpm.applications import RLRW
>>> from cpm.datasets import load_bandit_data
>>> twoarm = load_bandit_data()
>>> model = RLRW(data=data, dimensions=4)
>>> model.run()
Notes
Data must contain the following columns:
- choice: the choice of the participant from the available options, starting from 0.
- arm_n: the stimulus identifier for each option (arms in the bandit task), where n is the option available on a given trial. If there are more than one options, the stimulus identifier should be specified as separate columns of arm_1, arm_2, arm_3, etc. or arm_left, arm_middle, arm_right, etc.
- reward_n: the reward given after each options, where n is the corresponding arm of the bandit available on a given trial. If there are more than one options, the reward should be specified as separate columns of reward_1, reward_2, reward_3, etc.
parameters_settings must be a 2D array, like [[0.5, 0, 1], [5, 1, 10]], where the first list specifies the alpha parameter and the second list specifies the temperature parameter. The first element of each list is the initial value of the parameter, the second element is the lower bound, and the third element is the upper bound. The default settings are 0.5 for alpha with a lower bound of 0 and an upper bound of 1, and 5 for temperature with a lower bound of 1 and an upper bound of 10.
References
Robert C Wilson Anne GE Collins (2019) Ten simple rules for the computational modeling of behavioral data eLife 8:e49547.