Build your model
In cpm
the way you build models is by writing a function that specifies the transformation from your independent variables to your dependent variables for a single trial.
This means that the code you need to write is rather negligible.
It makes it easy to focus on the things that matter the most - specifying the model!
The function
Let us quickly write some function that specifies our model. On each trial, it will have to know the stimuli that were presented and the feedback that it gets for each action. It will also need to know the parameters that it can use to make decisions.
For now, let us assume that we have a trial that looks like this:
## create a single trial as a dictionary
trial = {
"trials": np.array([1, 2]),
"feedback": np.array([1, 0]),
}
Here we hav all the information we need for a given trial - all input to the model other than the parameters and its initial state. In reinforcement learning, this is what we call the state of the environment. The model will use this information to make a decision and update its internal state.
One advantage of cpm
is that most components that you will need to build sequential decision-making models are already implemented. This means that you can focus on the model itself, rather than the implementation details. Here we will use the learning
and decision
modules to build a simple model based on the Rescorla-Wagner update rule and a Greedy-decision rule.
from cpm.models import learning, decision, utils
import copy
def model(parameters, trial):
# pull out the parameters
alpha = parameters.alpha
temperature = parameters.temperature
values = np.array(parameters.values)
# pull out the trial information
stimulus = trial.get('trials')
feedback = trial.get("feedback")
mute = np.zeros(4) # mute learning for all cues not presented
# activate the value of each available action
# here there are two possible actions, that can take up on 4 different values
# so we subset the values to only include the ones that are activated...
# ...according to which stimuli was presented
activation = values[stimulus - 1]
# convert the activations to a 2x1 matrix, where rows are actions/outcomes
activations = activation.reshape(2, 1)
# calculate a policy based on the activations
response = decision.Softmax(activations=activations, temperature=epsilon)
response.compute() # compute the policy
choice = response.choice() # get the choice based on the policy
reward = feedback[choice] # get the reward of the chosen action
# update the value of the chosen action
mute[stimulus[choice] - 1] = 1 # unmute the learning for the chosen action
teacher = np.array([reward])
update = learning.SeparableRule(weights=values, feedback=teacher, input=mute, alpha=alpha)
update.compute()
values += update.weights.flatten()
## compile output
output = {
"policy" : response.policies, # policies
"response" : choice, # choice based on the policy
"reward" : reward, # reward of the chosen action
"values" : values, # updated values
"change" : update.weights, # change in the values
"activation" : activations.flatten(), # activation of the values
"dependent" : response.policies, # dependent variable
}
return output
If you want to learn more about this model, you can check out the tutorial on the model.
The immediately obvious thing is that the function takes two arguments:
- parameters : the freely-varying parameters of the model and its initial state of the model. It must be a
cpm.Parameters
class. We already covered it in the tutorial on parameters. - trial : this essentially includes all the information that we will need to do the computations we specified in the model. This is pulled from the data we covered in the data format section.
The function should return a dictionary that includes all the information that you want to save from the model. This can include the dependent variables, the policy, the response, the reward, the values, and the change in the values. You can also include any other information that you might need for analysis. For example, if you want to update the values
in the parameters
object, you can simply include it in the output, and it will update it in the parameters. Similarly if you want to know any variables on a trial-by-trial level that is not part of the parameters
object, cpm
will save it as long as it is part of the model function output.
Applying it to data
So, where do you loop through the trials? cpm
has built-in tools that frees you up from writing complicated for loops to apply the model to each trial. cpm
also compiles the data you need into a neat pandas.DataFrame
. We will use the cpm.generators.Wrapper
for this. Wrapper only does the simulation for one participant, so the data we need to input is a dictionary as opposed to a list of dictionaries.
from cpm.generators import Wrapper
decision_model = Wrapper(model = model, parameters = parameters, data = data)
If you want to run the model on the data, simply use the run()
method, after which you can export()
the simulation details as a pandas.DataFrame
:
decision_model.run()
decision_model_output = decision_model.export()
Simulating participants
Now we usually have many participants, each with different trial order and distribution of rewards. What we can do here is to use the cpm.generators.Simulator
to apply the model to all participants' trial orders.
from cpm.generators import Simulator
simulator = Simulator(wrapper = decision_model, parameters = parameters, data = complete_data)
simulator.run()
simulation = simulator.export()
This works largely as Wrapper
does, apart from two caveats:
- The data we provide for
Simulator
is a list of dictionaries, see data format section. - The parameters required for the simulation can be a Parameters object or a list of dictionaries whose length is equal to data. If it is a Parameters object, Simulator will use the same parameters for all simulations. It is a list of dictionaries, it will use match the parameters with data, so that for example parameters[6] will be used for the simulation of data[6].