import cpm
version = cpm.__version__
## if cpm is below 0.18.5, throw error
if version < "0.18.5":
raise Exception("cpm version is too old, please update to 0.18.5 or later")
else:
print(f"cpm version is up to date\nversion: {version}\nauthor: {cpm.__author__}")
cpm version is up to date version: 0.19.0 author: Lenard Dome
Example 2: Reinforcement learning with a two-armed bandit.¶
This is an intermediate level tutorial, where we assume that you are familiar with the basics of reinforcement learning and the two-armed bandit problem. In this example, we will apply and fit a reinforcement learning model to a two-armed bandit problem. The model will be consist of a $\epsilon$-greedy policy and a prediction error term.
In the following tutorial, we will use prettyformatter
to print some nice and organised output in the Jupyter Notebook.
In your python script, you do not need to use prettyformatter
.
Two-armed bandits¶
Two-armed bandit is a two-alternative forced-choice reinforcement learning problem, part of the bigger set of multi-armed bandit problems. In these problems, the person is faced with multiple choices, each with a different degree of reward. The goal of the person is to learn which choice is the best and to maximize the reward over time. In this example, we will consider a two-armed bandit problem, where the person is faced with two choices (select an item on the left or the item on the right), each with a different reward. There are 4 different items that can appear in combinations of two, and the reward for each item varies. For example, if item 1 has a chance of 0.7 of giving a reward, then you can expect to receive a reward 70% of the time when you select item 1. The problem is that the underlying structure of the item-reward mapping is unknown.
Import the data¶
First, we will import the data and get it ready for the toolbox.
## toolbox and libraries
import cpm
import ipyparallel as ipp ## for parallel computing with ipython (specific for Jupyter Notebook)
import pandas as pd
import numpy
import matplotlib.cm as cmx
import matplotlib.pyplot as plt
from prettyformatter import pprint
experiment = pd.read_csv('bandit_small.csv', header=0)
experiment.head(10)
index | ppt | trial | stimulus_left | stimulus_right | reward_left | reward_right | responses | feedback | |
---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 1 | 2 | 4 | 1 | 1 | 1 | 1 |
1 | 1 | 1 | 2 | 2 | 4 | 1 | 1 | 2 | 1 |
2 | 2 | 1 | 3 | 2 | 1 | 0 | 0 | 1 | 0 |
3 | 3 | 1 | 4 | 2 | 1 | 0 | 0 | 2 | 0 |
4 | 4 | 1 | 5 | 1 | 4 | 0 | 0 | 1 | 0 |
5 | 5 | 1 | 6 | 4 | 3 | 1 | 1 | 1 | 1 |
6 | 6 | 1 | 7 | 3 | 1 | 1 | 0 | 1 | 1 |
7 | 7 | 1 | 8 | 3 | 2 | 0 | 0 | 2 | 0 |
8 | 8 | 1 | 9 | 1 | 4 | 0 | 0 | 1 | 0 |
9 | 9 | 1 | 10 | 2 | 3 | 0 | 0 | 1 | 0 |
Let us look at what each column represents:
index
: variable to identify each row - this variable is clutter.left
: the stimulus presented on the left side.right
: the stimulus presented on the right side.reward_left
: the reward received when the left stimulus is selected.reward_right
: the reward received when the right stimulus is selected.ppt
: the participant number.responses
: the response of the participant (1 for right, 0 for left).
Here, we will quickly add a column observed
to specify for the toolbox, what is the dependent variable we actually want to predict.
experiment['observed'] = experiment['responses'] - 1
The model¶
Let us quickly go through the model we will use.
Each stimulus has an associated value, which is the expected reward that can be obtained from selecting that stimulus.
Let $Q(a)$ be the estimated value of action $a$. On each trial, $t$, there are two stimuli present, so that $Q(a)$ could be $Q(\text{left})$ or $Q(\text{right})$, where the corresponding Q-values are derived from the associated value of the stimuli present on left or right. More formally, we can say that the expected value of the action $a$ selected at time $t$ is given by:
$$ Q_t(a) = \mathbb{E}[R_t | A_t = a] $$
where $R_t$ is the reward received at time $t$, and $A_t$ is the action selected at time $t$.
On each trial $t$, the Softmax policy will select an action (left or right) based on the following policy:
$$ P(a_t) = \frac{e^{Q_{a,t} \epsilon}}{\sum_{i = 1}^{k}{e^{Q_{i,t} \epsilon}}} $$
where $\epsilon$ is the inverse temperature parameter, and $Q_{a,t}$ is the estimated value of action $a$ at time $t$. $k$ is the number of actions available, and in our case, $k = 2$. So, on each trial, the model will select an action (left or right) based on the following policy:
$$ A_t = \begin{cases} \text{left} & \text{with probability } P(a_{\text{left}}) \\ \text{right} & \text{with probability } P(a_{\text{right}}) \end{cases} $$
where $A_t$ is the action selected at time $t$, and $Q_t(a)$ is the estimated value of action $a$ at time $t$.
The model will calculate the prediction error according to the following learning rule:
$$ \Delta Q_t(A_t) = \alpha \times \Big[ R_t - Q_t(A_t) \Big] $$
where $\alpha$ is the learning rate, and $R_t$ is the reward received at time $t$. Note that this rule is often reported as the Rescorla-Wagner learning rule, where for each action/outcome, the prediction error is the difference between the reward received and the sum of all values present on each trial for that action. In our case, we only have one value for each action, so the Rescorla-Wagner learning rule is reduced to the Bush and Mosteller separable error term. Q-values are then updated as follows:
$$ Q_{t+1}(A_t) = Q_t(A_t) + \Delta Q_t(A_t) $$
Building the model¶
In order to use the toolbox, you will have to specify the computations for one single trial.
Rest assured, you do not have to build the model from scratch.
We have fully-fledged models in cpm.applications
that you can use, but we also have all the building blocks in cpm.components
that you can use to build your own model.
Parameters¶
Now, before we build the model, we also have to talk about the cpm.Parameter
class.
This class is used to specify the parameters of the model, including various bounds and priors.
Let us specify the parameters for the model.
from cpm.generators import Parameters, Value
parameters = Parameters(
# freely varying parameters are indicated by specifying priors
alpha=Value(
value=0.5,
lower=1e-10,
upper=1,
prior="truncated_normal",
args={"mean": 0.5, "sd": 0.25},
),
temperature=Value(
value=1,
lower=0,
upper=10,
prior="truncated_normal",
args={"mean": 5, "sd": 2.5},
),
# everything without a prior is part of the initial state of the model or fixed constructs (e.g. exemplars in general-context models of categorization)
values = np.array([0.25, 0.25, 0.25, 0.25]))
You can immediately see that we have two types of variables in the Parameters
class: one with priors and one without.
The ones with defined priors are the ones we will estimate later on, they are freely varying parameters (alpha and temperature).
All freely varying parameters have to be defined with priors.
Parameters without priors are part of the initial state of the model, such as starting Q-values for each stimulus. Here we initialized them at 0.25.
That was enough preparation, let us build a model. The model will take in the parameters class and the data, and it will output all variables of interest we specify.
Note: The model will take in a variable (as its second argument) that is in our case a row in the
pandas.DataFrame
, behaving as apandas.Series
.
@ipp.require('numpy')
def model(parameters, trial):
# pull out the parameters
alpha = parameters.alpha
temperature = parameters.temperature
values = numpy.array(parameters.values) # copy essentially prevents us from accidentally overwriting the original values
# pull out the trial information
stimulus = numpy.array([trial.stimulus_left, trial.stimulus_right]).astype(int)
feedback = numpy.array([trial.reward_left, trial.reward_right])
human_choice = trial.responses.astype(int) - 1 # convert to 0-indexing
mute = numpy.zeros(4) # mute learning for all cues not presented
# activate the value of each available action
# here there are two possible actions, that can take up on 4 different values
# so we subset the values to only include the ones that are activated...
# ...according to which stimuli was presented
activation = values[stimulus - 1]
# convert the activations to a 2x1 matrix, where rows are actions/outcomes
activations = activation.reshape(2, 1)
# calculate a policy based on the activations
response = cpm.models.decision.Softmax(activations=activations, temperature=temperature)
response.compute() # compute the policy
if numpy.isnan(response.policies).any():
# if the policy is NaN for a given action, then we need to set it to 1
print(f"NaN in policy with parameters: {alpha.value}, {epsilon.value}\n")
print(response.policies)
response.policies[numpy.isnan(response.policies)] = 1
model_choice = response.choice() # get the choice based on the policy
reward = feedback[human_choice] # get the reward of the chosen action
# update the value of the chosen action
mute[stimulus[human_choice] - 1] = 1 # unmute the learning for the chosen action
teacher = numpy.array([reward])
update = cpm.models.learning.SeparableRule(weights=values, feedback=teacher, input=mute, alpha=alpha)
update.compute()
values += update.weights.flatten()
## compile output
output = {
"trial" : trial.trial.astype(int), # trial number
"policy" : response.policies, # policies
"response" : model_choice, # choice based on the policy
"reward" : reward, # reward of the chosen action
"values" : values, # updated values
"change" : update.weights, # change in the values
"activation" : activations.flatten(), # activation of the values
"dependent" : numpy.array([response.policies[1]]), # dependent variable
}
return output
The important bit here is that everything we want to store must be specified in the model output. There are also variables we want to update, such as the values, which again, we have to include in the model output. They will be used by other methods in the toolbox to identify key variables.
The important part is the dependent
key, which outputs the main thing we fit the model to later.
Fitting to data¶
First we wrap our model specification into a Wrapper
, that loops it through the data
from cpm.generators import Simulator, Wrapper
wrapper = Wrapper(model=model, parameters=parameters, data=experiment[experiment.ppt == 1])
wrapper.run()
Now we have ran the model on a single session, let's investagete the output.
cpm
is great for this because everything that you might need to look at is stored inside the Wrapper
object.
## print last trial
pd.Series(wrapper.simulation[len(wrapper.simulation) - 1])
trial 96 policy [0.47040213577587603, 0.529597864224124] response 0 reward 0 values [0.7519550323486328, 0.31280517578125, 0.46469... change [[-0.0, -0.31280517578125, -0.0, -0.0]] activation [0.6256103515625, 0.7441403864537506] dependent [0.529597864224124] dtype: object
Maybe a better way to do this is to export the simulation details as a pandas.DataFrame
:
output = wrapper.export() # export sim data into pandas dataframe
output.tail()
trial_0 | policy_0 | policy_1 | response_0 | reward_0 | values_0 | values_1 | values_2 | values_3 | change_0 | change_1 | change_2 | change_3 | activation_0 | activation_1 | dependent_0 | ppt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
91 | 92 | 0.622632 | 0.377368 | 0 | 1 | 0.751955 | 0.625610 | 0.464693 | 0.953123 | 0.0 | 0.374390 | 0.0 | 0.000000 | 0.751955 | 0.251221 | 0.377368 | 0 |
92 | 93 | 0.418846 | 0.581154 | 0 | 1 | 0.751955 | 0.625610 | 0.464693 | 0.976562 | 0.0 | 0.000000 | 0.0 | 0.023438 | 0.625610 | 0.953123 | 0.581154 | 0 |
93 | 94 | 0.444083 | 0.555917 | 0 | 0 | 0.751955 | 0.625610 | 0.464693 | 0.488281 | -0.0 | -0.000000 | -0.0 | -0.488281 | 0.751955 | 0.976562 | 0.555917 | 0 |
94 | 95 | 0.494103 | 0.505897 | 0 | 1 | 0.751955 | 0.625610 | 0.464693 | 0.744140 | 0.0 | 0.000000 | 0.0 | 0.255860 | 0.464693 | 0.488281 | 0.505897 | 0 |
95 | 96 | 0.470402 | 0.529598 | 0 | 0 | 0.751955 | 0.312805 | 0.464693 | 0.744140 | -0.0 | -0.312805 | -0.0 | -0.000000 | 0.625610 | 0.744140 | 0.529598 | 0 |
Fitting the model¶
Now that we have the model, we can fit it to the data.
from cpm.optimisation import minimise, FminBound
fit = FminBound(
model=wrapper, # Wrapper class with the model we specified from before
data=experiment.groupby('ppt'), # the data as a list of dictionaries
minimisation=minimise.LogLikelihood.bernoulli,
parallel=True,
libraries=["numpy", "cpm", "pandas"],
cl=5,
prior=False,
ppt_identifier="ppt",
display=False,
number_of_starts=5,
# everything below is optional and passed directly to the scipy implementation of the optimiser
approx_grad=True
)
fit.optimise()
Starting optimization 1/5 from [0.1623115 1.79551212] Detected parallel execution method: ipyparallel Starting 5 engines with <class 'ipyparallel.cluster.launcher.LocalEngineSetLauncher'>
0%| | 0/5 [00:00<?, ?engine/s]
Starting optimization 2/5 from [0.97500821 5.14067959] Detected parallel execution method: ipyparallel Starting 5 engines with <class 'ipyparallel.cluster.launcher.LocalEngineSetLauncher'>
0%| | 0/5 [00:00<?, ?engine/s]
Starting optimization 3/5 from [0.29881021 1.89817652] Detected parallel execution method: ipyparallel Starting 5 engines with <class 'ipyparallel.cluster.launcher.LocalEngineSetLauncher'>
0%| | 0/5 [00:00<?, ?engine/s]
Starting optimization 4/5 from [0.79814141 6.17556695] Detected parallel execution method: ipyparallel Starting 5 engines with <class 'ipyparallel.cluster.launcher.LocalEngineSetLauncher'>
0%| | 0/5 [00:00<?, ?engine/s]
Starting optimization 5/5 from [0.68204044 7.04200288] Detected parallel execution method: ipyparallel Starting 5 engines with <class 'ipyparallel.cluster.launcher.LocalEngineSetLauncher'>
0%| | 0/5 [00:00<?, ?engine/s]
optimisation = fit.export()
optimisation.to_csv('bandit_optimised.csv')
optimisation
x_0 | x_1 | grad_0 | grad_1 | task_0 | funcalls_0 | nit_0 | warnflag_0 | hessian_0 | hessian_1 | hessian_2 | hessian_3 | ppt_0 | fun_0 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.076392 | 4.943615 | 1.421085e-06 | -7.105427e-07 | CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL | 42 | 11 | 0 | 1038.688067 | 27.070542 | 27.070542 | 0.890521 | 1 | 48.885637 |
1 | 0.056893 | 6.042443 | -1.705303e-05 | 0.000000e+00 | CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH | 57 | 14 | 0 | 2153.664420 | 35.756011 | 35.756011 | 0.629376 | 2 | 47.533037 |
2 | 0.356632 | 1.065564 | 0.000000e+00 | 0.000000e+00 | CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL | 30 | 7 | 0 | 16.604957 | 4.121969 | 4.121969 | 3.233523 | 3 | 64.615781 |
3 | 0.446523 | 1.775725 | 1.989520e-05 | -8.526513e-06 | CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH | 33 | 9 | 0 | 45.754371 | 4.659366 | 4.659366 | 3.575179 | 4 | 59.814704 |
4 | 0.090663 | 10.000000 | 7.105427e-07 | -8.154309e-01 | CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL | 42 | 11 | 0 | 1792.445878 | 9.953955 | 9.953955 | 0.262459 | 5 | 23.110100 |
5 | 0.241919 | 1.545130 | 1.492140e-05 | 1.350031e-05 | CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH | 36 | 10 | 0 | 29.419178 | 6.800688 | 6.800688 | 2.729972 | 6 | 62.970022 |
6 | 0.127897 | 4.204480 | 4.973799e-06 | 0.000000e+00 | CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL | 39 | 10 | 0 | 266.971832 | 13.598897 | 13.598897 | 1.231781 | 7 | 49.503374 |
7 | 0.337155 | 2.836729 | 2.842171e-06 | 0.000000e+00 | CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL | 36 | 10 | 0 | 62.884546 | 5.283081 | 5.283081 | 2.532645 | 8 | 51.435683 |
8 | 0.024140 | 10.000000 | 1.421085e-06 | -1.847553e-02 | CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL | 81 | 18 | 0 | 15382.111265 | 45.921130 | 45.921130 | 0.131931 | 9 | 57.344351 |
9 | 0.288536 | 1.416182 | -2.131628e-06 | -7.105427e-07 | CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL | 27 | 7 | 0 | 47.786772 | 4.427434 | 4.427434 | 2.556318 | 10 | 63.766870 |
10 | 0.090579 | 8.406898 | 2.842171e-06 | -7.105427e-07 | CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL | 45 | 11 | 0 | 649.205646 | 13.941505 | 13.941505 | 0.389036 | 11 | 41.119225 |
11 | 0.306509 | 2.612242 | 1.421085e-06 | 0.000000e+00 | CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL | 30 | 8 | 0 | 23.231508 | 6.180601 | 6.180601 | 2.806195 | 12 | 53.728642 |
12 | 0.158429 | 1.105030 | -1.605827e-04 | 5.684342e-06 | CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH | 24 | 6 | 0 | 46.100891 | 6.211236 | 6.211236 | 2.733968 | 13 | 64.814608 |
13 | 0.081189 | 9.373663 | 1.421085e-05 | 1.421085e-06 | CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH | 48 | 11 | 0 | 1006.719169 | 12.753828 | 12.753828 | 0.235702 | 14 | 35.623085 |
14 | 0.026190 | 6.001012 | -3.510081e-04 | -2.131628e-06 | CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH | 75 | 20 | 0 | 8924.348496 | 52.302440 | 52.302440 | 0.290672 | 15 | 60.635315 |
15 | 0.945101 | 1.859170 | 1.421085e-06 | 7.105427e-07 | CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL | 30 | 8 | 0 | 29.346566 | 5.983237 | 5.983237 | 4.567092 | 16 | 55.400534 |
16 | 0.244680 | 3.310229 | -7.105427e-07 | 0.000000e+00 | CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL | 36 | 10 | 0 | 146.869823 | 9.183604 | 9.183604 | 1.263730 | 17 | 57.858745 |
17 | 0.691118 | 1.519410 | -1.421085e-06 | -1.421085e-06 | CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL | 27 | 7 | 0 | 21.069774 | 5.336049 | 5.336049 | 5.016023 | 18 | 59.680878 |
18 | 0.056792 | 10.000000 | 1.421085e-06 | -4.106510e-02 | CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL | 48 | 11 | 0 | 2881.848481 | 24.543583 | 24.543583 | 0.256856 | 19 | 39.271600 |
19 | 0.451195 | 0.780055 | 0.000000e+00 | -1.563194e-05 | CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH | 42 | 10 | 0 | 6.905391 | 3.186109 | 3.186109 | 5.325029 | 910 | 64.864891 |
20 | 0.035755 | 10.000000 | -2.984279e-05 | -9.245155e-02 | CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH | 54 | 12 | 0 | 6827.491311 | 31.094242 | 31.094242 | 0.169086 | 911 | 50.384075 |
21 | 0.042693 | 4.714004 | -2.486900e-05 | 7.105427e-07 | CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH | 66 | 18 | 0 | 2836.535085 | 41.385033 | 41.385033 | 0.606489 | 912 | 58.188428 |
22 | 0.279705 | 2.807436 | -7.105427e-06 | -1.421085e-06 | CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL | 27 | 7 | 0 | 85.751498 | 5.909514 | 5.909514 | 2.089465 | 913 | 56.070448 |
23 | 0.025635 | 9.613464 | -5.618990e+00 | -6.518092e-02 | CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH | 69 | 13 | 0 | 14625.566698 | 48.199061 | 48.199061 | 0.163541 | 914 | 56.315035 |
import matplotlib.pyplot as plt
# Create a figure and a set of subplots
fig, axs = plt.subplots(1, 3, figsize=(16, 3)) # 1 row, 2 columns, and optional figure size
# Plot on the first subplot with a specific color
optimisation.x_0.plot(kind="hist", label="Learning Rate", color="blue", ax=axs[0])
axs[0].set_title('Learning Rate')
# Plot on the second subplot with a different color
optimisation.x_1.plot(kind="hist", label="Temperature", color="red", ax=axs[1])
axs[1].set_title('Temperature')
optimisation.fun_0.plot(kind="hist", label="Negative Log Likelihood", color="green", ax=axs[2])
axs[2].set_title('Log Likelihood')
# Display the plot
plt.show()
Simulation¶
We can instantly simulate model output from these results by using the cpm.generator.Simulator
class.
from cpm.generators import Simulator
simulator = Simulator(wrapper=wrapper,
parameters=fit.parameters, # parameters from the optimiser object is directly accessible for simulation
data=experiment.groupby("ppt")) # data grouped by participants
simulator.run()
simulation_data = simulator.export()
simulation_data.to_csv('bandit_simulation.csv')
simulation_data.tail()
trial_0 | policy_0 | policy_1 | response_0 | reward_0 | values_0 | values_1 | values_2 | values_3 | change_0 | change_1 | change_2 | change_3 | activation_0 | activation_1 | dependent_0 | ppt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2299 | 92 | 0.261834 | 0.738166 | 1 | 0 | 0.213470 | 0.313047 | 0.425371 | 0.485744 | -0.000000 | -0.008236 | -0.000000 | -0.000000 | 0.213470 | 0.321284 | 0.738166 | 23 |
2300 | 93 | 0.068021 | 0.931979 | 1 | 1 | 0.213470 | 0.313047 | 0.425371 | 0.498927 | 0.000000 | 0.000000 | 0.000000 | 0.013183 | 0.213470 | 0.485744 | 0.931979 | 23 |
2301 | 94 | 0.277423 | 0.722577 | 0 | 1 | 0.233633 | 0.313047 | 0.425371 | 0.498927 | 0.020163 | 0.000000 | 0.000000 | 0.000000 | 0.213470 | 0.313047 | 0.722577 | 23 |
2302 | 95 | 0.669767 | 0.330233 | 0 | 0 | 0.233633 | 0.313047 | 0.414467 | 0.498927 | -0.000000 | -0.000000 | -0.010904 | -0.000000 | 0.498927 | 0.425371 | 0.330233 | 23 |
2303 | 96 | 0.692527 | 0.307473 | 0 | 1 | 0.233633 | 0.313047 | 0.429477 | 0.498927 | 0.000000 | 0.000000 | 0.015010 | 0.000000 | 0.498927 | 0.414467 | 0.307473 | 23 |
left_option = experiment.stimulus_left
right_option = experiment.stimulus_right
model_chosen_left = 1 - simulation_data.response_0
model_chosen_right = simulation_data.response_0
ppt = simulation_data.ppt
trials = simulation_data.trial_0
left = pd.DataFrame({
"stimulus": left_option,
"response": model_chosen_left,
"trial": trials,
"ppt": ppt
})
right = pd.DataFrame({
"stimulus": right_option,
"response": model_chosen_right,
"trial": trials,
"ppt": ppt
})
# # print(stimulus.shape, response.shape, trial.shape)
plot_data = pd.concat([left, right], axis=0).sort_values(by="trial")
grouped = plot_data.groupby(['stimulus', 'trial']).mean().reset_index()
grouped.columns = ['stimulus', 'trial', 'response', 'ppt']
## calculate rolling mean
grouped['response_mean'] = grouped.groupby('stimulus')['response'].transform(lambda x: x.rolling(window=10, min_periods=1).mean())
grouped['response_std'] = grouped.groupby('stimulus')['response'].transform(lambda x: x.rolling(window=10, min_periods=1).std())
## rescale for each trial
grouped
stimulus | trial | response | ppt | response_mean | response_std | |
---|---|---|---|---|---|---|
0 | 1 | 1 | 0.571429 | 13.071429 | 0.571429 | NaN |
1 | 1 | 2 | 0.454545 | 10.181818 | 0.512987 | 0.082649 |
2 | 1 | 3 | 0.333333 | 10.333333 | 0.453102 | 0.119054 |
3 | 1 | 4 | 0.538462 | 9.384615 | 0.474442 | 0.106164 |
4 | 1 | 5 | 0.250000 | 9.250000 | 0.429554 | 0.136117 |
... | ... | ... | ... | ... | ... | ... |
379 | 4 | 92 | 0.833333 | 10.166667 | 0.811681 | 0.103698 |
380 | 4 | 93 | 0.800000 | 10.266667 | 0.825014 | 0.090743 |
381 | 4 | 94 | 0.583333 | 9.333333 | 0.791681 | 0.112055 |
382 | 4 | 95 | 0.750000 | 10.875000 | 0.766681 | 0.085047 |
383 | 4 | 96 | 0.769231 | 12.461538 | 0.770877 | 0.083914 |
384 rows × 6 columns
import matplotlib.pyplot as plt
import seaborn as sns # Seaborn is used for easy color palette access
# Create a figure and a single subplot
fig, ax = plt.subplots(figsize=(10, 6))
# Get a list of unique stimuli to iterate over
unique_stimuli = grouped['stimulus'].unique()
# Generate a color palette with the same number of colors as there are unique stimuli
colors = sns.color_palette("deep", len(unique_stimuli))
# Iterate over each unique stimulus
for stimulus, color in zip(unique_stimuli, colors):
# Filter the grouped DataFrame for the current stimulus
df_subset = grouped[grouped['stimulus'] == stimulus]
# Plot each subset with its corresponding color
ax.plot(df_subset['trial'], df_subset['response_mean'], label=stimulus, color=color)
# Plot the standard deviation as a shaded region around the mean
ax.fill_between(df_subset['trial'], df_subset['response_mean'] - df_subset['response_std'], df_subset['response_mean'] + df_subset['response_std'], color=color, alpha=0.2)
# Scale y axis to be between 0 and 1
ax.set_ylim([0, 1])
# Add a title and labels
ax.set_title('Model Choices Over Time')
ax.set_xlabel('Trial')
ax.set_ylabel('Probability of Choosing Item')
# Add legend to distinguish different lines
ax.legend(title='Stimulus')
# Add caption
plt.figtext(0.75, -0.05, '\n\nSimulated Data from bandit task with best fitting parameters', horizontalalignment='right')
# Show the plot
plt.show()