cpm.models

`cpm.models.activation`

`CompetitiveGating(input=None, values=None, salience=None, P=1, **kwargs)`

A competitive attentional gating function, an attentional activation function, that incorporates stimulus salience in addition to the stimulus vector to modulate the weights. It formalises the hypothesis that each stimulus has an underlying salience that competes to captures attentional focus (Paskewitz and Jones, 2020; Kruschke, 2001).

Parameters:

input (array_like, default: None ) –

The input value. The stimulus representation (vector).
values (array_like, default: None ) –

The values. A 2D array of values, where each row represents an outcome and each column represents a single stimulus.
salience (array_like, default: None ) –

The salience value. A 1D array of salience values, where each value represents the salience of a single stimulus.
P (float, default: 1 ) –

The power value, also called attentional normalisation or brutality, which influences the degree of attentional competition.

Examples:

>>> input = np.array([1, 1, 0])
>>> values = np.array([[0.1, 0.9, 0.8], [0.6, 0.2, 0.1]])
>>> salience = np.array([0.1, 0.2, 0.3])
>>> att = CompetitiveGating(input, values, salience, P = 1)
>>> att.compute()
array([[0.03333333, 0.6       , 0.        ],
       [0.2       , 0.13333333, 0.        ]])

References

Kruschke, J. K. (2001). Toward a unified model of attention in associative learning. Journal of Mathematical Psychology, 45(6), 812-863.

Paskewitz, S., & Jones, M. (2020). Dissecting exit. Journal of mathematical psychology, 97, 102371.

`compute()`

Compute the activations mediated by underlying salience.

Returns:	`array_like` – The values updated with the attentional gain and stimulus vector.

`Offset(input=None, offset=0, index=0, **kwargs)`

A class for adding a scalar to one element of an input array. In practice, this can be used to "shift" or "offset" the "value" of one particular stimulus, for example to represent a consistent bias for (or against) that stimulus.

Parameters:	`input` (`array_like`, default: `None` ) – The input value. The stimulus representation (vector). `offset` (`float`, default: `0` ) – The value to be added to one element of the input. `index` (`int`, default: `0` ) – The index of the element of the input vector to which the offset should be added. `kwargs`** (`dict`, default: `{}` ) – Additional keyword arguments.

Examples:

>>> vals = np.array([2.1, 1.1])
>>> offsetter = Offset(input = vals, offset = 1.33, index = 0)
>>> offsetter.compute()
array([3.43, 1.1])

`compute()`

Add the offset to the requested input element.

Returns:	`ndarray` – The stimulus representation (vector) with offset added to the requested element.

`ProspectUtility(magnitudes=None, probabilities=None, alpha=1, beta=None, lambda_loss=1, gamma=1, delta=1, utility_curve=None, weighting='tk', **kwargs)`

A class for computing choice utilities based on prospect theory.

Parameters:

magnitudes (ndarray, default: None ) –

A nested array where the outer dimension represents trials, with each trial containing the potential outcome magnitudes for each option.
probabilities (ndarray, default: None ) –

A nested array (with the same shape as magnitudes) where each entry contains the probability of the corresponding outcome.
alpha (float, default: 1 ) –

The utility curvature parameter, used for both gains and losses.
beta (float, default: None ) –

An optional parameter for the utility function used by Tversky and Kahneman (1992) for losses, defaults to alpha if not provided.
lambda_loss (float, default: 1 ) –

The loss aversion parameter (scaling losses relative to gains).
gamma (float, default: 1 ) –

The probability weighting curvature parameter (for gains with "tk" and both gains and losses with "power").
delta (float, default: 1 ) –

The attractiveness parameter, which determines the elevation of the weighting function in prelec and gw weighting functions. In tk, it is the probability weighting for losses. Defaults to gamma if not provided.
utility_curve (callable, default: None ) –

An optional utility function that takes the magnitude, alpha, and lambda_loss, and returns the utilities of each choice options. The default is a power utility function, see Notes.
weighting (str, default: 'tk' ) –

The definition of the weighting function. Should be one of 'tk', 'pd', or 'gw'. See Notes for details.
**kwargs (dict, default: {} ) –

Additional keyword arguments.

Notes

The different weighting functions currently implemented are:

- `tk`: Tversky & Kahneman (1992).
- `prelec`: Prelec (1998).
- `gw`: Gonzalez & Wu (1999).
- `power` : Simple power function: w(p) = p^gamma

Following Tversky & Kahneman (1992), the expected utility U of a choice option is defined as:

\[ \mathcal{U} = \sum_{i=1}^{n} w(p_i) \cdot u(x_i) \]

where \(w\) is a weighting function of the probability p of a potential outcome, and \(u\) is the utility function of the magnitude x of a potential outcome. The utility function \(u\) is defined as a power function for both gains and losses. It is implemented after Equation 5 in Tversky & Kahneman (1992):

\[ u(x) = \begin{cases} x^\alpha & \text{if } x \geq 0 \\ -\lambda \cdot (-x)^\alpha & \text{if } x < 0 \end{cases} \]

where \(\alpha\) is the utility curvature parameter, and \(\lambda\) is the loss aversion parameter. The weighting function is implemented after Equation 6 in Tversky & Kahneman (1992):

\[ w(p) = \frac{p^\gamma}{(p^\gamma + (1 - p)^\gamma)^{1/\gamma}} \]

where gamma, denoted via \(\gamma\), is the discriminability parameter of the weighting function. In the original formulation of Tversky & Kahneman (1992), losses are weighted with a different parameter, delta, denoted via \(\delta\), that replaces \(\gamma\) in the weighting function for losses. In the current implementation, whether it is a gain or less is determined by the sign of the corresponding magnitude.

Several other definitions of the weighting function have been proposed in the literature, most notably in Prelec (1998) and Gonzalez & Wu (1999). Prelec (equation 3.2, 1998, pp. 503) proposed the following definition:

\[ w(p) = \exp(-\delta \cdot (-\log(p))^\gamma) \]

where delta, \(\delta\), and gamma, \(\gamma\), are the attractiveness and discriminability parameters of the weighting function. Gonzalez & Wu (equation 3, 1999, pp. 139) proposed the following definition:

\[ w(p) = \frac{\delta \cdot p^\gamma}{\delta \cdot p^\gamma + (1 - p)^\gamma} \]

Examples:

>>> from cpm.models.activations import ProspectUtility
>>> magnitudes = [[5, 0], [10, -10]]
>>> probabilities = [[0.8, 0.2], [0.5, 0.5]]
>>> model = ProspectUtility(
        magnitudes=magnitudes,
        probabilities=probabilities,
        alpha=0.88,
        lambda_loss=2.25,
        gamma=0.61,
        delta=1.0,
        weighting="tk"
    )
>>> expected_utilities = model.compute()
>>> print(expected_utilities)

References

Gonzalez, R., & Wu, G. (1999). On the shape of the probability weighting function. Cognitive psychology, 38(1), 129-166.

Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–291.

Prelec, D. (1998). The probability weighting function. Econometrica, 497-527.

Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and uncertainty, 5, 297-323.

`compute()`

Compute the expected utility of each choice option.

Returns:	`ndarray` – The computed expected utility of each choice option.

`SigmoidActivation(input=None, weights=None, **kwargs)`

Represents a sigmoid activation function.

Parameters:	`input` (`array_like`, default: `None` ) – The input value. The stimulus representation (vector). `weights` (`array_like`, default: `None` ) – The weights value. A 2D array of weights, where each row represents an outcome and each column represents a single stimulus. `kwargs`** (`dict`, default: `{}` ) – Additional keyword arguments.

`compute()`

Compute the activation value using the sigmoid function.

Returns:	`ndarray` – The computed activation value.

`cpm.models.decision`

`ChoiceKernel(temperature_activations=0.5, temperature_kernel=0.5, activations=None, kernel=None, **kwargs)`

A class representing a choice kernel based on a softmax function that incorporates the frequency of choosing an action. It is based on Equation 7 in Wilson and Collins (2019).

Parameters:

temperature_activations (float, default: 0.5 ) –

The inverse temperature parameter for the softmax computation.
temperature_kernel (float, default: 0.5 ) –

The inverse temperature parameter for the kernel computation.
activations (ndarray, default: None ) –

An array of activations for the softmax function.
kernel (ndarray, default: None ) –

An array of kernel values for the softmax function.

Notes

In order to get Equation 6 from Wilson and Collins (2019), either set activations to None (default) or set it to 0.

See Also

cpm.models.learning.KernelUpdate: A class representing a kernel update (Equation 5; Wilson and Collins, 2019) that updates the kernel values.

References

Wilson, R. C., & Collins, A. G. E. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, Article e49547.

Examples:

>>> activations = np.array([[0.1, 0, 0.2], [-0.6, 0, 0.9]])
>>> kernel = np.array([0.1, 0.9])
>>> choice_kernel = ChoiceKernel(temperature_activations=1, temperature_kernel=1, activations=activations, kernel=kernel)
>>> choice_kernel.compute()
array([0.44028635, 0.55971365])

`GreedyRule(activations=None, epsilon=0, **kwargs)`

A class representing an ε-greedy rule based on Daw et al. (2006).

Parameters:	`activations` (`ndarray`, default: `None` ) – An array of activations for the greedy rule. `epsilon` (`float`, default: `0` ) – Exploration parameter. The probability of selecting a random action.

Attributes:	`activations` (`ndarray`) – An array of activations for the greedy rule. `epsilon` (`float`) – Exploration parameter. The probability of selecting a random action. `policies` (`ndarray`) – An array of outputs computed using the greedy rule. `shape` (`tuple`) – The shape of the activations array.

References

Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), Article 7095. https://doi.org/10.1038/nature04766

`choice()`

Chooses the action based on the greedy rule.

Returns:	`action`( `int` ) – The chosen action based on the greedy rule.

`compute()`

Computes the greedy rule.

Returns:	`output`( `ndarray` ) – A 2D array of outputs computed using the greedy rule.

`config()`

Returns the configuration of the greedy rule.

Returns:	`config`( `dict` ) – A dictionary containing the configuration of the greedy rule. activations (ndarray): An array of activations for the greedy rule. name (str): The name of the greedy rule. type (str): The class of function it belongs.

`Sigmoid(temperature=None, activations=None, beta=0, **kwargs)`

A class representing a sigmoid function that takes an n by m array of activations and returns an n array of outputs, where n is the number of output and m is the number of inputs.

The sigmoid function is defined as: 1 / (1 + e^(-temperature * (x - beta))).

Parameters:	`temperature` (`float`, default: `None` ) – The inverse temperature parameter for the sigmoid function. `beta` (`float`, default: `0` ) – It is the value of the output activation that results in an output rating of P = 0.5. `activations` (`ndarray`, default: `None` ) – An array of activations for the sigmoid function.

Examples:

>>> from cpm.models.decision import Sigmoid
>>> import numpy as np
>>> temperature = 7
>>> activations = np.array([[0.1, 0.2]])
>>> sigmoid = Sigmoid(temperature=temperature, activations=activations)
>>> sigmoid.compute()
array([[0.66818777, 0.80218389]])

`choice()`

Chooses the action based on the sigmoid function.

Returns:	`action`( `int` ) – The chosen action based on the sigmoid function.

Notes

The choice is based on the probabilities of the sigmoid function, but it is not guaranteed that the policy values will sum to 1. Therefore, the policies are normalised to sum to 1 when generating a discrete choice.

`compute()`

Computes the Sigmoid function.

Returns:	`output`( `ndarray` ) – A 2D array of outputs computed using the sigmoid function.

`Softmax(temperature=None, xi=None, activations=None, **kwargs)`

Softmax class for computing policies based on activations and temperature.

The softmax function is defined as: e^(temperature * x) / sum(e^(temperature * x)).

Parameters:

temperature (float, default: None ) –

The inverse temperature parameter for the softmax computation.
xi (float, default: None ) –

The irreducible noise parameter for the softmax computation.
activations (ndarray, default: None ) –

Array of activations for each possible outcome/action. It should be a 2D ndarray, where each row represents an outcome and each column represents a single stimulus.

Notes

The inverse temperature parameter beta represents the degree of randomness in the choice process. As beta approaches positive infinity, choices becomes more deterministic, such that the choice option with the greatest activation is more likely to be chosen - it approximates a step function. By contrast, as beta approaches zero, choices becomes random (i.e., the probabilities the choice options are approximately equal) and therefore independent of the options' activations.

activations must be a 2D array, where each row represents an outcome and each column represents a stimulus or other arbitrary features and variables. If multiple values are provided for each outcome, the softmax function will sum these values up.

Note that if you have one value for each outcome (i.e. a classical bandit-like problem), and you represent it as a 1D array, you must reshape it in the format specified for activations. So that if you have 3 stimuli which all are actionable, [0.1, 0.5, 0.22], you should have a 2D array of shape (3, 1), [[0.1], [0.5], [0.22]]. You can see Example 2 for a demonstration.

Examples:

>>> from cpm.models.decision import Softmax
>>> import numpy as np
>>> temperature = 5
>>> activations = np.array([0.1, 0, 0.2])
>>> softmax = Softmax(temperature=temperature, activations=activations)
>>> softmax.compute()
array([0.30719589, 0.18632372, 0.50648039])
>>> softmax.choice() # This will randomly choose one of the actions based on the computed probabilities.
2  
>>> Softmax(temperature=temperature, activations=activations).compute()
array([0.30719589, 0.18632372, 0.50648039])

`choice()`

Choose an action based on the computed policies.

Returns:	`int`( `The chosen action based on the computed policies.` ) –

`compute()`

Compute the policies based on the activations and temperature.

Returns:	`numpy.ndarray: Array of computed policies.` –

`irreducible_noise()`

Extended softmax class for computing policies based on activations, with parameters inverse temperature and irreducible noise.

The softmax function with irreducible noise is defined as:

(e^(beta * x) / sum(e^(beta * x))) * (1 - xi) + (xi / length(x)),

where x is the input array of activations, beta is the inverse temperature parameter, and xi is the irreducible noise parameter.

Notes

The irreducible noise parameter xi accounts for attentional lapses in the choice process. Specifically, the terms (1-xi) + (xi/length(x)) cause the choice probabilities to be proportionally scaled towards 1/length(x). Relatively speaking, this increases the probability that an option is selected if its activation is exceptionally low. This may seem counterintuitive in theory, but in practice it enables the model to capture highly surprising responses that can occur during attentional lapses.

Returns:	`numpy.ndarray: Array of computed policies with irreducible noise.` –

Examples:

>>> activations = np.array([[0.1, 0, 0.2], [-0.6, 0, 0.9]])
>>> noisy_softmax = Softmax(temperature=1.5, xi=0.1, activations=activations)
>>> noisy_softmax.irreducible_noise()
array([0.4101454, 0.5898546])

`cpm.models.learning`

`DeltaRule(alpha=None, zeta=None, weights=None, feedback=None, input=None, **kwargs)`

DeltaRule class computes the prediction error for a given input and target value.

Parameters:

alpha (float, default: None ) –

The learning rate.
zeta –

The constant fraction of the magnitude of the prediction error.
weights (array - like, default: None ) –

The value matrix, where rows are outcomes and columns are stimuli or features. The values can be anything; for example belief values, association weights, connection weights, Q-values.
feedback (array - like, default: None ) –

The target values or feedback, sometimes referred to as teaching signals. These are the values that the algorithm should learn to predict.
input (array - like, default: None ) –

The input value. The stimulus representation in the form of a 1D array, where each element can take a value of 0 and 1.
**kwargs (dict, default: {} ) –

Additional keyword arguments.

See Also

cpm.models.learning.SeparableRule : A class representing a learning rule based on the separable error-term of Bush and Mosteller (1951).

Notes

The delta-rule is a summed error term, which means that the error is defined as the difference between the target value and the summed activation of all values for a given output units target value available on the current trial/state. For separable error term, see the Bush and Mosteller (1951) rule.

The current implementation is based on the Gluck and Bower's (1988) delta rule, an extension of the Rescorla and Wagner (1972) learning rule to multi-outcome learning. Such that

\[ \Delta w_{ij} = \alpha \cdot (\lambda_i - \sum_j w_{ij}) \cdot x_j \]

where \(\Delta w_{ij}\) is the change in weight for the \(j\)-th stimulus for the \(i\)-th outcome, \(\lambda_i\) is the target (feedback) value for the i-th outcome, \(w_ij\) is the weights of stimulus \(j\) for the \(i\)-th outcome, \(x_j\) is the j-th stimulus input, and \(\alpha\) is the learning rate. This is consistent with the Rescorla and Wagner (1972)'s learning rule incorporating the summed error term.

Examples:

>>> import numpy as np
>>> from cpm.models.learning import DeltaRule
>>> weights = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]])
>>> teacher = np.array([1, 0])
>>> input = np.array([1, 1, 0])
>>> delta_rule = DeltaRule(alpha=0.1, zeta=0.1, weights=weights, feedback=teacher, input=input)
>>> delta_rule.compute()
array([[ 0.07,  0.07,  0.  ],
       [-0.09, -0.09, -0.  ]])
>>> delta_rule.noisy_learning_rule()
array([[ 0.05755793,  0.09214091,  0.],
       [-0.08837513, -0.1304325 ,  0.]])

This implementation generalises to n-dimensional matrices, which means that it can be applied to both single- and multi-outcome learning paradigms.

>>> weights = np.array([0.1, 0.6, 0., 0.3])
>>> teacher = np.array([1])
>>> input = np.array([1, 1, 0, 0])
>>> delta_rule = DeltaRule(alpha=0.1, weights=weights, feedback=teacher, input=input)
>>> delta_rule.compute()
array([[0.03, 0.03, 0.  , 0.  ]])

References

Gluck, M. A., & Bower, G. H. (1988). From conditioning to category learning: An adaptive network model. Journal of Experimental Psychology: General, 117(3), 227–247.

Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64-99). New York:Appleton-Century-Crofts.

Widrow, B., & Hoff, M. E. (1960, August). Adaptive switching circuits. In IRE WESCON convention record (Vol. 4, No. 1, pp. 96-104).

`compute()`

Compute the prediction error using the delta learning rule. It is based on the Gluck and Bower's (1988) delta rule, an extension to Rescorla and Wagner (1972), which was identical to that of Widrow and Hoff (1960).

Returns:	`ndarray` – The prediction error for each stimuli-outcome mapping with learning noise. It has the same shape as the weights input argument.

`noisy_learning_rule()`

Add random noise to the prediction error computed from the delta learning rule as specified Findling et al. (2019). It is inspired by Weber's law of intensity sensation.

Returns:	`ndarray` – The prediction error for each stimuli-outcome mapping with learning noise. It has the same shape as the weights input argument.

References

Findling, C., Skvortsova, V., Dromnelle, R., Palminteri, S., and Wyart, V. (2019). Computational noise in reward-guided learning drives behavioral variability in volatile environments. Nature Neuroscience 22, 2066–2077

`reset()`

Reset the weights to zero.

`HumbleTeacher(alpha=None, weights=None, feedback=None, input=None, **kwargs)`

A humbe teacher learning rule (Kruschke, 1992; Love, Gureckis, and Medin, 2004) for multi-dimensional outcome learning.

Attributes:

alpha (float) –

The learning rate.
input (ndarray or array_like) –

The input value. The stimulus representation in the form of a 1D array, where each element can take a value of 0 and 1.
weights (ndarray) –

The weights value. A 2D array of weights, where each row represents an outcome and each column represents a single stimulus.
teacher (ndarray) –

The target values or feedback, sometimes referred to as teaching signals. These are the values that the algorithm should learn to predict.
shape (tuple) –

The shape of the weight matrix.

Parameters:

alpha (float, default: None ) –

The learning rate.
weights (array - like, default: None ) –

The input value. The stimulus representation in the form of a 1D array, where each element can take a value of 0 and 1.
feedback (array - like, default: None ) –

The target values or feedback, sometimes referred to as teaching signals. These are the values that the algorithm should learn to predict.
input (array - like, default: None ) –

The input value. The stimulus representation in the form of a 1D array, where each element can take a value of 0 and 1.
**kwargs (dict, default: {} ) –

Additional keyword arguments.

Notes

The humble teacher is a learning rule that is based on the idea that if output node activations are larger than the teaching signal, they should not be counted as error, but should be rewarded. It is defined as:

\[ t_k = \begin{cases} \min(-1, a_k) & \text{if } t_k = 0 \text{ if stimulus is not followed by outcome/category-label} \\ \max(1, a_k) & \text{if } t_k = 1 \text{ if stimulus is followed by outcome/category-label} \end{cases} \]

where \(t_k\) is the teaching signal. Then the change in weights is computed according to the delta rule (Rescorla & Wagner, 1972; Rumelhart, Hinton & Williams, 1986; Gluck & Bower, 1988):

\[ \Delta w_{ij} = \alpha \cdot (t_k - a_k) \cdot x_j \]

where \(\Delta w_{ij}\) is the change in weight for the \(j\)-th stimulus for the \(i\)-th outcome, \(t_k\) is the teaching signal for the \(k\)-th outcome, \(a_k\) is the summed activation of all nodes connected to the \(k\)-th outcome, \(x_j\) is the j-th stimulus input, and \(\alpha\) is the learning rate.

References

Gluck, M. A., & Bower, G. H. (1988). From conditioning to category learning: An adaptive network model. Journal of Experimental Psychology: General, 117(3), 227–247.

Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22–44.

Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64-99). New York:Appleton-Century-Crofts.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. nature, 323(6088), 533-536.

Examples:

>>> import numpy as np
>>> from cpm.models.learning import HumbleTeacher
>>> weights = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]])
>>> teacher = np.array([0, 1])
>>> input = np.array([1, 1, 1])
>>> humble_teacher = HumbleTeacher(alpha=0.1, weights=weights, feedback=teacher, input=input)
>>> humble_teacher.compute()
array([[-0.06,  0.04,  0.14],
    [ 0.4 ,  0.5 ,  0.6 ]])

`compute()`

Compute the weights using the CPM learning rule.

Returns:	`weights`( `ndarray` ) – The updated weights matrix.

`KernelUpdate(response, alpha, kernel, input, **kwargs)`

A class representing a learning rule for updating the choice kernel as specified by Equation 5 in Wilson and Collins (2019).

Parameters:

response (ndarray) –

The response vector. It must be a binary numpy.ndarray, so that each element corresponds to a response option. If there are 4 response options, and the second was selected, it would be represented as [0, 1, 0, 0].
alpha (float) –

The kernel learning rate.
kernel (ndarray) –

The kernel used for learning. It is a 1D array of kernel values, where each element corresponds to a response option. Each element must correspond to the same response option in the response vector.

Notes

The kernel update component is used to represent how likely a given response is to be chosen based on the frequency it was chosen in the past. This can then be integrated into a choice kernel decision policy.

See Also

cpm.models.decision.ChoiceKernel : A class representing a choice kernel decision policy.

References

Wilson, Robert C., and Anne GE Collins. Ten simple rules for the computational modeling of behavioral data. Elife 8 (2019): e49547.

`compute()`

Compute the change in the kernel based on the given response, rate, and kernel, and return the updated kernel.

Returns:	`output`( `numpy.ndarray:` ) – The computed change of the kernel.

`config()`

Get the configuration of the kernel update component.

Returns:	`config`( `dict` ) – A dictionary containing the configuration parameters of the kernel update component. response (float): The response of the system. rate (float): The learning rate. kernel (list): The kernel used for learning. input (str): The name of the input. name (str): The name of the kernel update component class. type (str): The type of the kernel update component.

`QLearningRule(alpha=0.5, gamma=0.1, values=None, reward=None, maximum=None, *args, **kwargs)`

Q-learning rule (Watkins, 1989) for a one-dimensional array of Q-values.

Parameters:

alpha (float, default: 0.5 ) –

The learning rate. Default is 0.5.
gamma (float, default: 0.1 ) –

The discount factor. Default is 0.1.
values (ndarray, default: None ) –

The values matrix. It is a 1D array of Q-values active for the current state, where each element corresponds to an action.
reward (float, default: None ) –

The reward received on the current state.
maximum (float, default: None ) –

The maximum estimated reward for the next state.

Notes

The Q-learning rule is a model-free reinforcement learning algorithm that is used to learn the value of an action in a given state. It is defined as

Q(s, a) = Q(s, a) + alpha * (r + gamma * max(Q(s', a')) - Q(s, a)),

where Q(s, a) is the value of action a in state s, r is the reward received on the current state, gamma is the discount factor, and max(Q(s', a')) is the maximum estimated reward for the next state.

Examples:

>>> import numpy as np
>>> from cpm.models.learning import QLearningRule
>>> values = np.array([1, 0.5, 0.99])
>>> component = QLearningRule(alpha=0.1, gamma=0.8, values=values, reward=1, maximum=10)
>>> component.compute()
array([1.8  , 1.35 , 1.791])

References

Watkins, C. J. C. H. (1989). Learning from delayed rewards.

Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine learning, 8, 279-292.

`compute()`

Compute the change in values based on the given values, reward, and parameters, and return the updated values.

Returns:	`output`( `numpy.ndarray:` ) – The computed output values.

`SeparableRule(alpha=None, zeta=None, weights=None, feedback=None, input=None, **kwargs)`

A class representing a learning rule based on the separable error-term of Bush and Mosteller (1951).

Parameters:

alpha (float, default: None ) –

The learning rate.
zeta (float, default: None ) –

The constant fraction of the magnitude of the prediction error, also called Weber's scaling.
weights (array - like, default: None ) –

The value matrix, where rows are outcomes and columns are stimuli or features. The values can be anything; for example belief values, association weights, connection weights, Q-values.
feedback (array - like, default: None ) –

The target values or feedback, sometimes referred to as teaching signals. These are the values that the algorithm should learn to predict.
input (array - like, default: None ) –

The input value. The stimulus representation in the form of a 1D array, where each element can take a value of 0 and 1.
**kwargs (dict, default: {} ) –

Additional keyword arguments.

See Also

cpm.models.learning.DeltaRule : An extension of the Rescorla and Wagner (1972) learning rule by Gluck and Bower (1988) to allow multi-outcome learning.

Notes

This type of learning rule was among the earliest formal models of associative learning (Le Pelley, 2004), which were based on standard linear operators (Bush & Mosteller, 1951; Estes, 1950; Kendler, 1971). It is used in a variety of reinforcement learning models. This learning rule is defined in cpm as

\[ \Delta w_{ij} = \alpha \cdot (\lambda_i - w_{ij}) \cdot x_j \]

which is consistent with the modification of the Rescorla and Wagner (1972) learning rule by Sutton and Barto (2018). The current implementation generalises to any number of outcomes and stimuli, which means that it can be applied to both single- and multi-outcome learning paradigms.

References

Bush, R. R., & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58, 313–323

Estes, W. K. (1950). Toward a statistical theory of learning. Psychological Review, 57, 94–107

Kendler, T. S. (1971). Continuity theory and cue dominance. In J. T. Spence (Ed.), Essays in neobehaviorism: A memorial volume to Kenneth W. Spence. New York: Appleton-Century-Crofts.

Le Pelley, M. E. (2004). The role of associative history in models of associative learning: A selective review and a hybrid model. Quarterly Journal of Experimental Psychology Section B, 57(3), 193-243.

`compute()`

Computes the prediction error using the learning rule.

Returns:

ndarray The prediction error for each stimuli-outcome mapping. It has the same shape as the weights input argument.

`noisy_learning_rule()`

Add random noise to the prediction error computed from the delta learning rule as specified Findling et al. (2019). It is inspired by Weber's law of intensity sensation.

Returns:	`ndarray` – The prediction error for each stimuli-outcome mapping with learning noise. It has the same shape as the weights input argument.

References

Findling, C., Skvortsova, V., Dromnelle, R., Palminteri, S., and Wyart, V. (2019). Computational noise in reward-guided learning drives behavioral variability in volatile environments. Nature Neuroscience 22, 2066–2077

`reset()`

Resets the weights to zero.