cpm.utils

cpm.utils.data.pandas_to_dict(df=None, participant='ppt', stimuli='stimulus', feedback='feedback', **kwargs)

Convert a pandas dataframe to a dictionary suitable for use with the CPM wrappers.

The pandas dataframe should have a column for each stimulus, and a column for each feedback, and a column for a participant identifier. Each row should be a single trial, and each participant should have a unique number in the participant column.

Parameters:
  • df (pandas dataframe, default: None ) –

    The dataframe to convert

  • stimuli (str, default: 'stimulus' ) –

    The prefix for each stimulus column in the pandas DataFrame, by default "stimulus".

  • participant (str, default: 'ppt' ) –

    The column name for the participant number, by default "ppt".

  • **kwargs (dict, default: {} ) –

    Any other keyword arguments to pass to the pandas DataFrame.

Returns:
  • list

    A list of dictionaries, each dictionary containing the stimuli and feedback for a single participant.

cpm.utils.data.dict_to_pandas(dict)

Convert a dictionary to a pandas dataframe.

Parameters:
  • dict (dict) –

    The dictionary to convert.

Returns:
  • pandas( dataframe ) –

    The pandas dataframe converted from dict.

cpm.utils.metad.count_trials(data=pd.DataFrame, stimuli='Stimuli', responses='Responses', accuracy='Accuracy', confidence='Confidence', nRatings=4, padding=False, padAmount=None)

Convert raw behavioral data to nR_S1 and nR_S2 response count.

Given data from an experiment where an observer discriminates between two stimulus alternatives on every trial and provides confidence ratings, converts trial by trial experimental information for N trials into response counts.

Parameters:
  • data

    Dataframe containing stimuli, accuracy and confidence ratings.

  • stimuli (str, default: 'Stimuli' ) –

    Stimuli ID (0 or 1). If a dataframe is provided, should be the name of the column containing the stimuli ID. Default is 'Stimuli'.

  • responses (str, default: 'Responses' ) –

    Response (0 or 1). If a dataframe is provided, should be the name of the column containing the response accuracy. Default is 'Responses'.

  • accuracy (str, default: 'Accuracy' ) –

    Response accuracy (0 or 1). If a dataframe is provided, should be the name of the column containing the response accuracy. Default is 'Accuracy'.

  • confidence (str, default: 'Confidence' ) –

    Confidence ratings. If a dataframe is provided, should be the name of the column containing the confidence ratings. Default is 'Confidence'.

  • nRatings (int, default: 4 ) –

    Total of available subjective ratings available for the subject. e.g. if subject can rate confidence on a scale of 1-4, then nRatings = 4. Default is 4.

  • padding (bool, default: False ) –

    If True, each response count in the output has the value of padAmount added to it. Padding cells is desirable if trial counts of 0 interfere with model fitting. If False, trial counts are not manipulated and 0s may be present in the response count output. Default value for padding is 0.

  • padAmount (float, default: None ) –

    The value to add to each response count if padding is set to 1. Default value is 1/(2*nRatings)

Returns:
  • nR_S1, nR_S2 :

    Vectors containing the total number of responses in each accuracy category, conditional on presentation of S1 and S2.

Notes

All trials where stimuli is not 0 or 1, accuracy is not 0 or 1, or confidence is not in the range [1, nRatings], are automatically omitted.

The inputs can be responses, accuracy or both. If both responses and accuracy are provided, will check for consstency. If only accuracy is provided, the responses vector will be automatically infered.

If nR_S1 = [100 50 20 10 5 1], then when stimulus S1 was presented, the subject had the following accuracy counts: responded S1, confidence=3 : 100 times responded S1, confidence=2 : 50 times responded S1, confidence=1 : 20 times responded S2, confidence=1 : 10 times responded S2, confidence=2 : 5 times responded S2, confidence=3 : 1 time

The ordering of accuracy / confidence counts for S2 should be the same as it is for S1. e.g. if nR_S2 = [3 7 8 12 27 89], then when stimulus S2 was presented, the subject had the following accuracy counts: responded S1, confidence=3 : 3 times responded S1, confidence=2 : 7 times responded S1, confidence=1 : 8 times responded S2, confidence=1 : 12 times responded S2, confidence=2 : 27 times responded S2, confidence=3 : 89 times

Examples:

>>> stimID = [0, 1, 0, 0, 1, 1, 1, 1]
>>> accuracy = [0, 1, 1, 1, 0, 0, 1, 1]
>>> confidence = [1, 2, 3, 4, 4, 3, 2, 1]
>>> nRatings = 4
>>> nR_S1, nR_S2 = trials2counts(stimID, accuracy, confidence, nRatings)
>>> print(nR_S1, nR_S2)
Reference

This function is adapted from the Python version of trials2counts.m by Maniscalco & Lau [1] retrieved at: http://www.columbia.edu/~bsm2105/type2sdt/trials2counts.py

.. [1] Maniscalco, B., & Lau, H. (2012). A signal detection theoretic approach for estimating metacognitive sensitivity from confidence ratings. Consciousness and Cognition, 21(1), 422–430. https://doi.org/10.1016/j.concog.2011.09.021

cpm.utils.metad.bin_ratings(ratings=None, nbins=4, verbose=True, ignore_invalid=False)

Convert from continuous to discrete ratings.

Resample if quantiles are equal at high or low end to ensure proper assignment of binned confidence

Parameters:
  • ratings (list | ndarray, default: None ) –

    Ratings on a continuous scale.

  • nbins (int, default: 4 ) –

    The number of discrete ratings to resample. Default set to 4.

  • verbose (boolean, default: True ) –

    If True, warnings will be returned.

  • ignore_invalid (bool, default: False ) –

    If False (default), an error will be raised in case of impossible discretisation of the confidence ratings. This is mostly due to identical values, and SDT values should not be extracted from the data. If True, the discretisation will process anyway. This option can be useful for plotting.

Returns:
  • discreteRatings( ndarray ) –

    New rating array only containing integers between 1 and nbins.

  • out( dict ) –

    Dictionary containing logs of the discretisation process: * 'confbins': list or 1d array-like - If the ratings were reampled, a list containing the new ratings and the new low or hg threshold, appened before or after the rating, respectively. Else, only returns the ratings. * 'rebin': boolean - If True, the ratings were resampled due to larger numbers of highs or low ratings. * 'binCount' : int - Number of bins

  • .. warning:: This function will automatically control for bias in high or

    low confidence ratings. If the first two or the last two quantiles have identical values, low or high confidence trials are excluded (respectively), and the function is run again on the remaining data.

Raises:
  • ValueError:

    If the confidence ratings contains a lot of identical values and ignore_invalid is False.

Examples:

>>> from metadpy.utils import discreteRatings
>>> ratings = np.array([
>>>     96, 98, 95, 90, 32, 58, 77,  6, 78, 78, 62, 60, 38, 12,
>>>     63, 18, 15, 13, 49, 26,  2, 38, 60, 23, 25, 39, 22, 33,
>>>     32, 27, 40, 13, 35, 16, 35, 73, 50,  3, 40, 0, 34, 47,
>>>     52,  0,  0,  0, 25,  1, 16, 37, 59, 20, 25, 23, 45, 22,
>>>     28, 62, 61, 69, 20, 75, 10, 18, 61, 27, 63, 22, 54, 30,
>>>     36, 66, 14,  2, 53, 58, 88, 23, 77, 54])
>>> discreteRatings, out = discreteRatings(ratings)
(array([4, 4, 4, 4, 2, 3, 4, 1, 4, 4, 4, 4, 3, 1, 4, 1, 1, 1, 3, 2, 1, 3,
    4, 2, 2, 3, 2, 2, 2, 2, 3, 1, 3, 1, 3, 4, 3, 1, 3, 1, 2, 3, 3, 1,
    1, 1, 2, 1, 1, 3, 3, 2, 2, 2, 3, 2, 2, 4, 4, 4, 2, 4, 1, 1, 4, 2,
    4, 2, 3, 2, 3, 4, 1, 1, 3, 3, 4, 2, 4, 3]),
{'confBins': array([ 0., 20., 35., 60., 98.]), 'rebin': 0, 'binCount': 21})