CPM library
  • Home

Getting Started

  • Installation
  • Troubleshooting
  • Roadmap

Examples

  • Two associative learning models and blocking (cpm.models, cpm.generators)
  • Reinforcement learning with a two-armed bandit (cpm.models, cpm.generators, cpm.optimisation)
  • Estimating metacognitive efficiency with meta-d (cpm.applications)
  • Parameter Recovery (cpm.generators, cpm.models, cpm.optimisation)
  • Model Recovery and Landscaping (cpm.models, cpm.generators, cpm.optimisation)
    • Import results
    • Landscaping analysis
    • References
    • Appendix: Model Descriptions
  • Estimating Empirical Priors (cpm.hierarchical)
  • Exploring hierarchical estimation of hyperparameters (cpm.hierarchical)
  • Scale to hypercomputing cluster

API Reference

  • cpm.applications
  • cpm.datasets
  • cpm.models
  • cpm.generators
  • cpm.optimisation
  • cpm.hierarchical
  • cpm.utils
CPM library
  • Examples
  • Model Recovery and Landscaping (cpm.models, cpm.generators, cpm.optimisation)

In [1]:
Copied!
import cpm
from packaging import version

## cpm checks
print(cpm.__version__)
if version.parse(cpm.__version__) < version.parse("0.22"):
    raise ImportError("cpm version must be >= 0.22. Please install the latest version using: pip install --upgrade cpm")
import cpm from packaging import version ## cpm checks print(cpm.__version__) if version.parse(cpm.__version__) < version.parse("0.22"): raise ImportError("cpm version must be >= 0.22. Please install the latest version using: pip install --upgrade cpm")
0.24.6

Model Recovery and Landscaping (cpm.models, cpm.generators, cpm.optimisation)¶

We have already ran the model recovery example, which can be found in the model-recovery-run.py file. In this example, we will import the results and inspect them visually.

Import results¶

In [3]:
Copied!
import pandas as pd
import numpy as np

## plotting libraries
import matplotlib.pyplot as plt
import seaborn as sns

data = pd.read_csv("model-recovery-results.csv")
import pandas as pd import numpy as np ## plotting libraries import matplotlib.pyplot as plt import seaborn as sns data = pd.read_csv("model-recovery-results.csv")
In [4]:
Copied!
data = data[["target", "model", "run", "fun"]]
data
data = data[["target", "model", "run", "fun"]] data
Out[4]:
target model run fun
0 delta delta 0 23.059331
1 delta delta 0 26.683399
2 delta delta 0 37.170220
3 delta delta 0 33.120735
4 delta delta 0 35.196873
... ... ... ... ...
21595 anticorrelated anticorrelated 99 38.584292
21596 anticorrelated anticorrelated 99 33.349922
21597 anticorrelated anticorrelated 99 25.438968
21598 anticorrelated anticorrelated 99 60.340156
21599 anticorrelated anticorrelated 99 26.766203

21600 rows × 4 columns

In [5]:
Copied!
results = data.groupby(["target", "run", "model"]).sum().reset_index()
## which model ihad the lowest fun
# For each (target, run), find which model had the lowest fun
results['lowest_fun_model'] = results.groupby(['target', 'run'])['fun'].transform(lambda x: x.idxmin())
results['is_lowest'] = results.index == results['lowest_fun_model']
results[['target', 'run', 'model', 'fun', 'is_lowest']]
results = data.groupby(["target", "run", "model"]).sum().reset_index() ## which model ihad the lowest fun # For each (target, run), find which model had the lowest fun results['lowest_fun_model'] = results.groupby(['target', 'run'])['fun'].transform(lambda x: x.idxmin()) results['is_lowest'] = results.index == results['lowest_fun_model'] results[['target', 'run', 'model', 'fun', 'is_lowest']]
Out[5]:
target run model fun is_lowest
0 anticorrelated 0 anticorrelated 866.239182 True
1 anticorrelated 0 delta 1038.793504 False
2 anticorrelated 0 kernel 1051.874477 False
3 anticorrelated 1 anticorrelated 925.886793 True
4 anticorrelated 1 delta 1073.461200 False
... ... ... ... ... ...
895 kernel 98 delta 1082.442799 True
896 kernel 98 kernel 1084.175883 False
897 kernel 99 anticorrelated 965.265362 False
898 kernel 99 delta 866.880823 False
899 kernel 99 kernel 859.026212 True

900 rows × 5 columns

In [6]:
Copied!
confusion_matrix = results[results.is_lowest == True].groupby(['target']).model.value_counts().astype('double').reset_index().pivot(index='target', columns='model', values='count').fillna(0).astype(int).reset_index()
plt.figure(figsize=(8, 6))
sns.heatmap(
    confusion_matrix.set_index('target'),
    annot=True,
    fmt='d',
    cmap='cividis',
    annot_kws={"size": 12}
)
plt.title('Model Recovery Confusion Matrix\n', fontsize=18)
plt.ylabel('True Model (Target)', fontsize=16, labelpad=10)
plt.xlabel('Recovered Model', fontsize=16, labelpad=10)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.show()
confusion_matrix = results[results.is_lowest == True].groupby(['target']).model.value_counts().astype('double').reset_index().pivot(index='target', columns='model', values='count').fillna(0).astype(int).reset_index() plt.figure(figsize=(8, 6)) sns.heatmap( confusion_matrix.set_index('target'), annot=True, fmt='d', cmap='cividis', annot_kws={"size": 12} ) plt.title('Model Recovery Confusion Matrix\n', fontsize=18) plt.ylabel('True Model (Target)', fontsize=16, labelpad=10) plt.xlabel('Recovered Model', fontsize=16, labelpad=10) plt.xticks(fontsize=14) plt.yticks(fontsize=14) plt.show()
No description has been provided for this image

Landscaping analysis¶

Landscaping (Navarro et al., 2004) is a technique that allows to systematically explore the performance of different models on a dataset generated by one of them. It is part of a group of global model analysis techniques, that is designed to step back from a particular data sample and focus on the behavior of the model as a whole. Landscaping adds another dimension to fits to data by increasing our understanding of the models and data.

At its core, landscaping provides a visual representation of how two models relate to each other and experimental data. This is done by fitting one model to many datasets generated by that model under a specific experimental design. Another model is then fit to these same datasets, and the fits from both models are plotted against each other. The resulting scatterplot is known as a landscape.

In [18]:
Copied!
landscaping = data
pivoted = landscaping.pivot_table(index=['target', 'run'], columns='model', values='fun').reset_index()
landscaping = data pivoted = landscaping.pivot_table(index=['target', 'run'], columns='model', values='fun').reset_index()
In [55]:
Copied!
model_names = ['anticorrelated', 'delta', 'kernel']
num_models = len(model_names)

for index, target in enumerate(results['target'].unique()):
    subset = pivoted[pivoted['target'] == target]
    fig, axes = plt.subplots(num_models, num_models, figsize=(3*num_models, 3*num_models), squeeze=False)
    for i, x_model in enumerate(model_names):
        for j, y_model in enumerate(model_names):
            ax = axes[i, j]
            if x_model == y_model:
                ax.text(0.5, 0.5, 'X', fontsize=60, color='gray', ha='center', va='center', alpha=0.3, transform=ax.transAxes)
                ax.axis('off')
                ax.set_title(f"{x_model} vs {y_model}\n(same model)", fontsize=14)
                continue
            x = -subset[x_model]
            y = -subset[y_model]
            distance = np.abs(x - y)
            scatter = ax.scatter(x, y, c=distance, cmap='viridis', alpha=0.7, s=10)
            min_val = min(x.min(), y.min())
            max_val = max(x.max(), y.max())
            ax.plot([min_val, max_val], [min_val, max_val], color='red', linestyle='--', linewidth=2)
            ax.set_xlabel(f"{x_model} LL", fontsize=12)
            ax.set_ylabel(f"{y_model} LL", fontsize=12)
            ax.set_title(f"{x_model} vs {y_model}\n", fontsize=14)
            # Set same limits and ticks
            ax.set_xlim(min_val-1, max_val + 1)
            ax.set_ylim(min_val-1, max_val + 1)
            ticks = np.linspace(int(np.floor(min_val)), int(np.ceil(max_val)), num=5, dtype=int)
            ax.set_xticks(ticks)
            ax.set_yticks(ticks)
    fig.suptitle(f"\nFigure {index + 1}: data-generating model {target}", fontsize=18, y=1.02)
    plt.tight_layout(rect=[0, 0, 1, 0.97])
    plt.show()
model_names = ['anticorrelated', 'delta', 'kernel'] num_models = len(model_names) for index, target in enumerate(results['target'].unique()): subset = pivoted[pivoted['target'] == target] fig, axes = plt.subplots(num_models, num_models, figsize=(3*num_models, 3*num_models), squeeze=False) for i, x_model in enumerate(model_names): for j, y_model in enumerate(model_names): ax = axes[i, j] if x_model == y_model: ax.text(0.5, 0.5, 'X', fontsize=60, color='gray', ha='center', va='center', alpha=0.3, transform=ax.transAxes) ax.axis('off') ax.set_title(f"{x_model} vs {y_model}\n(same model)", fontsize=14) continue x = -subset[x_model] y = -subset[y_model] distance = np.abs(x - y) scatter = ax.scatter(x, y, c=distance, cmap='viridis', alpha=0.7, s=10) min_val = min(x.min(), y.min()) max_val = max(x.max(), y.max()) ax.plot([min_val, max_val], [min_val, max_val], color='red', linestyle='--', linewidth=2) ax.set_xlabel(f"{x_model} LL", fontsize=12) ax.set_ylabel(f"{y_model} LL", fontsize=12) ax.set_title(f"{x_model} vs {y_model}\n", fontsize=14) # Set same limits and ticks ax.set_xlim(min_val-1, max_val + 1) ax.set_ylim(min_val-1, max_val + 1) ticks = np.linspace(int(np.floor(min_val)), int(np.ceil(max_val)), num=5, dtype=int) ax.set_xticks(ticks) ax.set_yticks(ticks) fig.suptitle(f"\nFigure {index + 1}: data-generating model {target}", fontsize=18, y=1.02) plt.tight_layout(rect=[0, 0, 1, 0.97]) plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Let us look at the results on Figure 3. First thing to notice is that now we are plotting the Log Likelihoods and not the negative Log Likelihoods. So the bigger the number the better the fit. The data-generating model is the kernel model, so we expect the fits to move towards the kernel axis and away from the anticorrelated and delta models.

The bottom left corner of Figure 3 shows the anticorrelated model fit against the kernel model fit. There is a big shift away from the diagonal line, which indicates that the two models are qualitatively distinguishable and the data is informative enough to decide between them meaningfully. And as it comes, the kernel model wins over the anticorrelated model.

On the other hand, let us look at the bottom middle plot on Figure 3. The kernel and delta models are very similar, and the fits are very close to the diagonal line, with a slight shift towards the kernel model (e.g. the kernel was better at fitting), but this is a small shift. This suggests that there is little chance of finding discriminating data for these two models.

References¶

Navarro, D. J., Pitt, M. A., & Myung, I. J. (2004). Assessing the distinguishability of models and the informativeness of data. Cognitive Psychology, 49(1), 47–84. https://doi.org/10.1016/j.cogpsych.2003.11.001

Appendix: Model Descriptions¶

In [ ]:
Copied!

Previous Next

Built with MkDocs using a theme provided by Read the Docs.
« Previous Next »