RL Modelling Tutorial¶
Welcome to the avoidance learning reinforcement learning models repo. This repo was built for the PEAC lab to computationally model empirical data from the various avoidance learning tasks. This repo has the ability to fit RL models to empirical data and to validate these models through parameters and model recovery methods.
Note that this tutorial is designed to run in Google Colab and not from the repo itself (since it clones the repo)
Supported models
QLearning, ActorCritic
Relative, Advantage
Hybrid2012, Hybrid2021
StandardHybrid2012, StandardHybrid2021
Standard models
Standard models as described in each corresponding citation, which introduces the model, with the addition of counterfactual learning rates.
Hybrid models have alternatives versions without counterfactual learning rates: StandardHybrid2012, StandardHybrid2021
QLearning: Standard Q-Learning Model
ActorCritic: Standard Actor-Critic Model
Relative: Standard Relative Model (Palminteri et al., 2015)
Advantage: Simplified Relative Model (Williams et al., 2025)
Hybrid2012+bias: Hybrid 2012 Model (Gold et al., 2012)
Hybrid2021+bias+decay: Hybrid 2021 Model (Geana et al., 2021)
Optional Parameters
You can add optional parameters to models by adding them to the model name using a + sign
+bias: Adds a valence bias to the model (e.g. Hybrid2012+bias), only usable with Hybrid models
+novel: Adds a free parameter for the novel stimulus (e.g. QLearning+novel), useable with all models
+decay: Adds a decay parameter to the model (e.g. QLearning+decay), useable with all models
Project Pipeline¶
This repo is one part of a project pipeline, which requires the coordination of multiple repos. Projects begin with a task repo, which is used to collect behavioural data from participants either locally or on Prolific. The collected data must then be pushed through a data extraction repo to prepare CSV files for analysis. These CSV files are used in the analysis repo, which creates a PDF report (AL/reports
), ending the project pipeline.
Optionally, you can run computational reinforcement learning models using the modelling repo (this repo), and the results can be added to the report. This is a bit clunky because it requires a bit of back-and-forth between the analysis repo and this modelling repo. Specifically, the analysis repo must be run (with load_models=False
) in order to create two CSV files that this repo needs (AL/data/pain_learning_processed.csv
and AL/data/pain_transfer_processed.csv
). These files can then be manually moved into this repo's data directory (RL/data
). This repo can then be used to model the data, which will result in a newly constructed directory called modelling
(RL/modelling
). This folder can then be manually moved to the analysis repo as AL/modelling
. Then you can re-run the analysis repo (with load_models=True
) and the modelling results will be included in the PDF report.
Cloning the Repo¶
We will begin by cloning the repo, installing dependencies, and then adding this repo as a system path. Adding the repo in the system path is only necessary for this tutorial. We also change directory to the repo. When using locally, you can create your script in the AL
source folder, in the same manner as AL_main.py
(avoid_learning_analysis/AL/AL_main.py
).
import sys
import os
# We will now clone the repo, pull any updates, and install dependencies
!git clone https://github.com/petzschnerlab/avoid_learning_rl_models.git
%cd avoid_learning_rl_models/
!git pull
!pip install .
#Only necessary for Google Colab
sys.path.insert(0, os.path.abspath("/content/avoid_learning_rl_models/RL"))
The Pipeline¶
Next, we will import the Pipeline class. This class is the entry point to this repo. It will take in all of your parameters and run the corresponding analyses.
from helpers.pipeline import Pipeline
The Help Function¶
The pipeline has a help function that will outline some information about the repo and then describe all of the parameters. These details are also available in the documentation. We will use the help=True
parameters in order to see this help function below.
This parameter can be passed to the Pipeline during initiatialization:
pipeline = Pipeline(help=True)
or to the pipeline run method of the class:
pipeline = Pipeline()
pipeline.run(help=True)
The help information gets truncated in Jupyter notebooks, but you can view the whole output by clicking scrollable element
.
pipeline = Pipeline(help=True)
Fitting Models¶
There are two modes in this repo, fit
and validation
. In FIT mode, models are fitted to empirical data. In VALIDATION mode, parameter recovery or model recovery is performed, depending on the recovery parameter. We will begin by fitting several RL models to our empirical data.
We will define a typical set of parameters for this package below, see the help information above to understand what each parameters does. Note that in this example, we extract our priors using the fixed_priors function. Note that here we will be using learning and transfer filenames tutorial_learning_data.csv
and tutorial_transfer_data.csv
, respectively, but the default filenames exported by the analysis repo are pain_learning_processed.csv
and pain_transfer_processed.csv
. Finally, it is highly recommended to use multiprocessing if possible as we are doing here (e.g., multiprocessing=True
). Especially when you have a lot of participants, are running many models, and including many runs, this will take a very long time if you are not using multiprocessing. It will still take a good amount of time when using multiprocessing.
A final note is that we are setting a seed when initializing the pipeline to ensure replicability because this repo does use randomization.
from helpers.priors import fixed_priors
models = [
'QLearning+novel',
'ActorCritic+novel',
'Advantage+novel',
]
fixed = fixed_priors(models)
fit_params = {
'mode': 'fit',
'learning_filename': 'RL/data/tutorial_learning_data.csv',
'transfer_filename': 'RL/data/tutorial_transfer_data.csv',
'models': models,
'random_params': 'normal',
'fixed': fixed,
'multiprocessing': True,
}
pipeline = Pipeline(seed=1251)
pipeline.run(**fit_params)
Fitting Results¶
Now that we have finished fitting the models to our data, we can look at a couple of plots to see the results. After the fitting procedure, a new folder is created RL/modelling
. This folder contains all important results (statistics, plots, etc.). The intent of this folder is to move it to the analysis repo and that way you can add your modelling results to your PDF report. For this tutorial we will look at a couple of plots to see our results.
Let's begin by viewing the BIC plot, which will tell us which model fit the data the best. Remember, the lower this value the better the data fit.
from IPython.display import Image, display
display(Image(filename='RL/modelling/BIC-model-comparisons.png'))
caption = (
'Model selection using Bayesian information criterion (BIC). Grand averaged BIC metrics for each model across all participants. '
'Lower values indicate better model fit. The black border indicates the best model.'
)
print(caption)
We can also view the fits from the BIC table, which will also show us the best fits for each group.
import pandas as pd
BIC_fits = pd.read_csv('RL/modelling/group_BIC.csv')
BIC_fits
The fitting procedure also runs model simulations using the fitted parameters of each participant. We can also view these simulation results. We will view the best models simulations.
best_model = BIC_fits['best_model'].values[-1]
print(best_model)
filename = f'RL/modelling/model_behaviours/{best_model}_model_behaviours.png'
display(Image(filename, width=800, height=600))
caption = (
'Posterior Predictive checks of the best-fitted computational model. '
'a. Learning Phase: Modelperformance across binned learning trials for the reward and punishment contexts for each group. '
'Shaded regions represent 95% confidence intervals. Blue and red diamonds indicate empirical means of participant accuracy. '
'b. Transfer Phase: Choice rates for each stimulus type during transfer trials for each group. '
'Choice rate is computed as the percentage of times a stimulus type was chosen, given the number of times it was presented. '
'Bar plots show the mean and 95% confidence intervals of the choice rate for each stimulus type across participants within each group. '
'Grey diamonds indicate empirical means of participant choice rates. '
'Abbreviations: HR – high reward rate (75% reward), LR – low reward rate (25% reward), LP – low punishment rate (25% loss), '
'HP – high punishment rate (75% loss), N - novel stimulus.'
)
print(caption)
Once we are happy with our chosen best model and it's behaviours, we can look at each of it's parameters and how they vary across groups.
filename = f'RL/modelling/parameter_fits/{best_model}-model-fits.png'
display(Image(filename, width=800, height=600))
caption = (
'Group effects for all parameters. Fitted parameter values are displayed as a bar plot, '
'showing the mean and 95% confidence intervals across participants for each group.'
)
print(caption)
Finally, this mode produces a fit_data_FIT.pkl
file. This file is a dictionary where the keys are the model names and the values of each key is a dataframe of fitted values for each participant.
import pickle
with open('RL/modelling/fit_data_FIT.pkl', 'rb') as f:
full_fit = pickle.load(f)
print("Type of loaded object:", type(full_fit))
print("Top-level keys in the loaded fit data (model names):", full_fit.keys())
print("First 20 rows of the best fit data:")
best_fit = full_fit[best_model]
best_fit.head(20)
Validating Models¶
Next, we will be looking at how to validate your models. In practice, it is best to do this before the fitting procedure we have just outlined above. Model validation ensures that your model is functioning as intended and that the model parameters (e.g., the upper and lower bounds) are appropriate for your model.
Using the validation mode requires a recovery
parameter and this parameter indicates whether you are conducting parameter
or model
recovery. Parameter recovery is the process of generating data with known parameters for a given model, and then fitting that data with the same model to determine whether the parameters are recoverable. Model recovery is the process of generating data with known parameters for all given models. Each model is then fitted to all generated data (regardless of which model generated it) to test which model best fits data from every model. Ideally, the model that generated the data should be the best fit for the corresponding data.
We will next define some typical parameters to use in these validations. We are also included an optional bounds
parameter below. This parameter should be a nested dictionary where the highest level are model base names (i.e., without optional parameters such as novel
) and the lowest level are parameter bounds to be overwritten. Bounds are used to constrain are parameters during fitting (and during generating data) in that the indicated parameters will not leave the bounds. Below, we are only constraining temperature
for all models, because if this value is too large then the model converges towards responding randomly, and thus all other paramaters are ignored (thus unrecoverable). Any bounds that are not overwritten in this way will be determined by their defaults, which are declared in the RLModel
class (RL/models/rl_models.py
).
from helpers.priors import fixed_priors
models = [
'QLearning+novel',
'ActorCritic+novel',
'Advantage+novel',
]
fixed = fixed_priors(models)
bounds = {
'QLearning': {'temperature': (0.01, 0.20)},
'ActorCritic': {'temperature': (0.01, 0.20)},
'Advantage': {'temperature': (0.01, 0.20)},
}
params = {
'mode': 'validation',
'learning_filename': 'RL/data/tutorial_learning_data.csv',
'transfer_filename': 'RL/data/tutorial_transfer_data.csv',
'models': models,
'random_params': 'random',
'fixed': fixed,
'bounds': bounds,
'multiprocessing': True,
}
pipeline = Pipeline(seed=1251)
Parameter Recovery¶
Let's run the parameter recovery validation first. You'll see below that we directly insert the recovery
parameter into the function, but we could have also included it within the params dictionary above. However, since we will be running the run
method twice (once for parameter
recovery and once for model
recovery), across which all other parameters remain the same, it's just a bit cleaner to run this way.
pipeline.run(recovery='parameter', **params)
Parameter recovery provides us with a few files. Let's first visualize how well parameter recovery worked. We will be plotting a series of scatterplots, with the x axis as the true parameters and the y axis as the fitted parameters. Successful parameter recovery would be observed if all parameters showed a strong positive correlation between the true and fitted parameter values, which will be the case for a lot of our models.
from IPython.display import Image, display
display(Image(filename='RL/modelling/correlations/recovery_correlation_plot.png', width=800, height=600))
caption = (
'Parameter recovery results. Parameter recovery for each parameter within each model. '
'Data were generated for each model using randomly determined parameters (true values) and then fitted by that model (fit values) '
'to assess the model’s ability to recover parameters. Pearson r correlations for each parameter determines the degree to which the parameter was recoverable. '
'These values are presented in Supplementary Table [SUPP PARAM REC]. Grey dashed lines indicate a perfect recovery.'
)
print(caption)
The plot can give us a general idea of recovery success, but we should check the actual correlations for each model to make sure.
param_recovery_corrs = pd.read_csv('RL/modelling/param_recovery_corrs.csv')
param_recovery_corrs
Finally, this method also provides us with a fit_data
file named fit_data_PARAMETER.pkl
. This file is the same as we described above in the fitting procedure, but with the values fitted during the parameter recovery process.
import pickle
with open('RL/modelling/fit_data_PARAMETER.pkl', 'rb') as f:
full_fit = pickle.load(f)
print("Type of loaded object:", type(full_fit))
print("Top-level keys in the loaded fit data (model names):", full_fit.keys())
print("First 20 rows of the best fit data:")
best_fit = full_fit[best_model]
best_fit.head(20)
Model Recovery¶
Once we are happy with parameter recovery, we can move on to model recovery. Model recovery involves every model generating data for all participants and then all models fitting all data generated. This is by far the lengthiest process of this repo. You'll note below that we again are directly providing the recovery='model'
parameter, which again could have simply been added to our params dictionary if we preferred. You will also notice that we are setting generate_data=False
here. Setting this to false means that new data will not be generated. Instead, the data generated when running parameter
recovery will here be reused. We suggest following this procedure so that you only have one set of generated data for both validations. If we generate data here, the old files will be overwritten.
pipeline.run(recovery='model', **params, generate_data=False)
Model recovery does not output too many results. The main result we are interested in is which model best fit each dataset (again the different datasets correspond to data generated by different models). A successful model recovery would demonstrate that each model best fit it's own generated data. We can view this in a confusion matrix below. The BIC values are used here, and they are normalized to range between -100 and 100. So, -100 was the worst fit across all analyses and 100 was the best fit across analyses.
filename = f'RL/modelling/model_recovery.png'
display(Image(filename, width=800, height=800))
caption = (
'Model recovery results. Model recovery for each model using normalized BIC metrics. '
'Data were generated for each model using randomly determined parameters and then fitted by all models to assess each model’s '
'ability to fit the corresponding model’s data. BIC values are are normalized to the range of [0,1], scaled by 200, and then shifted by 100 to center around 0. '
'A BIC of -100 indicates the best fit overall and a value of +100 indicates the worst fit overall. '
'Within each column, one model fit is highlighted (bolded and bordered) indicating the best fitted model to explain the generated data. '
'Successful model recovery is when the model best fits itself (as we see here). The values printed are the normalized BIC values.'
)
print(caption)
Again, a fit_data
file was built during this validation, fit_data_MODEL.fit
, and we can look into that as well. Just like the other fit datasets, it is a dictionary with model names as keys and dataframes as values for those keys. The dataframes are slightly different than before, however, in that they now include a new column called model
. This indicates which model generated the data being fit. You can think of this as the dictionary key being the model fitting the data and the model column in the dataframe as the model that generated the data when being fit.
import pickle
with open('RL/modelling/fit_data_MODEL.pkl', 'rb') as f:
full_fit = pickle.load(f)
print("Type of loaded object:", type(full_fit))
print("Top-level keys in the loaded fit data (model names):", full_fit.keys())
print("First 10 rows of the best fit data:")
best_fit = full_fit[best_model]
best_fit.head(10)
Let's also look at the tail of this dataframe to demonstrate that the model
column includes all of the models.
print(best_fit['model'].unique())
best_fit.tail(10)
Conclusion¶
That brings us to the end of this tutorial. Again, the next suggested step is to migrate the RL/modelling
folder to the analysis repo under AL/modelling
and re-run this repo to add modelling results to the PDF report.