AL Analysis Tutorial¶
Welcome to the avoidance learning analysis repo. This repo was built for the PEAC lab to analyse behavioural data from the various avoidance learning tasks. This repo loads the data, conducts statistics on the data, plots the data, and generates a report as a PDF file, which presents the main findings of the study.
Note that this tutorial is designed to run in Google Colab and not from the repo itself (since it clones the repo)
Project Pipeline¶
This repo is one part of a project pipeline, which requires the coordination of multiple repos. Projects begin with a task repo, which is used to collect behavioural data from participants either locally or on Prolific. The collected data must then be pushed through a data extraction repo to prepare CSV files for analysis. These CSV files are used in the analysis repo (this repo), which creates a PDF report (AL/reports
), ending the project pipeline.
Optionally, you can run computational reinforcement learning models using the modelling repo, and the results can be added to the report here. This is a bit clunky because it requires a bit of back-and-forth between this repo and the modelling repo. Specifically, this repo must be run (with load_models=False
, see Parameters in documentation) in order to create two CSV files that the modelling repo needs (AL/data/pain_learning_processed.csv
and AL/data/pain_transfer_processed.csv
). These files can then be manually moved into the modelling repo's data directory (RL/data
). The modelling repo can then be used to model the data, which will result in a newly constructed directory called modelling
(RL/modelling
). This folder can then be manually moved to this analysis repo as AL/modelling
. Then you can re-run this repo (with load_models=True
) and the modelling results will be included in the PDF report.
Cloning the Repo¶
We will begin by cloning the repo, installing dependencies, and then adding this repo as a system path. Adding the repo in the system path is only necessary for this tutorial. We also change directory to the repo. When using locally, you can create your script in the AL
source folder, in the same manner as AL_main.py
(avoid_learning_analysis/AL/AL_main.py
).
import sys
import os
# We will now clone the repo, pull any updates, and install dependencies
!git clone https://github.com/petzschnerlab/avoid_learning_analysis.git
%cd avoid_learning_analysis/
!git pull
!pip install .
#Only necessary for Google Colab
sys.path.insert(0, os.path.abspath("/content/avoid_learning_analysis/AL"))
The Pipeline¶
Next, we will import the Pipeline class. This class is the entry point to this repo. It will take in all of your parameters and run the corresponding analyses.
from helpers.pipeline import Pipeline
The Help Function¶
The pipeline has a help function that will outline some information about the repo and then describe all of the parameters. These details are also available in the documentation. We will use the help=True
parameters in order to see this help function below.
This parameter can be passed to the Pipeline during initiatialization:
pipeline = Pipeline(help=True)
or to the pipeline run method of the class:
pipeline = Pipeline()
pipeline.run(help=True)
The help information gets truncated in Jupyter notebooks, but you can view the whole output by clicking scrollable element
.
pipeline = Pipeline(help=True)
Running the Pipeline¶
Running the pipeline requires inputting parameters to the run method. For this package, there two required parameters, file_path
and file_name
.
file_path
: Path to the data file(s) to be loaded. From this path, you can load several different files using the file_name parameter.
file_name
: Name of the file(s) to be loaded. These filenames should be relative to the file_path parameter. You can load multiple files by providing a list of file names or a single file name as a string. You can add further path information here if your data splits at the point of file_path. For example, file_path = "path/to/data" and file_name = ["subfolder1/data1.csv", "subfolder2/data2.csv"] will load two files from different subfolders.
We will define a typical set of parameters for this package below, see the help information above to understand what each parameters does.
Processing the data will take a bit of time, so please be patient.
%%capture
params = {
'author': 'Chad C. Williams',
'file_path': os.path.join('AL','data'),
'file_name': 'tutorial_data.csv',
'accuracy_exclusion_threshold': 70, #Exclusion threshold for accuracy
'RT_low_threshold': 200, #Lower exclusion threshold for RT
'RT_high_threshold': 5000, #Upper exclusion threshold for RT
'load_stats': True, #Run stats on data
'load_posthocs': True, #Run posthoc tests on data
'hide_posthocs': True, #Hide posthoc results from the report
}
pipeline = Pipeline()
pipeline.run(**params)
The Report¶
The report is saved as a PDF and displays the main findings of the analyses. You can find this PDF under AL/reports/PEAC_report_pain.pdf
(this is the default name, but it can be changed with the print_filename
parameter). We will also display it below, but it's best to view this report directly, so navigate to the report and see your findings!
Keep in mind that the tutorial data only contains five participants per group, so our plots and statistics will not look to great in this example.
import base64
from IPython.display import HTML
pdf_path = "AL/reports/PEAC_report_pain.pdf"
with open(pdf_path, "rb") as f:
pdf_bytes = f.read()
encoded = base64.b64encode(pdf_bytes).decode("utf-8")
pdf_display = f'<embed src="data:application/pdf;base64,{encoded}" width="700" height="900" type="application/pdf">'
HTML(pdf_display)
Although the report gives you a general overview of all findings, you may want to look at the files used to build it more directly. Let's begin by observing the participant pain scores across the groups.
from IPython.display import Image, display, Markdown
display(Image(filename='AL/plots/pain/demo-clinical-scores.png'))
caption = (
'Pain metrics for each group. Boxplots show the mean and 95% confidence intervals of the corresponding metric for each group. '
'Half-violin plots show the distribution of the scores of the corresponding metric for each group. '
'Scatter points show the scores of the corresponding metric for each participant within each group.'
)
display(Markdown(caption))
Next, we can view the behavioural data for both the learning and transfer phases across our groups.
display(Image(filename='AL/plots/pain/empirical-performance.png', width=800, height=600))
caption = (
'Empirical findings of learning accuracy and transfer choice rates. '
'a. Learning Phase: Behavioral performance across binned learning trials for the reward and punishment contexts for each group. Shaded regions represent 95% confidence intervals. '
'b. Transfer Phase: Choice rates for each stimulus type during transfer trials for each group. '
'Choice rate is computed as the percentage of times a stimulus type was chosen, given the number of times it was presented. '
'Bar plots show the mean and 95% confidence intervals of the choice rate for each stimulus type across participants within each group. '
'Abbreviations: HR – high reward rate (75% reward), LR – low reward rate (25% reward), LP – low punishment rate (25% loss), HP – high punishment rate (75% loss), N - novel stimulus.'
)
display(Markdown(caption))
Computational Modelling of Empirical Data (Optional)¶
Now that we are done data analysis, you might want to proceed to computationally modelling the data using our modelling repo. This repo requires two files that the run function built for us, specifically AL/data/pain_learning_processed.csv
and AL/data/pain_transfer_processed.csv
. This tutorial will end here, but if you want to continue with computational modelling, go to the tutorial in the modelling repo and it will contain these data for you to continue with.