Neuro-computations of Observational Learning vs. Experiential Learning

Mar 1, 2025 · 3 min read

An Ice Cream Decision

ice cream problem
You’re at an ice cream shop with your friend Bob. There are three options: chocolate, vanilla, and strawberry.

  • Bob gets chocolate again - last time he bought the chocolate one.
  • You tried strawberry last time and didn’t like it.
  • You assume Bob’s taste is similar to yours.

So, you choose chocolate—based on your own experience and Bob’s.

We face situations like this daily, where we blend firsthand experience with observed behavior. This study explores the neuro-computational basis of how we integrate observational and experiential learning.

A Social Pearl Hunting Task

To answer the question, I designed a social pearl hunting task, adapted from Charpentier et al., 2024 .

social pearl hunting task

Participants gather pearls using two information sources:

  1. Observational learning (OL): Watch a partner choose a coral, then see the resulting shell. The partner knows which shell is better but can only choose corals.
  2. Experiential learning (EL): Choose a shell and receive direct feedback (pearl or not).

Disentangling Observational and Experiential Strategies

OL strategy
OL consistent choice: the partner chooses blue-orange coral,, which generates a blue shell. next they repeat their choice, indicating their preference for a blue shell
In an OL-consistent choice, one infers the preference of the partner by observing two consecutive trials - whether the partner repeat the coral choice (they want the current shell), or switch the coral choice (they want the other shell).

EL strategy
EL consistent chioce: yourself choose blue and do not see a pearl, the next trial you should switch to yellow
In an EL-consistent choice, one repeat the past choice if that choice yielded a pearl, and switch otherwise.

Reliability-based Arbitration

Computational framework
Reliability-based arbitration model
We propose a reliability-based arbitration model to explain the decisions:

  • Two systems compute shell values using prediction errors:
    • State Prediction Error (SPE) for OL: mismatch between predicted and actual shell from the partner’s coral.
    • Reward Prediction Error (RPE) for EL: mismatch between expected and actual reward.
  • Values computed from the two systems are flexibly combined based on the reliability - a function of the prediction errors.

Neural Signatures of Decision Systems

Where does the brain compute these values?

Using model-based fMRI analysis, we found the OL-based decision signal in the dorsal medial prefrontal cortex (dmPFC), temporal-parietal junction (TPJ), and superior temporal gyrus (STG). EL-based decision value in the ventromedial prefrontal cortex (vmPFC).

OL and EL value

Integrated value
We also saw the integrated decision value in both vmPFC and STG.

Conference Talk

On Apr 2024, I presented the above results at the Social Affective Neuroscience Society (SANS) annual meeting as a Data Blitz titled Neuro-computational Mechanism Of Reliability-based Arbitration Between Observational And Experiential Learning.

📥 Download PDF