Perfect Video BlueprintTRIBE v2

Based on fMRI brain encoding research

Grounded in TRIBE v2 — Meta AI's fMRI brain encoding model

What does a 10/10 video
look like to your brain?

TRIBE v2 predicts how every cortical vertex in your brain responds to video — frame by frame. This blueprint is what maximises those responses across all five brain networks simultaneously.

Target Signal Scores for a 10/10 Video

Attention≥ 70%

Driven by the frontoparietal network. Spikes when the viewer sees a face, hears their name/a question, encounters a cut, or sees unexpected motion. Needs a reset stimulus every 20–30 s or it decays exponentially.

Emotion≥ 65%

Driven by the limbic system (amygdala, anterior cingulate). Peaks during close-up faces showing genuine emotion, music swells, relatable social situations, and surprising reveals. Music alone can sustain a 15–20 % floor.

Visual≥ 60%

Driven by V1–V4 and MT+ (motion area). High-contrast edges, fast cuts, on-screen text, and motion in the periphery all activate visual cortex. Static talking head with no overlays will sit around 20–30 %.

Cognitive Load30–50%

Driven by the prefrontal cortex. Too low = boring, under-stimulated. Too high = confused, disengaged. The sweet spot is moderate challenge — a concept introduced, immediately explained, then linked to something the viewer already knows.

The Perfect 60-Second Video — Second by Second

0–1 sPattern DisruptAttention ↑↑ Visual ↑

Show a face in close-up, start mid-sentence, or show the end result first. Never start with a logo, title card, or silence.

🧠MrBeast always shows the most extreme moment in the first frame. Your brain instantly needs context — so it keeps watching.

1–3 sHook StatementAttention ↑ Emotion ↑ Cog. Load moderate

One bold, specific claim or question. 'I quit my job and made $1M in 90 days' or 'Why your hook is costing you 80 % of viewers.' Spoken and shown as text simultaneously.

🧠Text on screen doubles retention during this window — visual + language networks fire together.

3–8 sPromise & StakesEmotion ↑ Attention steady

Tell them exactly what they will get and why it matters to them personally. Make the cost of NOT watching explicit.

🧠The default mode network (narrative/social) activates when you speak about the viewer's own life. Use 'you' not 'people'.

8–20 sFirst Value BeatVisual ↑ Cog. Load increases (good)

Deliver something immediately useful. A stat, a reveal, a demo. Cut away from talking head to a visual — B-roll, screen recording, or graphic.

🧠Cut to a close-up of whatever you're talking about. Visual cortex responds strongly to object-focused shots vs wide static shots.

20–25 sPattern Interrupt #1Attention reset ↑↑

Change something: zoom in sharply, cut to a different angle, add a sound effect, show a reaction. Anything unexpected.

🧠Frontoparietal attention decays on a ~20 s cycle. A single unexpected stimulus fully resets the curve. Miss this window and you lose 15–25 % of viewers.

25–45 sCore ContentAll signals moderate-high

The main argument, demo, or story. Break it into 3 short chunks of no more than 7 seconds each. Cut between chunks. Add music under voice.

🧠Underlying music (no lyrics, 120–140 BPM for energetic content) adds a consistent +10–15 pts to emotion throughout this window.

45–50 sEmotional PeakEmotion ↑↑ Attention ↑↑

The payoff, the reveal, the transformation, or the most relatable moment. Show a real face reacting. Use music swell if possible.

🧠Amygdala response peaks at genuine human faces showing positive or surprised emotions. Staged reactions score 30–40 % lower than authentic ones.

50–60 sCTA + LoopCog. Load low (easy ask) Emotion steady

One clear instruction. Then seed the next video with a cliffhanger — something unresolved that triggers the narrative gap in the default mode network.

🧠Loop-worthy endings (ending with the same frame you started with) double average session watch time according to A/B data from creators using TRIBE-graded content.

Which Brain Networks Drive Each Signal

Frontoparietal / Attention

Novel stimuliUnexpected cutsFaces appearingDirect questionsYour name spoken

Default Mode / Narrative

StorytellingSecond-person language ('you')Unresolved questionsSocial situationsRelatable failures

Visual Cortex (V1–MT+)

Motion in frameHigh contrastClose-up product shotsOn-screen textFast cut rhythm

Limbic / Emotion

Genuine facial emotionMusic (any genre)Physical stakesSurprise revealsLoss/gain framing

Language (Broca / Wernicke)

Clear articulationShort sentencesAnalogies to known conceptsRhetorical questionsRepetition of key terms

Signal Profiles by Content Format

These are signal tendencies based on how each format activates specific brain networks — not measurements of any specific video. Upload your own video to get real scores.

🎙

Static Talking Head

AttentionLow

EmotionVariable

VisualLow

Cognitive LoadVariable

Why it works

+Language network active
+High cognitive engagement if delivery is strong
+Feels intimate and trustworthy

Natural weaknesses

−Visual cortex habituates within 8–10 s of no change
−Emotion requires vocal variation — monotone kills limbic response
−Attention decays rapidly without pattern interrupts

→Fix: Cut to a close-up of anything you're describing every 6–8 s. Add text overlay on key statements. Even a zoom-in resets visual cortex.

🎬

Talking Head + B-roll

AttentionMid

EmotionMid

VisualHigh

Cognitive LoadMid

Why it works

+Visual cortex gets novelty on every B-roll cut
+Cognitive load balanced by showing while explaining
+Attention sustained by dual-channel stimulation

Natural weaknesses

−B-roll that doesn't match the audio creates cognitive dissonance — load spikes
−Over-cutting loses emotional continuity of the speaker's face

→Fix: Return to the speaker's face for emotional beats. Cut to B-roll only when explaining a concrete object, place, or action.

✨

Animation / Motion Graphics

AttentionMid

EmotionVariable

VisualHigh

Cognitive LoadMid

Why it works

+Visual cortex continuously stimulated — every frame changes
+Cognitive load manageable when animation paces the reveal
+No talking head means no emotion floor — music carries it

Natural weaknesses

−Emotion depends almost entirely on music and voice tone
−Attention still decays if animation is repetitive — needs visual novelty, not just motion

→Fix: Use music that changes tonality at key story beats. Voice pace should slow down when animation reveals a complex diagram.

🖥

Screen Recording / Tutorial

AttentionVariable

EmotionLow

VisualLow

Cognitive LoadHigh

Why it works

+Cognitive load intentionally elevated — learning state is engaged
+Goal-directed content keeps attention if the payoff is clear

Natural weaknesses

−Lowest visual score of any format — screen content has low spatial frequency vs natural video
−Emotion floor is near zero without a face or voice variation
−Attention drops sharply if the viewer loses the thread of what you're doing

→Fix: Use picture-in-picture face cam for emotional anchoring. Zoom into the exact area of screen you're working on. Say what you're about to do before you do it.

🎥

Documentary / Cinematic

AttentionVariable

EmotionHigh

VisualHigh

Cognitive LoadLow

Why it works

+High visual from natural scene diversity
+Music-driven emotion can be sustained throughout
+Narrative structure activates default mode network strongly

Natural weaknesses

−No face = no fusiform activation = attention needs to come from narrative tension
−Pacing too slow drops attention below recovery threshold

→Fix: Cut to a close human face every 20–30 s even briefly. Narrative stakes must be established in the first 5 s or attention falls below the recovery threshold.

Do's and Don'ts

What activates the brain

✓

Start with a face in close-upFusiform face area activates instantly. No other stimulus gets attention this fast.

✓

Add text overlay on key statementsVisual + language networks fire simultaneously — doubles neural encoding strength.

✓

Use music under voice (no lyrics)+10–15 pts to emotion score. Limbic system responds to music even when attention is on speech.

✓

Cut every 4–7 secondsEach cut resets visual cortex novelty response. Static shots see linear attention decay.

✓

Ask a direct question every 30 sQuestions trigger the default mode network's narrative gap mechanism — brain won't let go.

✓

Show the result before the processGoal-oriented framing keeps attention elevated through the explanation.

✓

Use the word 'you' not 'people'Second-person activates self-referential processing — the most attentive brain state.

✓

End on an open loopUnresolved narrative gaps in the DMN increase session-level watch time.

What kills engagement

✗

Open with a logo or title cardZero face, zero motion, zero stakes. Attention is at its most fragile in second 0–2.

✗

Hold on a static talking head > 10 secondsVisual cortex habituates. After 10 s of no change, visual score drops 20–30 pts.

✗

List more than 3 items in a rowWorking memory overloads at item 4+. Cognitive load spikes into the red zone.

✗

Speak in a monotoneAcoustic variation drives emotion. Flat prosody = flat limbic response = emotion floor ~25 %.

✗

Use jargon without explaining itUnexpected complexity causes disengagement drops, not confusion — viewers give up rather than struggle.

✗

Show a wall of textReading competes with listening for language network bandwidth. Both suffer.

✗

Put your CTA before the valueAsking before delivering triggers avoidance. Emotion drops, attention drops.

✗

Fade to black for transitionsBlack frames are literal signal voids — every signal flatlines during them.

What this page is based on — and what it isn'tTRIBE v2 (Meta AI, 2024) is a multimodal brain encoding model trained on fMRI recordings of participants watching naturalistic video. It predicts per-vertex cortical activation from audio + video + text features simultaneously.The signal targets, timeline structure, brain network triggers, and format profiles on this page are derived from the mechanisms of how TRIBE's constituent networks work — frontoparietal attention dynamics, limbic response curves, visual cortex habituation, and DMN narrative activation. They are not scores measured from any specific video.No third-party video was downloaded, run through the analyzer, or scored to produce this page. Upload your own content to get real, measured predictions.

What does a 10/10 videolook like to your brain?

Static Talking Head

Talking Head + B-roll

Animation / Motion Graphics

Screen Recording / Tutorial

Documentary / Cinematic

What does a 10/10 video
look like to your brain?