Emotional Authenticity Detection (Video) -; Project Index

Overview

The premise: emotional honesty leaks. When the verbal channel and the non-verbal channels disagree -; a smile that doesn't reach the eyes, a steady sentence over a stressed voice -; that incongruence is measurable. This project analyses video across face, voice, and body to find those mismatches and surface them.

The framing matters and is baked into the design: it's aimed at self-analysis and personal development. The interesting use is watching your own recordings back and noticing where you weren't as composed (or as fine) as you thought, not auditing anyone else.

Background

This came out of the same cluster of interests as the EEG and psychological-profiling work -; trying to make internal states legible -; and shares a sibling in an escape-room emotion-detection experiment, where the same machinery was pointed at reading visitors' reactions. Here the lens is turned inward instead.

How It Works

It's a GPU-heavy multimodal pipeline (tuned for a 24 GB RTX 3090) that processes each modality separately and then hunts for disagreement between them:

Face processor -; facial-expression and micro-expression analysis, sampling at up to 200 fps for the brief, involuntary flickers that ordinary 30 fps capture misses.
Voice processor -; voice-stress and prosody analysis on the audio track.
Multimodal integrator -; the heart of it: aligns the channels, measures incongruence against detection thresholds, and flags where verbal and non-verbal signals diverge.
Timeline viewer + training pipeline -; a viewer to scrub the analysis back over the video, and a pipeline for training the underlying models.

Models are PyTorch, processing runs in batched sequences (up to 150 frames), and the system falls back to CPU if no CUDA device is present -; slowly, with a warning.

Where It Landed

Archived as a working multimodal prototype with a viewer. The architecture is complete -; face, voice, integration, timeline, training -; and it produces real per-frame analysis output. It stayed an exploration rather than a polished app.

Three-channel pipeline (face, voice, body) with explicit incongruence detection.
Micro-expression path runs at a much higher frame rate than the main analysis.
Timeline viewer for reviewing results against the source video.