Overview
A privacy-first computer-vision system that watches a classroom the way an attentive teacher does -; not to identify anyone, but to read the room. It scores engagement from visual attention, posture, movement, and interaction, then flags the moments worth noticing: a confusion cascade rippling across the back rows, a slow slide into end-of-period fatigue, a stretch where nobody's looking at the board.
The whole thing is meant to run on the edge -; a Raspberry Pi 5 with a Hailo-8L accelerator bolted on -; so frames never leave the room and no faces are ever stored.
Background
The premise: a teacher running a full room can't watch twenty faces at once, and by the time disengagement is obvious it's usually too late to course-correct. The question was whether cheap edge hardware could surface those signals in real time without turning the classroom into a surveillance device.
Privacy was the hard constraint, not an afterthought. Tracking is anonymous -; students get throwaway IDs for the session, frames are deleted as they're processed in production mode, and there's no facial recognition anywhere in the pipeline. The same engagement-detection core was later sketched against a museum-visitor application, which is where a lot of the “read a crowd, don't identify the crowd” thinking got tested.
How It Works
Python, OpenCV, and MediaPipe handle face detection, 33-keypoint pose estimation, gaze classification (screen / board / notes / distracted), and gesture recognition (hand raised, note-taking, nodding). A hot-swappable inference backend lets it run on CPU for development and on the Hailo NPU in production -; same code, different engine. It's built to track up to ~20 students simultaneously.
Engagement is a weighted multi-factor score rather than a single magic number: visual attention, postural engagement, movement, and interaction each contribute, updated continuously across multiple time scales (10s / 1m / 10m / whole session). On top of the raw score sit the more interesting modules -; micro-expression detection, confusion cascade detection, and a four-stage fatigue progression model.
A Flask + SocketIO dashboard pushes the live picture to a browser -; video stream, Plotly charts, CSV export -; so the room's state is legible at a glance instead of buried in logs.
Current Status
Archived as a Summer 2025 build. It got far enough to have a working analysis engine, a real-time dashboard, and a stack of supporting modules (temporal pattern recognition, gaze/confusion detection, calibration, social-dynamics analysis) plus an executive summary and full feature spec -; further than most experiments get.
- Working multi-factor engagement scoring with CPU and Hailo backends.
- Flask/SocketIO dashboard with live video, Plotly charts, and CSV export.
- Never crossed into a real classroom deployment -; shelved at the prototype stage.