Student Absence Anomaly Detection -; Project Index

Overview

Most attendance flags are blunt: total days missed, hit a threshold, done. This looks at the shape of an absence instead. A student who misses scattered single periods clustered around mid-day looks very different from one who's out with the flu, and the difference is measurable. The tool scores each student-day for how anomalous the pattern is, classifies what kind of pattern it is, and ranks risk.

Crucially it's all classical statistics -; it runs on a standard machine with no ML module, no GPU, no model training.

Background

The motivating observation: tests happen mid-day, and students who want to dodge them learn to be “absent” for exactly the periods that matter while staying under whatever total-absence threshold triggers a consequence. A pure day-count never catches that. You need to weight the middle of the day, notice non-consecutive gaps, and watch for coordination across peers.

It grew up alongside the attendance-data prototyping work -; the statistical functions that distinguish consecutive from scattered absences came out of that experimentation, then got hardened into this detector.

How It Works

Python with NumPy, Pandas, SciPy, and statsmodels do the heavy lifting. The core is a period-weighted anomaly score with a bell-curve emphasis on middle periods (where tests usually land), combined with non-consecutivity detection and spread analysis. Each student-day comes back with an anomaly score (0-;1), a pattern description, the contributing factors, and a risk level.

# attendance is a per-period present/absent vector
detector = AbsenceAnomalyDetector(num_periods=8, anomaly_threshold=0.7)
result = detector.analyze_single_day(
    attendance=[0,1,0,1,0,1,0,0],
    student_id='S001', school_id='SCH001', date='2024-01-15')
# -> anomaly_score, is_anomaly, pattern_description, advanced_patterns

Above the per-day score sit pattern classifiers -; strategic period avoidance, test-dodging clusters, social coordination (peers absent together), end-of-day fatigue, and threshold-gaming. District-level comparison adds ANOVA, Kruskal-Wallis, effect sizes, and regression for predictive modeling. There's a Tkinter GUI on top of the CLI for single-day analysis, batch CSV runs, school comparison, and the visualization suite.

Current Status

Archived as a Summer 2025 build. It reached a genuinely complete state -; CLI and GUI, batch import/export, a user guide, and a stack of mathematical documentation in the work vault -; but it's parked rather than in active operational use.

Period-weighted anomaly scoring with pattern classification and risk levels.
Tkinter GUI plus batch CSV processing and district-level statistical comparison.
Runs ML-free on a standard machine; SQL connectivity sketched for live data.