Overview
This one is research more than code -; a deep survey of how AI systems are being built to store and retrieve autobiographical memory, the messy, time-stamped, first-person kind. The question driving it was whether the memory architectures that large language models use could be bent toward augmenting human recall: predicting which cue brings back which memory, recognising the pattern behind a faded one, stitching photos and text and audio into something that behaves like episodic memory.
The output is a written map of the field as of mid-2025 rather than a running system. It sits next to the EEG and face-recognition explorations as part of a broader poke at how memory and attention actually work, and what a respectful personal tool around them would even look like.
Background
The itch was simple: human memory is associative, temporal, and lossy, and the current crop of memory-augmented models are converging on architectures that mirror exactly those properties -; sensory, short-term, long-term tiers, with decay and consolidation baked in. If a model already organises memory the way a brain roughly does, maybe it can help a brain retrieve.
So this became a literature dig into the 2020-;2025 explosion of work: neuroscience-inspired models, the datasets that train them, the open-source frameworks shipping today, and the sizeable pile of ethical landmines that show up the moment you point any of this at a real person's life.
What The Survey Covers
The notes pull together several threads that keep recurring across the recent research:
- Brain-inspired architectures -; Hierarchical Memory Transformers and Recurrent Memory Transformers that mimic a three-tier sensory/short-term/long-term hierarchy; DeepMind's Compressive Transformer, which borrows from sleep-based memory consolidation to compress older activations rather than drop them.
- Cue-;memory association -; the CRAM (Cue-Recalled Autobiographical Memory) dataset of 14,242 memories across 4,244 subjects, structured by spatial detail, people, and objects -; the kind of data you'd need to predict recall from a cue.
- Temporal modelling -; Temporal Convolutional Networks, BiLSTM-with-attention stacks, and Hierarchical Temporal Memory, all aimed at the time-ordered structure that makes a memory episodic rather than a flat fact.
- Multimodal fusion -; vision-language and audio-language models that extract memory cues from photos and recordings, plus SynapticRAG, which folds time directly into memory vectors in a deliberately synapse-like way.
A recurring sobering note: even strong models stay bad at this. On contamination-free episodic benchmarks, GPT-4-class systems land around 0.6 F1 on multi-event retrieval -; a reminder that "remembering" is still genuinely hard, not solved.
Where It Landed
Archived as a reference document. It did its job: it told me what the state of the art actually is, what's downloadable today, and where the ethical tripwires sit before I'd consider building anything that touched real memories.
- Maps the architectures, datasets, and shipping frameworks (Mem0, Letta/MemGPT, MemoRAG) worth knowing about.
- Flags the genuinely thorny ethics -; consent, false-memory generation, data ownership -; as gating concerns rather than footnotes.
- Never crossed from survey into implementation; that was the right call given everything above.