GenEHR decoder knowledge graphs — retro500K cohorts

Decoder-conditional association graphs from the GenEHR electronic health record foundation model (LLaMA-large, mixed-radix time encoding, age + demographic conditioning), over the retro500K MGH cohort. Each model and cohort comes in two backbones — mass-corrected (edges chosen to surface cross-modality links) and conditional (edges chosen by raw next-code probability). Inside every graph, use rank by to re-score the shown edges and the top selector to jump between all views.

R2 (visit ordered)

LLaMA-large HMTE decoder; within-visit events kept in observed order (standard cross-entropy).

Mass-corrected, modality-balanced backbone each source keeps its top targets in every modality; no connected code lacks cross-modality links (~75% cross-modal)
Conditional backbone raw p(next|source); frequent within-modality successors, 15–24% cross-modal

R4 (visit-invariant)

Adds within-visit event-order invariance loss + within-visit shuffling; highest cross-modality signal.

Mass-corrected, modality-balanced backbone each source keeps its top targets in every modality; no connected code lacks cross-modality links (~75% cross-modal)
Conditional backbone raw p(next|source); frequent within-modality successors, 15–24% cross-modal
Two backbones decide which edges a graph keeps:
  Mass-corrected, modality-balanced — ranks targets by mass-corrected PMI (offsets each modality's baseline mass) and keeps each source's top targets separately within every modality. So a code connects to its strongest diagnoses, medications, labs and procedures, and no connected code is left without cross-modality links.
  Conditional — keeps each source's top targets by raw p(next|source), so edges follow the model's unadjusted next-code distribution and skew within-modality.

Edge scores (the rank by control re-scores the shown edges):
  Conditional = p(next|source)
  PMI = conditional PMI = log p(t|s) / p(t)
  Mass-corrected PMI = log p(t|s) / p̃(t)

Tip: type a code or name (e.g. I48, metformin, warfarin) and press Enter to focus its nearest neighbours; the focus panel keeps balance modalities on so cross-modality partners surface.