How this works

Methodology

Every claim is a graded relationship between a subject and an outcome. Every study is logged as an appraisal that supports, contradicts, tests null, or is mixed. The score is the evidence, weighted by how good the evidence is.

The consensus graph: every dot is a claim, entity, or study verdict; every line a relationship. Thousands of appraisals woven into one auditable structure. — Every dot is a claim, an entity (a food, drug, gene, pathway), or a single study's verdict; every line is a relationship. Thousands of appraisals, woven into one auditable structure.

TL;DR — We gather every study we can find on a claim, weight each one by how strong its design is (a pooled review of human trials counts far more than a mouse study or a hunch), and add it up into one honest score: supported, contradicted, or too soon to say. We also show how far up the "evidence ladder" the claim has actually climbed, so a popular idea resting on mouse data can't pass itself off as proven.

How science actually settles things

No single study proves anything. A real finding has to survive being repeated by different people, in different places, with the right comparisons in place. So instead of trusting one headline, we look at the whole pile of studies on a claim and ask three things: how much evidence is there, how good is it, and which way does it point.

Not all evidence is equal. Ideas climb a ladder. At the bottom: a plausible theory, or an effect in cells in a dish. Then animal studies. Then population studies that spot patterns in people. Near the top: randomized human trials, where people are assigned by chance to the treatment or not, which is the cleanest way to tell cause from coincidence. At the very top: a meta-analysis that pools many human trials together. A claim is only as proven as the highest rung it has actually reached, and "consistent in mice" is not "works in people."

We weight studies by where they sit on that ladder, and by their quality (a big, careful trial outranks a small, shaky one). We count the ones that disagree, not just the ones that agree. We flag conflicts of interest, because who paid for a study is context worth knowing. And when a source is retracted, it comes out. The number you see is that whole reckoning, in the open. Below is the precise model.

The model

Typed claim–evidence consensus graph. Each claim is a graded Subject–Predicate–Object triplet; each source is ingested as a (source × claim) appraisal — supports / contradicts / tested-null / mixed.

How the score is computed

score = Σ(grade_weight × quality_mult × sign) / Σ(weight). grade_weight: mechanism/in-vitro 1, animal 2, observational 3, RCT/n-of-1 5, meta-analysis 8. quality: low 0.5 / moderate 1 / high 1.5.

Consensus states

strong-support ≥0.6
leans-support 0.2–0.6
contested ±0.2
leans-against −0.6..−0.2
refuted ≤−0.6
insufficient (<2 independent groups & no meta)

The evidence ceiling

The highest study grade behind a claim (mechanism→meta-analysis). A strong-support with an 'animal' ceiling means 'consistent in mice', NOT proven in humans.

Honesty rules

Null results for directional claims are graded contradicts, not neutral.
Retracted sources are removed via an automated Crossref/Retraction-Watch check.
Single-research-network claims are flagged 'insufficient'.
Industry conflicts of interest are surfaced, not hidden.

Inspiration

The memory/claim-graph design was inspired by recent agent long-term-memory research; the methodology page should contrast this consensus engine (a typed evidence graph) with how the underlying AI agent's own file-based memory works.

Educational only, not medical advice. The grading reflects published evidence, not clinical guidance for any individual.