Representing All

Medical Knowledge

Bijan Parsia


Yes, all.

All in breath.

All in depth.

From and for all contexts.

Why settle for a small problem you can't solve
when you can tackle a big problem you can't solve?

My Goal

I want to build clinical information systems that are up to date with our best clinical knowledge, easy to maintain, and help people more than they hurt.

By people I mean anyone

(clinical, non-clinical, expert, lay person, or novice)

An Experience

Atrial fib stroke

A "Simple" Problem

Should a patient with chronic non-valvular atrial fibrillation be on prophylactic anti-coagulation therapy?

  • Tons of research
  • Strong consensus in guidelines
  • Practice is not aligned

How big a problem?

No more than 50-60% of patients affected by atrial fibrillation (AF) receive anticoagulation. In the setting of AF, VKAs are safe and effective when properly managed, reducing stroke and systemic embolism by more than 60%.

— Molteni and Cimminiello, Warfarin and atrial fibrillation: from ideal to real the warfarin affaire, 2014

How big a problem?

It is not clear how adoption of the CHA2DS2-VASc score would change actual clinical practice. Multiple studies have demonstrated that physicians do not adhere well to the current anticoagulation guidelines, with many low-risk patients receiving oral anticoagulation and many high-risk patients receiving neither oral anticoagulation nor aspirin.

— Mason et al., Impact of the CHA2DS2-VASc Score on Anticoagulation Recommendations for Atrial Fibrillation, 2012

Why do we mis-anticoagulate?

  • The decision seems simple!
  • However,
    • Wafarin is tricky
    • All anticoag have risks
    • Some risks are vague
      • Falls!?
    • Balancing risk and benefit is v. hard
    • Patient and clinician values misaligned

Solution: Risk tools + increasingly shrill guidance

(Conceptual) Tools

Risk assessment rules are (we hope) evidence based,
easy to use instruments for providing a
reasonably objective indicator of a patient's risks.

  • For AFib, we (currently) have
    • CHA2DS2VASc (for stroke risk) and
    • HAS-BLED (for bleeding risk)

Tool Characteristics (Uncomputerised)

  • For ease of use:
    • Score based system
      • (e.g., 0-9 based on counting risk factors)
    • Relatively few risk factors
      • (mnemonicable! CHA2DS2-VASc ≈ 8)
    • Attempt to find clear strata
      • CV = 0 = Very low risk
      • Aim to support clear guidance!
  • Evidence based
    • Many studies and parameterisations


  • A typical example (Lane and Yip, 2012):
    • CV = 0: recommend no antithrombotic therapy.
    • CV = 1: recommend antithrombotic therapy with oral anticoagulation or antiplatelet therapy but preferably oral anticoagulation.
    • CV ≥ 2: recommend oral anticoagulation.
    • HB ≥ 3 "indicates" that
      • caution is warranted when prescribing oral anticoagulation and
      • regular review is recommended.

Not clear in practice!

Thought: Kill scores

  • Scores are there for ease of use
    • Bedside, "in the head" computation
      • And yet, calculators (paper and electronic) abound
  • Scores are strange
    • Too granular
      • "Zero, one, two, many"
      • "Zero, one, many"
      • "Zero, many"
    • Too coarse!
      • CV = 1 means many things

Too coarse!

Olesen et al BMJ 2011

Too coarse!

Olesen et al BMJ 2011

Too coarse!

Olesen et al BMJ 2011

Solution: COMPUTERS!!!

The answer is always MOER COMPUTERS

A "Simple" Problem

You may ask yourself...How did I get here?!

Getting to baseline

Getting to baseline

We derived a theoretical annual risk of stroke without treatment...from a large cohort (n = 73 538) of ‘real world’ patients in the Danish National Patient Registry who have non-valvular AF and were not treated with warfarin....The rates in the Danish non-OAC cohort were adjusted to account for antiplatelet use within each group...

LaHaye et al 2012 use same data as Olesen et al 2011!

Different analysis :(

Two Derivations!

When the days go by...

Another problem

  • The bleeding rates...suck
    • Maybe the Danish drink a lot, get into bar fights, and cut each other up...I don't know and I don't want to know!
  • LaHaye et al introduce a bleeding ratio
    • Up to 5fold reduction
      • Based on "patient discounting of bleed risk"
      • Say what?!
  • Without ratio, one doctor said:
    • "On this basis, I'll never prescribe anticoagulation"

Water flowing underground...

Bleeding ratio

Another Parameterisation

Everything but the names might be different!

What changed?

The names might be different

Can we combine?

Once in a lifetime...

The treatments!

Once in a lifetime...

Where did this come from?

My god..what have I done...

I have a citation...

...and some persistence.

Do you see how to get to 2.3?

Tip of the iceberg

  • You don't want to know the rest!
    • Crazy numbers
    • Lots of "clinical judgement"
    • Sanity checks are few
      • And...often unhelpful
  • This should be simple!

All I wanted to do was to make a little tool!

Main Personal Outcome of This Exercise

I am terrified of doctors.

I'm less depressed about empirical computer science.


Risk game map fixed

Thinking about details

  • "Easy" Details
    • For the baseline
      • Gimme the (anonymized) patient registries!
      • Make the statistical model execution reproducible
        • Jupyter, RMarkdown, whatever!
    • For the relative risks
      • Micropublish!
      • Cite better!
      • Live tables
        • Executable, all transformations available
          • Don't make me deal with odds and hazards!

Thinking about details

We want to apply parameterisations to new things!

The mess

The mess is in the literature, now

We need a new literature

A Slogan

The foundation of a new literature is

  1. data disintermediation and
  2. analytics aggregation

A Slogan Explained

I want other people to

  1. get out of my way (data disintermediation) and
  2. help me out (analytics aggregation)

Data Disintermediation

Get out of my way

  • Getting out of the way is hard!
    • No such thing as raw data
    • If it did exist it would be unusable

This is the easy very hard thing

Requires conceptual and informatical and statistical clarity

Analytics Aggregation

Help me out

  • Analyses should stack
    • Or tell me the reason why not!
  • Analyses should be reusuable
    • On new data!
      • And if not, they should tell me why not!
  • Code (and it's ilk) is not enough
    • We need conceptual clarity
    • We need know how
    • We need to share interpretations

Analytics Aggregation?

Help me out

  • A promise of OWL
    • Shared conceptual models
  • A promise of ODBA
    • Use that model with your untouched data

A start toward aggregating understanding of data

Analytics Aggregation — A hint

Inference services!

Getting to all

Close up view of Great Buddha Hall Daibutsu in Tōdai-ji temple complex. Nara, Nara Prefecture, Kansai Region, Japan

Daibutsu in Tōdai-ji

A joke

What did the Buddist say to the hot dog vendor?

Make me one with everything. dogs with mustard relish and cheese sauce (4690675763)

Another joke

What did the Semanticwebbist say to the data vendor?

Make me one with anything.

Catsup (1700978402)

notes: Anyone can say anything about anything

This should be a positive, not a negative, right!

Not a joke

Anyone can say anything about anything

This should be a positive, not a negative, right!

Four kinds of knowledge

What does success look like?

An extension to the OWL infrastructure that allows us to

  • encode a conceptually and analytically complete description of two studies
    • including the data
  • align the conceptual structure of those ontologies
  • infer a meta-analysis

This bridges the conceptual, evidential, and informatical!

What does success look like?

We can generate post-graduate MCQ based exam from:

  1. A set of patient records for a particular clinician
  2. Data about disease prevalence
  3. Models of disease progression
  4. Knowledge about diagnosis and treatment options

Diverse information artefacts are key!

Not a solved problem

  • I see (some of the) pieces
    • OWL is ok for conceptual knowledge
    • ODBA gets us closer to data
    • Lots of cool ML stuff with OWL (concept drift!)
  • I see missing pieces
    • Integrating probability and OWL tricky
    • Qualitative evidence
    • People don't do what they can now (e.g., R)
    • Procedural is challenging

People keep neglecting the conceptual

The challenge isn't small

Come fail with me...gloriously!

Needed: Formalisms, tools, techniques, attempts...

The best way to predict the future is to invent it.
Alan Kay


A lot of this came out of my sabbatical time when I was hosted by Siemens HS (now Cerner HS).

In particular, Jodi Wachs, Shipeng Yu, Marc Overhage, James Walker and Balaji Krishnapuram and I, in various combinations, spent person-too-longs banging our heads against that-which-should-be-easier-than-this-madness.

Everything kooky or logicy can be attributed to me!