TopCause does root cause analysis

v1.72.0 TopCause is an algorithm that answers the question What’s the single biggest change I can make to improve my outcome?.

TopCause takes two inputs:

Say a Rugby team is recruiting for heavy people, and have weight data like this.

male age height weight
1 90.0 151.7 47.8
0 90.0 139.7 36.4
0 90.0 136.5 31.8
1 20.0 156.8 53.0
0 10.0 145.4 41.2

See the data

If they want to know What’s the single biggest driver of weight?, TopCause can answer that. Here is topcausecalc.py which has a FunctionHandler that returns the drivers.

import gramex.ml
import gramex.cache
from gramex.transforms import handler

@handler
def drivers():
    data = gramex.cache.open('weight.csv')
    model = gramex.ml.TopCause()
    model.fit(data, data['weight'])
    return model.result_

To set this up, use this gramex.yaml:

url:
  topcause-drivers:
    pattern: drivers
    handler: FunctionHandler
    kwargs:
      function: topcausecalc.drivers

See the drivers

TopCause results

The result in model.result_ is a DataFrame. For every column (feature) in X, there is a row in the result that shows the impact of that feature.

Here is a sample row

value gain p type
height 164.5 12.7 8.4e-13 num

The columns show:

The above example returns:

value gain p type
weight 55.0 16.9 1.8e-267 num
height 164.5 12.7 8.4e-13 num
male NaN NaN 0.057 num
age NaN NaN 0.453 num

This example says that:

  1. weight has the biggest impact on weight (obviously) – let’s ignore this
  2. height has the second biggest impact on weight. Specifically:
    • value: Picking people with the (high) height of 164.5 cm
    • gain: This can increase average weight by 12.7 kg.
    • p: The probability of error is small (8E-13), i.e. height definitely impacts weight
    • type. This column was treated as a number
  3. male does not impact weight with enough confidence. There’s a 5.7% chance it doesn’t. (The default cutoff is 5%)
  4. age does not impact weight with enough confidence. There’s a 45.3% chance it doesn’t.

Summary: Recruiting tall people (~164cm) can increase team weight by ~12.7kg

TopCause configuration

The constructor gramex.ml.TopCause() accepts these parameters: