Decision Tree Lab

Decision Trees Pathology Lab

A Decision Tree is a clinical flowchart: it asks a series of yes/no questions about a patient's tumour to reach a diagnosis.

Set Tumour Size and Cell Uniformity and watch the patient's path glow through the tree.

Entropy & Information Gain

At each split, the tree asks: "Which question reduces diagnostic uncertainty the most?"

Entropy measures impurity — how mixed malignant/benign cases are at a node:

H = −Σ pᵢ log₂(pᵢ) H = 0 → pure (all one class) H = 1 → maximally mixed (50/50)

Information Gain is the entropy drop after a split:

IG = H(parent) − Σ (|child|/|parent|)·H(child) The tree always picks the split with the highest IG — the most informative question.

Gini Impurity

Gini is an alternative to Entropy (used by scikit-learn's CART by default). It measures the probability of misclassifying a randomly chosen sample:

Gini = 1 − Σ pᵢ² Gini = 0 → pure node Gini = 0.5 → worst case (binary) Gain = Gini(parent) − Σ weighted Gini(child)

Both entropy and Gini are shown live at each node in the tree below.

⚖️ Ethical Reality

Overfitting Rare Cases: A very deep tree memorises the training set, including noise. It may learn that one elderly patient with high BMI is always malignant — a rule that doesn't generalise to new patients.

① Patient Input

Tumour Size

centimetres (cm)

Cell Uniformity

score 1 – 10

② Tree Parameters

Max Depth3

maximum levels of questions

Min Samples to Split5

minimum patients required for a split

Tree Summary

                    Nodes: —

                    Leaves: —

                    Root IG: —

③ Prediction Output

Diagnostic Verdict

MALIGNANT

Confidence: —%

—

Path Depth

—

Leaf Entropy

—

Leaf Gini

—

Leaf Samples

④ Decision Tree — Active patient path highlighted · internal nodes show split rule + Gini/Entropy · leaves show class + confidence