Statistical Visualization

1. Skewness: Income Distribution Analogy

Watch extreme earners physically enter the population, dragging the Mean away from the Median.

Skew Parameter (α):

Symmetric

A Concrete Example (N=20)

Imagine exactly 20 people earning between ₹10,000 and ₹10,500. Their Mean and Median are virtually identical (around ₹10,250). This builds a Symmetric distribution.

Try it yourself: Use the buttons below to inject a single extreme outlier into this 20-person population and calculate exactly how the Mean and Skewness react, while the Median stays stubborn!

                        Click a button above to generate the example dataset...
                    

Formula (3rd Moment): Skewness = Σ(X - μ)³ / (N · σ³)

                        Mean: 0.00

                        Med:  0.00

                        Skewness: 0.00

2. Kurtosis: Physical Mass Transfer

Watch the "shoulders" collapse as mass pushes into the peak and tails (Red Zones).

Extreme Tail Volatility (%):

Mesokurtic (Normal Tails)

Understanding Kurtosis

Kurtosis measures the "tailedness" of a distribution using the fourth moment. It identifies the presence of outliers.

1. Mesokurtic (Excess Kurtosis ≈ 0): Similar to a normal distribution. Baseline predictable risk.
2. Platykurtic (Excess Kurtosis < 0): Thinner tails and a flatter peak. Extremely dispersed points. Fewer major fluctuations, indicating less risk.
3. Leptokurtic (Excess Kurtosis > 0): Fatter tails and a sharper, taller peak. Clustered data points with a much higher frequency of extreme outlier events (high risk).

Dynamic Visualizer: Watch how physically increasing the tail volatility sweeps mass away from the "shoulders" of the curve and violently pumps it into the peak and the extreme edges (the red danger zones)!

Formula (4th Moment): Kurtosis = Σ(X - μ)⁴ / (N · σ⁴) *Excess Kurtosis = Kurtosis - 3

                        Mean: 0.00

                        SD:   0.00

                        Excess Kurt: 0.00

Calculate from your Dataset

Want to see how your own data holds up? Upload a CSV or TXT file containing numerical data (comma or space separated) to instantly calculate its Skewness and Kurtosis!

Standard Normal (Z-Score)

The ultimate statistical benchmark. We "morph" any complex data into a simplified Mean=0, SD=1 scale for fair comparison.

1. Normal Distribution Morphing

The "Height" Analogy

Comparing heights in cm (μ=170) vs inches (μ=67) is hard. Standardization "morphs" both into Z-scores so you can compare them apples-to-apples.

Manual Steps

Step A: Shift — Subtract the Mean (μ).
Step B: Scale — Divide by SD (σ).

Initial Mean: 50

Initial SD: 15

Progress (%)

                        Target Mean: 0.00
                        Target SD: 1.00
                    

2. Standard Normal (Z-Table)

Once standardized, any value becomes a Z-score. This allows us to calculate probability (area under the curve) using a standard table.

The Z-Score Formula Z = (X - μ) / σ

Where X is raw value, μ is mean, σ is SD.

Show Areas

±1σ ±2σ ±3σ

Z-Score Value: 0

Types of Variables

The foundation of any statistical analysis starts with understanding your data types.

Qualitative (Categorical)

Describes qualities or characteristics. Values are labels or names.

Nominal Naming only. No intrinsic order.

Example: Blood Group (A, B, AB, O), Gender, Eye Color

Ordinal Order exists, but distances between points are unknown.

Example: Pain Scale (Mild, Moderate, Severe), Cancer Stage (I, II, III, IV)

Quantitative (Numerical)

Measured quantities. Values are expressed as numbers.

Discrete Whole numbers only. Counts of items.

Example: Number of patients in a ward, Heart rate (beats/min)

Continuous (Interval/Ratio) Can take any value in a range (decimals).

Example: Height, Weight, Blood Glucose, Blood Pressure

Understanding Scale properties & Absolute Zero

Interval Scale

Characterized by equal intervals between successive values but lacking a true zero point.

Scientific Note: Zero is an arbitrary label. 0°C represents the freezing point of water, not the complete absence of thermal energy. Consequently, ratios are mathematically invalid (e.g., 40°C is not "twice as hot" as 20°C).

Ratio Scale

Possesses all properties of an interval scale with the addition of an Absolute Zero point.

Scientific Note: Zero represents the total absence of the variable. 0 Kelvin represents the theoretical state of zero molecular motion. This allows for proportional comparisons (e.g., 200 mg is exactly twice the mass of 100 mg).

Visualization Library

Main Category	Variable Type	Scientific Plot Types
Qualitative	Nominal / Ordinal
Qualitative	Associations
Quantitative	Discrete / Continuous
	Associations
	Clustering

Levels of Prevention

A framework for disease control and health preservation.

Primary Prevention

Actions taken prior to the onset of disease to remove the possibility that it will ever occur. Target: Healthy Population.

Clinical Example:

Immunization (COVID-19/Polio), use of seatbelts, nutritional supplementation (Folic acid in pregnancy).

Secondary Prevention

Actions which halt the progress of a disease at its incipient stage and prevent complications. Target: Early Disease.

Clinical Example:

Mammography for breast cancer screening, Pap smears, BP screening, early treatment of hypertension.

Tertiary Prevention

Measures available in the late stages to mitigate impact, limit disability, and rehabilitate. Target: Established Disease.

Clinical Example:

Physiotherapy for stroke patients, cardiac rehabilitation after MI, speech therapy after neurological insult.

Quaternary Prevention

Actions taken to identify patients at risk of over-medicalization and protect from unnecessary interventions. Target: At-risk of iatrogenesis.

Clinical Example:

Avoiding unwarranted prostate biopsy in elderly, depumping patients from polypharmacy, ethical end-of-life care.

🔬 Prevention Matrix: Disease Timeline × Intervention Level

Disease Stage ↓	🛡️ Primary	🔍 Secondary	🏥 Tertiary	⚖️ Quaternary
Well / Susceptible	Vaccination Health education	—	—	Avoid over-screening Prevent incidentalomas
Pre-clinical / Latent	Chemoprophylaxis e.g., Statins in high-risk	Screening Mammography, Pap smear	—	Right test, right patient
Clinical Disease	—	Early treatment Reduce severity	Definitive treatment Surgery, chemo, ICU	Avoid polypharmacy
Disability / Sequelae	—	—	Rehabilitation Physio, prosthetics	Ethical end-of-life Palliative care dignity

█ Active intervention zone — Not applicable at this stage Cells color-intensity indicates relevance strength.

Statistical Errors & Power

Understanding the outcome of hypothesis testing: Being right, or being wrong.

Slide the parameters below to see how alpha, sample size, and effect size impact our ability to detect a true difference (Power).

Significance Level (α): 5% Risk of Type I error (False Positive)

Effect Size (Δ): 2.0 Difference between Group Means

Sample Precision (1/σ): 1.0 1/σ = inverse of Standard Error (SE = σ/√n). Higher precision means either a larger sample size (n) or lower variability (σ). Both make the sampling distributions narrower, making it easier to distinguish H₀ from Ha. Increasing this slider simulates collecting more data or using more precise measurements.

Type I Error (α): 0.05

Type II Error (β): 0.20

Power (1-β): 80%

Type I Error (α) — "False Alarm"

Rejecting H₀ when it is actually true. Saying a drug works when it doesn't.

Type II Error (β) — "Missed Discovery"

Failing to reject H₀ when Ha is true. Missing a drug that actually works.

📊 Decision Matrix: Reality vs. Conclusion

	H₀ True (No real effect)	H₀ False (Real effect exists)
Reject H₀	❌ Type I Error (α) False Positive	✅ Correct (Power = 1−β) True Positive
Fail to Reject H₀	✅ Correct (1−α) True Negative	⚠️ Type II Error (β) False Negative

🏥 Clinical Example: Antihypertensive Drug Trial

A pharma company tests Drug X (new) vs Placebo for reducing systolic BP.
H₀: Drug X = Placebo (no difference in SBP reduction).
Ha: Drug X ≠ Placebo (Drug X reduces SBP more).

If Type I Error occurs:

Drug Controlling Authority approves Drug X even though it has no real benefit. Patients take an ineffective drug with possible side effects. Resources wasted.

If Type II Error occurs:

Study concludes Drug X doesn't work, so it's shelved. Patients miss out on an effective treatment that could have saved lives.

🎯 How to Control These Errors

Controlling Type I Error (α)

Lower the significance threshold (e.g., α = 0.01 instead of 0.05)
Apply Bonferroni correction for multiple comparisons
Use pre-registration to prevent p-hacking — p-hacking means analyzing data in many different ways (testing multiple variables, subgroups, or endpoints) until a statistically significant p-value appears by chance. Pre-registration means publicly declaring your hypothesis, primary outcome, sample size, and analysis plan before collecting data (e.g., on ClinicalTrials.gov), so you cannot cherry-pick results after the fact.
Apply False Discovery Rate (FDR) control in genomics

Controlling Type II Error (β)

Increase sample size — the single most effective method
Use a larger effect size threshold (clinically meaningful)
Reduce measurement variability (better instruments, training)
Use one-tailed tests when direction is known a priori

⚡ Key Insight: Type I and Type II errors have an inverse relationship. Making α stricter (reducing false positives) increases β (more false negatives), and vice versa. The only way to reduce both simultaneously is to increase sample size.

1. Define Population

Min Value

Max Value

Population Size

2. Draw Samples

Sample Size (n): 30

Repeated Samples (k)

Confidence Intervals to Show:

90% 95% 99%

Last Sample Values

No samples drawn yet.

Confidence Interval Bounds

Waiting for generation...

Population Distribution

Pop Mean (μ) -

Pop SD (σ) -

Sampling Distribution (Means)

Sample Mean (x̄) -

Sample SD (s) -

Standard Error (Theoretical) σ/√n -

Standard Error (Simulated) SD(x̄) -

Visual History: Confidence Intervals

Statistical Definitions

Population Mean (μ)

The true average value of the entire population.

Formula:

μ = ΣX / N

Example (Current Pop):

Waiting for generation...

Population SD (σ)

Average distance of population points from the mean.

Formula:

σ = √[ Σ(X-μ)² / N ]

Example (Current Pop):

Waiting for generation...

Sample Mean (x̄)

The average value calculated from a specific sample size (n).

Formula:

x̄ = Σx / n

Example (Last Draw):

Waiting for draw...

Sample SD (s)

Estimated spread of the population using (n-1) degrees of freedom.

Formula:

s = √[ Σ(x-x̄)² / (n-1) ]

Example (Last Draw):

Waiting for draw...

Standard Error (SE)

Quantifies the precision of the sample mean. Used for calculating Confidence Intervals and conducting Hypothesis Testing.

Formula:

SE = σ / √n

Example (Current):

Waiting for draw...

95% Confidence Interval (CI)

A precision range that captures the true population mean with a specific level of certainty. The interval width is determined by the Standard Error (SE).

Formula:

CI = x̄ ± (1.96 * SE)

Example (Last Draw):

Waiting for draw...

CLT & Why it Matters?

Sample means follow a Normal Distribution regardless of population shape, if n is large enough.

Significance:

Allows making inferential predictions about a population without knowing its true distribution shape.

Recent Samples History (Last 30)

Waiting for first sample draw...