Hypothesis Testing Basics
Hypothesis Testing Basics
Purpose and Rationale
Why Do We Need Hypothesis Testing?
Hypothesis testing serves several crucial purposes in statistical analysis:
-
Making Data-Driven Decisions
- Provides a structured framework for making decisions based on sample data
- Helps avoid making decisions based on intuition or anecdotal evidence
- Allows us to quantify the strength of evidence for or against a claim
-
Scientific Method Application
- Enables systematic testing of theories and claims
- Provides a way to falsify hypotheses (following Popper's philosophy)
- Allows for replication and verification of results
-
Risk Management
- Helps quantify the probability of making incorrect decisions
- Provides a way to control Type I and Type II errors
- Allows for setting acceptable levels of risk in decision-making
The Rationale Behind Hypothesis Testing
-
Statistical Inference
- We can't observe entire populations, so we use samples
- Sample results vary due to random sampling
- Need a method to distinguish between:
- Real effects/patterns
- Random variation in the data
-
Burden of Proof
- Starts with a skeptical position (null hypothesis)
- Requires strong evidence to reject the null
- Protects against false discoveries
- Similar to "innocent until proven guilty" in legal systems
-
Quantifying Uncertainty
- Provides a way to measure the strength of evidence
- Allows for comparison of different studies
- Helps in making informed decisions under uncertainty
Introduction to Hypothesis Testing
Aspect | Description |
---|---|
Purpose | Formal statistical technique to answer binary questions (yes/no) about a Population using sample data |
Contrast with | Confidence Interval - which estimates parameters or provides ranges of plausible values |
Key Feature | Compares two competing possibilities about a population parameter |
Examples of Hypothesis Testing Questions
Type | Example |
---|---|
Fairness Test | Is a coin fair? (yes/no) |
Treatment Effect | Is a new drug more effective than an existing one? (yes/no) |
Quality Control | Does a manufacturing process meet specifications? (yes/no) |
Population Parameter | Is the mean height of a population different from a known value? |
Relationship | Is there a relationship between two categorical variables? |
Court Case Analogy
Concept | Court Case | Hypothesis Testing |
---|---|---|
Initial Assumption | Innocence | Null Hypothesis ( |
Alternative | Guilt | Alternative Hypothesis ( |
Evidence Required | Beyond reasonable doubt | Small p-value |
Decision | Not guilty ≠ innocent | Fail to reject |
Burden of Proof | On prosecution | On alternative hypothesis |
Core Components: The Hypotheses
Component | Description | Format |
---|---|---|
Null Hypothesis ( |
Represents skepticism, status quo, or no effect | |
Alternative Hypothesis ( |
Represents what we aim to find evidence for |
Detailed Hypothesis Testing Procedure
Step | Description | Key Components |
---|---|---|
1 | Define Hypotheses | * State * Choose appropriate test type * Determine if one or two-tailed |
2 | Collect Data and Check Conditions | * Gather random sample(s) * Verify sample size conditions * Check distribution assumptions * Ensure independence |
3 | Calculate Test Statistic | * Compute sample statistic * Standardize using Standard error * Account for degrees of freedom if needed |
4 | Determine P-value | * Identify sampling distribution * Calculate probability of more extreme results * Consider test direction (one/two-tailed) |
5 | Make Decision | * Compare p-value to significance level * State conclusion in context * Consider practical significance |
Key Considerations
Aspect | Description |
---|---|
Sample Size | Affects power and validity of assumptions |
Significance Level | Pre-determined threshold (often α = 0.05) |
Test Direction | One-tailed vs two-tailed affects p-value calculation |
Conditions | Must be verified for valid inference |
Practical Significance | Statistical significance ≠ practical importance |
Common Misconceptions
Misconception | Reality |
---|---|
"Fail to reject" means |
It only means insufficient evidence to reject |
Small p-value proves |
It only provides evidence against |
Large p-value proves |
It only means insufficient evidence against |
Statistical significance = practical importance | They are different concepts |
Related Topics
- Hypothesis Testing Key Concepts - Detailed explanation of null distribution, test statistics, and p-values
- Hypothesis Testing for Proportions - Specific tests for population proportions
- Hypothesis Testing for Means - Specific tests for population means
- Central Limit Theorem - Understanding the theoretical foundation for hypothesis testing
- Confidence Interval - Alternative approach to statistical inference
- Type I and Type II Errors - Understanding potential errors in hypothesis testing
- Statistical Significance - Understanding what makes results significant
Comprehensive Guide to Hypothesis Tests
Section / Term | What it refers to | How to interpret it | Why it matters |
---|---|---|---|
Call | Formula you passed to lm() (e.g. Net_Tuition ~ Enrollment + Type ). |
Confirms the model you actually fit: response on the left, predictors on the right. | Quick specification check |
Coefficients block | Estimates and tests for each parameter in $$\hat{Y}=b_0+b_1X_1+\dots+b_kX_k$$. | — | — |
Estimate |
Slope: expected change in Y for a 1‑unit rise in that predictor, holding others constant. Intercept: predicted Y when all |
Direction & magnitude of each relationship | |
Std. Error |
Smaller SE ⇒ more precise estimate. | Precision of estimate | |
t value |
(t=\dfrac{\text{Estimate}}{\text{Std.Error}}). | Large |t| ⇒ estimate is many SEs from 0 ⇒ evidence that the true |
Test statistic for |
`Pr(> | t|)` | Two‑sided p‑value for that t‑test. | If p < α (e.g. 0.05), reject |
Residual standard error | Typical prediction error; lower ⇒ tighter fit. | Absolute measure of model accuracy | |
df (Residuals) | Used in t‑ and F‑tests. | Calibration of p‑values | |
Multiple R‑squared | Ranges 0–1; higher ⇒ stronger explanatory power. | Strength of fit | |
Adjusted R‑squared | Penalises unnecessary predictors; use to compare models of different sizes. | Strength of fit (penalised) | |
F‑statistic | Tests (H_0!:\beta_1=\dots=\beta_k=0) (no slopes). | Large F ⇒ at least one slope ≠ 0. | Overall model significance |
p‑value (for F) |
Probability of that F (or larger) under (H_0). | p < α ⇒ model is statistically useful overall. | Global test of usefulness |