Whoville Statistics: Regression, Probability, and Distributions Flashcards
Master Whoville Statistics: Regression, Probability, and Distributions with these flashcards. Review key terms, definitions, and concepts using active recall to strengthen your understanding and ace your exams.
Swipe to navigate between cards
Front
Correlation
Back
A numerical measure of the linear association between two variables. Values range from $-1$ to $1$, with sign indicating direction and magnitude indicating strength.
Front
Slope
Back
In regression, the estimated change in the response variable for a one-unit increase in the predictor. Computed as $b_1 = r \frac{s_{Y}}{s_{X}}$ for simple linear regression.
Front
Intercept
Back
The predicted value of the response when the predictor equals zero. Interpretation can be meaningless if $x=0$ is outside the observed range.
Front
Least-squares line
Back
The line that minimizes the sum of squared residuals between observed values and predicted values. It provides the best linear unbiased estimate under standard assumptions.
Front
Residual
Back
The difference between an observed value and its predicted value from a regression model. Residual = observed $-$ predicted.
Front
Coefficient of determination
Back
Denoted $R^2$, it measures the proportion of variance in the response explained by the predictor(s). In simple regression, $R^2 = r^2$.
Front
Prediction vs. Causation
Back
Regression can predict associations but does not prove causation unless the study design (e.g., randomized experiment) supports causal claims. Correlation alone is insufficient.
Front
Binomial model
Back
Models the number of successes in $n$ independent trials with constant success probability $p$. Denoted $Bin(n,p)$ with mean $np$ and variance $np(1-p)$.
Front
Normal approximation
Back
Approximating a Binomial $Bin(n,p)$ by a Normal when $n$ is large and $p$ not too close to 0 or 1. Use mean $np$ and variance $np(1-p)$ and apply continuity correction.
Front
Continuity correction
Back
Adjustment when approximating discrete distributions (like binomial) by a continuous distribution (normal). E.g., $P(X\ge k)$ approximated by $P(X>k-0.5)$.
Front
Expected value
Back
The weighted average of all possible values of a random variable, using their probabilities. For discrete $X$, $E[X]=\sum x p(x)$.
Front
Variance
Back
The expected squared deviation from the mean: $Var(X)=E[(X-\mu)^2]=E[X^2]-\mu^2$. It measures spread of the distribution.
Front
Standard deviation
Back
The square root of the variance. It is on the same scale as the variable and describes typical deviation from the mean.
Front
Bayes' theorem
Back
A formula to reverse conditional probabilities: $P(A|B)=\dfrac{P(B|A)P(A)}{P(B)}$. Useful when updating probabilities given new evidence.
Front
Finite population correction
Back
An adjustment when sampling without replacement from a finite population. When the sample is less than about 5% of the population, the correction is negligible.
Front
Z-score
Back
Standardized value representing how many standard deviations a data point is from the mean: $z=\dfrac{x-\mu}{\sigma}$. Used to find probabilities under the normal curve.
Front
Prediction interval
Back
An interval estimate for an individual future observation that accounts for both uncertainty in the regression parameters and residual variability. Wider than a confidence interval for the mean.
Front
Sample size rule
Back
For normal approximation of a binomial, ensure $np$ and $n(1-p)$ are both reasonably large (common rule: at least 5 or 10). This ensures approximation quality.
Continue learning
Explore other study materials generated from the same source content. Each format reinforces your understanding of Whoville Statistics: Regression, Probability, and Distributions in a different way.
Create your own flashcards
Turn your notes, PDFs, and lectures into flashcards with AI. Study smarter with spaced repetition.
Get Started Free