Week 13 - Uncertainty

# Week 13 - Uncertainty
## Hypothesis Testing
<html>
<div style="float:left">

</div>
<hr color='#EB811B' size=1px width=800px>
</html>
### Danilo Freire
### 22 April 2019

---

<style>

.remark-slide-number {
  position: inherit;
}

.remark-slide-number .progress-bar-container {
  position: absolute;
  bottom: 0;
  height: 6px;
  display: block;
  left: 0;
  right: 0;
}

.remark-slide-number .progress-bar {
  height: 100%;
  background-color: #EB811B;
}

.orange {
  color: #EB811B;
}
</style>

# Today's Agenda

* One-sample tests

* Two-sample tests

* Assignment 5

* Final project
]
---

# Hypothesis Testing

* Method: "Proof" by contradiction

* We try to reject the _null hypothesis_

- _X_ is uncorrelated with/has no effect on _Y_
  - _X_ is no different from _Y_
]
---

# Proof by Contradiction

* Starts with the _law of the excluded middle_: either a preposition is true, or its opposite is true

* `$\forall P \vdash (P \lor \lnot P)$` 
]
--

* Then, we assume _P_ is false

* Lastly, we prove _P_ cannot be false given our data/reasoning

* Thus, _P_ must be true
]
---

# Hypothesis Testing

* Why? Because statistics is _probabilistic_

* So we use a probabilistic version of proof by contradiction
]
---

# Hypothesis Testing

.font150[
* We construct the _null hypothesis_ `$\rightarrow H_0$` (what we want to refute), and the _alternative hypothesis_ `$\rightarrow H_1$`
 
* We select a test statistic `$T$`

* Figure out the sampling distribution of `$T$` under `$H_0$`

* Is the observed value of `$T$` likely to occur under `$H_0$`?

- Yes - fail to reject `$H_0$`
  - No - reject `$H_0$`
]
---

# Paul the Octopus

# Paul the Octopus

.font150[
* 2010 World Cup
  - Group: .orange[Germany] vs Australia
  - Group: Germany vs .orange[Serbia]
  - Group: Ghana vs .orange[Germany]
  - Round of 16: .orange[Germany] vs England
  - Quarter-final: Argentina vs .orange[Germany]
  - Semi-final: Germany vs .orange[Spain]
  - 3rd place: Uruguay vs .orange[Germany]
  - Final: Netherlands vs .orange[Spain]
]
---

# Paul the Octopus

* Null hypothesis: Paul is randomly choosing winner

* Test statistics: Number of correct answers

* Reference distribution: `Binomial(8, 0.5)`

* The probability that Paul gets them all correct: `$\frac{1}{2^8} \approx 0.004$`
]
---

# More Data about Paul

.font150[
* UEFA Euro 2008
  - Group: .orange[Germany] vs Poland
  - Group: Croatia vs .orange[Germany] (wrong)
  - Group: Austria vs .orange[Germany]
  - Quarter-final: Portugal vs .orange[Germany]
  - Semi-final: .orange[Germany] vs Turkey
  - Final: .orange[Germany] vs Spain (wrong)
  
* A total of 14 matches

* 12 correct guesses
]
---

# More Data about Paul

.font150[
* .orange[p-value:] Probability that under the null you observe something at least as extreme as what you actually observed

* `$Pr({12,13,14}) \approx 0.001$`

```r
pbinom(12, size = 14, prob = 0.5, lower.tail = FALSE)
```

```
## [1] 0.0009155273
```
]
---

# P-Value

.font140[
* .orange[p-value] is the probability, computed under `$H_0$`, of observing a value ofthe test statistic at least as extreme as its observed value

* A smaller p-value presents stronger evidence against `$H_0$`

* p-value less than `$\alpha$` indicates statistical significance

* `$\alpha$` is usually 0.05

* .orange[Remember:] p-value is NOT the probability that `$H_0$` `$(H_1)$` is true (false)

* Statistical significance does not necessarily imply scientific significance
]
---

# One-Sample Test - Continuous Variable

* For example: you run a factory and want to try a new machine. Does the machine actually improve your results, or are the results just due to chance?

* R function: `t.test()`
]
---

# One-Sample Test - Continuous Variable

* A bottle filling machine is set to fill bottles with soft drink to a volume of 500 ml. The actual volume is known to follow a normal distribution. The manufacturer believes the machine is under-filling bottles. A sample of 20 bottles is taken and the volume of liquid inside is measured.
]
---

# One-Sample Test - Continuous Variable

```r
bottles <- c(484.11, 459.49, 471.38, 512.01, 494.48, 528.63, 493.64,
             485.03, 473.88, 501.59, 502.85, 538.08, 465.68, 495.03,
             475.32, 529.41, 518.13, 464.32, 449.08, 489.27)

mean(bottles)
```

```
## [1] 491.5705
```

```r
t.test(bottles, mu = 500)
```

```
## 
## 	One Sample t-test
## 
## data:  bottles
## t = -1.5205, df = 19, p-value = 0.1449
## alternative hypothesis: true mean is not equal to 500
## 95 percent confidence interval:
##  479.9667 503.1743
## sample estimates:
## mean of x 
##  491.5705
```
]
---

# Two-Sample Test

* This is particularly useful for randomised experiments

* We test whether the treatment and control groups have similar means

* If so, then we conclude the treatment likely doesn't have any effect

* We use the same `t.test()` function 
]
---

# Two-Sample Test

```r
resume <- read.csv("https://raw.githubusercontent.com/pols1600/pols1600.github.io/master/datasets/causality/resume.csv")

call_blacks <- subset(resume$call, resume$race == "black") 
call_whites <- subset(resume$call, resume$race == "white")

t.test(call_blacks, call_whites)
```

```
## 
## 	Welch Two Sample t-test
## 
## data:  call_blacks and call_whites
## t = -4.1147, df = 4711.6, p-value = 3.943e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.04729503 -0.01677067
## sample estimates:
##  mean of x  mean of y 
## 0.06447639 0.09650924
```
]
---

# t.test() or lm()?

```r
summary(lm(call ~ race, data = resume))
```

```
## 
## Call:
## lm(formula = call ~ race, data = resume)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.09651 -0.09651 -0.06448 -0.06448  0.93552 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.064476   0.005505  11.713  < 2e-16 ***
## racewhite   0.032033   0.007785   4.115 3.94e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2716 on 4868 degrees of freedom
## Multiple R-squared:  0.003466,	Adjusted R-squared:  0.003261 
## F-statistic: 16.93 on 1 and 4868 DF,  p-value: 3.941e-05
```
]
---

# t.test() or lm()?

* The intercept in the linear model is the mean of the control group (when all other variables are zero)

* The coefficient is the average for the treatment group

* I suggest you to use `lm()`: you can add control variables, interactions, etc
]

---

# Questions?

<html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html>  
---

# See you on Wednesday!