Week 03 - Observational Studies

# Week 03 - Observational Studies
## Descriptive Statistics for a Single Variable
<html>
<div style="float:left">

</div>
<hr color='#EB811B' size=1px width=800px>
</html>
### Danilo Freire
### 8th February 2019

---

<style>

.remark-slide-number {
  position: inherit;
}

.remark-slide-number .progress-bar-container {
  position: absolute;
  bottom: 0;
  height: 6px;
  display: block;
  left: 0;
  right: 0;
}

.remark-slide-number .progress-bar {
  height: 100%;
  background-color: #EB811B;
}

.orange {
  color: #EB811B;
}
</style>

# Today's Agenda

* Standard deviation

* Leader assassination DiD

* Internal and external validity

* Replication crisis
]

---

# Quantiles

* We can split a variable in many ways:
	- Quartiles
	- Quantiles
	- Percentiles

* Which quantile is the median?
]
---

# Quantiles

* What is the median of {1, 2, 3, 4, 20}?

* Interquartile range (IQR): the difference between the 75th and the 25th percentile    
]

---

# Standard Deviation

* `$SD = (\sqrt{\frac{1}{n} \sum^{N}_{i = 1} (x_{i} - \bar{x})^{2}})$`
]
--

.font150[
* where `$\bar{x}$` indicates the sample mean, that is, `$\bar{x} = \frac{1}{n} \sum^{N}_{i = 1} x_{i}$`, and `$n$` is the sample size

* Almost all data entries are located within 2 or 3 standard deviations from the mean
]
---

# R Examples

```r
median(leaders$age)
```

```
## [1] 52.5
```

```r
IQR(leaders$age)
```

```
## [1] 16.75
```

```r
quantile(leaders$age)
```

```
##    0%   25%   50%   75%  100% 
## 18.00 45.00 52.50 61.75 81.00
```

```r
quantile(leaders$age, probs = seq(0, 1, by = 0.1)) # deciles
```

```
##   0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
## 18.0 39.0 43.0 47.0 50.0 52.5 57.0 60.0 64.0 70.0 81.0
```

```r
quantile(leaders$age, probs = c(.34, .55, .93)) # 34th, 55th, 93th percentiles
```

```
## 34% 55% 93% 
##  48  55  71
```
---

# R Examples

```r
mean(leaders$age)
```

```
## [1] 53.524
```

```r
sd(leaders$age)
```

```
## [1] 12.03982
```

```r
summary(leaders$age)
```

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   18.00   45.00   52.50   53.52   61.75   81.00
```
---

# Questions?

---

# Leader Assassination DD

* Does successful leader assassination cause democratization? Does successful leader assassination lead countries to war?  Answer these questions by analyzing the data.  Be sure to state your assumptions and provide a brief interpretation of the results.]
---

# Leader Assassination DD

```r
str(leaders)
```

```
## 'data.frame':	250 obs. of  11 variables:
##  $ year          : int  1929 1933 1934 1924 1931 1968 1992 1908 1916 1929 ...
##  $ country       : Factor w/ 88 levels "Afghanistan",..: 1 1 1 2 2 3 3 4 4 4 ...
##  $ leadername    : Factor w/ 196 levels "Abdullah Al-Hussein",..: 72 128 77 196 196 27 26 6 46 92 ...
##  $ age           : int  39 53 50 29 36 41 73 48 76 77 ...
##  $ politybefore  : num  -6 -6 -6 0 -9 -9 -2 1 2 2 ...
##  $ polityafter   : num  -6 -7.33 -8 -9 -9 ...
##  $ interwarbefore: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ interwarafter : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ civilwarbefore: int  1 0 0 0 0 0 0 0 0 0 ...
##  $ civilwarafter : int  0 0 0 0 0 0 1 0 0 0 ...
##  $ result        : Factor w/ 10 levels "dies between a day and a week",..: 6 3 9 10 6 10 3 7 6 6 ...
```

```r
# create success variable
leaders$success <- ifelse(leaders$result == "dies between a day and a week" |
                            leaders$result == "dies between a week and a month" |
                            leaders$result == "dies within a day after the attack" |
                            leaders$result == "dies, timing unknown",1, 0)
```
---

# Leader Assassination DD

```r
# polity score before and after assassination attempt
diff.pol.succ <- mean(leaders$polityafter[leaders$success == 1]) - 
                   mean(leaders$politybefore[leaders$success == 1]) # successful

diff.pol.succ
```

```
## [1] -0.05864198
```

```r
diff.pol.unsucc <- mean(leaders$polityafter[leaders$success == 0]) - 
                     mean(leaders$politybefore[leaders$success == 0]) # unsuccessful

diff.pol.unsucc
```

```
## [1] -0.1513605
```

```r
## difference in differences
diff.pol.succ - diff.pol.unsucc
```

```
## [1] 0.09271857
```
---

# Leader Assassination DD

```r
# create variable for warbefore and warafter
leaders$warbefore <- ifelse(leaders$interwarbefore == 1 |
                              leaders$civilwarbefore == 1, 1, 0)
leaders$warafter <- ifelse(leaders$interwarafter == 1 |
                             leaders$civilwarafter == 1, 1, 0)

## compare war before to war after among successful and unsuccessful
diff.war.succ <- mean(leaders$warafter[leaders$success == 1]) - 
                   mean(leaders$warbefore[leaders$success == 1])

diff.war.unsucc <- mean(leaders$warafter[leaders$success == 0]) - 
                     mean(leaders$warbefore[leaders$success == 0])

diff.war.succ - diff.war.unsucc # difference in differences
```

```
## [1] -0.07161754
```

.font130[
Using the difference-in-difference approach, we find very little difference in the contries' polity score and in the proportion of countries engaged in war. Leader assassination does not seem to cause countries to democratise or engage in war.]
---

# Internal and External Validity

* .orange[Internal validity]: the degree to which we can attribute the results to the treatment and not to other factors

* However, observational studies have greater _external validity_

* .orange[External validity]: the extent to which the results can be generalised 
]

---

# Replication Crisis

Website: <https://theconversation.com/you-cant-characterize-human-nature-if-studies-overlook-85-percent-of-people-on-earth-106670>

---

# Replication Crisis

Website: <https://www.theatlantic.com/science/archive/2018/11/psychologys-replication-crisis-real/576223>

---

# Replication Crisis

Website: <https://theconversation.com/you-cant-characterize-human-nature-if-studies-overlook-85-percent-of-people-on-earth-106670>

---

# What Should We Do?

* .orange[Field experiments:] conduct experiments in realistic settings

* .orange[Larger sample sizes:] large samples tend to be more representative of the underlying population

* .orange[Open methods and open data:] share your code and datasets so other can verify them

* .orange[Preregistration:] state your hypotheses _before_ running the experiment

]
---

# We're Getting Better

Website: <https://www.vox.com/science-and-health/2018/8/27/17761466/psychology-replication-crisis-nature-social-science>

---

# Wrap-up

* If possible, preregister your hypotheses and make your data and code available (RMarkdown can help!)

* Replicate your and other people's work

* Is science wrong? _No_, but there are many wrong findings maskerading as science

* Keep those things in mind while reading a study that seems to good to be true: _it most likely is_]

---

# Homework

* BBC podcast on the same problem: <https://pca.st/n5b3>

* `CAUSALITY02`
]

---

# See you next week!