class: center, middle, inverse, title-slide # Week 03 - Observational Studies ## Descriptive Statistics for a Single Variable
### Danilo Freire ### 8th February 2019 --- <style> .remark-slide-number { position: inherit; } .remark-slide-number .progress-bar-container { position: absolute; bottom: 0; height: 6px; display: block; left: 0; right: 0; } .remark-slide-number .progress-bar { height: 100%; background-color: #EB811B; } .orange { color: #EB811B; } </style> # Today's Agenda .font150[ * Quantiles * Standard deviation * Leader assassination DiD * Internal and external validity * Replication crisis ] --- # Quantiles .font150[ * Sometimes it is useful to look at the distribution of a given variable * We can split a variable in many ways: - Quartiles - Quantiles - Percentiles * Which quantile is the median? ] --- # Quantiles .font150[ * What is the median of {2, 5, 6, 10}? * What is the median of {1, 2, 3, 4, 20}? * Interquartile range (IQR): the difference between the 75th and the 25th percentile ] --- # Standard Deviation .font150[ * Average distance of each data point to the mean * `\(SD = (\sqrt{\frac{1}{n} \sum^{N}_{i = 1} (x_{i} - \bar{x})^{2}})\)` ] -- .font150[ * where `\(\bar{x}\)` indicates the sample mean, that is, `\(\bar{x} = \frac{1}{n} \sum^{N}_{i = 1} x_{i}\)`, and `\(n\)` is the sample size * Almost all data entries are located within 2 or 3 standard deviations from the mean ] --- # R Examples ```r median(leaders$age) ``` ``` ## [1] 52.5 ``` ```r IQR(leaders$age) ``` ``` ## [1] 16.75 ``` ```r quantile(leaders$age) ``` ``` ## 0% 25% 50% 75% 100% ## 18.00 45.00 52.50 61.75 81.00 ``` ```r quantile(leaders$age, probs = seq(0, 1, by = 0.1)) # deciles ``` ``` ## 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% ## 18.0 39.0 43.0 47.0 50.0 52.5 57.0 60.0 64.0 70.0 81.0 ``` ```r quantile(leaders$age, probs = c(.34, .55, .93)) # 34th, 55th, 93th percentiles ``` ``` ## 34% 55% 93% ## 48 55 71 ``` --- # R Examples ```r mean(leaders$age) ``` ``` ## [1] 53.524 ``` ```r sd(leaders$age) ``` ``` ## [1] 12.03982 ``` ```r summary(leaders$age) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 18.00 45.00 52.50 53.52 61.75 81.00 ``` --- class: inverse, center, middle # Questions? <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- # Leader Assassination DD .font150[ * **Question 5** * Does successful leader assassination cause democratization? Does successful leader assassination lead countries to war? Answer these questions by analyzing the data. Be sure to state your assumptions and provide a brief interpretation of the results.] --- # Leader Assassination DD ```r str(leaders) ``` ``` ## 'data.frame': 250 obs. of 11 variables: ## $ year : int 1929 1933 1934 1924 1931 1968 1992 1908 1916 1929 ... ## $ country : Factor w/ 88 levels "Afghanistan",..: 1 1 1 2 2 3 3 4 4 4 ... ## $ leadername : Factor w/ 196 levels "Abdullah Al-Hussein",..: 72 128 77 196 196 27 26 6 46 92 ... ## $ age : int 39 53 50 29 36 41 73 48 76 77 ... ## $ politybefore : num -6 -6 -6 0 -9 -9 -2 1 2 2 ... ## $ polityafter : num -6 -7.33 -8 -9 -9 ... ## $ interwarbefore: int 0 0 0 0 0 0 0 0 0 0 ... ## $ interwarafter : int 0 0 0 0 0 0 0 0 0 0 ... ## $ civilwarbefore: int 1 0 0 0 0 0 0 0 0 0 ... ## $ civilwarafter : int 0 0 0 0 0 0 1 0 0 0 ... ## $ result : Factor w/ 10 levels "dies between a day and a week",..: 6 3 9 10 6 10 3 7 6 6 ... ``` ```r # create success variable leaders$success <- ifelse(leaders$result == "dies between a day and a week" | leaders$result == "dies between a week and a month" | leaders$result == "dies within a day after the attack" | leaders$result == "dies, timing unknown",1, 0) ``` --- # Leader Assassination DD ```r # polity score before and after assassination attempt diff.pol.succ <- mean(leaders$polityafter[leaders$success == 1]) - mean(leaders$politybefore[leaders$success == 1]) # successful diff.pol.succ ``` ``` ## [1] -0.05864198 ``` ```r diff.pol.unsucc <- mean(leaders$polityafter[leaders$success == 0]) - mean(leaders$politybefore[leaders$success == 0]) # unsuccessful diff.pol.unsucc ``` ``` ## [1] -0.1513605 ``` ```r ## difference in differences diff.pol.succ - diff.pol.unsucc ``` ``` ## [1] 0.09271857 ``` --- # Leader Assassination DD ```r # create variable for warbefore and warafter leaders$warbefore <- ifelse(leaders$interwarbefore == 1 | leaders$civilwarbefore == 1, 1, 0) leaders$warafter <- ifelse(leaders$interwarafter == 1 | leaders$civilwarafter == 1, 1, 0) ## compare war before to war after among successful and unsuccessful diff.war.succ <- mean(leaders$warafter[leaders$success == 1]) - mean(leaders$warbefore[leaders$success == 1]) diff.war.unsucc <- mean(leaders$warafter[leaders$success == 0]) - mean(leaders$warbefore[leaders$success == 0]) diff.war.succ - diff.war.unsucc # difference in differences ``` ``` ## [1] -0.07161754 ``` .font130[ Using the difference-in-difference approach, we find very little difference in the contries' polity score and in the proportion of countries engaged in war. Leader assassination does not seem to cause countries to democratise or engage in war.] --- # Internal and External Validity .font150[ * Because of randomisation, we know that RCTs have strong _internal validity_ * .orange[Internal validity]: the degree to which we can attribute the results to the treatment and not to other factors * However, observational studies have greater _external validity_ * .orange[External validity]: the extent to which the results can be generalised ] --- # Replication Crisis .center[![:scale 100%](rep01.png)] Website: <https://theconversation.com/you-cant-characterize-human-nature-if-studies-overlook-85-percent-of-people-on-earth-106670> --- # Replication Crisis .center[![:scale 85%](rep02.png)] Website: <https://www.theatlantic.com/science/archive/2018/11/psychologys-replication-crisis-real/576223> --- # Replication Crisis .center[![:scale 100%](rep03.png)] Website: <https://theconversation.com/you-cant-characterize-human-nature-if-studies-overlook-85-percent-of-people-on-earth-106670> --- # What Should We Do? .font150[ * .orange[Replications:] see whether the same results hold under different conditions * .orange[Field experiments:] conduct experiments in realistic settings * .orange[Larger sample sizes:] large samples tend to be more representative of the underlying population * .orange[Open methods and open data:] share your code and datasets so other can verify them * .orange[Preregistration:] state your hypotheses _before_ running the experiment ] --- # We're Getting Better .center[![:scale 100%](rep04.png)] Website: <https://www.vox.com/science-and-health/2018/8/27/17761466/psychology-replication-crisis-nature-social-science> --- # Wrap-up .font150[ * Not all experiments are true, many don't replicate * If possible, preregister your hypotheses and make your data and code available (RMarkdown can help!) * Replicate your and other people's work * Is science wrong? _No_, but there are many wrong findings maskerading as science * Keep those things in mind while reading a study that seems to good to be true: _it most likely is_] --- # Homework .font150[ * John Oliver on the replication crisis: <https://youtu.be/0Rnq1NpHdmw> * BBC podcast on the same problem: <https://pca.st/n5b3> * `CAUSALITY02` ] --- class: inverse, center, middle # See you next week! <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html>