Week 05 - Measurement

# Week 05 - Measurement
## Bivariate Relationships: Scatter Plots and Correlations
<html>
<div style="float:left">

</div>
<hr color='#EB811B' size=1px width=800px>
</html>
### Danilo Freire
### 22th February 2019

---

<style>

.remark-slide-number {
  position: inherit;
}

.remark-slide-number .progress-bar-container {
  position: absolute;
  bottom: 0;
  height: 6px;
  display: block;
  left: 0;
  right: 0;
}

.remark-slide-number .progress-bar {
  height: 100%;
  background-color: #EB811B;
}

.orange {
  color: #EB811B;
}
</style>

# Today's Agenda

* Correlations
]

---

# Scatter Plots

* Convention: _x_ is the independent variable (what you change), _y_ is the dependent variable (what you want to explain)

* Cartesian coordinates (x, y) of the data points

* `plot(x,y)` in R
]
---

# Scatter Plots

- `main`, `xlab`, `ylab`, `ylim`, `xlim`, `col` as we've seen before
	- `pch =` different plotting symbols. 
	
* You can add another variable to the same graph with `points()`
]
---

# Scatter Plots

# Scatter Plots

```r
set.seed(12345)                       # reproducibility
x <- rnorm(n = 100, mean = 5, sd = 2) # random numbers with normal distribution
y <- x + rnorm(100, 0, 1)             # no need to write function arguments
df <- data.frame(x,y)                 # just to see them side-by-side
head(df, 10)                          # first 10 observations
```

```
##           x         y
## 1  6.171058 6.3949830
## 2  6.418932 5.2627087
## 3  4.781393 5.2038119
## 4  4.093006 2.7682504
## 5  6.211775 6.3528592
## 6  1.364088 0.8280401
## 7  6.260197 5.9485910
## 8  4.447632 6.0037414
## 9  4.431681 3.9836472
## 10 3.161356 3.4824795
```
---

# Scatter Plots

```r
plot(df$x, df$y, main = "Scatter Plot", pch = 16, col = "blue")     # plot
```

---

# Scatter Plots

```r
z <- runif(n = 10, min = 0, max = 10) # add another variable 
df <- data.frame(x,y,z)
head(df, 10)
```

```
##           x         y         z
## 1  6.171058 6.3949830 0.7548045
## 2  6.418932 5.2627087 4.7438424
## 3  4.781393 5.2038119 2.6458955
## 4  4.093006 2.7682504 2.3074607
## 5  6.211775 6.3528592 5.9619939
## 6  1.364088 0.8280401 1.5892558
## 7  6.260197 5.9485910 8.5505484
## 8  4.447632 6.0037414 2.3745380
## 9  4.431681 3.9836472 7.9711170
## 10 3.161356 3.4824795 0.7848559
```
---

# Scatter Plots

```r
plot(df$x, df$y, main = "Scatter Plot", pch = 16, col = "blue")
*points(df$z, pch = 17, col = "red")      # add z to the plot
```

---

# Time-Series Plots

* Add `type = "l"` (line) to your code

* Additional lines with `lines()`

* Be sure to include _the same x variable in both_
]
---

# Time-Series Plots

```r
set.seed(1) 
years <- seq(from = 1950, to = 2010, by = 10)
k <- rnorm(n = 7, mean = 5, sd = 5)
plot(years, k, main = "Time-Series Plot", type = "l", col = "brown")
```

---

# Time-Series Plots

```r
set.seed(3)                               # different random numbers
z <- rnorm(n = 7, mean = 5, sd = 2.5)
plot(years, k, main = "Time-Series Plot", type = "l", col = "brown")
*lines(years, z, type = "l", col = "blue") # add z to plot
text(2009, 8, "K variable", col = "brown")
text(2009, 4, "Z variable", col = "blue")
```

---