Hi everyone,

I’m having trouble figuring out how to “properly” set an informative prior for a regular bivariate correlation model. I have a proposed solution, but I’m not sure whether it makes sense or not, and would greatly appreciate any input.

To get going, let’s simulate 50 data points with a known correlation of 0.2:

```
library(MASS)
library(tidyverse)
library(brms)
set.seed(2021)
data <- as.data.frame(mvrnorm(50, mu = c(0,0),
Sigma = matrix(c(1, 0.2, 0.2, 1), ncol = 2),
empirical = TRUE)) %>%
mutate(x = V1, y = V2) %>%
dplyr::select(x, y)
cor(data$x, data$y)
```

Ordinarily, I would approach this using `mvbind`

and an `LKJ`

prior:

```
m0 <- brm(data = data,
family = gaussian,
bf(mvbind(x, y) ~ 0) + set_rescor(TRUE),
prior = c(prior(normal(0, 1), class = sigma, resp = x),
prior(normal(0, 1), class = sigma, resp = y),
prior(lkj(1), class = rescor)),
sample_prior = "only",
control = list(adapt_delta = 0.9),
iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)
```

But since the data is standardized, using a linear regression model with a suppressed intercept should yield an **equivalent** estimate of the correlation, right? And if so, a `uniform(-1, 1)`

should be equivalent or at least very similar to the `mvbind`

+ `LKJ(1)`

prior approach, like this:

```
m1 <- brm(data = data,
family = gaussian,
y ~ 0 + x,
prior = c(prior(uniform(-1, 1), class = b, lb = -1, ub = 1),
prior(normal(0, 1), class = sigma)),
sample_prior = "only",
control = list(adapt_delta = 0.9),
iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)
```

In practice however I usually use the `LKJ(2)`

prior to rule out extreme values:

```
m2 <- brm(data = data,
family = gaussian,
bf(mvbind(x, y) ~ 0) + set_rescor(TRUE),
prior = c(prior(normal(0, 1), class = sigma, resp = x),
prior(normal(0, 1), class = sigma, resp = y),
prior(lkj(2), class = rescor)),
sample_prior = "only",
control = list(adapt_delta = 0.9),
iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)
```

Here the problems start, since I cannot use an `LKJ(2)`

in the linear regression approach. The closest would be something like a `Beta`

, but that’s bounded between 0 and 1. I experimented with different truncated normals, such as a `normal(0, 0.5)`

:

```
m3 <- brm(data = data,
family = gaussian,
y ~ 0 + x,
prior = c(prior(normal(0, 0.5), class = b, lb = -1, ub = 1),
prior(normal(0, 1), class = sigma)),
sample_prior = "only",
control = list(adapt_delta = 0.9),
iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)
```

It’s fairly close, but there are slight differences in where most of the probability mass lands. Finally, I extended this to a form of informative prior, using a `normal(0.2, 0.5)`

:

```
m4 <- brm(data = data,
family = gaussian,
y ~ 0 + x,
prior = c(prior(normal(0.2, 0.5), lb = -1, ub = 1, class = b),
prior(normal(0, 1), class = sigma)),
sample_prior = "only",
control = list(adapt_delta = 0.9),
iter = 4000, warmup = 500, chains = 4, cores = 4, seed = seed)
```

Here’s how the prior distributions look:

And here’s how the posterior distributions look:

Overall, very similar (at least for this simulated, well-behaved data set).

**My questions boil down to:**

- Am I correct in assuming equivalence between the “regular”
`mvbind(x, y)`

approach and the linear regression`y ~ 0 + x`

approach, as long as data are standardized? - If the above is correct, then is the use of truncated normals (as a substitute for the
`LKJ`

prior and as a way of injecting prior knowledge) reasonable? - Are there alternative approaches that might be better suited?

- Operating System: Debian 11
- brms Version: 2.16.1