Bayesian variable selection or Bayesian estimation for differences in the Markov random field model for binary and/or ordinal variables in two independent samples.
Source:R/bgmCompare.R
bgmCompare.Rd
The bgmCompare
function estimates the pseudoposterior distribution of
the parameters of a Markov Random Field model for mixed binary and ordinal
variables, and the differences in pairwise interactions and category thresholds
between two groups. The groups are assumed to be two independent samples.
Usage
bgmCompare(
x,
y,
difference_selection = TRUE,
main_difference_model = c("Free", "Collapse", "Constrain"),
variable_type = "ordinal",
reference_category,
pairwise_difference_scale = 1,
main_difference_scale = 1,
pairwise_difference_prior = c("Bernoulli", "Beta-Bernoulli"),
main_difference_prior = c("Bernoulli", "Beta-Bernoulli"),
pairwise_difference_probability = 0.5,
main_difference_probability = 0.5,
pairwise_beta_bernoulli_alpha = 1,
pairwise_beta_bernoulli_beta = 1,
main_beta_bernoulli_alpha = 1,
main_beta_bernoulli_beta = 1,
interaction_scale = 2.5,
threshold_alpha = 0.5,
threshold_beta = 0.5,
iter = 10000,
burnin = 1000,
na.action = c("listwise", "impute"),
save = FALSE,
display_progress = TRUE
)
Arguments
- x
A data frame or matrix with \(n_1\) rows and
p
columns containing binary and ordinal responses for the first group. Regular ordinal variables are recoded as non-negative integers(0, 1, ..., m)
if not already done. Unobserved categories are collapsed into other categories after recoding (i.e., if category 1 is unobserved, the data are recoded from (0, 2) to (0, 1)). Blume-Capel ordinal variables are also coded as non-negative integers if not already done. However, since “distance” from the reference category plays an important role in this model, unobserved categories are not collapsed after recoding.- y
A data frame or matrix with \(n_2\) rows and
p
columns containing binary and ordinal responses for the second group. The variables or columns iny
must match the variables or columns inx
. In the paired samples design, the rows inx
must match the rows iny
. Note thatx
andy
are recoded independently, although the function checks that the number of different responses observed matches betweenx
andy
.- difference_selection
Logical. If
TRUE
,bgmCompare
will model the inclusion or exclusion of the between samples parameter differences; ifFALSE
, it will estimate all between-sample parameter differences. Default isTRUE
.- main_difference_model
A string specifying options for how the bgmCompare function should handle the comparison of threshold parameters when the observed categories in the samples do not match. The "Collapse" option tells bgmCompare to collapse the two categories into one (for the data set where both categories were observed). The "Constrain" option sets the difference between the category thresholds in the two data sets to zero if the category is not observed in one of the two data sets. The "Free" option tells bgmCompare to estimate a separate set of thresholds in the two samples and to not model their differences.
- variable_type
A string or vector specifying the type of variables in
x
(andy
). Supported types are "ordinal" and "blume-capel", with binary variables treated as "ordinal". Default is "ordinal".- reference_category
The reference category in the Blume-Capel model. Should be an integer within the range of integer values observed for the "blume-capel" variable. Can be a single number that sets the reference category for all Blume-Capel variables at once, or a vector of length
p
, where thei
-th element is the reference category for thei
variable if it is a Blume-Capel variable, and elements for other variable types are ignored. The value of the reference category is also recoded when `bgmCompare` recodes the corresponding observations. Required only if there is at least one variable of type "blume-capel".- pairwise_difference_scale
The scale of the Cauchy distribution that is used as the prior for the pairwise difference parameters. Defaults to
1
.- main_difference_scale
The scale of the Cauchy distribution that is used as the prior for the threshold difference parameters. Defaults to
1
.- pairwise_difference_prior
A character string that specifies the model to use for the inclusion probability of pairwise differences. Options are "Bernoulli" or "Beta-Bernoulli". Default is "Bernoulli".
- main_difference_prior
A character string that specifies the model to use for the inclusion probability of threshold differences. Options are "Bernoulli" or "Beta-Bernoulli". Default is "Bernoulli".
- pairwise_difference_probability
The inclusion probability for a pairwise difference in the Bernoulli model. Can be a single probability or a matrix of
p
rows andp
columns specifying the probability of a difference for each edge pair. Defaults to 0.5.- main_difference_probability
The inclusion probability for a threshold difference in the Bernoulli model. Defaults to 0.5, implying no prior preference. Can be a single probability or a vector of length
p
specifying the probability of a difference between the category thresholds for each variable. Defaults to 0.5.- pairwise_beta_bernoulli_alpha
The alpha parameter of the beta distribution for the Beta-Bernoulli model for group differences in pairwise interactions. Default is 1.
- pairwise_beta_bernoulli_beta
The beta parameter of the beta distribution for the Beta-Bernoulli model for group differences in pairwise interactions. Default is 1.
- main_beta_bernoulli_alpha
The alpha parameter of the beta distribution for the Beta-Bernoulli model for group differences in category thresholds. Default is 1.
- main_beta_bernoulli_beta
The beta parameter of the beta distribution for the Beta-Bernoulli model for group differences in category thresholds. Default is 1.
- interaction_scale
The scale of the Cauchy distribution that is used as a prior for the nuisance pairwise interaction parameters. Defaults to
2.5
.- threshold_alpha, threshold_beta
The shape parameters of the beta-prime prior density for the nuisance threshold parameters. Must be positive values. If the two values are equal, the prior density is symmetric about zero. If
threshold_beta
is greater thanthreshold_alpha
, the distribution is left-skewed, and ifthreshold_beta
is less thanthreshold_alpha
, it is right-skewed. Smaller values tend to result in more diffuse prior distributions.- iter
The function uses a Gibbs sampler to sample from the posterior distribution of the model parameters and indicator variables. How many iterations should this Gibbs sampler run? The default of
1e4
is for illustrative purposes. For stable estimates, it is recommended to run the Gibbs sampler for at least1e5
iterations.- burnin
The number of iterations of the Gibbs sampler before saving its output. Since it may take some time for the Gibbs sampler to converge to the posterior distribution, it is recommended not to set this number too low.
- na.action
How do you want the function to handle missing data? If
na.action = "listwise"
, listwise deletion is used. Ifna.action = "impute"
, missing data will be imputed iteratively during Gibbs sampling. Since imputation of missing data can have a negative impact on the convergence speed of the Gibbs sampling procedure, it is recommended to run the procedure for more iterations.- save
Should the function collect and return all samples from the Gibbs sampler (
save = TRUE
)? Or should it only return the (model-averaged) posterior means (save = FALSE
)? Defaults toFALSE
.- display_progress
Should the function show a progress bar (
display_progress = TRUE
)? Or not (display_progress = FALSE
)? The default isTRUE
.
Value
If save = FALSE
(the default), the result is a list of class
“bgmCompare” containing the following matrices:
indicator
: A matrix withp
rows andp
columns containing the posterior inclusion probabilities of the differences in pairwise interactions on the off-diagonal and the posterior inclusion probabilities of the differences in category thresholds on the diagonal.difference_pairwise
: A matrix withp
rows andp
columns, containing model-averaged posterior means of the differences in pairwise interactions.difference_threshold
: A matrix withp
rows andmax(m)
columns, containing model-averaged posterior means of the differences in category thresholds.interactions
: A matrix withp
rows andp
columns, containing posterior means of the nuisance pairwise interactions.thresholds
: A matrix withp
rows andmax(m)
columns containing the posterior means of the nuisance category thresholds. In the case of “blume-capel” variables, the first entry is the parameter for the linear effect and the second entry is the parameter for the quadratic effect, which models the offset to the reference category.
If save = TRUE
, the result is a list of class “bgmCompare”
containing the following matrices:
indicator_pairwise
: A matrix withiter
rows andp * (p - 1) / 2
columns containing the inclusion indicators for the differences in pairwise interactions from each iteration of the Gibbs sampler.difference_pairwise
: A matrix withiter
rows andp * (p - 1) / 2
columns, containing parameter states for the differences in pairwise interactions from each iteration of the Gibbs sampler.indicator_threshold
: A matrix withiter
rows andsum(m)
columns, containing the inclusion indicators for the differences in category thresholds from each iteration of the Gibbs sampler.difference_threshold
: A matrix withiter
rows andsum(m)
columns, containing the parameter states for the differences in category thresholds from each iteration of the Gibbs sampler.interactions
: A matrix withiter
rows andp * (p - 1) / 2
columns, containing parameter states for the nuisance pairwise interactions in each iteration of the Gibbs sampler.thresholds
: A matrix withiter
rows andsum(m)
columns, containing parameter states for the nuisance category thresholds in each iteration of the Gibbs sampler.
Column averages of these matrices provide the model-averaged posterior means.
In addition to the results of the analysis, the output lists some of the arguments of its call. This is useful for post-processing the results.
Details
In the first group, the pairwise interactions between the variables \(i\) and \(j\) are modeled as $$\sigma_{\text{ij}} = \theta_{\text{ij}} + \delta_{\text{ij}} / 2,$$ and in the second group as $$\sigma_{\text{ij}} = \theta_{\text{ij}} - \delta_{\text{ij}} / 2,$$ The pairwise interaction parameter \(\theta_{\text{ij}}\) denotes an overall effect that is considered nuisance, and attention is focused on the pairwise difference parameter \(\delta_{\text{ij}}\), which reflects the difference in the pairwise interaction between the two groups.
The bgmCompare
function supports two types of ordinal variables, which
can be mixed. The default ordinal variable introduces a threshold parameter
for each category except the lowest category. For this variable type, the threshold parameter for
variable \(i\), category \(c\), is modeled as
$$\mu_{\text{ic}} = \tau_{\text{ic}} + \epsilon_{\text{ic}} / 2,$$
in the first group and in the second group as
$$\mu_{\text{ic}} = \tau_{\text{ic}} - \epsilon_{\text{ic}} / 2,$$
The category threshold parameter \(\tau_{\text{ic}}\) denotes
an overall effect that is considered nuisance, and attention is focused on the
threshold difference parameter \(\epsilon_{\text{ic}}\),
which reflects the difference in threshold of for variable \(i\), category
\(c\) between the two groups.
The Blume-Capel ordinal variable assumes that there is a specific reference category, such as “neutral” in a Likert scale, and responses are scored according to their distance from this reference category. In the first group, the threshold parameters are modelled as $$\mu_{\text{ic}} = (\tau_{\text{i1}} + \epsilon_{\text{i1}} / 2) \times \text{c} + (\tau_{\text{i2}} + \epsilon_{\text{i2}} / 2) \times (\text{c} - \text{r})^2,$$ and in the second groups as $$\mu_{\text{ic}} = (\tau_{\text{i1}} - \epsilon_{\text{i1}} / 2) \times \text{c} + (\tau_{\text{i2}} - \epsilon_{\text{i2}} / 2) \times (\text{c} - \text{r})^2.$$ The linear and quadratic category threshold parameters \(\tau_{\text{i1}}\) and \(\tau_{\text{i2}}\) denote overall effects that are considered nuisance, and attention is focused on the two threshold difference parameters \(\epsilon_{\text{i1}}\) and \(\epsilon_{\text{i2}}\), which reflect the differences in the quadratic model for the variable \(i\) between the two groups.
Bayesian variable selection is used to model the presence or absence of the difference parameters \(\delta\) and \(\epsilon\), which allow us to assess parameter differences between the two groups. Independent spike and slab priors are specified for these difference parameters. The spike and slab priors use binary indicator variables to select the difference parameters, assigning them a diffuse Cauchy prior with an optional scaling parameter if selected, or setting the difference parameter to zero if not selected.
The function offers two models for the probabilistic inclusion of parameter differences:
Bernoulli Model: This model assigns a fixed probability of selecting a parameter difference, treating them as independent events. A probability of 0.5 indicates no preference, giving equal prior weight to all configurations.
Beta-Bernoulli Model: Introduces a beta distribution prior for the inclusion probability that models the complexity of the configuration of the difference indicators. When the alpha and beta shape parameters of the beta distribution are set to 1, the model assigns the same prior weight to the number of differences present (i.e., a configuration with two differences or with four differences is a priori equally likely).
Inclusion probabilities can be specified for pairwise interactions with
pairwise_difference_probability
and for category thresholds with
threshold_difference_probability
.
The pairwise interaction parameters \(\theta\), the category threshold parameters \(\tau\), and, in the not yet implemented paired-samples designs, the between-sample interactions \(\omega\) are considered nuisance parameters that are common to all models. The pairwise interaction parameters \(\theta\) and the between-sample interactions \(\omega\) are assigned a diffuse Cauchy prior with an optional scaling parameter. The exponent of the category threshold parameters \(\tau\) are assigned beta-prime distribution with optional scale values.