Multivariate logistic regression


Multivariate logistic regression is a type of data analysis that predicts any number of outcomes based on multiple independent variables. It is based on the assumption that the natural logarithm of the odds has a linear relationship with independent variables.

Procedure

First, the baseline odds of a specific outcome compared to not having that outcome are calculated, giving a constant. Next, the independent variables are incorporated into the model, giving a regression coefficient and a "P" value for each independent variable. The "P" value determines how significantly the independent variable impacts the odds of having the outcome or not.
It is desirable to use as few variables as necessary, and to have at least 10 - 20 times as many observations as independent variables.

Formula

Multivariate logistic regression uses a formula similar to univariate logistic regression, but with multiple independent variables.
where v is the number of independent variables. The following formula shows that multivariate logistic regression is simply a standard linear regression model:

Types

The two main types of multivariate logistic regression are linear regression and logistic regression.

Linear regression

Linear regression produces results that show a linear relationship with a single independent variable and can be plotted on a graph as a straight line.

Logistic regression

In contrast, logistic regression produces results that show a nonlinear relationship. As a result, plotting the data on a graph produces a curved line called a sigmoid. Unlike linear regression, logistic regression produces results based on two or more independent variables.
The odds ratio associated with a single independent variable can change when other independent variables are accounted for as well. However, the changes are usually insignificant, but they can indicate errors.

Assumptions

Multivariate logistic regression assumes that the different observations are independent. It also assumes that the natural logarithm of the odds ratio and the dependent variables show a linear relationship. However, it does not assume a normal distribution of the dependent variables.
Null hypothesis
A null hypothesis is an assumption that the independent variables do not have any impact on the dependent variable.

Dependent variables

There are three main types of logistic regression dependent variables : Binary, multi-class, and ordinal.
Binary
A binary dependent variable is a variable with only two outcomes, and the possible values must be opposites of each other.
Multi-class
A multi-class dependent variable is a variable with at least three qualitative outcomes, usually with a constant numerical stand-in.
Ordinal
An ordinal dependent variable is a variable with at least three possible outcomes, which are numerically different.

Models

Multivariate logistic regression produces the following models:

Logit models

models distinguish independent and dependent variables.

Log-linear models

Unlike logit models, log-linear models do not distinguish between categories of variables.

Probit models

Probit models function similarly to logit models due to the similarities of normal and logistic distributions. However, since the independent variables are interpreted as standard deviations instead of odds ratios, these models are also more similar to linear models than logit models.

Uses

Scientists

When scientists use logistic regression, they usually include as many independent variables as necessary.

Doctors and physicians

Multivariate logistic regression is used by physicians to:
  • associate certain characteristics with certain outcomes
  • determine the effects of certain techniques
  • give people with certain conditions appropriate treatments
  • develop appropriate models

    Market

Multivariate logistic regression is also used to analyze customer preferences for products.

Artificial intelligence

Multivariate logistic regressions are also used in machine learning.

In comparison to multivariable logistic regression

While both multivariate logistic regression and multivariable logistic regression correlate multiple independent variables to outcomes, multivariate logistic regression correlates independent variables to multiple outcomes, while multivariable logistic regression correlates independent variables to a single outcome.