How to Build a Logistic Regression Model from Scratch in R

Author

Lam Fu Yuan, Kevin

Published

December 8, 2022

In this document, I demonstrate how the Newton-Raphson Method can be used to approximate the Maximum Likelihood Estimates of the parameters of a Logistic Regression model.

Learning Objectives
  1. Describe the Newton-Raphson Method to approximate the Maximum Likelihood Estimates of the parameters of a Logistic Regression model.
  2. Use the Newton-Raphson Method to approximate the Maximum Likelihood Estimates of the parameters of a Logistic Regression model in R.

1 Logistic Regression

Let Y=(Y1,Y2,,Yn)T be a random sample of size n where YiB(1,πi), i=1,2,,n, are n independent random scalars which follow a Bernoulli distribution with parameter πi. Let y=(y1,y2,,yn)T be an observed random sample in which yi, i=1,2,,n, are n realisations of the random scalars Yi, i=1,2,,n. The probability mass function of Yi is

f(yi;π)=P(yi;π)=P(Yi=yi)=πiyi(1πi)1yi

Let X=(x1,x2,,xn)T be a n×p data matrix in which each row xiT, i=1,2,,n, represents an observation and each column represents a covariate. Let β=(β1,β2,,βp)T be the parameters of the Multiple Logistic Regression model. The model is

log(πi1πi)=xiTβ=β1xi1++β2xi2++βpxip or

πi=exiTβ1+exiTβ=eβ1xi1+β2xi2++βpxip1+eβ1xi1+β2xi2++βpxip

2 Maximum Likelihood Estimation

In Maximum Likelihood Estimation, the problem is to find values of β=(β1,β2,,βp)T so as to

maxL(β)

where L(β) is the likelihood function.

2.1 Likelihood Function

The likelihood function for β is

L(β)=i=1nf(yi;πi)=i=1nP(yi;πi)=i=1nP(Yi=yi)=i=1nπiyi(1πi)1yi=i=1n(exiTβ1+exiTβ)yi(11+exiTβ)1yi

Therefore, the log-likelihood function for β is

l(β)=i=1n[yilog(exiTβ1+exiTβ)+(1yi)log(11+exiTβ)]=i=1n[yixiTβyilog(1+exiTβ)log(1+exiTβ)+yilog(1+exiTβ)]=i=1n[yixiTβlog(1+exiTβ)]

2.2 First Derivative

The first derivative of the log-likelihood function with respect to the (r)-th parameter βr, r=1,2,,p is

dldβr=ddβri=1n[yixiTβlog(1+exiTβ)]=ddβri=1n[yixi1β1+yixi2β2++y_ixipβplog(1+exi1β1+xi2β2++xipβp)]=i=1n(yixirexi1β1+xi2β2++xipβp1+exi1β1+xi2β2++xipβpxir)=i=1n[(yiexiTβ1+exiTβ)xir]=i=1n(yiπi)xir

Therefore,

l(β)=dldβ=[dldβ1dldβ2dldβp]T=[(y1π1)x11+(y2π2)x21++(ynπn)xn1(y1π1)x12+(y2π2)x22++(ynπn)xn2(y1π1)x1p+(y2π2)x2p++(ynπn)xnp]=[x11x21xn1x1px2pxnp][(y1π1)(ypπp)]=XT(yπ)

2.2 Second Derivative

The second derivative of the log-likelihood function with respect to the r-th and r-th parameters βr, βs, r,s=1,2,,p is

d2ldβrdβs=ddβsi=1n[(yiexiTβ1+exiTβ)xir]=ddβsi=1n[yixir(exi1β1+xi2β2++xipβp1+eexi1β1+xi2β2++xipβp)xir]=i=1n[exi1β1+xi2β2++xipβp1+exi1β1+xi2β2++xipβpxirxis(exi1β1+xi2β2++xipβp)2(1+exi1β1+xi2β2++xipβp)2xirxis]=i=1n[exiTβ1+exiTβxirxis(exiTβ)2(1+exiTβ)2xirxis]=i=1n[exiTβ+(exiTβ)2(1+exiTβ)2xirxis(exiTβ)2(1+exiTβ)2xirxis]=i=1nexiTβ(1+exiTβ)2xirxis=i=1nπi(1πi)xirxis

Therefore,

l(β)=[d2ldβ12d2ldβ1dβ2d2ldβ1dβpd2ldβpdβ1d2ldβpdβ2d2ld2βp]=[i=1nπi(1πi)xi1xi1i=1nπi(1πi)xi1xipi=1nπi(1πi)xipxi1i=1nπi(1πi)xipxip]=[x11x21xn1x1px2pxnp][π1(1π1)000π2(1π2)000πn(1πn)][x11x12x1pxn1xn2xnp]=XTWX

where W=diag(πi(1πi)),i=1,2,,n.

3. Newton-Raphson Method

Let θ be a parameter and f be a function of θ. Suppose that we are interested to find the root of f (i.e., the value of θ where f(θ)=0). The Newton-Raphson Method is a numerical method to approximate the root of f (Thomas et al., 2005).

Algorithm 1

Newton-Raphson Method

  1. Guess a first approximation of the root of (f).
  2. Use the first approximation to get a second approximation, the second approximation to get a third approximation, etc., using the formula θk+1=θkf(θk)f(θk) where θk is the Newton-Raphson approximation of the root of f in the k-th iteration.

3.1 R

In this section, I demonstrate how the Newton-Raphson Method can be used to approximate the Maximum Likelihood Estimates of the parameters of a Logistic Regression model in R.

First I load and prepare the data. The observed random sample y is assigned to the variable y; and the data matrix X is assigned to the variable X.

df <- read.csv(file="https://stats.idre.ucla.edu/stat/data/binary.csv")
X <- as.matrix(data.frame(intercept=rep(x=1, times=nrow(df)),gpa=df$gpa, gre=df$gre))
y <- df$admit

Second, I define the function newtonRaphsonLogReg which uses the Newton-Raphson Method to approximate the Maximum Likelihood Estimates of the parameters of a Logistic Regression model.

#' newtonRaphsonLogReg
#' @param X: A data matrix
#' @param y: A response vector
#' @param maxit: (Optional) maximum number of iterations
#' @param thr: (Optional) convergence threshold
#' @return Maximum likelihood estimates of regression parameters

newtonRaphsonLogReg <- function(X, y, maxit=10, thr=0.001) {
  b <- solve(t(X)%*%X)%*%t(X)%*%y # Initial Guess
  k <- 1 # Initial Iteration Count
  d <- max(abs(b)) # Initial Delta
  while(k<=maxit & d>thr) {
    p <- as.vector(exp(X%*%b)/(1+exp(X%*%b))) # Probabilities
    W <- diag(x=p*(1-p),nrow=length(p),ncol=length(p)) # Weight Matrix
    z <- X%*%b+solve(W)%*%(y-p) # Adjusted Response Vector
    b_new <- solve(t(X)%*%W%*%X)%*%t(X)%*%W%*%z # New Guess
    d <- max(abs(b_new-b)) # Update Delta
    b <- b_new # Update Guess
    cat(paste0("[Iteration ", k, "]: ", d, "\n"))
    k <- k+1
  }
  return(b)
}

Last, I use the function.

newtonRaphsonLogReg(X=X, y=y)
[Iteration 1]: 3.56198602366629
[Iteration 2]: 0.824206953925452
[Iteration 3]: 0.0351788519326073
[Iteration 4]: 7.20576835240294e-05
                  [,1]
intercept -4.949378062
gpa        0.754686856
gre        0.002690684

The results are similar to those obtained using the glm function.

summary(glm(admit~gpa+gre, family="binomial", data=df))

Call:
glm(formula = admit ~ gpa + gre, family = "binomial", data = df)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -4.949378   1.075093  -4.604 4.15e-06 ***
gpa          0.754687   0.319586   2.361   0.0182 *  
gre          0.002691   0.001057   2.544   0.0109 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 499.98  on 399  degrees of freedom
Residual deviance: 480.34  on 397  degrees of freedom
AIC: 486.34

Number of Fisher Scoring iterations: 4

4 References

Thomas, G. B., Weir, M. D., Hass, J., & Giordano, F. R. (2005). Thomas’ calculus (pp. 2379-8858). Addison-Wesley.