<- read.csv(file="https://stats.idre.ucla.edu/stat/data/binary.csv")
df <- as.matrix(data.frame(intercept=rep(x=1, times=nrow(df)),gpa=df$gpa, gre=df$gre))
X <- df$admit y
In this document, I demonstrate how the Newton-Raphson Method can be used to approximate the Maximum Likelihood Estimates of the parameters of a Logistic Regression model.
1 Logistic Regression
Let
Let
2 Maximum Likelihood Estimation
In Maximum Likelihood Estimation, the problem is to find values of
where
2.1 Likelihood Function
The likelihood function for
Therefore, the log-likelihood function for
2.2 First Derivative
The first derivative of the log-likelihood function with respect to the (r)-th parameter
Therefore,
2.2 Second Derivative
The second derivative of the log-likelihood function with respect to the
Therefore,
where
3. Newton-Raphson Method
Let
3.1 R
In this section, I demonstrate how the Newton-Raphson Method can be used to approximate the Maximum Likelihood Estimates of the parameters of a Logistic Regression model in
First I load and prepare the data. The observed random sample
Second, I define the function
#' newtonRaphsonLogReg
#' @param X: A data matrix
#' @param y: A response vector
#' @param maxit: (Optional) maximum number of iterations
#' @param thr: (Optional) convergence threshold
#' @return Maximum likelihood estimates of regression parameters
<- function(X, y, maxit=10, thr=0.001) {
newtonRaphsonLogReg <- solve(t(X)%*%X)%*%t(X)%*%y # Initial Guess
b <- 1 # Initial Iteration Count
k <- max(abs(b)) # Initial Delta
d while(k<=maxit & d>thr) {
<- as.vector(exp(X%*%b)/(1+exp(X%*%b))) # Probabilities
p <- diag(x=p*(1-p),nrow=length(p),ncol=length(p)) # Weight Matrix
W <- X%*%b+solve(W)%*%(y-p) # Adjusted Response Vector
z <- solve(t(X)%*%W%*%X)%*%t(X)%*%W%*%z # New Guess
b_new <- max(abs(b_new-b)) # Update Delta
d <- b_new # Update Guess
b cat(paste0("[Iteration ", k, "]: ", d, "\n"))
<- k+1
k
}return(b)
}
Last, I use the function.
newtonRaphsonLogReg(X=X, y=y)
[Iteration 1]: 3.56198602366629
[Iteration 2]: 0.824206953925452
[Iteration 3]: 0.0351788519326073
[Iteration 4]: 7.20576835240294e-05
[,1]
intercept -4.949378062
gpa 0.754686856
gre 0.002690684
The results are similar to those obtained using the
summary(glm(admit~gpa+gre, family="binomial", data=df))
Call:
glm(formula = admit ~ gpa + gre, family = "binomial", data = df)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.949378 1.075093 -4.604 4.15e-06 ***
gpa 0.754687 0.319586 2.361 0.0182 *
gre 0.002691 0.001057 2.544 0.0109 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 499.98 on 399 degrees of freedom
Residual deviance: 480.34 on 397 degrees of freedom
AIC: 486.34
Number of Fisher Scoring iterations: 4
4 References
Thomas, G. B., Weir, M. D., Hass, J., & Giordano, F. R. (2005). Thomas’ calculus (pp. 2379-8858). Addison-Wesley.