The logit is the log of the odds. This can be mapped back into a probability (with the sigmoid function) and then back to a class.
The logistic sigmoid is defined as \[f(x) = \frac{1}{1+e^{-x}} = \frac{e^x}{e^x+1}\] The sigmoid allows to transform values from \(-\infty \lt x \lt \infty\) into a \(-1 < f(x) < 1\) interval.
Assume p(x) be the linear function. However, the problem is that p is the probability that should vary from 0 to 1 whereas p(x) is an unbounded linear equation. To address this problem, let us assume, log p(x) be a linear function of x and further, to bound it between a range of (0,1), we will use logit transformation.
Solve for \(p(x)\) using exponential on both sides, isolate \(p(x)\) and factoringthe coefficient. We get: \[p(x) = \frac{e^{\alpha_0 + \alpha \cdot x}}{e^{\alpha_0 + \alpha \cdot x}+1}\]
Since Logistic regression predicts probabilities, we can fit it using likelihood. Therefore, for each training data point x, the predicted class is y. Now, the likelihood can be written as: \[L(\alpha_0, \alpha) = \prod_{i=1}^n p(x_i)^{y_i} \left(1-p(x_i) \right)^{1-y_i}\] Take log on both side, we can transform that product into a sum. \[Log(L(\alpha_0, \alpha)) = \sum_{i=1}^n y_i \cdot log(p(x_i)) + (1-y_i) \cdot log((1-p(x_i) )\]