Discriminative vs Generative Models

Introduction.

Understanding the two main types of machine learning models—discriminative and generative—is crucial. These categories of modeling serve as the backbone of many machine-learning applications.

Discriminative Models

Discriminative models work as discriminants, making decisions based on data or a collection of data features. They work by directly determining the decision boundaries between different classes. These models are often employed for classification tasks, where the goal is to determine the category to which a new piece of data belongs. It's worth noting that while these models are excellent at defining the decision surface, they do not model the underlying data distribution. So, how do these models make a decision?

We'll delve deeper into this in a future session on decision theory, but in a nutshell, discriminative models make comparisons based on learned thresholds or decision boundaries, especially in higher-dimensional data that includes more than one feature, such as height and eye color.

One common example of a discriminative model is logistic regression, where the output is a decision: 1 or 0. 

In logistic regression, the probability P(y = 1| X) is given by:

In this equation, θ represents the weight vector that the model learns during training. X is the input data vector, and the output is the probability that the data belongs to the specific class y=1. It's worth noting that for other classes, say y=0, a different set of weights θ is used. In a more complex setup like a neural network, these weights can be viewed as edge values connecting the input to different output class nodes. This allows the neural network to learn different decision boundaries for each class, thereby making it capable of multi-class classification. In such scenarios, each output node has its own set of weights, ensuring that the decision boundaries for different classes are independent and optimized based on the respective class data.

Another example is the Support Vector Machine (SVM), which learns a decision boundary that separates two or more classes within the sample space. The name Support Vector Machine (SVM) might be confusing at first glance. Support Vectors are the data points that lie closest to the decision boundary, and 'machine' in this context refers to the algorithmic model that performs a specific task. Combining these two elements, SVM is a discriminative model that makes decisions based on boundaries shaped by these crucial support vectors in the sample space.

The visual above demonstrates how a Support Vector Machine (SVM) learns a decision boundary to separate two classes within the sample space.

Generative Models

As the name suggests, generative models create something new—samples that weren't initially part of the collected data. These models don't just learn the data distribution; they learn the joint probability distribution P(X,y). This makes them versatile and applicable to tasks beyond just classification.

What's unique about these generated samples? These models learn the underlying distribution, which can be thought of as a sparse representation of the true distribution. With enough data and training, these models converge to the true distribution of the sample space. This is particularly useful in tasks requiring a deep understanding of data distributions, such as anomaly detection.

It's important to understand that a distribution is characterized either by a Probability Mass Function (PMF) for discrete random variables or a Probability Density Function (PDF) for continuous random variables. Once we know the PMF or PDF, we can derive the Cumulative Distribution Function (CDF)—by summing in the case of discrete variables, or integrating for continuous ones. With a known CDF, generating a new sample can often be done through inverse transformation. One additional note is that in the context of generative modeling, both discrete and continuous distribution can be encountered so it's good to know the difference between PMF and PDF.

One powerful example of generative modeling is Bayesian inference. In Bayesian models, we learn the distribution for different classes. For instance, in a Gaussian generative model, the joint distribution for a particular class and its features can be characterized by:​​

To generate a sample from a particular class, we can use inverse mapping techniques to transform a randomly chosen value from the output space back to the sample space of the class.

Another useful application for the generative model is anomaly detection:

Anomaly detection is one of the many applications where generative models excel. In this context, a generative model could be trained to learn the normal behavior of a system, represented by some underlying data distribution. Once the model is trained, it can then be used to identify anomalies or outliers, which are data points that significantly deviate from the learned distribution.

For instance, they can identify unusual patterns in network traffic to flag potential cybersecurity threats. Any network activity that significantly deviates from this learned Gaussian distribution could be flagged as a potential cybersecurity threat. See the following for a visual demonstration:


The green histogram represents the normal data distribution, while the red points are anomalies that deviate significantly from this distribution. In a real-world application, such points could be flagged for further investigation.

Conclusion

Understanding these models is a fundamental requirement for solving real-world problems using machine learning, particularly in fields requiring a deep grasp of data distributions and decision boundaries.