💡 Download the complete guide to AI-generated synthetic data!
Go to the ebook


SMOTE is a synthetic minority oversampling technique based on nearest neighbor information. It was first developed for a numeric column where the minority class is upsampled by taking each sample of the minority class and its nearest neighbors and forming a linear combination of them. SMOTEN-C also takes categorical columns into account and selects the most frequent category of nearest neighbors. This interpolation is better than naive resampling, which only randomly resamples for an existing sample, but on the other hand, linear interpolation cannot achieve the complexity of new minority samples that can be achieved using synthetic data generation.