Nonparametric Density Function Estimation Using the Epanechnikov Kernel: A Comparison of Kernels and Adaptive Bandwidth with Classification Applications Using R
Keywords:
Density function estimation - Epischenkov's kernel - Adaptive Bandwidth -R statistical packageAbstract
In this study, we present the estimation of the probability density function using nonparametric methods, as a modern statistical tool that enables uncovering the true underlying structure of data distributions without the need for prior assumptions. The focus is placed on kernel density estimation (KDE) using the Epanechnikov kernel, owing to its superior analytical properties that lead to reduced variance compared to the Gaussian kernel.The performance of the two kernels was evaluated using the mean squared error (MSE) criterion, with particular emphasis on the bandwidth parameter (h) as a crucial factor affecting estimation quality. Optimal bandwidth selection was investigated using Silverman’s rule of thumb and the least squares cross-validation (LSCV) method. Furthermore, the study was extended to adaptive approaches, where narrower bandwidths were employed in regions of high data density and wider bandwidths in low-density regions, resulting in improved estimates for heterogeneous data.Both fixed and adaptive methods were implemented on real datasets using the R programming language, and the resulting density estimates were applied to Bayesian classification tasks. The results demonstrated that adaptive bandwidth selection achieved a 30–50% reduction in MSE compared to the fixed bandwidth approach. Overall, the comparative analysis confirms the efficiency of adaptive KDE in constructing accurate classification models that effectively capture complex data structures