Regarding high-dimensional heterogeneous data, combined with the existing algorithms poor mining accuracy and parameter sensitivity, this paper proposes a local outlier mining algorithm based on neighborhood density. Use region segmentation to split high-dimensional data into reasonable sub-regions, reducing the difficulty of processing a large amount of high-dimensional data. The kernel neighborhood density is used to replace the average neighborhood density, so that the density calculation has nothing to do with data heterogeneity. Finally, the neighborhood state and outlier state of the data are further determined on the basis of neighborhood density to improve the accuracy of outlier mining. Through artificial and UCI data set simulation results, it shows that data volume and data dimension are the main factors that affect data outlier mining. The accuracy, coverage, and efficiency of the algorithm proposed in this paper are significantly better than those of the comparison algorithm, and it has better adaptability to different types of data sets.
1. Introduction
2. The principle of outlier mining in heterogeneous data
3. Data set region segmentation
4. Outlier mining algorithm
5. Simulation and result analysis
6. Conclusion