bioRxiv (Cold Spring Harbor Laboratory)
Low-cost scalable discretization, prediction and feature selection for complex systems
July 2019 • Susanne Gerber, Lukáš Pospíšil, Mohit Navandar, Illia Horenko
Abstract Finding reliable discrete approximations of complex systems is a key prerequisite when applying many of the most popular modeling tools. Common discretization approaches (for example, the very popular K-means clustering) are crucially limited in terms of quality and cost. We introduce a low-cost improved-quality Scalable Probabilistic Approximation (SPA) algorithm, allowing for simultaneous data-driven optimal discretization, feature selection and prediction. Cross-validated applications of SPA to a range…