Lecture: The fuzzy border between Molecular Simulations and data science

Alejandro Rodríguez García (International Center for Theoretical Physics, Trieste, Italy), former student of IQTCUB, will give a lecture next week entitled.

Data sets can be considered an ensemble of realizations drawn from a density distribution. Obtaining a synthetic description of this distribution allows rationalizing the underlying generating process and building human-readable models. In simple cases, visualizing the distribution in a suitable low-dimensional projection is enough to capture its main features but real world data sets are often embedded in a high-dimensional space. Therefore, I present a procedure that allows obtaining such a synthetic description in an automatic way with the only information of pairwise data distances (or similarities). This methodology is based on a reliable estimation of the intrinsic dimension of the dataset and the probability density function coupled with a modified Density Peaks clustering algorithm. The final outcome of all this machinery working together is a hierarchical tree that summarizes the main features of the data set and a classification of the data that maps to which of these features they belong to.


The event will take place on Tuesday 5h March at 12h at Aula QF