Speaker: David Bindel (Cornell)
Title: Latent factor models
Abstract: Approximate low-rank factorizations pervade matrix data analysis, often interpreted in terms of latent factor models. After discussing the ubiquitous singular value decomposition (aka PCA), we turn to factorizations such as the interpolative decomposition and the CUR factorization that offer advantages in terms of interpretability and ease of computation. We then discuss constrained approximate factorizations, particularly non-negative matrix factorizations and topic models, which are often particularly useful for decomposing data into sparse parts. Unfortunately, these decompositions may be very expensive to compute, at least in principal. But in many practical applications one can make a separability assumption that allows for relatively inexpensive algorithms. In particular, we show how to the separability assumption enables efficient linear-algebra-based algorithms for topic modeling, and how linear algebraic preprocessing can be used to clean up of the data and improve the quality of the resulting topics.