Home > FFT 2023 > Rob Patro

Rob Patro (UMD)


Promises and pitfalls of dimensionality reduction and harmonic analysis in single-cell analysis


Single-cell sequencing has proven an exciting and potentially transformational biotechnology. For example, existing popular protocols for single-cell RNA-sequencing (scRNA-seq) are able to perform transcriptome-wide measurement across tens of thousands of cells per-sample, producing sequencing data that can be demultiplexed and counted in silico to yield gene-by-cell expression matrices that act as the starting point for a plethora of downstream analyses. These analyses are broad in scope, ranging e.g. from the study of organogenesis through exploring treatment resistance in tumor tissue, yet they share the starting point of a large, highly-sparse matrix representing the inferred counts of each annotated gene within each assayed cell.

The high ambient dimensionality of these data, combined with the high degree of sparsity and measurement noise make the direct analysis of these expression matrices challenging. Therefore, standard analyses adopt various forms of dimensionality reduction and heavy filtering of the data. The resulting low-dimensional representations are then used to perform tasks like visualization, clustering, classification, and dynamical cell state inference.

Unfortunately, the overall effects and adequacy of the standard filtering and dimensionality reduction approaches are not fully understood, neither are there robust techniques for inferring the appropriate "intrinsic" dimensionality of the data or characterizing the adherence of the data to various "manifold" assumptions. In fact, even the most appropriate measure of distance between the gene expression vectors of cells remains somewhat of an open question.

In this perspective talk, I will give an overview of several of the current challenges faced in the initial processing of single-cell RNA-seq analysis. I will also highlight some methods and approaches developed by the community that make effective use of principles from harmonic analysis to enable analysis of these noisy and high-dimensional measurements. Finally, I will attempt to characterize some open questions whose resolution may have broad impacts on the way we process these data and, therefore, on the manifold subsequent analyses that are performed.