i3S logo

Bioinformatic Workflows

Principal component analysis (PCA)

Principal component analysis (PCA) is an unsupervised method used to simplify complex, high-dimensional data while still retaining its trends and patterns. This technique is very usefull to summarize genomic data with tens to thousands of dimensions (e.g. genes x samples). Beside PCA, other mathematical approaches exist, t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP). These two methodologies are better than PCA at preserving elements of the data structure from high-dimensional space. UMAP has been particularly useful to precisely define cell types in mixed populations based on data from single-cell RNA-seq experiments by capturing local relationships within groups of transcriptomes in addition to global relationships between distinct groups.

Available pipelines