Gambardella, Gennaro and di Bernardo, Diego (2019) A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining. Frontiers in Genetics, 10. ISSN 1664-8021
pubmed-zip/versions/2/package-entries/fgene-10-00734.pdf - Published Version
Download (1MB)
Abstract
Gene expression in individual cells can now be measured for thousands of cells in a single experiment thanks to innovative sample-preparation and sequencing technologies. State-of-the-art computational pipelines for single-cell RNA-sequencing data, however, still employ computational methods that were developed for traditional bulk RNA-sequencing data, thus not accounting for the peculiarities of single-cell data, such as sparseness and zero-inflated counts. Here, we present a ready-to-use pipeline named gf-icf (gene frequency–inverse cell frequency) for normalization of raw counts, feature selection, and dimensionality reduction of scRNA-seq data for their visualization and subsequent analyses. Our work is based on a data transformation model named term frequency–inverse document frequency (TF-IDF), which has been extensively used in the field of text mining where extremely sparse and zero-inflated data are common. Using benchmark scRNA-seq datasets, we show that the gf-icf pipeline outperforms existing state-of-the-art methods in terms of improved visualization and ability to separate and distinguish different cell types.
Item Type: | Article |
---|---|
Subjects: | Afro Asian Library > Medical Science |
Divisions: | Faculty of Law, Arts and Social Sciences > School of Art |
Depositing User: | Unnamed user with email support@afroasianlibrary.com |
Date Deposited: | 09 Feb 2023 08:25 |
Last Modified: | 24 Aug 2024 13:21 |
URI: | http://classical.academiceprints.com/id/eprint/190 |