Database of Small Human Noncoding RNAs

DASHR v2.0 data summary

We categorized all 802 small RNA sequencing experiments integrated into DASHR v2.0 into four data collections:

  1. DASHR1-GEO data collection-consists of 197 smRNA-seq datasets from DASHR v1.0(1);
  2. DASHR2-GEO data collection-consists of 365 new Illumina smRNA-seq experiments curated from GEO/SRA;
  3. ENCODE-GEO data collection –consists of all 72 short total RNA sequencing experiments (whole-cell) available in ENCODE transcriptome data (GSE24565);
  4. ENCODE-portal data collection -consists of all 168 smallRNA-seq experiments in ENCODE portal.

Each of the data collections in DASHR v2.0 is available for both GRCh37/hg19 and GRCh38/hg38 reference genomes.

Note that DASHR1-GEO and DASHR2-GEO sequencing datasets are generated using the TruSeq Small RNA Library Preparation Kit (Illumina), while ENCODE datasets are generated using a different, short total RNA-seq protocol.

DASHR1-GEO data collection

This data collection includes 197 smRNA-seq datasets from DASHR v1.0

List of studies and samples included into DASHR1-GEO data collection

DASHR2-GEO data collection

As in previous DASHR release, we continued our efforts to manually collect publicly available raw Illumina smRNA-seq data from GEO and SRA (12). In total, we included into DASHR2-GEO collection 365 datasets derived from normal (non-diseased) human tissue experiments and cell types (we excluded any experiments with >51 nts sequencing read length that corresponded to longer RNAs). Datasets in DASHR2-GEO collection were grouped by corresponding experimental studies into 34 tissues/cell types.

List of studies and samples included into DASHR2-GEO data collection

ENCODE-GEO data collection

The raw sequencing data with read length 36bp, 76bp or 101bp (FASTQ formats) were obtained from ENCODE GEO entry (accession: GSE24565):
For ENCODE-GEO data collection, we included all 72 datasets from experiments performed on “whole cell” as specified under the “localization” description.

List of samples included into DASHR v2.0 ENCODE-GEO data collection

ENCODE-portal data collection

The raw sequencing data with read length 36bp, 76bp or 101bp (FASTQ formats) for the ENCODE project were obtained from two different sources: the ENCODE data portal.

ENCODE-portal data collection in DASHR v2.0 includes 168 datasets from ENCODE-portal (https://www.encodeproject.org/). We included all experiments from “small RNA-seq” assay category released before August 2016, and filtered out experiments with ‘not_compliant’ audit status, as well as experiments restricted to a particular cell compartment.

List of samples included into DASHR v2.0 ENCODE-portal data collection

DASHR annotation

DASHR v2.0 gene and sncRNA mature product annotation have been updated to include:
1. GRCh38/hg38 and GRCh37/hg19 sncRNA gene and sncRNA mature product annotations;
2. annotations for non-small RNA genes and other genomic elements for both GRCh37/hg19 and GRCh38/hg38.

DASHR v2.0 contains 68,135 (GRCh37/hg19) and 65,156 (GRCh38/hg38) annotations for sncRNA genes and mature RNA products, as well as 1,469,297 (GRCh37/hg19) and 1,811,078 (GRCh38/hg38) annotations for non-small RNA genes and other genomic elements.

Summary of DASHR v2.0 annotations (XLS)