A method to remove highly abundant globin transcripts from human blood RNA samples


RNA extracted from blood samples often contains a high proportion of globin mRNA transcripts (Jang et al., 2020). Since RNA globin species are so abundant, they can mask the remaining transcripts of interest in the sequencing data. To prevent this, globin transcripts can be removed from the extracted RNA prior to library preparation.

We have found the GLOBINclear-Human Kit (ThermoFisher Scientific, AM1980) is an effective method to deplete globin transcripts in RNA extracted from human whole blood. The kit utilises hybridisation technology with incorporated biotin/streptavidin binding capacity to physically remove transcripts from globin genes HBA1, HBA2, and HBB.

We compared the sequencing and analysis performance of non-depleted human blood RNA samples to globin-depleted equivalents. Both samples came from the same template RNA stock, and globin depletion was performed prior to library preparation with the cDNA-PCR Sequencing Kit. The samples were then sequenced, and the resulting reads were oriented, trimmed and aligned to the human genome. Aligned reads were assigned to genes and counted. In figures 1 and 2, globin counts are consistently reduced in the globin-depleted samples, allowing the non-globin transcripts to be sequenced at higher levels.

Globin fig 1a Figure 1. Percentage of reads in replicate libraries that assign to globin genes HBA1, HBA2, and HBB or any other gene. Globin counts are consistently reduced from approximately 80% of non-depleted control libraries to less than 1% of depleted libraries.


Globin fig 2 Figure 2. Raw read length distributions for a subset of 2 million aligned reads for non-depleted (left) and globin-depleted (right) libraries. In the globin-depleted library, the three globin genes no longer dominate the sample.

Our data demonstrates the GLOBINclear-Human Kit (ThermoFisher Scientific, AM1980) is highly effective in removing globin RNA species from blood samples. Over 99% of reads fall outside the three globin genes in the depleted samples, whereas these reads only make up approximately 20% of non-deleted samples (Figures 1 and 2). Globin depletion was found to not impact overall read count, meaning the number of reads from other transcripts are substantially increased. This enables a better characterisation of transcript diversity in depleted samples and detection of transcripts that are of lower abundance.

Gene count correlation between the non-depleted and depleted libraries is high (Figure 3), indicating the method does not bias quantification of the rest of the transcriptome. The globin-depleted libraries were also found to have a similar transcript 5’-to-3' coverage as non-depleted libraries, showing that the depletion method does not result in degradation or fragmentation of RNA (Figure 4).

Globin fig 3 Figure 3. Correlation in gene counts. The globin-depleted sample is shown on the x-axis and the control on the y-axis. Shades of blue indicate areas containing 50%, 80%, 95%, and 99% of the data. The three globin genes (purple points) have much higher counts in the non-depleted sample.


Globin fig 4 Figure 4. Reads and coverage displayed in the Integrative Genomics Viewer (IGV) using an equal number of reads from each sample. (A) The HBB globin gene. HBB reads are reduced in depleted samples in comparison with non-depleted samples. (B) GAPDH. The depletion of globin reads resulted in increased counts from other genes.

References

(1) Jang JS, Berg B, Holicky E, et al. Comparative evaluation for the globin gene depletion methods for mRNA sequencing using the whole blood-derived total RNAs. BMC Genomics. 2020;21(1):890. Published 2020 Dec 11. doi:10.1186/s12864-020-07304-4

Last updated: 9/4/2023

Document options