全文下载: 202404020.pdf
文章编号: 16726987(2024)04014613; DOI: 10.16351/j.16726987.2024.04.020
李双翼a,b, 刘发荣a,b, 任胜a,b, 于彬b*(青岛科技大学 a.数理学院; b.数据科学学院, 山东 青岛 266061)
摘要: 单细胞多组学测序正在广泛应用于生物医学研究中,并产生大量的多样性组学数据。然而原始的单细胞多组学数据包含多种类型的测序噪声和冗余信息,对后续生物医疗层面的分析造成困难。现有的降噪方法主要依赖于单一的数据分布假设,并针对性的处理单个组学数据,这对模型联合处理不同组学数据造成极大地限制。本研究提出一种使用单细胞多组学数据降噪的分析方法,称为scMAED (singlecell multiomics data via a multihead autoencoder network to denoising)。模型在多头自动编码器网络中添加了分类解码器,以无监督的方式来最大程度的去除数据噪声。首先,使用两个编码器独立学习多组学数据的内部特征,并联合输出的低维特征进行共同解码。其次,分类解码器不做任何数据分布假设,通过使用预测的细胞簇标签来反馈数据信息,以最大限度的去除复杂噪声。最后,使用主成分分析和 tSNE进行可视化。本文基于模拟数据集和真实的小鼠数据集对模型进行性能评估,结果显示scMAED在降噪效果上优于实验中的对比方法,并能够极大的改善单细胞多组学数据的质量。
关键词: 单细胞多组学数据; 深度学习; 多头自编码网络; 降噪
中图分类号: Q 811.4文献标志码: A
引用格式: 李双翼, 刘发荣, 任胜, 等 基于多头自编码网络的单细胞多组学数据无监督降噪[J]. 青岛科技大学学报(自然科学版), 2024, 45(4): 146158.
LI Shuangyi, LIU Farong, REN Sheng, et al. Unsupervised denoising of singlecell multiomics data based on multihead autoencoder network[J]. Journal of Qingdao University of Science and Technology(Natural Science Edition), 2024, 45(4): 146158.
Unsupervised Denoising of SingleCell MultiOmics Data Based on
MultiHead Autoencoder Network
LI Shuangyia,b, LIU Faronga,b, REN Shenga,b, YU Binb
(a. College of Mathematics and Physics; b. College of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China)
Abstract: Singlecell multiomics sequencing is being widely used in biomedical research and generates large amounts of diverse omics data. However, raw singlecell multiomics data contains multiple types of sequencing noise and redundant information, which makes subsequent biomedical analysis difficult. Existing denoising methods mainly rely on a single data distribution assumption and process a single omics data in a targeted manner, which greatly limits the joint processing of different omics data by the model. Therefore, we design and propose an analytical method for denoising using singlecell multiomics data, called scMAED (singlecell multiomics data via a multihead autoencoder network to denoising). The model adds a classification decoder to the multihead autoencoder network to remove the maximum noise from the data in an unsupervised manner. First, two encoders are used to independently learn the internal features of the multiomics data, and jointly decode the output lowdimensional features. Second, the classification decoder does not make any data distribution assumptions, and uses the predicted cell cluster labels to feed back data information to minimize complex noise. Finally, we use principal component analysis and tSNE for visualization. In this paper, we evaluate the performance of the model based on simulated datasets and real mouse datasets. The results show that scMAED is superior to the experimental comparison method in denoising effect, and can greatly improve the quality of singlecell multiomics data.
Key words: singlecell multiomics data; deep learning; multihead autoencoder network; noise reduction
收稿日期: 20231003
基金项目: 国家自然科学基金项目(62172248); 山东省自然科学基金项目(ZR2021MF098).
作者简介: 李双翼(1998—),男,硕士研究生.*通信联系人.