Publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- Nat. Mach. Intell.Rigorous integration of single-cell ATAC-seq data using regularized barycentric mappingShuchen Zhu†, Heyang Hua†, and Shengquan Chen*Nature Machine Intelligence, 2025
Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) deciphers genome-wide chromatin accessibility, providing profound insights into gene regulation mechanisms. With the rapid advance of sequencing technology, scATAC-seq data typically encompasses numerous samples from various conditions, leading to complex batch effects, thus necessitating reliable integration tools. While numerous batch integration tools exist for single-cell RNA sequencing (scRNA-seq) data, inherent data characteristic differences limit their effectiveness on scATAC-seq data. Existing integration methods for scATAC-seq data suffer from several fundamental limitations, such as disrupting the biological heterogeneity and focusing solely on low-dimensional correction, which can lead to data distortion and hinder downstream analysis. Here we propose Fountain, a deep learning framework for scATAC-seq data integration via rigorous barycentric mapping. Fountain regularizes barycentric mapping with geometric data information to achieve biological heterogeneity-preserving integration. Comprehensive experiments across diverse real-world datasets demonstrate the advantages of Fountain over existing methods in batch correction and biological conservation. Additionally, the trained Fountain model can integrate data from new batches alongside already integrated data without retraining, enabling continuous online data integration. Moreover, Fountain’s reconstruction strategy generates batch-corrected ATAC profiles, improving the capture of cellular heterogeneity and revealing cell type-specific implications such as expression enrichment analysis and partitioned heritability analysis.
- Nat. Commun.Triple-effect correction for Cell Painting data with contrastive and domain-adversarial learningChengwei Yan†, Yu Zhang†, Jiuxin Feng†, and 8 more authorsNaure Communications, 2025
Cell Painting (CP), as a high-throughput imaging technology, generates extensive cell-stained imaging data, providing unique morphological insights for biological research. However, CP data contains three types of technical effects, referred to as triple effects, including batch effects, gradient-influenced row and column effects (well position effects). The interaction of various technical effects can obscure true biological signals and complicate the characterization of CP data, making correction essential for reliable analysis. Here, we propose cpDistiller, a triple-effect correction method specially designed for CP data, which leverages a pre-trained segmentation model coupled with a semi-supervised Gaussian mixture variational autoencoder utilizing contrastive and domain-adversarial learning. Through extensive qualitative and quantitative experiments across various CP profiles, we demonstrate that cpDistiller effectively corrects triple effects, especially well position effects, a challenge that no current methods address, while preserving cellular heterogeneity. Moreover, cpDistiller effectively captures system-level phenotypic responses to genetic perturbations and reliably infers gene functions and interactions both when combined with scRNA-seq data and independently. cpDistiller also excels at identifying gene and compound targets, which is a critical step in drug discovery and broader biological research.
- Briefings Bioinf.Graph neural networks for single-cell omics data: a review of approaches and applicationsSijie Li†, Heyang Hua†, and Shengquan Chen*Briefings in Bioinformatics, 2025
Rapid advancement of sequencing technologies now allows for the utilization of precise signals at single-cell resolution in various omics studies. However, the massive volume, ultra-high dimensionality, and high sparsity nature of single-cell data have introduced substantial difficulties to traditional computational methods. The intricate non-Euclidean networks of intracellular and intercellular signaling molecules within single-cell datasets, coupled with the complex, multimodal structures arising from multi-omics joint analysis, pose significant challenges to conventional deep learning operations reliant on Euclidean geometries. Graph neural networks (GNNs) have extended deep learning to non-Euclidean data, allowing cells and their features in single-cell datasets to be modeled as nodes within a graph structure. GNNs have been successfully applied across a broad range of tasks in single-cell data analysis. In this survey, we systematically review 107 successful applications of GNNs and their six variants in various single-cell omics tasks. We begin by outlining the fundamental principles of GNNs and their six variants, followed by a systematic review of GNN-based models applied in single-cell epigenomics, transcriptomics, spatial transcriptomics, proteomics, and multi-omics. In each section dedicated to a specific omics type, we have summarized the publicly available single-cell datasets commonly utilized in the articles reviewed in that section, totaling 77 datasets. Finally, we summarize the potential shortcomings of current research and explore directions for future studies. We anticipate that this review will serve as a guiding resource for researchers to deepen the application of GNNs in single-cell omics.
- Front. Comput. Sci.Facilitating single-cell chromatin accessibility research with a user-friendly databaseHeyang Hua†, Sijie Li†, Haitian Liang, and 1 more authorFrontiers of Computer Science, 2025
Single-cell chromatin accessibility (scATAC-seq) technology has emerged as a powerful tool for studying gene regulation and cellular heterogeneity. However, the analysis of scATAC-seq data is often complex and requires specialized knowledge and tools. To address this challenge, we have developed a user-friendly database called scATACdb, which provides a comprehensive resource for scATAC-seq data analysis. The database includes a wide range of features, such as data visualization, quality control, and functional annotation, making it accessible to researchers with varying levels of expertise. In this paper, we present the design and implementation of scATACdb, highlighting its key features and functionalities. We also demonstrate its utility through several case studies, showcasing how it can facilitate the analysis of scATAC-seq data and enhance our understanding of gene regulation. We believe that scATACdb will be a valuable resource for researchers in the field of single-cell genomics and will contribute to advancing our knowledge of gene regulation at the single-cell level.
2024
- Protein CellMultiKano: an automatic cell type annotation tool for single-cell multi-omics data based on Kolmogorov–Arnold network and data augmentationSiyu Li†, Xinhao Zhuang†, Songbo Jia†, and 8 more authorsProtein & Cell, 2024
The breakthrough in single-cell omics sequencing technologies has provided an unprecedented level of detail, allowing biologists to explore the patterns of gene activity, and the dynamics of cellular function at the resolution of individual cells. At the forefront of this revolution is single-cell RNA sequencing (scRNA-seq), which measures gene expression of individual cells to characterize transcriptional heterogeneity. Additionally, other single-cell assays, such as single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq), shed light on cellular heterogeneity at the epigenetic level, enhancing our understanding of transcriptional regulation. However, while single-omics sequencing techniques provide valuable insights, they may not capture the intricate relationships between biomolecules in single cells due to their restriction to only one type of omics data. To bridge this gap, recent advancements have led to the development of several joint profiling methods (Cao et al., 2018; Chen et al., 2019; Luecken et al., 2021; Ma et al., 2020), which enable the simultaneous measurement of gene expression and chromatin accessibility, offering a holistic view of the gene regulatory landscape in individual cells.
- INSCscCrab: A Reference-Guided Cancer Cell Identification Method based on Bayesian Neural NetworksHeyang Hua†, Wenxin Long†, Yan Pan†, and 4 more authorsInterdisciplinary Sciences: Computational Life Sciences, 2024
Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular heterogeneity and gene expression dynamics. However, the analysis of scRNA-seq data is often complicated by the presence of noise and batch effects, which can obscure the underlying biological signals. To address this challenge, we have developed a novel method called scCrab, which utilizes a reference-guided approach based on Bayesian neural networks to identify cancer cells in scRNA-seq datasets. By leveraging prior knowledge from reference datasets, scCrab can effectively distinguish cancer cells from normal cells, even in the presence of noise and batch effects. We demonstrate the effectiveness of scCrab through extensive experiments on both simulated and real-world datasets, showing that it outperforms existing methods in terms of accuracy and robustness. Our method provides a powerful tool for analyzing scRNA-seq data and has the potential to advance our understanding of cancer biology.