Publications

Group highlights

At the end of this page, you can find the full list of publications. All papers are also available on Google Scholar.

The nuclear receptor ERR cooperates with the cardiogenic factor GATA4 to orchestrate transcriptional control of cardiomyocyte differentiation

We find that Estrogen-related receptors (ERR) signaling is necessary for induction of genes involved in mitochondrial and cardiac-specific contractile processes during human induced pluripotent stem cell-derived cardiomyocyte (hiPSC-CM) differentiation.

Sakamoto, T., Batmanov, K., Wan, S., Guo, Y., Lai, L., Vega, R.B. & Kelly, D.P*.

Nature Communications 13, 1991 (2022)

SHARP: hyperfast and accurate processing of single-cell RNA-seq data via ensemble random projection

To process large-scale single-cell RNA-sequencing (scRNA-seq) data effectively without excessive distortion during dimension reduction, we present SHARP, an ensemble random projection-based algorithm that is scalable to clustering 10 million cells. Comprehensive benchmarking tests on 17 public scRNA-seq datasets demonstrate that SHARP outperforms existing methods in terms of speed and accuracy.

Wan, S., Kim, J., & Won, K. J*.

Genome Research 30 (2), 205-213 (2020)

FUEL-mLoc: feature-unified prediction and explanation of multi-localization of cellular proteins in multiple organisms

We present an interpretable and efficient web-server, namely FUEL-mLoc, using Feature-Unified prediction and Explanation of multi-Localization of cellular proteins in multiple organisms. This paper proposes a multi-label predictor based on ensemble linear neighborhood propagation (LNP), namely, LNP-Chlo, which leverages hybrid sequence-based feature information from both labeled and unlabeled proteins for predicting localization of both single- and multi-label chloroplast proteins.

Wan, S*., Mak, M. W*., & Kung, S. Y.

Bioinformatics 33 (5), 749-750 (2017)

Ensemble linear neighborhood propagation for predicting subchloroplast localization of multi-location proteins

This paper proposes a multi-label predictor based on ensemble linear neighborhood propagation (LNP), LNP-Chlo, which leverages hybrid sequence-based feature information from both labeled and unlabeled proteins for predicting localization of both single- and multi-label chloroplast proteins. Experimental results on a stringent benchmark dataset and a novel independent dataset suggest that LNP-Chlo performs at least 6% (absolute) better than SOTA predictors. This paper also demonstrates that ensemble LNP significantly outperforms LNP based on individual features.

Wan, S*., Mak, M. W*., & Kung, S. Y.

Journal of Proteome Research 15 (12), 4755-4762 (2016)

Mem-mEN: predicting multi-functional types of membrane proteins by interpretable elastic nets

This paper proposes an efficient predictor, namely Mem-mEN, which can produce sparse and interpretable solutions for predicting membrane proteins with single- and multi-label functional types.

Wan, S*., Mak, M. W*., & Kung, S. Y.

IEEE/ACM Transactions on Computational Biology and Bioinformatics 13(4), 706–718 (2016)

mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines

We proposes an efficient multi-label predictor, mGOASVM, for predicting the subcellular localization of multi-location proteins. mGOASVM achieves an actual accuracy of 88.9% and 87.4%, respectively, which are significantly higher than those achieved by the SOTA predictors such as iLoc-Virus and iLoc-Plant.

Wan, S., Mak, M. W*., & Kung, S. Y.

BMC Bioinformatics 13, 1-16 (2012)

 

Full List of publications

Books

  1. Machine learning for protein subcellular localization prediction
    Wan, S., & Mak, MW.
    De Gruyter, ISBN 978-1-5015-0150-0, 2015, Germany

  2. Bioinformatics and machine learning for cancer biology
    Wan, S., Fan, Y., Jiang, C., & Li, S.
    MDPI, ISBN 978-3-0365-4814-2, 2022, Switzerland. (edited book)

Journal Articles (*: corresponding author; #: co-first author; underlined names are lab members)

  1. Functional Connectivity Alterations in Cocaine Use Disorder: Insights from the Triple Network Model and the Addictions Neuroclinical Assessment Framework
    Xu, Z., Liu R., Azzam M., Wan, S., Wang, J*.
    bioRxiv 2024.11.12.623073 (2024)

  2. SAMP: Identifying antimicrobial peptides by an ensemble learning model based on proportionalized split amino acid composition
    Feng J., Sun M., Liu C., Zhang W., Xu C., Wang, J, Wang G., Wan, S*.
    Briefings in Functional Genomics (accepted) (2024), | [preprint]

  3. A review of artificial intelligence-based brain age estimation and its applications for related diseases
    Azzam, M., Xu, Z., Liu, R., Li, L., Soh, K.M., Challagundla, K.B., Wan, S., Wang, J*.
    Briefings in Functional Genomics, elae042 (2024)

  4. WIMOAD: Weighted integration of multi-omics data for Alzheimer’s Disease (AD) diagnosis
    Xiao, H., Wang, J., Wan, S*.
    bioRxiv 2024.09.25.614862 (2024)

  5. RanBALL: An Ensemble Random Projection Model for Identifying Subtypes of B-cell Acute Lymphoblastic Leukemia
    Li, L., Xiao, H., Wu, X., Tang, Z., Khoury J., Wang, J., Wan, S*.
    bioRxiv 2024.09.24.614777 (2024)

  6. A prognostic framework for predicting lung signet ring cell carcinoma via a machine learning based cox proportional hazard model
    Chen, H., Xu, Y., Lin, H., Wan, S*., Luo, L*.
    Journal of Cancer Research and Clinical Oncology 150(364), 1-15 (2024)

  7. Multi-Omics based artificial intelligence for cancer research
    Li, L.#, Sun, M.#, Wang, J., Wan, S*.
    Advances in Cancer Research 163, 303-356 (2024)

  8. The context-dependent epigenetic and organogenesis programs determine 3D vs. 2D cellular fitness of MYC-driven cancer
    Fang, J.#, Singh, S.#, Wells, B., Wu, Q., Jin, H., Janke, L., Wan, S., Steele, J., Connelly, J., Murphy, A., Wang, R., Davidoff, A., Ashcroft, M., Pruett-Miller, S., Yang, J.
    Research Square 10.21203/rs.3.rs-4390765/v1 (2024)

  9. Artificial intelligence for omics data analysis
    Ahmed, Z.#, Wan, S.#, Zhang, F.#, & Zhong, W.#
    BMC Methods 1, 4 (2024)

  10. A review for artificial intelligence based protein subcellular localization
    Xiao, H., Zou, Y., Wang, J., & Wan, S*.
    Biomolecules 14, 409 (2024)

  11. Procyanidin alleviates ferroptosis and inflammation of LPS-induced RAW264.7 cell via the Nrf2/HO-1 pathway
    Zeng, J., Weng, Y., Lai, T., Chen, L., Li, Y., Huang, Q., Zhong, S., Wan, S*., & Luo, L*.
    Naunyn-Schmiedeberg’s Arch Pharmacol (2023)

  12. Editorial: Bioinformatics analysis of omics data for biomarker identification in clinical research, Volume II
    Sun, M.#, Li, L.#, Xiao, H.#, Feng, J.#, Wang, J., & Wan, S*.
    Frontiers in Genetics 14, 1256468 (2023)

  13. USP1 expression driven by EWS:: FLI1 transcription factor stabilizes Survivin and mitigates replication stress in Ewing sarcoma
    Mallard, H. J., Wan, S., Nidhi, P., Hanscom-Trofy, Y. D., Mohapatra, B., Woods, N. T., Lopez-Guerrero, J. A., Llombart-Bosch, A., Machado, I., Scotlandi, K., Kreiling, N. F., Perry, M. C., Mirza, S., Coulter, D. W., Band, V., Band, H., & Ghosal, G.
    Molecular Cancer Research MCR, MCR-23-0323 (2023)

  14. Embedded bioprinting of breast tumor cells and organoids using low concentration collagen based bioinks
    Shi, W., Mirza, S., Kuss, M., Liu, B., Hartin, A., Wan, S., Kong, Y., Mohapatra, B., Krishnan, M., Band, H., Band, V., & Duan, B.
    Advanced Healthcare Materials e2300905 (2023)

  15. Editorial: Ferroptosis as a novel therapeutic target for inflammation-related diseases
    Liang, Y., Su, Z., Mao, X., Wan, S*., & Luo, L*.
    Frontiers in Pharmacology 14, 1152326 (2023)

  16. Editorial: Single cell meets metabolism and cancer biology
    Wang, J., & Wan, S*.
    Frontiers in Oncology 13, 1125186 (2023)

  17. Etiology of oncogenic fusions in 5,190 childhood cancers and its clinical and therapeutic implication
    Liu, Y., Klein, J., Bajpai, R., Dong, L., Tran, Q., Kolekar, P., Smith, J. L., Ries, R. E., Huang, B. J., Wang, Y. C., Alonzo, T. A., Tian, L., Mulder, H. L., Shaw, T. I., Ma, J., Walsh, M. P., Song, G., Westover, T., Autry, R. J., Gout, A. M., Wheeler, D.A., Wan, S., Wu, G, Yang, J.J., Evans, W.E., Loh, M., Easton, J., Zhang, JH., Klco, J.M., & Ma, X.
    Nature Communications 14 (1), 1739 (2023)

  18. The nuclear receptor ERR cooperates with the cardiogenic factor GATA4 to orchestrate transcriptional control of cardiomyocyte differentiation
    Sakamoto, T., Batmanov, K., Wan, S., Guo, Y., Lai, L., Vega, R.B. & Kelly, D.P.
    Nature Communications 13, 1991 (2022)

  19. Alzheimer’s disease-associated U1 snRNP splicing dysfunction causes neuronal hyperexcitability and cognitive impairment
    Chen, P. C., Han, X., Shaw, T. I., Fu, Y., Sun, H., Niu, M., Wang, Z., Jiao, Y., Teubner, B. J. W., Eddins, D., Beloate, L. N., Bai, B., Mertz, J., Li, Y., Cho, J. H., Wang, X., Wu, Z., Liu, D., Poudel, S., Yuan, Z. F., Mancieri, A., Low, J., Lee, H.M., Patton, M., Earls, L., Stewart, E., Vogel, P., Wan, S., Serrano, G., Beach, T., Dyer, M., Smeyne, R., Moldoveanu, T., Chen, T., Wu, G., Zakharenko, S., Yu, G., & Peng, J.
    Nature Aging 2(10), 923–940 (2022)

  20. A sequence obfuscation method for protecting personal genomic privacy
    Wan, S*., & Wang, J*.
    Frontiers in Genetics 13, 876686 (2022)

  21. Genomic profiling identifies genes and pathways dysregulated by HEY1–NCOA2 fusion and shines a light on mesenchymal chondrosarcoma tumorigenesis
    Qi, W., Rosikiewicz, W., Yin, Z., Xu, B., Jiang, H., Wan, S., Fan, Y., Wu, G., & Wang, L.
    The Journal of Pathology 257 (5), 579-592 (2022)

  22. Special issue on bioinformatics and machine learning for cancer biology
    Wan, S*., Jiang, C., Li, S., & Fan, Y.
    Biology 11 (3), 361 (2022)

  23. Identification of a modular super-enhancer in murine retinal development
    Honnell, V., Norrie, J. L., Patel, A. G., Ramirez, C., Zhang, J., Lai, Y. H., Wan, S., & Dyer, M. A.
    Nature Communications 13 (1), 253 (2022)

  24. Improving bulk RNA-seq classification by transferring gene signature from single cells in acute myeloid leukemia
    Wang, R., Zheng, X., Wang, J., Wan, S., Song, F., Wong, M. H., Leung, K. S., & Cheng, L.
    Briefings in Bioinformatics 23 (2), bbac002 (2022)

  25. Editorial: Transcriptional regulation in metabolism and immunology
    Jiang, C., Wan, S., Hu, P., Li, Y., & Li, S.
    Frontiers in Genetics 13, 845697 (2022)

  26. Targeting the spliceosome through RBM39 degradation results in exceptional responses in high-risk neuroblastoma models
    Singh, S., Quarni, W., Goralski, M., Wan, S., Jin, H., Van de Velde, L. A., Fang, J., Wu, Q., Abu-Zaid, A., Wang, T., Singh, R., Craft, D., Fan, Y., Confer, T., Johnson, M., Akers, W. J., Wang, R., Murray, P. J., Thomas, P. G., Nijhawan, D., Davidoff, A.M., & Yang, J.
    Science Advances 7 (47), eabj5405 (2021)

  27. YAP/TAZ maintain the proliferative capacity and structural organization of radial glial cells during brain development
    Lavado, A., Gangwar, R., Paré, J., Wan, S., Fan, Y., & Cao, X.
    Developmental Biology 480, 39-49 (2021)

  28. SHARP: hyperfast and accurate processing of single-cell RNA-seq data via ensemble random projection
    Wan, S., Kim, J., & Won, K.J.
    Genome Research 30 (2), 205-213 (2020)

  29. A critical role for estrogen-related receptor signaling in cardiac maturation
    Sakamoto, T., Matsuura, T. R., Wan, S., Ryba, D. M., Kim, J. U., Won, K. J., Lai, L., Petucci, C., Petrenko, N., Musunuru, K., Vega, R. B., & Kelly, D. P.
    Circulation Research 126 (12), 1685-1702 (2020)

  30. MondoA drives muscle lipid accumulation and insulin resistance
    Ahn, B., Wan, S., Jaiswal, N., Vega, R. B., Ayer, D. E., Titchenell, P. M., Han, X., Won, K. J., & Kelly, D. P.
    JCI Insight 4 (15) (2019)

  31. Predicting subcellular localization of multi-location proteins by improving support vector machines with adaptive-decision schemes
    Wan, S*., & Mak, M.W.*
    International Journal of Machine Learning and Cybernetics 9, 399-411 (2018)

  32. Is congenital amusia a disconnection syndrome? A study combining tract-and network-based analysis
    Wang, J., Zhang, C., Wan, S., & Peng, G.
    Frontiers in Human Neuroscience 11, 473 (2017)

  33. Gram-LocEN: Interpretable prediction of subcellular multi-localization of Gram-positive and Gram-negative bacterial proteins
    Wan, S*., Mak, M.W.*, & Kung, S.Y.
    Chemometrics and Intelligent Laboratory Systems 162, 1-9 (2017)

  34. FUEL-mLoc: feature-unified prediction and explanation of multi-localization of cellular proteins in multiple organisms
    Wan, S*., Mak, M.W.*, & Kung, S.Y.
    Bioinformatics 33 (5), 749-750 (2017)

  35. Transductive learning for multi-label protein subchloroplast localization prediction
    Wan, S*., Mak, M.W., & Kung, S.Y.
    IEEE/ACM Transactions on Computational Biology and Bioinformatics 14(1), 212–224 (2017)

  36. Ensemble linear neighborhood propagation for predicting subchloroplast localization of multi-location proteins
    Wan, S*., Mak, M.W.*, & Kung, S.Y.
    Journal of Proteome Research 15 (12), 4755-4762 (2016)

  37. Benchmark data for identifying multi-functional types of membrane proteins
    Wan, S*., Mak M.W.*, & Kung S.Y.
    Data in Brief 8, 105-107 (2016)

  38. Mem-ADSVM: A two-layer multi-label predictor for identifying multi-functional types of membrane proteins
    Wan, S*., Mak, M.W.*, & Kung, S.Y.
    Journal of Theoretical Biology 398, 32-42 (2016)

  39. Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins
    Wan, S*., Mak, M.W.*, & Kung, S.Y.
    BMC Bioinformatics 17 (1), 1-17 (2016)

  40. Mem-mEN: predicting multi-functional types of membrane proteins by interpretable elastic nets
    Wan, S*., Mak, M.W., & Kung, S.Y.
    IEEE/ACM Transactions on Computational Biology and Bioinformatics 13(4), 706–718 (2016)

  41. mLASSO-Hum: a LASSO-based interpretable human-protein subcellular localization predictor
    Wan, S*., Mak, M.W., & Kung, S.Y.
    Journal of Theoretical Biology 382, 223-234 (2015)

  42. mPLR-Loc: An adaptive-decision multi-label classifier based on penalized logistic regression for protein subcellular localization prediction
    Wan, S., Mak, M.W., & Kung, S.Y.
    Analytical Biochemistry 473, 14-27 (2015)

  43. R3P-Loc: A compact multi-label predictor using ridge regression and random projection for protein subcellular localization
    Wan, S., Mak, M.W., & Kung, S.Y.
    Journal of Theoretical Biology 360, 34-45 (2014)

  44. HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins
    Wan, S., Mak, M.W., & Kung, S.Y.
    PloS One 9 (3), e89545 (2014)

  45. Semantic similarity over gene ontology for multi-label protein subcellular localization
    Wan, S*., Mak, M.W., & Kung, S.Y.
    Engineering 5 (10), 68 (2013)

  46. GOASVM: A subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou’s pseudo-amino acid composition
    Wan, S*., Mak, M.W., & Kung, S.Y.
    Journal of Theoretical Biology 323, 40-48 (2013)

  47. mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines
    Wan, S., Mak, M.W. & Kung, S.Y.
    BMC Bioinformatics 13, 1-16 (2012)

Conference Papers

  1. Processing millions of single cells by SHARP
    Wan, S., Kim, J. & Won, KJ.
    The 11th ACM Conference on Bioinformatics, Computational Biology and Health Informatics (ACM BCB 2020), virtual online, Sep (2020)

  2. Hyper-fast and accurate clustering of ultra-large-scale single-cell data with ensemble random projection
    Wan, S., Kim, J., Fan, Y., & Won, KJ.
    The 2020 International Conference on Machine Learning (ICML) Workshop on Computational Biology, virtual online, Jul (2020)

  3. Protecting genomic privacy by a sequence-similarity based obfuscation method
    Wan, S., Mak, M.W., & Kung, S.Y.
    2017, arXiv preprint arXiv, 1708.02629 (2017)

  4. Ratio utility and cost analysis for privacy preserving subspace projection
    Wan, S., & Kung, S.Y.
    2017, arXiv preprint arXiv, 1702.07976 (2017)

  5. Ensemble random projection for multi-label classification with application to protein subcellular localization
    Wan, S., Mak, MW., Zhang, B., Wang, Y., & Kung, SY.
    2014 IEEE International Conference on Acoustic Speech and Signal Processing (ICASSP’14), Florence, Italy, May 2014, pp. 5999-6003 (2014)

  6. An ensemble classifier with random projection for predicting multi-label protein subcellular localization
    Wan, S., MW, Mak., B, Zhang., Y, Wang. & S. Kung.
    The 2013 IEEE International Conference on Bioinformatics and Biomedicine (BIBM’2013), Shanghai, China, Dec. 2013, pp. 35-42 (2013)

  7. Adaptive thresholding for multi-label SVM classification with application to protein subcellular localization prediction
    Wan, S., Mak, MW., & Kung, SY.
    2013 IEEE International Conference on Acoustic Speech and Signal Processing (ICASSP’13), Vancouver, Canada, May 2013, pp. 3547-3551 (2013)

  8. GOASVM: Protein subcellular localization prediction based on Gene ontology annotation and SVM
    Wan, S., Mak, MW., & Kung, SY.
    2012 IEEE International Conference on Acoustic Speech and Signal Processing (ICASSP’12), Kyoto, Japan, Mar. 2012, pp. 2229-2232 (2012)

  9. Protein subcellular localization prediction based on profile alignment and Gene Ontology
    Wan, S., Mak, MW. & Kung, SY.
    2011 IEEE International Workshop on Machine Learning for Signal Processing (MLSP’11), Beijing, China, Sep. 2011, pp. 1-6 (2011)

  10. A method of continuous data flow embedded within speech signals
    Wan, S., Yao, C., Hu, Y., Zhang, G.
    The 2-nd International Conference on Signal Acquisition and Processing (ICSAP’10), Bangalore, India, Feb. 2010, pp. 362-365 (2010)

Conference Abstracts

  1. RanBALL: Identifying B-cell acute lymphoblastic leukemia subtypes based on an ensemble random projection model, Cancer Research, vol. 84 (6_ Supplement), pp.4907-4907.
    Li, L.., Xiao, H., Khoury, J. D., Wang, J., & Wan, S*.
    AACR Annual Meeting 2024, San Diego, CA, Apr. 2024

  2. Reducing health disparities for prostate adenocarcinoma by integrating multi-omics data via a multi-modal transfer learning approach, Cancer Research, vol. 84 (6_ Supplement), pp.4800-4800.
    Li, L.., Wang, J., & Wan, S*.
    AACR Annual Meeting 2024, San Diego, CA, Apr. 2024

  3. SAMP: An accurate ensemble model based on proportionalized split amino acid composition for identifying antimicrobial peptides
    Feng, J., Sun, M., Zhang, W., Wang, G., & Wan, S*.
    Antimicrobial Peptides, Yesterday, Today and Tomorrow 2023, Omaha, NE, Oct (2023)

  4. B-cell acute lymphoblastic leukemia subtype identification with an ensemble random projection-based machine learning model
    Li, L., Xiao, H., & Wan, S.
    CHRI Scientific Conference 2023, Omaha, NE, Nov (2023)

  5. Integrating multi-omics data by a multi-modal transfer learning model to reduce healthcare disparities for kidney renal clear cell carcinoma
    Li, L., & Wan, S.
    CHRI Scientific Conference 2023, Omaha, NE, Nov (2023)

  6. RanBall: An ensemble random projection-based model for identifying B-cell acute lymphoblastic leukemia subtypes
    Li, L., Xiao, H., & Wan, S.
    PCRG symposium 2023, Omaha, NE, Aug (2023)

  7. RNA-seq and chIP-seq profiling identifies genes and pathways dysregulated by hey1-ncoa2 fusion and shed a light on mesenchymal chondrosarcoma tumorigenesis
    Qi, W., Rosikiewicz, W., Yin, Z., Xu, B., Wan, S., Fan, Y., Wu, G., and Wang, L.
    AACR Annual Meeting 2021, Philadelphia, PA, Apr (2021)

  8. The estrogen-related receptor (ERR) drives cardiac myocyte maturation in cooperation with GATA4
    Sakamoto, T., Wan, S., Batmanov, K., & Kelly, DP.
    Circulation Research, 127(A222-A222) (2020)

  9. Hyper-fast and accurate clustering of ultra-large-scale single-cell data with ensemble random projection
    Wan, S., Kim, J., Fan, Y., & Won, KJ.
    Cell Symposia, The Conceptual Power of Single-Cell Biology, San Francisco, CA, USA, Apr (2020). (postponed due to COVID-19 outbreak)

  10. The estrogen-related receptor coordinates transcription of genes involved in mitochondrial and contractile maturation in human induced pluripotent stem cell-derived cardiac myocytes
    Sakamoto, T., Wan, S., Won, K.J., & Kelly, D.P.
    Circulation, vol. 140 (Suppl_1), pp. A11803-A11803 (2019). (presented in American Heart Association Scientific Session (AHA2019), Philadelphia, PA, USA, Nov (2019)

  11. Estrogen-related receptor signaling is critical for postnatal cardiac maturation
    Matsuura, T.R., Sakamoto, T., Ryba, D.M., Wan, S., & Kelly, D.P.
    Circulation, vol. 140 (Suppl_1), pp. A11803-A11803 (2019). (presented in American Heart Association Scientific Session (AHA2019), Philadelphia, PA, USA)

  12. MondoA mediates myocyte lipid accumulation and insulin resistance driven by chronic nutrient excess
    Ahn, B., Wan, S., Won, K.J., Jaiswal, N., Titchenell, P.M., & D. P. Kelly.
    American Diabetes Association’s 79th Scientific Sessions (ADA2019), San Francisco, CA, USA, Jun (2019) (oral)

  13. Hyper-fast and accurate processing of large-scale single-cell transcriptomics data via ensemble random projection
    Wan, S., Kim, J., & Won, K.J.
    RECOMB/ISCB Conference on Regulatory & Systems Genomics with DREAM Challenges (RSG DREAM 2018), New York, USA, Dec (2018)