Identification of candidate genes in ischemic cardiomyopathy by gene expression omnibus database

Background Ischemic cardiomyopathy (ICM) is one of the most usual causes of death worldwide. This study aimed to find the candidate gene for ICM. Methods We studied differentially expressed genes (DEGs) in ICM compared to healthy control. According to these DEGs, we carried out the functional annotation, protein-protein interaction (PPI) network and transcriptional regulatory network constructions. The expression of selected candidate genes were confirmed using a published dataset and Quantitative real time polymerase chain reaction (qRT-PCR). Results From three Gene Expression Omnibus (GEO) datasets, we acquired 1081 DEGs (578 up-regulated and 503 down-regulated genes) between ICM and healthy control. The functional annotation analysis revealed that cardiac muscle contraction, hypertrophic cardiomyopathy, arrhythmogenic right ventricular cardiomyopathy and dilated cardiomyopathy were significantly enriched pathways in ICM. SNRPB, BLM, RRS1, CDK2, BCL6, BCL2L1, FKBP5, IPO7, TUBB4B and ATP1A1 were considered the hub proteins. PALLD, THBS4, ATP1A1, NFASC, FKBP5, ECM2 and BCL2L1 were top six transcription factors (TFs) with the most downstream genes. The expression of 6 DEGs (MYH6, THBS4, BCL6, BLM, IPO7 and SERPINA3) were consistent with our integration analysis and GSE116250 validation results. Conclusions The candidate DEGs and TFs may be related to the ICM process. This study provided novel perspective for understanding mechanism and exploiting new therapeutic means for ICM.

technology analysis, bioinformatics have become most frequently used means to identify potential biomarkers in a variety of diseases [5][6][7]. It is reported that a lot of fetal and immediate-early genes are deregulated in the ischemic heart [8]. To our knowledge, many researchers have performed global gene expression to obtain key genes in the underlying mechanisms of ICM [9][10][11]. For example, Qiao et al. reported that differentially expressed genes (DEGs) and transcription factors (TFs) play pivotal roles in ICM progress through regulating gene expression [12]. Li et al. found that the functional annotation and pathway analysis of DEGs was conducive to further studying the interactions between the differentially expressed genes in ICM [13]. Wang et al. found that PHLDA1 might be a novel molecular marker for ICM [14]. Previous studies identified changes in the protein levels of TFs including GATA4, NFAT1, MEF2C, CSX NKX2-5, NF-kB, STAT-3 and AP-1 in cardiomyopathy and cardiopathy model [15][16][17][18][19]. The molecular mechanism of coordinating transcription in ICM has not been completely understood. Therefore, it is essential to find the pathogenic mechanism and develop new diagnostic biomarker.
The appearance of gene microarray data has become an effective means to search DEGs in multiple diseases which help to reveal underlying mechanisms. Genes that cause complex diseases are always involved in common biological processes in various biological networks [20,21]. A comprehensive understanding of disease can be improved by analyzing the biological data of the network module.
Here, the integrated analysis of multiple GEO datasets was performed to identify DEGs between ICM and healthy control. The bioinformatics methods was applied to obtain the ICM related pathways and TFs. The purpose of our study is to better understand the molecular events and pathways of ICM and to develop new therapeutic means for ICM.

Methods
The analysis of microarray data The expression profile of ICM and healthy control were downloaded from GEO database (http://www. ncbi.nlm.nih.gov/geo) with the keywords "ischemia"[-MeSH Terms] OR ischemic [All Fields]) AND "cardiomyopathies" [MeSH Terms] OR cardiomyopathy [All Fields]. Three series of datasets, GSE46224 [22], GSE52601 [23] and GSE5406 [24], were selected for data analyses according to the selection criteria described as follows: (1) Dataset should be wholegenome mRNA expression profile by array. (2) Datasets were obtained by left ventricular tissue samples of ICM and healthy control group. (3) The datasets should be normalized or original.

Identification of DEGs
MetaMA, an R package, is applied to combine data from three GEO datasets. The Benjamini & Hochberg (False discovery rate; FDR) were used to modulate the P-values. The selection criteria for identification of DEGs were: FDR < 0.05. The R package was performed to produce the hierarchical clustering analysis of top 100 DEGs.

Functional annotation
Gene Ontology (GO) classification and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were structured by using GeneCodis (http://genecodis.cnb.csic.es/analysis). The terms with FDR < 0.05 was significant results.

ICM -specific protein-protein interaction (PPI) network
The top 50 DEGs in ICM were applied to construct the PPI network by using Biological General Repository for Interaction Datasets (BioGRID) (http://thebiogrid.org/), and then the PPI network was visualized by Cytoscape (3.6.1) (http://www.cytoscape.org/). The nodes represent proteins and edges connect the nodes to show their relationship.

ICM -specific transcriptional regulatory networks
The corresponding promoters of the top 20 upregulated or down-regulated DEGs were obtained by UCSC (http://genome.ucsc.edu). The TF that regulates these DEGs comes from the matching tool in TRAN SFAC. The ICM -specific transcriptional regulatory network was built by Cytoscape.

Validation in the GEO dataset
The dataset of GSE116250 [25] was downloaded from the GEO database and used to validate the expression pattern of selected DEGs. The dataset GSE116250 was published on Nov 14, 2018 and examined the left ventricle tissue sample consisting of 13 ICM patients and 14 healthy controls.

Confirmation by qRT-PCR
Patients presenting to Beijing Anzhen Hospital from July 2018 to December 2018 for coronary angiography were recruited consecutively for the study. Subjects were included in study as cases when left ventricular ejection fraction (LVEF) of ≤40% and fulfilling one of the following criteria: patients with history of myocardial infarction or revascularization (cardiac bypass surgery or percutaneous coronary intervention), patients with ≥75% stenosis of left main or proximal LAD, or patients with ≥75% stenosis of two or more epicardial vessels [26]. Subjects with LVEF of > 50 and < 50% stenosis in any main coronary artery were included as controls.  Ten patients diagnosed as ICM and 10 controls were enrolled in this study. The detailed characteristics of the patients were listed in Table 1. All patients were first on an empty stomach for 12 h. Then, we collected the blood samples by venipuncture at 7:00-8:00 of the next morning. This study has been approved by the ethics institute of our hospital. The signed informed consents of all the participants were obtained. Total RNA was isolated with the total RNA kit (Invitrogen, China). Fast Quant RT Kit (Invitrogen, China) was utilized to produce the complementary DNA. Then we performed the qRT-PCR with the Super Real PreMix Plus SYBR Green (Invitrogen, USA) on ABI 7500 real-time PCR system. The amplification process was performed under the following conditions: 15 min at 95°C followed by 40 cycles of 10 s at 95°C, 30 s at 55°C, 32 s at 72°C, and 15 s at 95°C, 60 s at 60°C, 15 s extension at 95°C. The 2 − ΔΔCt method was used to address the data. The PCR primers used are displayed in Table 2.

DEGs in ICM
Three datasets (GSE46224, GSE52601 and GSE5406) were obtained from GEO (Table 3). Compared with the healthy controls, 1081 DEGs (578 genes were upregulated and 503 genes were down-regulated) in ICM were obtained. All DEGs between ICM and healthy controls were displayed in Supplementary Table S1. Top 40 DEGs between ICM and healthy controls were demonstrated in Table 4. Hierarchical clustering of top 100 DEGs was indicated in Fig. 1.

ICM -specific transcriptional regulatory networks
According to TRANSFAC, 64 TFs targeting 40 DEGs (top 20 up-regulated or down-regulated genes) were identified.

Validation in GSE116250
Six DEGs (MYH6, THBS4, BCL6, BLM, IPO7 and SER-PINA3) were selected to verify in GSE116250 dataset. Among them, BCL6, BLM and IPO7 were the hub gene of ICM-specific PPI network. THBS4 was top TFs covering the most downstream DEGs. MYH6 and SERPINA3 were top 40 DEGs in ICM. As displayed in Fig. 6, the expression of six DEGs were consistent with our integration results. MYH6, BCL6, BLM, IPO7 and SERPINA3 were down-regulated while THBS4 was up-regulated in ICM compared with healthy control.

Validation by qRT-PCR
Six DEGs validated in GSE116250 were chose for qRT-PCR verification (Fig. 6). As shown in Fig. 7, MYH6,   Fig. 4 The ICM-specific PPI network. Ellipses were used to represent nodes and lines were used to represent edges. Green represents a downward adjustment and red represents a downward adjustment. The black border indicates top20Up/Down BCL6, BLM, IPO7 and SERPINA3 were down-regulated and THBS4 was up-regulated in ICM compared with control. In generally, the validation results of qRT-PCR were consistent with our integration results and GSE116250 validation results.

Discussion
ICM continues to be one of the major diseases that threaten human health [1]. To make an accurate mechanism and find more effective therapeutic strategy for ICM in the early stage, it is need to find new therapeutic targets for ICM. With the emergence of highthroughput microarrays, a number of public resources have been build, among which the national center for biotechnology information (NCBI) GEO is the largest public resource [27]. Bioinformatics analysis based on GEO database provides valuable basis for revealing the pathogenesis of multiple diseases [28][29][30]. Integrated microarrays analysis with different platforms will obtain genome-wide expression profiling with larger sample size which will increase the statistical power than an individual microarray. Investigating abnormal gene expression in upstream TFs -mediated disease states can help to reveal the pathophysiological changes of complex diseases [31]. In the study, we carried out the integrated analysis of three gene expression datasets to identify the DEGs associated with ICM. A 1081 DEGs were identified in ICM with FDR < 0.05. The ICM related pathways and TFs were also obtained by the bioinformatics methods. We selected MYH6, THBS4, BCL6, BLM, IPO7 and SERPINA3 to verify their expression in ICM. Expression of 6 DEGs (MYH6, THBS4, BCL6, BLM, IPO7 and SERPINA3) in qRT-PCR results were consistent with our GEO analysis, which adds evidence to the reliability of our results.
MYH6 encodes the alpha heavy chain subunit of cardiac myosin in the developing atria. It has been reported that mutations of MYH6 associated with hypertrophic and dilated cardiomyopathy [32,33]. MYH6 was associated with congenital heart disease, and indicate that by increase mutation of MYH6 could be associated with congenital heart disease [34]. Mutations in the head domain of MYH6 play a pivotal role in the progress of familial secundum-type atrial septal defects [35]. Jiang et al. found that silencing of mutant MYH6 transcripts in mice inhibited hypertrophic cardiomyopathy [36]. Castellana et al. reported the desmoglein-2/desmocollin-2/MYH6 mutations might determine a mild hypertrophic phenotype associated both to ventricular tachyarrhythmias and atrio-ventricular block [37]. Granados-  Riveron et al. reported that mutations of MYH6 affecting myofibril formation are associated with congenital heart defects, whereas others have identified mutations of the same gene in patients with hypertrophic and dilated cardiomyopathy [38]. Here, MYH6 was down-regulated in patient with ICM in both integration analysis and qRT-PCR confirmation. The KEGG pathway enrichment analyses results showed that MYH6 was significantly enriched pathway of cardiac muscle contraction, hypertrophic cardiomyopathy and dilated cardiomyopathy. Therefore, we hypothesized that MYH6 might play key roles in ICM via regulating signaling pathway of cardiac muscle contraction, hypertrophic cardiomyopathy and dilated cardiomyopathy. THBS4 is one of the exocrine glycoproteins involved in wound healing and tissue remodeling via modulating the repair and remodeling of the extracellular matrix [39,40]. It has been found that THBS4 is continually abnormally expressed in the multiple solid cancers [41][42][43]. Recent research has indicated that THBS4 is involved in severe hypertrophic cardiomyopathy and heart failure pathogenesis [44]. In this study, THBS4 was one of top 6 TFs covering the most downstream DEGs, and was up-regulated in both integration analysis and qRT-PCR confirmation. The results displayed that THBS4 may play a key role in the pathogenesis of ICM. SERPINA3, a protease inhibitor, belongs to the superfamily of serine protease inhibitors. SERPINA3 is an acute phase response gene that is upregulated during inflammation [45]. Masanori et al. found that SERPINA3 may be novel diagnostic and pharmacological targets for heart failure [46]. SERPINA3 has been reported to be involved in the pathogenesis of myocardial ischemia-reperfusion injury [47]. Herein, SERPINA3 was one of top 40 DEGs, and was down-regulated in both integration analysis and qRT-PCR confirmation. Therefore, we hypothesized that SERPINA3 may be involved in the development of ICM.
However, this study has several limitations that need to be acknowledged. The small samples size (10 sample per group) for qRT-PCR confirmation might affect the quality of our results. Although the validation based on GSE116250 suggested that our qRT-PCR results were generally convincing, studies with larger sample size need to be conducted to confirm this conclusion. The identification of DEGs of ICM is a pilot study and further model systems or cell lines experiments are needed to reveal their biological functions in ICM.

Conclusions
The functional annotation, PPI network and ICMspecific transcriptional regulatory network were performed to identify DEGs, TFs and pathways in ICM which provides perspective to reveal the pathology and develop therapeutic targets for the ICM.
Additional file 1. All differentially expressed genes between ICM and healthy control.