Identification of candidate biomarkers and therapeutic agents for heart failure by bioinformatics analysis

Introduction Heart failure (HF) is a heterogeneous clinical syndrome and affects millions of people all over the world. HF occurs when the cardiac overload and injury, which is a worldwide complaint. The aim of this study was to screen and verify hub genes involved in developmental HF as well as to explore active drug molecules. Methods The expression profiling by high throughput sequencing of GSE141910 dataset was downloaded from the Gene Expression Omnibus (GEO) database, which contained 366 samples, including 200 heart failure samples and 166 non heart failure samples. The raw data was integrated to find differentially expressed genes (DEGs) and were further analyzed with bioinformatics analysis. Gene ontology (GO) and REACTOME enrichment analyses were performed via ToppGene; protein–protein interaction (PPI) networks of the DEGs was constructed based on data from the HiPPIE interactome database; modules analysis was performed; target gene—miRNA regulatory network and target gene—TF regulatory network were constructed and analyzed; hub genes were validated; molecular docking studies was performed. Results A total of 881 DEGs, including 442 up regulated genes and 439 down regulated genes were observed. Most of the DEGs were significantly enriched in biological adhesion, extracellular matrix, signaling receptor binding, secretion, intrinsic component of plasma membrane, signaling receptor activity, extracellular matrix organization and neutrophil degranulation. The top hub genes ESR1, PYHIN1, PPP2R2B, LCK, TP63, PCLAF, CFTR, TK1, ECT2 and FKBP5 were identified from the PPI network. Module analysis revealed that HF was associated with adaptive immune system and neutrophil degranulation. The target genes, miRNAs and TFs were identified from the target gene—miRNA regulatory network and target gene—TF regulatory network. Furthermore, receiver operating characteristic (ROC) curve analysis and RT-PCR analysis revealed that ESR1, PYHIN1, PPP2R2B, LCK, TP63, PCLAF, CFTR, TK1, ECT2 and FKBP5 might serve as prognostic, diagnostic biomarkers and therapeutic target for HF. The predicted targets of these active molecules were then confirmed. Conclusion The current investigation identified a series of key genes and pathways that might be involved in the progression of HF, providing a new understanding of the underlying molecular mechanisms of HF. Supplementary Information The online version contains supplementary material available at 10.1186/s12872-021-02146-8.


Introduction
Heart failure (HF) is a cardiovascular disease characterized by tachycardia, tachypnoea, pulmonary rales, pleural effusion, raised jugular venous pressure, peripheral oedema and hepatomegaly [1]. Morbidity and mortality linked with HF is a prevalent worldwide health problem holding a universal position as the leading cause of death [2]. The numbers of cases of HF are rising globally and it has become a key health issue. According to a survey, the prevalence HF is expected to exceed 50% of the global population [3]. Research suggests that modification in multiple genes and signaling pathways are associated in controlling the advancement of HF. However, a lack of investigation on the precise molecular mechanisms of HF development limits the treatment efficacy of the disease at present.
Previous study showed that HF was related to the expression of MECP2 [4] RBM20 [5], CaMKII [6], troponin I [7] and SERCA2a [8]. Toll-Like receptor signaling pathway [9], activin type II receptor signaling pathway [10], CaMKII signaling pathways [11], Drp1 signaling pathways [12] and JAK-STAT signaling pathway [13] were liable for progression of HF. More investigations are required to focus on treatments that enhance the outcome of patients with HF, to strictly make the diagnosis of the disease based on screening of biomarkers. These investigations can upgrade prognosis of patients by lowering the risk of advancement of HF and related complications. So it is essential to recognize the mechanism and find biomarkers with a good specificity and sensitivity.
The recent high-throughput RNA sequencing data has been widely employed to screen the differentially expressed genes (DEGs) between normal samples and HF samples in human beings, which makes it accessible for us to further explore the entire molecular alterations in HF at multiple levels involving DNA, RNA, proteins, epigenetic alterations, and metabolism [14]. However, there still exist obstacles to put these RNA seq data in application in clinic for the reason that the number of DEGs found by expression profiling by high throughput sequencing were massive and the statistical analyses were also too sophisticated [15][16][17][18][19] In this study, first, we had chosen dataset GSE141910 from Gene Expression Omnibus (GEO) (http:// www. ncbi. nlm. nih. gov/ geo/) [20]. Second, we applied for limma tool in R software to obtain the differentially expressed genes (DEGs) in this dataset. Third, the Top-pGene was used to analyze these DEGs including biological process (BP), cellular component (CC) and molecular function (MF) REACTOME pathways. Fourth, we established protein-protein interaction (PPI) network and then applied Cytotype PEWCC1 for module analysis of the DEGs which would identify some hub genes. Fifth, we established target gene-miRNA regulatory network and target gene-TF regulatory network. In addition, we further validated the hub genes by receiver operating characteristic (ROC) curve analysis and RT-PCR analysis.
Finally, we performed molecular docking studies for over expressed hub genes. Results from the present investigation might provide new vision into potential prognostic and therapeutic targets for HF.

Data resource
Expression profiling by high throughput sequencing with series number GSE141910 based on platform GPL16791 was downloaded from the GEO database. The dataset of GSE141910 contained 200 heart failure samples and 166 non heart failure samples. It was downloaded from the GEO database in NCBI based on the platform of GPL16791 Illumina HiSeq 2500 (Homo sapiens).

Identification of DEGs in HF
DEGs of dataset GSE141910 between HF groups and non heart failure groups were respectively analyzed using the limma package in R [21]. Fold changes (FCs) in the expression of individual genes were calculated and DEGs with P < 0.05, |log FC|> 1.158 for up regulated genes and |log FC|< − 0.83 for down regulated genes were considered to be significant. Hierarchical clustering and visualization were used by Heat-map package of R.

Protein-protein interaction network construction and module screening
PPI networks are used to establish all protein coding genes into a massive biological network that serves an advance compassionate of the functional system of the proteome [25]. The HiPPIE interactome (https:// cbdm. uni-mainz. de/ hippie/) [26] database furnish information regarding predicted and experimental interactions of proteins. In the current investigation, the DEGs were mapped into the HiPPIE interactome database to find significant protein pairs with a combined score of > 0.4. The PPI network was subsequently constructed using Cytoscape software, version 3.8.2 (www. cytos cape. org) [27]. The nodes with a higher node degree [28], higher betweenness centrality [29], higher stress centrality [30] and higher closeness centrality [31] were considered as hub genes. Additionally, cluster analysis for identifying significant function modules with a degree cutoff > 2 in the PPI network was performed using the PEWCC1 (http:// apps. cytos cape. org/ apps/ PEWCC1) [32] in Cytoscape.

Target gene-miRNA regulatory network construction
The miRNet database (https:// www. mirnet. ca/) [33] contains information on miRNA and the regulated genes. Using information collected from the miRNet database, hub genes were matched with their associated miRNA. The target gene-miRNA regulatory network then was constructed using Cytoscape software. MiRNAs and target are selected based on highest node degree.

Target gene-TF regulatory network construction
The NetworkAnalyst database (https:// www. netwo rkana lyst. ca/) [34] contains information on TF and the regulated genes. Using information collected from the Net-workAnalyst database, hub genes were matched with their associated TF. The target gene-TF regulatory network then was constructed using Cytoscape software. TFs and target genes are selected based on highest node degree.

Receiver operating characteristic (ROC) curve analysis
Then ROC curve analysis was implementing to classify the sensitivity and specificity of the hub genes for HF diagnosis and we investigated how large the area under the curve (AUC) was by using the statistical package pROC in R software [35].

RT-PCR analysis
H9C2 cells (ATCC) were cultured in Dulbecco's minimal essential medium (DMEM) (Sigma-Aldrich) supplemented with 10% fetal calf serum (Sigma-Aldrich) and 1% streptomycin (Sigma-Aldrich) at 37 °C in 5% CO 2 . HL-1 cells (ATCC) was culture in Claycomb medium (Sigma-Aldrich) supplemented with 10% fetal bovine serum (Sigma-Aldrich), 1% streptomycin (Sigma-Aldrich), 1% glutamax (Sigma-Aldrich) and 0.1 mM norepinephrine (Sigma-Aldrich) at 37 °C in 5% CO 2 . Total RNA was isolated from cell culture of H9C2 for HF and HL-1 for normal control using the TRI Reagent (Sigma, USA). cDNA was synthesized using 2.0 μg of total RNA with the Reverse transcription cDNA kit (Thermo Fisher Scientific, Waltham, MA, USA). The 7 Flex real-time PCR system (Thermo Fisher Scientific, Waltham, MA, USA) was employed to detect the relative mRNA expression. The relative expression levels were determined by the 2-ΔΔCt method and normalized to internal control beta-actin [36]. All RT-PCR reactions were performed in triplicate. The primers used to explore mRNA expression of ten hub genes were listed in Table 1.

Identification of candidate small molecules
SYBYL-X 2.0 perpetual drug design software has been used for surflex-docking studies of the designed novel molecules and the standard on over expressed genes of PDB protein. Using ChemDraw Software, all designed molecules and standards were sketched, imported and saved using open babel free software in sdf. template. The protein of over expressed genes of ESR1, LCK, PPP2R2B, TP63 and their co-crystallised protein of PDB code 4PXM, 1KSW, 2HV7, 3VD8 and 6RU6 were extracted from Protein Data Bank [37][38][39][40]. Optimizations of the designed molecules were performed by standard process by applying Gasteiger Huckel (GH) charges together with the TRIPOS force field. In addition, energy minimization was achieved using MMFF94s and MMFF94 algorithm methods. The preparation of the protein was done after protein incorporation. The co-crystallized ligand and all water molecules have been eliminated from the crystal structure; more hydrogen's were added and the side chain was set, TRIPOS force field was used for the minimization of structure. The interaction efficiency of the compounds with the receptor was expressed in kcal/mol units by the Surflex-Dock score. The best location was integrated into the molecular region by the interaction between the protein and the ligand. Using Discovery Studio Visualizer, the visualisation of ligand interaction with receptor is performed.

Identification of DEGs in HF
We identified 881 DEGs in the GSE141910 dataset using the limma package in R. Based on the limma analysis, using the adj P val < 0.05, |log FC|> 1.158 for up regulated genes and |log FC|< − 0.83 for down Table 1 The sequences of primers for quantitative RT-PCR  Table S1. The volcano plot for DEGs is illustrated in Fig. 1. Figure 2 is the hierarchical clustering heat-map.

Functional enrichment analysis
Results of GO analysis showed that the up regulated genes were significantly enriched in BP, CC, and MF, including biological adhesion, regulation of immune system process, extracellular matrix, cell surface, signaling receptor binding and molecular function regulator   (Table 2). Pathway analysis showed that the up regulated genes were significantly enriched in extracellular matrix organization and immunoregulatory interactions between a lymphoid and a non-lymphoid cell (Table 3); the down regulated genes were significantly enriched in neutrophil degranulation and SLC-mediated transmembrane transport (Table 3).

Protein-protein interaction (PPI) network and module analysis
Based on the HiPPIE interactome database, the PPI network for the DEGs (including 6541 nodes and 13,909 edges) was constructed (Fig. 3A). Up regulated gene with higher node degree, higher betweenness centrality, higher stress centrality and higher closeness centrality were as follows: ESR1, PYHIN1, PPP2R2B, LCK, TP63 and so on. Down regulated genes had higher node degree, higher betweenness centrality, higher stress centrality and higher closeness centrality were as follows PCLAF, CFTR, TK1, ECT2, FKBP5 and so on. The node degree, betweenness centrality, stress centrality and closeness centrality are listed in Table 4. Additionally, two significant modules, including module 1 (10 nodes and 24 edges) and module 2 (5 nodes and 10 edges) (Fig. 3B) and module 3 (55 nodes and 115 edges), were acquired by PEWCC1 plug-in (Fig. 3C). Furthermore, GO terms and REACTOME pathways were significantly enriched by module 1, including adaptive immune system, immunoregulatory interactions between a lymphoid and a non-lymphoid cell, hemostasis, biological adhesion and regulation of immune system process. Meanwhile, the nodes in module 2 were significantly enriched in GO terms and REACTOME pathways, including neutrophil degranulation and secretion.

Target gene-TF regulatory network construction
Associations between 330 TFs and their 247 target genes were collected from the target gene-TF regulatory network (Fig. 5). TFs of ESRRA, RERE, HMG20B, THRAP3, ATF1, MXD3, ARID4B, CBFB, TAF7 and CREM, which exhibited a high degree of interaction, were selected from this network. Furthermore, the results also showed that FSCN1 was the target of ESRRA, APOA1 was the target of RERE, COL1A1 was the target of HMG20B, HBB was the target of THRAP3, LCK was the target of ATF1, SOCS3 was the target of MXD3, BCL6 was the target of ARID4B, FKBP5 was the target of CBFB, ANLN was the target of TAF7 and ATP2A2 was the target of CREM, and are listed in Table 5.

Receiver operating characteristic (ROC) curve analysis
First of all, we performed the ROC curve analysis among 10 hub genes based on the GSE141910. The results showed that ESR1, PYHIN1, PPP2R2B, LCK, TP63, PCLAF, CFTR, TK1, ECT2 and FKBP5 achieved an AUC value of > 0.7, demonstrating that these ten genes have high sensitivity and specificity for HF, suggesting they can be served as biomarkers for the diagnosis of HF (Fig. 6).

RT-PCR analysis
RT-PCR was used to validate the hub genes between normal and HF cell lines. The results suggested that the mRNA expression level of ESR1, PYHIN1, PPP2R2B, LCK and TP63 were significantly increased in HF compared with that in normal, while PCLAF, CFTR, TK1, ECT2 and FKBP5 were significantly decreased in HF compared with that in normal and are shown in Fig. 7.
The molecules HIM6, HIM10 obtained good binding score of more 5 to 6.999 with all proteins and the molecules HIM11, HIM12, HIM14, HTZ9, HTZ10 and HTZ12 obtained binding score above 5 and less than 9 with PDB protein code of 2HV7, 3VD8 and 6RUR respectively. The molecule HIM11 obtained highest binding score of 8.678 with 2HV7 and its interaction with amino acids are molecule HIM11 (Fig. 9) has obtained with a high binding score with PDB protein 2HV7, the interactions of molecule is the C6 side chin acyl carbonyl C=O formed hydrogen bond interaction with amino acid GLN-207 with bond length 1.92 A° and 3' N-H group of imidazole ring formed hydrogen bond interaction with VAL-305 with bond length 2.36 A° respectively. It also formed other interactions of carbon hydrogen bond of -CH 3 group of carboxylate at C6 with PRO-304 and amide-pi stacked and pi-pi stacked interaction of electrons of aromatic ring A with ALA-204 and ring C with HIS-155 and HIS-308. Molecule formed pi-alkyl interaction of ring B with PRO-304 and all interactions with amino acids and bond length are depicted by 3D and 2D figures ( Fig. 10 and Fig. 11

Conclusions
The present investigation aimed at characterizing the expression profiling by high throughput sequencing of the HF patients. Our bioinformatics analyses revealed key gene signatures as candidate biomarkers in HF. Hub genes (ESR1, PYHIN1, PPP2R2B, LCK, TP63, PCLAF, CFTR, TK1, ECT2 and FKBP5) were diagnosed as an essential genetic factors in HF. In general, DEGs linked with HF genes, including already known markers of HF and other HF related diseases, and novel biomarkers, were diagnosed. Potential implicated miRNAs and TFs were also diagnosed. The diagnosed hub genes might represent candidate diagnostic and prognostic biomarkers, and therapeutic targets. The current investigation reported novel genes and signaling pathways in HF, and further investigation is required.
Additional file 1: Table S1. The statistical metrics for key differentially expressed genes (DEGs).