Identification of gene expression location of angiotensin‐converting enzyme‐2 SNPs as a receptor for SARS‐CoV‐2 in different populations by using various databases

The World Health Organization (WHO) has announced that Severe Acute Respiratory Syndrome Coronavirus‐2 (SARS‐CoV‐2) and Coronavirus disease (COVID‐19) is considered a worldwide pandemic. Rapidly rising numbers of patients have been reported in almost every country, along with the growing mortality rates. Uncontrolled growth in patient numbers may be due to reasons such as treatment options and vaccine availabilities and unidentified targets of SARS‐CoV‐2. Previous study has revealed that the molecular target of SARS‐CoV‐2 is analogous to SARS (2003), i.e. angiotensin‐converting enzyme‐2 (ACE‐2). Therefore, the determination of ACE‐2 may enrich existing information and facilitate development of drugs targeted toward SARS‐CoV‐2. This study aims to screen the expression of ACE‐2 genes and their relationship to the types of SNP variants in SARS‐CoV‐2. We explored a series of observations using powerful databases, e.g. GTEx portal, HaploReg, 1000 Genome and Ensembl, to identify the gene variant of ACE‐2. We showed that ACE‐2 is highly expressed in the testes and small intestine, while its lowest level is observed in lymphocytes. Subsequently, we observed 17 gene variants containing a missense mutation potentially damaging protein level. Among these genes, single nucleotide polymorphism (SNP) rs370187012 shows the highest damage‐level score, while the lowest effect is in SNP rs4646116. The highest frequency of the C allele was observed in European populations (1%). In addition to showing that ACE‐2 is expressed in several organs, we concluded that the ACE‐2 gene variation can be found in African, American, Southeast and East Asian, and European populations. The polymorphisms of ACE‐2 impact on the ACE2 protein structure and the binding capacity of the ACE‐2 receptor with the S‐Protein of SARS‐CoV‐2.


Introduction
Continuous increment of patients suffering from SARS CoV2 infection across continents, i.e., Asia Pacific, Eu rope, America, and Africa, has threatened humans glob ally. Enforced by an uncontrolled number of SARSCoV 2 patients, WHO as per11 March 2020 announced that the status of global SARSCoV2 had reached an outbreak pandemic. According to an update from Worldometers (Worldometer 2020a) as per17 January 2021: 94,949,414 confirmed cases; 2,030.914 deaths; and 67,773,034 cured patients COVID19 were reported among 215 countries, since the first case allegedly reported in Wuhan, Hubei, China in December 2019. Each country has conducted various strategies to uplift the prevention of the disease, mainly by restraining the air flights towards pandemic sides and vice versa.
This coronavirus is firstly known as "2019 novel coronavirus". Furthermore, it was officially changed to "COVID19" and included the type of SARSCoV2 virus. The SARSCoV2 is regarded as the cause, while COVID19 is referring to the disease caused by SARS CoV2. Providing the disease name is the authority of WHO through standardized International Classification of Disease (ICD). The International Committee on Taxon omy of Viruses (ICTV) of WHO legitimately announced the rise of SARSCoV2 on 11 February 2020, consider ing the virus is genetically analogous to the type of SARS coronavirus spread in 2003.
Given the current surge of cases that occur mainly in the Asian population, scientists put the suspect on them and further observe the factors. One may provide the in sight answer to the question is the genetics/genomic sci ence. Genetics science studied the overall information about the genetic who owned a cell or organism, while the genome is a plural form of genetics. Thus far, scientists are trying their best to cope with the SARSCoV2, i.e. iden tify the origin of the virus through sequencing/mapping of SARSCoV2 and compare with some animals considered or suspected as the virus origin as bat found in the tradi tional market in Wuhan.
Scientists have been identifying various possibilities of genes damaged by the virus. Instead, one was also at tempting to identify the receptor of SARSCoV2 itself. Chao and coworkers analyzed the comparison of human gene characteristics in several populations and correlated it with their vulnerability against COVID19 (Cao et al. 2020). They elucidated the angiotensinconverting2 en zyme (ACE2) as one of the receptors of SARSCoV2. ACE2 receptors are generally found in the respiratory tract. Some previous studies reported reinforcing the hy pothesis that ACE2 is strongly evident as the receptor or supreme target of SARSCov2 (Zhou et al. 2020; Lu et al. 2020. Additionally, an invitro study also represented the positive correlations between the ACE2 expression with SARSCoV in 2003 (Hofmann et al. 2004; Li et al. 2007). The SARSCov2 enables binding ACE2 with a strong affinity towards Sprotein (Li et al. 2005). There is an ur gent necessity to screen the expression of ACE2 genes and their relation to SNP variants types in SARSCoV2. Previous studies have not explored the types of variations in the ACE2 gene, which are susceptible to SARSCoV2 and especially to the outbreak pandemic. Regarding the fundamental role of ACE2 in SARSCoV2 cell entry, and as a potential therapeutic target for antiviral therapy, in this study, we used several databases to investigate related ex pression profiles and SNPs of ACE2. Using the various databases, we found that the results were consistent with some previous studies (Hikmet et al. 2020; Paniri et al. 2021) that ACE2 expressed in several organs, including the respiratory tract. The polymorphisms of ACE2 impact the ACE2 protein structure, function and this also affected to ACE2 dependent cell entry of SARSCoV2. This in formation will be beneficial for further research in regards to SARSCoV2.

Materials and Methods
In order to find the expression of ACE 2 genes, the GTEx portal database (GTExPortal 2020), a site that contains all genes expression, was used to identify the types of SNPs in the ACE 2 gene. The GTEx portal database was re trieved on 20 June 2020. Afterward, to confirm the type of mutated genes that affect the level of protein or socalled the gene coding protein (variation type of genes of mis sense mutase), we then use Haploreg version 4.1 (Hap loReg 2020) and 1000 Genome project (AsiaEnsemble 2020). Upon identifying the types of nonsynonym SNPs or missense mutase, we employ the Ensembl database to compare the SNP among the worldwide population. En sembl encompasses a database that more specifically pre dicts protein changes, i.e. PolyPhen2. This database di vided the changes of protein into three categories based on score and level of SNP damage in protein: score 0.00 0:15 (benign), 0.15 0.85 (possibly damaging), and 0.85 1.00 (probably damaging), with the range between 0 to 1. This study investigated several affected populations by SARSCov2, including Europe, Southeast and South Asia, America, and Africa.

Identification of ACE-2 gene expression over various organs
The wellknown database used to contain all gene expres sions of human organs is expression quantitative trait loci (eQTLs), which is loaded on the GTEx database (Ardlie et al. 2015). By comparing them with accessible genome databases such as the 1000 Genome database, the out come allows us to identify the types of genes in a cer tain population, i.e., Asia, when most SARSCoV2 out breaks occurred. Using a website database (GTExPortal 2020), results showed that ACE2 genes positively asso ciate with ACE2 expression in several organs. (Figure  1.). Figure 1 shows the expression of ACE2 in several organs, which were scoring with log10 (Transcripts Per Million (TPM) +1). ACE2 was highly expressed in the testis (median TPM in male was 46.53), small intestine or the terminal ileum (median TPM in female 50,06; male 24.45), adipose visceral (median TPM in female 9.409; male 8.543), kidney (median TPM in female 10.70; male 7.676), heart left ventricle (median TPM in female 8.991; male 6.896), thyroid (median TPM in female 6.172; male 6.414), heart atrial appendage (median TPM in female 5.706; male 5.422), colon -transverse (median TPM in fe male 4.679; male 2.775), while the lowest level observed in lymphocytes (median TPM in female 0.02405; male 0.01637). We also identified that ACE2 was expressed in the human lung (median TPM in females 0.94; male 1.044), which this organ knew as the host for SARSCoV 2. It is also confirmed that the ACE2 gene tended to be expressed in the female gender (cervix ectocervix, cervix endocervix, uterus, fallopian tube, ovary, and vagina) compared to the male (testis, kidney medulla and prostate). However, sequencing/ RNA mapping showed that the male Asian population showed a higher ACE2 gene expression than the female ). This finding is in accordance with the data reported by Worl dometers (Worldometer 2020b), indicates that the death rate of male patients is merely higher (61.8%) than that of female (38.2%).

Identification of ACE-2 gene variation affecting the protein change
Single nucleotide polymorphisms (SNPs) are able to im pact protein function, structure, stability, and abundance (Calcagnile et al. 2020). For identifying the gene varia tion affecting the protein change, we used Haploreg ver. 4.1 and 1000 genome project databases that contain 2.6 million gene variations of SNPs (Devuyst 2015). We ob serve 128 SNPs that have a variety of missense mutations. Through further Ensembl database, we identified 17 SNPs that may impact the protein alteration types with the fol lowing types: probably damaging, possibly damaging, and hyperplasia (Table 1). We further identified that 17 gene variations of ACE2 have an alleles variety of <1% over the population of African, American, and Asian. This data indicates that the ACE2 is rarely observed in those populations. However, for SNP rs4646116, the allele fre quency of European showed over 1%. This study also demonstrated that the ACE2 gene with SNP rs370187012 is the most potent contributing to protein levels alter ation through a predicted probably damaging. Ensembl database observation also indicates the rs370187012 allele frequency of <0.1% among the Southeast Asian popula tion in which is one of the most affected regions of SARS CoV2 and COVID19 outbreak (Table 1).

Discussion
Increase evidence of COVID19 has been a massive threat around the world. Scientists are still searching for a break through to overcome the SARSCoV2 attack by identi fying the causes and biological receptors targeted by this virus. For instance, as the SARS receptor has been iden tified (ACE2) in 2003 (Hofmann et al. 2004), the ini tial step to encounter the new virus can be probably ini tiated by using the similar target receptor to correlate with the binding to the new virus . In this study, we used a gene expression database to identify the ACE2 mostexpressing organs. Several previous studies have been conducted and mentioned that ACE2 is utmost expressed in the respiratory system, thus may become one of the possible pathways to cure COVID19 ). However, from our data, we found that the expres sion of ACE2 in the respiratory system was not very high ( Figure 1). The higher expression of ACE2 may be due to comorbidity, including hypertension, coronary artery disease, chronic obstructive pulmonary disease, diabetes mellitus, chronic kidney disease, obesity and smoking pa tients (Guan et al. 2020). Furthermore, changes in the ex pression levels of ACE2 correlate with SNPs, splicing, and transcription processing which could raise the vulnerabil ity of individuals to COVID19 infection. We further as  sessed the correlation of population with COVID19 with the gene variant. A study conducted by Chao and co workers showed that SNP with rs4646127 located in in tron ACE2 is the highest frequency of an allele in China population 0.997 and Southeast Asia (0.994) (Cao et al. 2020). Compared to Europe and the US data, the SNP of rs4646127 showed much lower expression of an allele in China, 0.651 and 0.754, respectively. Nevertheless, this SNP type suggests an intone, which is not a coding type of protein that can trigger a change in protein level.
In this study, we employed several gene variety databases, 1000 genome database, and GTEx database that may enhance the SNP signal present and potentially de tected the phenotype changes possibility or, in this case, correlated highly to COVID19. The presence of SNPs within the coding region of ACE2 can result in the alter ation of amino acid sequence. This change in amino acid sequence at the interaction site can affect the binding ca pacity of the ACE2 receptor with the SProtein of SARS CoV2. Furthermore, SNPs within promoter or 3′UTR can cause downregulation of the ACE2 gene resulting in lower levels of ACE2 receptor at the cell surface for inter action with virus particles (Chaudhary 2020). We found 17 types of ACE2 variation, which are the potential to give impact down to the protein level. This fact may help raise clues to assess the COVID19 severity in correlation with the ACE2 gene and its coexistence with diseases, such as pneumonia. The SNP rs370187012 with 0.999 scores showed potential damage for R710H protein type, while SNP rs4646116 showed the lowest score of 0.001 along with benign potency, or rarely inducing any protein mu tation. However, further responsible steps are required to ensure the vulnerability of the ACE2 gene of SARS CoV2 infected patients toward the COVID19. Further more, a noteworthy fact is underlined herein, that SNP rs4646116 appears among Europe population with more than 1% occurrence probability of the ACE2; meanwhile, the other populations, viz. The US and Asia show lower probability frequency (less than 1%). The rs4646116 (K26R), rs149039346 (S692P), and rs41303171 (N720D) located in the helix, helix and coil structure, respectively. Phyre2 software indicated that S692P protein changing probably causes disorder with relatively high confidence. Change in ACE2 structure affected COVID19 cell entry, and therefore its replication in host cells, especially lung (Liu et al. 2020; Sommerstein et al. 2020. The polymor phisms of ACE2 impact the ACE2 protein structure, af fecting the binding capacity of the ACE2 receptor with the SProtein of SARSCoV2.
Overall, our data is fairly restricted to early identifica tion of gene variation of ACE2 using the databases study and prioritizing the gene variety of mutated type, mis sense. Further validation of such experiments associated with SNP types among and interpopulations of COVID 19 patients are highly required to verify the allele differ ence among the different populations of COVID19. Thus, it may provide more data in regards to ACE2 gene vulner ability for various SNPs of the COVID19 patients.

Conclusions
A vigorous database search has successfully provided an insight into ACE2 gene variation and evidence of ACE 2 expressed in the testis, small intestine or the termi nal ileum, adipose visceral, kidney heart left ventricle, thyroid, heart atrial appendage, colon transverse and the respiratory tract, which has been wellknown as the SARSCoV2 target. Furthermore, we identified 17 vari ants of SNPs missense mutation that play a potential role in protein changes, along with SNP rs370187012 and SNP rs4646116 regarded as the highest and lowest protein al terations, respectively.