Cloning and in silico study of an endoglucanase from a thermophilic bacterium isolated from a hydrothermal vent of West Kawio, Sangihe-Talaud waters, North Sulawesi, Indonesia

Endoglucanase is used in industries that apply high temperatures, such as bioethanol, detergent, paper, and animal feed industries. Most available endoglucanases have very low stability at high temperatures. Therefore, this study aimed to identify a new thermostable endoglucanase that is able to maintain its activity at high temperatures. Five isolates of thermophilic bacteria were previously isolated from the hydrothermal vent of West Kawio, Indonesia. Among them, the DSI2 isolate showed the highest endoglucanase activity, and was identified and named as Bacillus safensis DSI2. The EgDSI2 gene was cloned from B. safensis DSI2. EgDSI2 is 1851 bp long encoding a protein of 616 amino acids. The encoded protein, EgDSI2, has high sequence identity to other B. safensis endoglucanases and was predicted with the Compute pI/Mw tool to be 69.41 kDa. EgDSI2 was high in hydrophobic amino acids. The enzyme had higher percentage of Ala and Pro, and lower percentage of Gly compared to thermolabile endoglucanases from two Bacillus species. EgDSI2 harbored a catalytic domain belonging to glycosyl hydrolase family 9 (GH9) and a type 3 cellulose‐binding domain (CBM3). Properties of endoglucanases with GH9‐CBM3modular organization include activity over a wide pH range, high optimum temperature, and thermostablity. Therefore, EgDSI2 has potential applications in the industries.


Introduction
Cellulose is the most abundant, renewable biopolymer on earth and the dominating waste material from agricul ture (Patel et al. 2014). A promising strategy for effi cient utilization of this renewable resource is hydrolysis with cellulase to release glucose, which in turn can be fer mented to produce ethanol (Sukumaran et al. 2005). Cel lulase is an enzyme system that consists of three types of enzymes that work synergistically to hydrolyze cellu lose into glucose, i.e. endoglucanase (EC 3.2.1.4), ex oglucanase (EC 3.2.1.74), and βglucosidase (EC 3.2.1.21) (Lynd et al. 2002). Endoglucanase (Eg) hydrolyzes cellu lose by randomly cleaving glycosidic bonds resulting in oligosaccharides and thus new chain ends. Exoglucanase or cellobiohydrolase hydrolyzes the ends of the oligosac charide chains, releasing cellobiose. βglucosidase finally hydrolyzes cellobiose to glucose. Thus, among the three cellulases, Eg, also called carboxymethyl cellulase (CM Case), is a key enzyme that plays an important role for ini tiating cellulose hydrolysis (Wood and Bhat 1988; Zhang et al. 2006; Gupta and Verma 2015. If Eg biosynthesis is increased, more cellulose will be hydrolyzed to glucose which will be finally converted to ethanol through fermen tation (Srinivas and Panda 1998).
In addition to the bioethanol industry, cellulases are also used in detergent, paper, food, and animal feeds in dustries (Sukumaran et al. 2005). These industries apply high temperatures for easy mixing of materials, increased substrate solubility, and decreased risk of contamination (Turner et al. 2007). However, high temperature is a factor limiting the wide industrial use of cellulases. High temper ature may cause protein denaturation and loss of catalytic function. Most of the available cellulases have very low activity at high temperatures. Therefore, new cellulases with thermostable property need to be identified (Zarafeta et al. 2016).
New Egs were reported from various thermophilic bacteria. Li et al. (2008) obtained thermostable Eg which had an optimum temperature at 50 o C from Bacillus sub tilis DR which was isolated from a hot spring. Yang et al. (2010) obtained Eg with maximum activity at 60 o C from B. subtilis strain I15 which was isolated from compost. Moeis et al. (2014) obtained Eg which had an optimum activity at 50 o C from Bacillus sp. RP1 isolated from a hot spring. Zarafeta et al. (2016) obtained thermostable cellulase with optimum activity at 70 o C from Ther moanaerobacterium isolated from a hot spring. De Marco et al. (2017) obtained thermostable Eg with optimum ac tivity at 60 o C from B. licheniformis 380 isolated from compost. Dos Santos et al. (2018) obtained thermostable Eg with the optimum temperature at 60 o C from the marine Bacillus sp. SR22. These studies show that thermophilic bacteria that thrive in hightemperature habitats may pro duce thermostable Egs.
Enzyme thermostability is determined mainly by its amino acid sequence (Ebrahimi et al. 2011). Ther mostable enzymes have specific amino acid compositions (Nakashima et al. 2003). They contain a high amount of charged and hydrophobic amino acids (Sadeghi et al. 2006), favor charged amino acids (mainly Glu, Arg, and Lys) capable of forming ion pairs, have more Pro and fewer Gly relative to their thermolabile counterpart (Ku mar et al. 2007).
Thermophilic bacterial isolates DSI1, DSI2, DSI3, DSI4, and DSI5 had been obtained in previous research (unpublished data) from deepsea water surrounding the hydrothermal vent of West Kawio, SangiheTalaud waters, North Sulawesi, Indonesia (5°72'0"N, 127°14'0"E). The samples were collected in 2010 through the INDEX SA TAL expedition, which was a collaborative research team between Indonesia (Ministry of Marine Affairs and Fish eries Republic of Indonesia, Agency for the Assesment and Application of Technology Republic of Indonesia [BPPT RI]) and the United States of America (the National Oceanic and Atmospheric Administration [NOAA]). The deepsea water was sampled at 1500 3000 m depth be low the sea surface. The physicochemical parameters of the sampling environment were as follows, pressure 317 atm, temperature surrounding the vent 35 80 o C, pH 2.8 6.5, and salinity 35 40 ppt. The bacterial isolates had not been screened for their ability to produce thermostable Egs and had not been identified. This study aimed to screen the ability of the thermophilic bacterial isolates in produc ing thermostable Egs, to identify the highest Egproducing isolate based on 16S rRNA gene sequence analysis, to clone the Eg gene into pET32b vector, and to analyze the thermostable Eg in silico to determine the abundance of certain amino acids compared to thermolabile Egs from mesophilic bacteria.

Screening for Eg activity
Eg activity of the DSI bacterial isolates was screened us ing the Congo red plate assay based on Teather and Wood (1982) and Sheng et al. (2012) with modifications. A sin gle colony of each isolate was taken with a sterile toothpick and spotted in the middle of screening medium in the Petri dish. The screening medium used was a mixture of 90% of LB, 10% of BHMS, MgSO 4 0.8 g/L, agar 15 g/L and 1% (m/v) carboxymethyl cellulose (CMC) as endoglucanase substrate. The screening was carried out in triplicate for each isolate. The media were incubated at 50 o C for 48 h. The diameters of bacterial colonies were measured.
The media were then dripped with 0.1% (m/v) Congo red solution until the entire surface of the media was sub merged, allowed to stand for 15 min to stain the media, then the Congo red solution was removed. Next, the me dia were dripped with 1M NaCl solution until the entire surface was submerged, allowed to stand for 15 min, then the NaCl solution was removed. The diameters of the clear zones around the colonies were measured. The isolate that had the largest cellulolytic index (ratio of the clear zone di ameter to the colony diameter) was selected to be further analyzed.

DNA isolation
Isolation of genomic DNA of the DSI2 isolate was carried out by the chloroformisoamyl alcohol method. A single colony of DSI2 bacterial isolate was inoculated into 15 ml LB broth in a 50 mL centrifuge tube, incubated at 50 o C, 150 rpm, 18 h. One mL of the culture was loaded into a 1.5 mL microcentrifuge tube, centrifuged at 18,800g for 1 min, the supernatant was discarded. This step was per formed six times. The final cell pellet was resuspended with 750 µL lysis buffer (25 mM EDTA, 50 mM TrisCl, 0.5% SDS), vortexed, then 750 µL chloroformisoamyl al cohol (24:1) was added. The mixture was incubated at 80 o C for 10 min. The tube was centrifuged at 18,800 g, 3 min. The aqueous phase was transferred into a new 1.5 mL microcentrifuge tube, 750 µL chloroformisoamyl alcohol (24:1) was added and the tube was centrifuged at 18,800g for 3 min. The aqueous phase (500 µL) was transferred into a new 1.5 mL microcentrifuge tube. This chloroformisoamyl alcohol extraction was repeated once again, 400 µL aqueous phase was transferred into a new 1.5 mL microcentrifuge tube and 40 µL of 0.8 M LiCl and 1 mL absolute ethanol were added to the tube. The mixture was incubated at 20 o C for 30 min. Afterwards, the tube was centrifuged at 18,800 g for 3 min. The supernatant was discarded, 200 µL of 70% ethanol was added to the tube, centrifuged at 18,800 g for 3 min. After the super natant was discarded, the dried DNA pellet was dissolved in 50 µL TERNase buffer pH 8.0. The DNA solution was stored in 20 o C. Plasmid DNA isolation was conducted us ing Presto TM Mini Plasmid Kit (Geneaid, New Taipei City, Taiwan).

Gel electrophoresis
Electrophoresis was performed at 100 V for 30 min, using 1% agarose gel, 1X TAE buffer solution, 10 μL DNA, and 3 μL GeneRuler TM 1 kb DNA Ladder (Thermo Scientific, Vilnius, Lithuania). The gel was then immersed in ethid ium bromide solution (2 μg/mL) for 510 s and rinsed in distilled water for 510 min. DNA bands were observed with a UV transilluminator.

Polymerase chain reaction (PCR)
The 16S rRNA gene of DSI2 isolate was PCR amplified at Macrogen, Inc., South Korea, using universal primers 27F (5'AGAGTTTGATCMTGGCTCAG3') and 1492R (5' TACGGYTACCTTGTTACGACTT3'). Endoglucanase gene EgDSI2 from DSI2 isolate was PCR amplified in our lab. with thermocycler (Applied Biosystems 2720) using customdesigned forward and reverse primers (see Mate rials and methods 2.9). The PCR was conducted follow ing the protocol for Q5 ® HighFidelity Master Mix (New England Biolabs, Ipswich, USA). The reaction consisted of 1X Q5 ® HighFidelity Master Mix, 0.5 µM forward primer, 0.5 µM reverse primer, 560 ng template DNA and nucleasefree water (Promega, Madison, USA) to a total volume of 50 µL. The thermocycling conditions were ini tial denaturation at 98 o C for 30 s followed by 35 cycles of denaturation at 98 o C for 10 s, annealing at 70 o C for 30 s, elongation at 72 o C for 1 min and final elongation at 72 o C for 2 min. The annealing temperature was deter mined by the Tm Calculator program version 1.11.0 (https: //tmcalculator.neb.com). The PCR product was confirmed by agarose gel electrophoresis.

DNA purification
Purifications of DNA from agarose gels were performed using GenepHlow TM Gel/PCR Kit DFH100 (Geneaid, New Taipei City, Taiwan) according to the kit's manual.

DNA equencing
Sequencing of the 16S rRNA and EgDSI2 genes were carried out at Macrogen, Inc., South Korea. The se quencing reactions were performed by the Sanger big dye terminator method using PRISM 3730XL Analyzer ABI sequencer. Primers for sequencing of the 16S rRNA gene were the universal 27F and 1492R. Primers for sequencing of the EgDSI2 were T7 promoter (5' TAATACGACTCACTATAGGG3') and T7 terminator (5'GCTAGTTATTGCTCAGCGG3').

Phylogenetic analysis
Ten sequences of 16S rRNA genes that had the highest homologies with the DSI2 isolate were taken as ingroup sequences and one 16S rRNA sequence from Geobacillus stearothermophilus strain BCRC 10285 was used as out group. Multiple sequence alignment (MSA) analysis was performed with the ClustalX version 2.1 program (Larkin et al. 2007). The MSA result in FASTA format was opened with the BioEdit version 7.2.5 program (Hall 1999) to carry out trimming of ambiguous bases at the beginning and the end of MSA until all sequences started and ended at the same site. Phylogenetic analysis was conducted with MEGA version X program (Kumar et al. 2018) using the NeighborJoining tree construction method. The level of confidence in the phylogenetic tree was evaluated by the Bootstrap method with 1000 repetitions.

EgDSI2 primer design
Primers for EgDSI2 were designed based on ten Eg sequences of the same length (616 amino acids) from Bacillus safensis strains (Table 1) using the CEMAsuite version 2.0.9 program (Lane et al. 2015). The ends of the consensus coding sequence (CDS) were chosen as the primer sequences to isolate the whole gene (start codon CDS stop codon) with a length of 1851 bp. The 5' end of the forward primer was given extra bases (GCAATAGA) and a NdeI restriction site (CA↓TATG). The 5' end of the reverse primer was given extra bases (TCGT) and a BamHI restriction site (G↓GATCC). The forward and reverse primers were FEg 5' GCAATAGACATATGGCATCTTACAACTATGTAGAG Specificity of the primers was tested with the PrimerBLAST program (https://www.ncbi.nlm.nih.gov/tools/primerblast/) (Ye et al. 2012). Quality of the primers was evaluated with the Clone Manager 9 program.

Construction of recombinant plasmid
Construction of the recombinant plasmid was carried out using the restrictionligation method. The EgDSI2 and pET32b vector (Novagen, United States) were double digested with the NdeI and BamHI restriction enzymes (Thermo Scientific, Vilnius, Lithuania). The optimal con dition for the double digest reactions was determined using the DoubleDigest CalculatorThermo Scientific program (https://www.thermofisher.com/id/en/home/brands/therm oscientific/molecularbiology/thermoscientificrestric tionmodifyingenzymes/restrictionenzymesthermosc ientific/doubledigestcalculatorthermoscientific.html). The pET32b and the PCR product of EgDSI2 were double digested with 2X final concentration of buffer Tango (Thermo Scientific, Waltham, USA), 20 U NdeI and 20 U BamHI. The reactions were incubated at 37 o C for 15 min. The whole double digests were loaded into wells of a 0.8% agarose gel for electrophoresis. DNA bands with the desired sizes were extracted from the gel.
The doubledigested EgDSI2 (12 µL) and pET32b (5 µl) were ligated with T4 DNA ligase buffer (Thermo Sci entific, Waltham, USA) and 1 Weiss U T4 DNA ligase (Thermo Scientific, Waltham, USA). The ligation reaction mixture was incubated at 4 o C in a water bath for 16 h. The ligation product was named pETEgDSI2.

Transformation of Escherichia coli BL21 (DE3) competent cells with pET-EgDSI2
A tube of competent E. coli BL21 (DE3) cells was taken from 80 o C and thawed on ice for 5 min. The ligation reac tion mixture was centrifuged briefly, then 5 µL ligation re action mixture and 50 µL competent cells were transferred aseptically into a cooled 1.5 mL microcentrifuge tube. The tube was flicked with fingers five times to mix the compe tent cells and the ligation reaction mixture. The tube was incubated on ice for 30 min, heated at 42 o C for 90 s, then reincubated on ice for 5 min. LB broth 300 µL was added

In silico analysis of nucleotide and amino acid sequences of EgDSI2
Bioinformatics programs used for in silico analysis are listed in Table 2. The EgDSI2 sequence was analyzed in silico to determine its homology and GC percentage. The deduced amino acid sequence, EgDSI2, was analyzed in silico to predict its size and molecular mass, to classify its family, to detect its domains, to determine the amino acid composition, to predict its secondary structures com position, and to model its 3D structure based on homol ogy.

Growing DSI thermophilic bacterial isolates
Thermophilic bacterial isolates DSI2 and DSI5 showed growth in broth and agar media after overnight incubation. DSI3 isolate showed growth in broth medium on the sec ond day, and on agar medium on the fourth day. Mean while, DSI1 and DSI4 isolates did not show any growth until the seventh day.

Ability of DSI bacterial isolates in producing Eg
The Congo red plate assay revealed that isolates DSI2, DSI3, and DSI5 had the ability to produce Eg as indicated by the presence of clear zones around the colonies. The formation of clear zones indicated that Eg was secreted by the bacteria (Sheng et al. 2012). The three isolates formed clear zones with different diameters indicating that they had different capability in degrading the CMC substrate, thus show different Eg activities. The cellulolytic indexes of the three isolates are summarized in Table 3. DSI2 isolate had the highest cellulolytic index (about 2 times greater) compared to the other two isolates (Table 3). Ac cording to Jang and Chen (2003), the diameter of the clear zone is generally proportional to Eg activity. Therefore, DSI2 isolate was selected to be further analyzed.

Identity of DSI2 isolate based on 16S rRNA gene and phylogenetic analyses
Phylogenetic analysis was performed to determine the phylogenetic relationship of DSI2 isolate with other bacte rial species based on the similarity of the 16S rRNA gene sequences. The contig sequence obtained from the se quencing of the 16S rRNA gene of the DSI2 isolate had a size of 1340 bp (GenBank accession no. MN726487).  (Slepecky and Hemphill 2006). The species B. safensis was first isolated from spacecraft and assemblyfacility (SAF) surfaces at the Jet Propulsion Laboratory, USA. The species name safensis was derived from SAF abbre viation (Satomi et al. 2006). This species lives in a variety of habitats, such as spacecraft and related environments, deserts, industrial waste, oilpolluted environments, com post, pond water, marine sediments, rhizosphere, insect gut, plants, fermented foods, human and animal feces, and soil (Lateef et al. 2015). Several studies have reported the ability of B. safensis in producing Eg. Khianngam et al. (2014) isolated Egproducing B. safensis PJ124S from palm oil byproduct. Kanchanadumkerng et al. (2017) isolated Egproducing B. safensis M3 from freshwater swamp forest soil.

Isolation of the EgDSI2 gene
Isolation of EgDSI2 was carried out by PCR method. The PCR resulted in a single DNA band with a size of around 1.8 kb (Figure 2) which corresponded to the expected size of the gene, i.e., 1851 bp.

Cloning of EgDSI2
The doubledigested EgDSI2 resulted in a single DNA band with a size of around 1.8 kb. The doubledigested pET32b vector resulted in a single DNA band with a size of around 5.5 kb. The intact (not digested) pET32b vec tor has a size of 5,899 bp. The doubledigest reaction of pET32b caused a fragment of 492 bp to be cut out off the backbone, forming a linear DNA with a size of 5407 bp. The single DNA band of the doubledigested pET32b had the expected size of more than 5.0 kb.
The doubledigested products were ligated, and the ligation product (pETEgDSI2) was used to transform E. coli BL21 (DE3) competent cells. Plasmid DNA was iso lated from the transformants to confirm the presence of the pETEgDSI2 construct. Confirmation of the construct was carried out by restrictiondigestion and PCR methods. Di gestion with BamHI resulted in a single DNA band with a size of around 7 kb (Figure 3a) which corresponded to the expected size of pETEgDSI2. PCR was conducted with the same primers to isolate EgDSI2 and resulted in a single DNA band with a size of around 1.8 kb which was the size of the target gene (Figure 3b).
The EgDSI2 sequence was obtained using T7 pro moter and T7 terminator as sequencing primers, i.e. se quences that flank the inserted EgDSI2 in pETEgDSI2. The gene was 1851 bp and contained the ATG start codon and the TAA stop codon at the ends of the sequence (Gen Bank accession no. MN709889). The result of BLASTN alignment showed that EgDSI2 had 99.19% identity with the Eg gene from Bacillus sp. WP8 (CP010075.1) and 97.73% identity with Eg genes from Bacillus safensis strain U171 (CP015611.1) and B. safensis strain U41 (CP015610.1). These results confirmed that the DNA fragment isolated from B. safensis DSI2 was an Eg gene. The EgDSI2 had a low GC content (44%). According to Grosjean and Oshima (2007), mRNAs from thermophilic organisms have a low GC content. A too high GC content may reduce the informational coding potential of mRNAs. High GC content may also favor the formation of highly stable stemloop of mRNA molecule which can be a prob lem during translation. This showed that there was no cor relation between the GC content and the optimal growth temperature of the cell.
The three dimensional (3D) structure of the ther mostable EgDSI2 was predicted using the SWISS MODEL program (https://swissmodel.expasy.org/) (Wa terhouse et al. 2018) ( Figure 5). The 3D structure mod eling was based on the 3D structure of endo/exocellulase E4 (glycosyl hydrolase family 9) from T. fusca (PDB ID 1JS4) as template. The EgDSI2 sequence had 54.87% identity with this template, the highest compared to other proteins in the PDB database. The estimated value of the constructed model quality or the Global Model Quality Es timation (GMQE) was 0.8 (maximum 1) which indicated that the resulting model had a fairly high level of relia bility. The active site residues in EgDSI2 were predicted with the ScanProsite program (https://prosite.expasy.org/s canprosite/) (De Castro et al. 2006) as Asp58, Asp414, and Glu423. The active sites could not be shown on the 3D model with the SWISSMODEL program, but analysis with the program showed that the three active site residues were all located in the catalytic domain.

Amino acid composition of EgDSI2
The amino acid sequence of EgDSI2 was obtained from translation of its gene sequence with the Transeq EM BOSS program (https://www.ebi.ac.uk/Tools/st/emboss_ transeq/) (Rice et al. 2000). EgDSI2 consisted of 616 amino acids. Alignment using BLASTP program showed that EgDSI2 had 99.35% identity with Eg from B. safensis strain SCAL1 (KMK71141.1).
Results from BLASTP and InterProScan (http://ww w.ebi.ac.uk/interpro/search/sequencesearch) (Jones et al. 2014) showed that EgDSI2 had a catalytic domain which belonged to the glycoside hydrolase family 9 (GH9) and a type 3 cellulosebinding domain (CBM3). Glycoside hydrolases (EC 3.2.1.) are a widespread group of en zymes that hydrolyze glycosidic bonds between two or more carbohydrates or between a carbohydrate and a non carbohydrate part (Henrissat et al. 1995). The catalytic domain of EgDSI2 was located in residues 6 434. The CBM3 domain was located in residues 465 546, at the C terminus of the protein. CBM3 is composed of nine beta FIGURE 5 Endoglucanase EgDSI2 3D structure predicted by SWISS-MODEL. The protein is composed of two domains: the GH9 domain, shown in left, is rich in α-helices; and the CBM3 domain shown in right, is formed by β-strands arranged in a β-sandwich fashion. strands that form a compact domain. The domain is ar ranged in two antiparallel betasheets stacking together to form a betasandwich. (Tormo et al. 1996; Shimon et al. 2000. The domain architecture of EgDSI2 is shown in Figure 6. The molecular weight of EgDSI2 was predicted with the Compute pI/Mw tool (https://web.expasy.org/compu te_pi/) (Gasteiger et al. 2005) to be 69.41 kDa. Accord ing to Kanchanadumkerng et al. (2017), Egs from Bacil lus sp. had molecular weights in the range of 30 97 kDa. The EgDSI2 had similarity, in terms of modular structure and protein size of about 70 kDa, with Egs from Bacillus sp. AC1 isolated from Mollusca (Li et al. 2006), endo phytic B. pumilus strain CL16 (Lima et al. 2005), and EG IV from Bacillus sp. KSM522 isolated from soil (Hit omi et al. 1997). Enzymes with a modular organization of GH9CBM3 always have the same properties, such as active in a wide pH range, high optimum temperature and thermostable (Kanchanadumkerng et al. 2017).
The amino acid composition of the EgDSI2 was de termined by the ProtParam program (https://web.expasy.o rg/protparam/) (Gasteiger et al. 2005) and is displayed in a pie chart (Figure 7). The amino acid composition of the thermostable EgDSI2 was compared with thermostable Eg from Bacillus sp. strain KSM635 (PDB ID 1G0C), ther molabile Eg from mesophilic bacteria B. agaradhaerens (PDB ID 7A3H) and thermolabile Eg from mesophilic bacteria B. licheniformis (PDB ID 2JEN). The compari son was performed to find the abundance of certain amino acids in the thermostable Egs. The comparison of amino acid compositions of the four Egs is shown in Table 4.
The thermostable EgDSI2 had a higher percentage of charged amino acids (Arg, Lys, Asp, Glu, His) than the thermolabile Eg 2JEN, but the percentage was lower than the thermolabile Eg 7A3H. The thermostable EgDSI2 had a lower percentage of charged amino acids than the ther mostable Eg 1G0C, which also had a lower percentage of charged amino acids than Eg 7A3H. In general, ther mostable proteins have more charged amino acids capable of forming ion pairs that function as weak bonds to main tain molecular conformation of the proteins (Kumar et al. 2007). However, the role of ion pairs in protein thermosta bility is controversial. Most experiments and compar isons between thermostable and thermolabile proteins had shown that ion pairs had a role in increasing thermostabil ity. On the other hand, some studies had indicated that ion pairs/salt bridges usually destabilized the native conforma tion of proteins. The reason for the destabilization effect of an ion pair is that the formation of the ion pair in the pro tein core has a desolvation penalty, which is energetically unfavorable and was not fully compensated (Sadeghi et al. 2006). Formation of ion pairs and their networks did not completely account for all the charged amino acids found in a protein (Kajander et al. 2000). The high number of ion pairs in thermostable proteins was not due to the high number of charged residues, but was due to the high num ber of positively charged residues that bound to negatively charged residues (Sadeghi et al. 2006). The lower charged amino acids content in the thermostable EgDSI2 than the thermolabile Eg 7A3H could be an indirect consequence of the high percentage of hydrophobic amino acids. Glu percentage in the thermostable EgDSI2 was lower than the thermolabile Eg 7A3H but was higher than the ther molabile Eg 2JEN. Glu percentage in the thermostable Eg 1G0C was higher than both 7A3H & 2JEN. Nevertheless, Sadeghi et al. (2006) found that both thermostable proteins and thermolabile proteins from mesophilic organisms have a high content of Glu. This eliminated Glu from having a role in increasing protein thermostability.
The percentage of polar amino acids (Gln, Asn, Ser, Thr, Tyr, Cys) in the thermostable EgDSI2 was higher than the thermostable Eg 1G0C. The percentage of polar amino acids in the thermostable EgDSI2 was higher than the thermolabile Eg 7A3H but lower than the thermola bile Eg 2JEN. Due to this discrepancy, it was difficult to draw a conclusion from this comparison. However, in the thermostable EgDSI2 itself, polar amino acids were less frequent compared to hydrophobic amino acids. The low content of polar amino acids was in agreement with the study by Sadeghi et al. (2006). The reason behind this low content of polar amino acids could be to avoid deamination and backbone cleavage in proteins (Sadeghi et al. 2006). Polar amino acids could form hydrogen bonds. So it was also assumed that a high percentage of polar amino acids would probably increase the hydrogen bonding capability. A hydrogen bond is formed by three atoms, i.e. one H atom and two electronegative atoms (often N or O) (Jeong et al. 2003). Hydrogen bonds are important to stabilize protein structure (thus retain its activity) under extreme temperature (Ishak et al. 2019). The number and types of hydrogen bond in a protein are related to its thermosta bility (Kar andScheiner 2004; Ragone 2001). The study by Dalhus et al. (2002) found that malate dehydrogenase (MDH) from thermophilic bacteria had the highest number of hydrogen bonds compared to MDH from moderate ther mophilic bacteria and mesophilic bacteria. In addition, the study by Sadeghi et al. (2006) showed similar results that thermostable proteins had a higher number of hydrogen bonds than thermolabile proteins from mesophilic organ isms.
The percentage of the hydrophobic amino acids (Ala, Gly, Ile, Leu, Met, Phe, Val, Pro, Trp) in the thermostable EgDSI2 and the thermostable Eg 1G0C were higher than the thermolabile Eg 2JEN but lower than the thermolabile Eg 7A3H. The study by Sadeghi et al. (2006) showed that thermostable proteins contained a high percentage of hy drophobic amino acids. Ala in the thermostable EgDSI2 and the thermostable Eg 1G0C were present in a higher percentage compared to the thermolabile Egs 7A3H & 2JEN. This was consistent with the results of the study by Kumwenda et al. (2013) which reported that proteins from extreme thermophilic bacterium Thermus thermophilus HB27 had a greater frequency of Ala than proteins from thermotolerant bacteria T. scotoductus SA01. The per centage of Pro in the thermostable EgDSI2 and the ther mostable Eg 1G0C were higher than both 7A3H and 2JEN. This was in accordance with the results of other studies that Pro was more frequent in thermostable proteins than in thermolabile proteins (Watanabe et al. 1991; Sadeghi et al. 2006. Pro had been used in mutational studies to improve protein stability (Veltman et al. 1996; Van den Burg et al. 1998. Pro is the only amino acid whose side chain binds to the central α carbon at the carboxyl group (COOH) and to the nitrogen atoms in the amino group (NH2) forming a ring structure (Koenig et al. 2018). This ring structure re stricted conformational freedom of CαN rotation (Yu et al. 2015). Pro also restricted the conformation of the preced ing amino acid in a protein sequence (Bajaj et al. 2007). Pro was known to rigidify flexible regions and enhanced thermostability (Yu et al. 2015). These results indicated that the thermostable EgDSI2 was a more rigid enzyme than the thermolabile Egs 7A3H & 2JEN.
However, the percentage of Trp and Gly in the ther mostable EgDSI2 and the thermostable Eg 1G0C were lower than the two thermolabile Egs. (Sadeghi et al. 2006) also reported that thermolabile proteins had a high content of Trp. This finding, along with the result by (Sadeghi et al. 2006) indicated that Trp did not account for ther mostability. Gly has a simple side chain, a hydrogen atom, and has greater conformational freedom so that it can pro vide flexibility for adjacent residues (Yan and Sun 1997). Reduction in the Gly content in thermostable proteins re duced flexibility and thus enhanced rigidity, preventing the protein from unfolding at high temperature (Ladenstein and Antranikian 1998). The lower Gly percentage in ther mostable EgDSI2 indicated that it had greater rigidity than the thermolabile Egs 7A3H & 2JEN.

Conclusions
This study described the cloning and molecular charac terization of endoglucanase (Eg) from B. safensis DSI2. Amino acid composition analysis showed that EgDSI2 had the characteristics of thermostable proteins, i.e. high per centage of hydrophobic residues, high Pro content, and low Gly content. The modular organization of GH9 CBM3 indicated that EgDSI2 had properties of being ac tive in a wide pH range, a high optimum temperature, and thermostable. These properties are advantageous to be ap plied in industry.