Cloning and characterization of bgl6111 gene encoding β‐glucosidase from bagasse metagenome

β‐Glucosidase (BGL) is an essential enzyme for the hydrolysis of cellulose in industrial processes, but natural BGL enzymes are poorly understood. Metagenomics is a robust tool for bioprospecting in the search for novel enzymes from the entire community’s genomic DNA present in nature. The metagenomics approach simplifies the process of searching for new BGL enzymes by extracting DNA and retrieving its gene information through a series of bioinformatic analyses. In this study, we report the gene cloning, heterologous expression of the bgl6111 gene (accession number MW221260) in Pichia pastoris KM71, and the biochemical characterization of the recombinant enzyme. We successfully identified the bgl6111 sequence of 2,520 bp and 839 amino acids with a molecular size of 89.4 kDa. The amino acid sequence of the bgl6111 gene showed 67.61% similarity to BGL from an uncultured bacterium (ABB51613.1). The BGL product has the highest activity on the third day at 1.210 U/mL, categorized as low production. The enzymatic activity could enhance up to 539.8% of 7.742 U/mL by using the ultrafiltration method. Our findings provide insightful information that bgl6111 obtained from bagasse metagenome could be an alternative candidate for industrial applications in the future.


Introduction
The βglucosidases (BGL) (EC 3.2.1.21)are enzymes that belong to the cellulase enzyme complex group and classi fied among the glycoside hydrolases (GH) (Ahmed et al. 2017; Rouyi et al. 2014).These enzymes play a crucial role in hydrolyzing glycosidic bonds to release nonreduc ing terminal glucosyl residues (Rouyi et al. 2014).BGL enzymes are vital contributors to the hydrolysis of cellu lose in various biotechnology industrial processes, includ ing composting (Zang et al. 2018), breakdown of naringin in grapefruit juice (Prakash et al. 2002), and ethanol pro duction (Tang et al. 2013).Additionally, BGL enzymes find applications in producing aromatic compounds for the flour industry, wine production, hydrolysis of antho cyanin products, enhancing organoleptic qualities of fruits and juices (Singh et al. 2016), and serving as additives in animal feed to improve food digestibility (Singhania et al. 2016).Despite their wideranging applications, the explo ration of natural BGL enzymes remains relatively limited.
Nature offers a diverse array of BGL enzymes with differ ent types and characteristics that hold potential for devel opment in industrial processes.
Cloning and expression of recombinant protein meth ods are necessary for enzyme exploration (Zhao et al. 2013).Researchers have explored various bgl encoding BGL enzymes produced by microorganisms.It was suc cessfully isolated the bgl gene from Bacillus subtilis and expressed in Escherichia coli, to produce the recombi nant enzyme with 54.04 U/mg specific activity Chamoli et al. (2016).Yang et al. (2015) expressed the bgl gene from Thermoanaerobacterium aotearoense in E. coli with a specific activity of 740.5 U/mg.In addition, to isolate the bgl gene from microorganisms, it can be obtained di rectly from the environmental DNA (Mercedes et al. 2016; Matsuzawa et al. 2017).Genes in the natural environment extracted directly via a metagenomic approach are poten tial sources of unexplored enzymes.This potential allows researchers to explore the genetic resources that have not been revealed (Prayogo et al. 2020).
The needs for this enzyme are highly demand in the current years due to the multifunctional purpose of β glucosidases (BGL) as nutraceuticals and pharmaceuti cals because of their recognition ability, signaling pro cesses, and antibiotic properties (Bhatia et al. 2002).The novel BGL from natural sources is abundant in the envi ronment.The potential of the bgl gene can be explored based on environmental characteristics.The composition contained in the environment can characterize the types of genes present in the environment.For example, sugar cane bagasse pile is one of the potential environments for exploring the bgl gene.It represents a unique ecological characteristic with a high lignocelluloserich environment.Microbial communities in this environment provide valu able functional gene resources for discovering lignocellu lolytic enzymes (Mhuantong et al. 2015).Exploring the bgl gene in the sugarcane bagasse environment may lead to the discovery of a novel BGL enzyme as an alternative candidate enzyme for industrial purposes.
The aim of this research was to attempt to clone the se quence from several lignocellulolytic enzymes which was previously constructed and identified through activity based screening from the bagasse metagenomic library (Kanokratana et al. 2015; Mhuantong et al. 2015) .This re combinant enzyme was constructed in the pPICZαA plas mid using E. coli DH5α as a propagation host and ex pressed in Pichia pastoris KM71.Our result from se quence analysis revealed unique and conserved biomass degrading enzymes in this metagenomic library denoted as bgl6111, a βglucosidase (BGL) novel enzyme simi lar to the BGL from uncultured bacterium (ABB51613.1).We also provide the predicted structural model of bgl6111 which the recombinant protein product is closely related to GH3 and analysis of the amino acid showed the molec ular weight of this protein of 89.4 kDa.Furthermore, we present the simple ultrafiltration method to enhance the en zyme activity of the bgl gene, which is 539.8% higher if compared with nontreatment.

Microbial strain, plasmid, and culture
The purchased vector pPICZαA from Invitrogen was used as the DNA vector for gene construction.The bgl6111 gene from the bagasse metagenomic library collected from the bagasse pile at Phu Khieo BioEnergy Chaiyaphum Province, Thailand, was used as a DNA insert (Mhuan tong et al. 2015).The resulting recombinant DNAs were introduced into E. coli DH5α by electroporation (Biorad) with 15 kv/cm, 100 Ω, and 25 μF.E. coli DH5α was used as the propagation host, and P. pastoris KM71 was used as the expression host (Haniyya et al. 2021).

Construction of cloned bgl expression vector
A fulllength sequence of the bgl gene (contig no.6111) was successfully identified from the metagenomic library constructed from the bagasse pile.The gene was ampli fied using genespecific primers BGL6111/EcoRI/F (5′ GAATTCATGGCATGCGTGCTCGCAGCCTTT 3′) and BGL6111/XbaI/R (5′TCTAGATCATCCCGTGCACG GAAGGGTGCC 3′).A total of 50 µL of 50 ng DNA template, 1× Phusion GC buffer, 200 µM each of dNTPs, 0.5 µM each of the primers, and one unit of Phusion DNA polymerase were used for PCR amplification.The amplification was conducted at 95 °C for 5 min and involved 25 cycles of denaturing at 95 °C for 30 s, annealing at 55 °C for 30 s, elongation at 72 °C for 3 min (extension efficiency was 30 s/1 kb), and a final extension at 72 °C for 10 min.
The GeneJET Gel Extraction Kit (Thermo Scientific) was used to purify the PCR product.EcoRI and XbaI di gested the bluntended purified PCR product to make a sticky end.The digestion reaction involved 2× concentra tion of Tango buffer, 1 μg of PCR product, and five units of EcoRI and XbaI incubated at 37 °C for two h.The di gested fragment was then ligated to the pPICZαplasmid.The ligation mixture contained 50 ng of insert DNA, 50 ng of plasmid, 1× concentration of T4 ligase buffer, and five units of T4 ligase enzyme incubated at 22 °C for 16-18 h.The ligation mixture was then transformed into E. coli DH5α by using the heat shock method.The transfor mants harboring the corrected recombinant plasmid were selected from LB agar containing 25 µg/mL of Zeocin.The transformants were screened by colony PCR using 5′ AOX1 forward and 3′AOX1 reverse primers.The gene sequence in the plasmid was subsequently confirmed by conventional sequencing (Macrogen).

Screening and expression of P. pastoris KM71 transformants
The recombinant plasmid pPICZαAbgl6111 present in E. coli DH5α were segregated using the GeneJET Plas mid Miniprep Kit.Subsequently, the fragments were linearized by PmeI to facilitate the integration of the pPICZαA-bgl6111 and the AOX1 locus of P. pastoris KM71.The linearized fragments were transformed into P. pastoris KM71 via electroporation and then cultured into YPD containing 100 μg/mL of Zeocin at 30 °C for three days.A total of 300 colonies of the transformants were collected by dots on the YPD containing 100 μg/mL of Zeocin.The positive clones were confirmed using colony PCR with 5′AOX1 forward and 3′AOX1 reverse primers and OUTAOX_F and REVα965R.The PCR reaction was carried out with 2× GoTaq Green buffer (Promega) in a total volume of 25 μL.The amplification was conducted at 95 °C for 5 min, followed by 25 cycles at 95 °C for 30 s, 55 °C for 30 s, and 72 °C for 3 min.A final extension was conducted at 72 °C for 10 min.
Positive transformant colonies were randomly se lected from the master plate and then cultured on 50 mL of BMGY incubated at 30 °C until the OD 600 reached 2-6.The culture was centrifuged at a rate of 3,000 × g for 5 min.The culture pellets were grown in 10 mL of BMMY media and incubated at 30 °C for three days.Induction was maintained for three days through the daily addition of methanol to maintain the methanol concentration at 0.5%.The enzyme activity was measured every day for three days (Day 1 = D1; Day 2 = D2; Day 3 = D3).
The ultrafiltration technique could enhance enzyme concentration (Nor et al. 2018).The BMGY culture was scaled up to 200 mL and incubated at 30 °C until the OD 600 reached 2-6.The culture was centrifuged at 3,000 × g for 5 min and cultivated in 40 mL of BMMY at 30 °C for three days.After that, the culture was centrifuged at 3,000 × g for 10 min.The resulting supernatant was concentrated using ultrafiltration (30 kDa cutoff centrifugal filter, Am icon).

Enzyme assay and SDS-PAGE analysis
The measurement of BGL activity involved using 1 mM p nitrophenylβDglucopyranoside (pNPG) as a substrate.The pNPG was dissolved in 50 mM citrate buffer with a pH of 5.0.The blank and substrate solution containing 80 μL of 1 mM pNPG in each tube was incubated at 50 °C for 5 min.After that, 20 μL of BGL enzyme was added to the substrate solution and then incubated at 50 °C for 10 min.The reaction was stopped by adding 25 μL of 1 M Na 2 CO 3 to the blank and substrate solutions.Exactly 20 μL of BGL enzyme was then added to the blank solu tion.The solutions were transferred into a 96well plate to calculate the concentration using spectrophotometry at a wavelength of 405 nm (Gao et al. 2016).The unit of BGL activity was defined as one nmol of pnitrophenol, which is released per milliliter of enzyme per minute under stan dard test conditions.Sodium dodecylsulfate polyacrylamide gel elec trophoresis (SDSPAGE) was conducted to confirm and determine the size of the protein product.Exactly 10 µL of the expression sample was mixed with 1× loading dye in a microtube.The mixture was then heated at 100 °C for 5 min and centrifuged at a speed of 2,000 rpm for 5 s at 4 °C.Approximately 20 μL of samples and ten μL of mark ers were inserted into the well.Markers (14.4-120 kDa) were used to calculate the molecular weight of the protein produced.SDSPAGE was run for 80 min with an elec tric current of 30 Ω. Coomassie blue staining was used to visualize the resulting protein product band.

Bioinformatic analysis
The BLAST was used for molecular analysis of the bgl6111 gene (Zhang et al. 2017).ExPASy Translate Tool (https://www.expasy.org/) was applied to translate the bgl6111 DNA sequence into an amino acid sequence and predict its molecular weight.The prediction of sec ondary structures was carried out with the Chou & Fasman method (Ashok Kumar 2013).The 3D structural model of bgl6111 protein was generated by Phyre 2 (Kelley et al. 2015).Multiple sequence alignment was analyzed using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo /) (Lu et al. 2013; Chamoli et al. 2016).The phylogenetic tree was built using MEGA 7 software for analysis based on the neighborjoining method under a bootstrap value of 1,000 (Dodda et al. 2018).The gene sequence has been submitted by the authors in the NCBI database with ac cession number MW221260.

Cloning of bgl6111 gene on pPICZαA
Cloning of the bgl6111 gene was initiated by amplify ing it to multiply and add the site of the EcoRI restric tion enzyme at the start (5′end) and XbaI at the end (3′ end).The amplification process used specific pairs of bgl6111/EcoRI_F and bgl6111/XbaI_R to produce a DNA fragment length of 2,535 bp. Figure 1 shows the band proving the presence of the bgl6111 gene.
Digestion was carried out on the DNA vector (pPICZαA) and DNA insert (bgl6111 gene) to make them compatible.After digestion, the DNA insert produced a 2,529 bp size, while the vector DNA fragments pro duced a 3,530 bp size.Both insert fragments and vectors were then ligated to produce the size of 6,059 bp.Re combinant plasmids (6,059 bp) were inserted into E. coli DH5α using the heat shock method.The transformation result showed that colonies were growing in the 25 µg/mL Zeocin media.Subsequently, the growing colonies were verified by colony PCR by using the bgl6111/EcoRI_F and bgl6111/XbaI_R primers.The verification result indicated that the bgl6111 gene was successfully transformed into E. coli DH5α (data not shown).The recombinant plasmid was then sequenced to ascertain the nucleotide sequence of the bgl6111 gene using Sanger sequencing method.

Expression of recombinant plasmid pPICZαA-bgl6111
The GeneJET Plasmid Miniprep Kit (Thermo Scientific) isolated the recombinant plasmid pPICZαA-bgl6111 from E. coli DH5α.PmeI digested purified plasmid recombi nants to linearize the fragments.The linear form of re combinant plasmids was then transformed into P. pastoris KM71 through the electroporation method.The result of the transformation revealed that colonies were growing in the 100 μg/mL Zeocin media.The 5′AOX1 forward and 3′AOX1 reverse were used to confirm the presence of ex pression cassettes (3,056 bp).Moreover, OUTAOX_F and REVα965R were used to confirm the integration of the AOX1 promoter site to some αfactor secretion signal sites ( 1,300 bp).The results showed that the bgl6111 gene was successfully integrated into the P. pastoris genome.Such integration indicates a successful transformation process.
The selected colonies were then expressed through a culture process to produce recombinant proteins by induc tion in a fedbatch fermentation mode (Looser et al. 2015).Methanol content in BMMY induced the gene expression process via the AOX1 promoter.SDSPAGE checked the presence of proteins, and the specific activity was mea sured with the pNPG substrate.The enzymespecific ac tivity checking was carried out every day for three days, where D1 was the first, D2 was the second, and D3 was the third.

Confirmation of the presence of BGL
The product enzyme was incubated with the pNPG sub strate and measured at a wavelength of 405 nm.This en zyme was expressed as a soluble protein with the activity of 0.775 U/mL at D1, 1.173 U/mL at D2, and 1.210 U/mL at D3.The ultrafiltration technique was used to increase enzyme activity.The resulting recombinant protein activ ity enhanced using ultrafiltration was 7.742 U/mL (Figure 2).The yield enhanced using ultrafiltration increased up to 539.8%.
The SDSPAGE method was used to prove the pres ence of recombinant BGL.According to the results of SDSPAGE (Figure 3), the H1 well showed the presence of a protein with a size of 89.4 kDa; this outcome matched the previous prediction.Meanwhile, the control showed no band in the gels.This result proved that the bgl6111 gene derived from the sugarcane bagasse metagenome sample could be expressed in P. pastoris KM71.

Sequence analysis of bgl6111 gene
The bgl6111 gene has a fragment length of 2,520 bp, en codes 839 amino acid proteins, and contains 67% GC.The BLAST analysis showed that the bgl6111 gene had a 67.61% similarity to BGL from uncultured bacterium (ABB51613.1).The recombinant protein product from the bgl6111 gene was predicted to have a molecular weight of 89.4 kDa.Theoretical isoelectric points and instabil ity indices were 5.41 and 40.44, respectively.An instabil ity index exceeding 40 is classified as an unstable protein (Shrestha et al. 2017).
Phyre2 analysis predicted the tertiary structure of bgl6111 proteins with a confidence level of 100% and identity of 49% based on the PDB template c3f93D, a gly cosyl hydrolase family 3 (GH3) from Pseudoalteromonas sp.BB1. Figure 4a shows the ligand site predicted by 3DligandSite is highlighted in green; Figure 4b shows cat alytic sites at Asp 320 as a nucleophilic site and Glu 520 as a proton donor (Wass et al. 2010; Kelley et al. 2015. Based on Pfam's analysis, three conserved domains were detected in the sequence of the bgl6111: the glyco syl hydrolase family 3 (GH3), Nterminal domain; GH3, Cterminal domain; and galactosebinding domainlike.These results indicated that the recombinant protein prod uct is closely related to GH3.The secondary structure prediction analysis using the Chou & Fasman method re vealed that the main structure of the recombinant BGL in this study was αhelix, with a percentage of 66.6% (Ashok Kumar 2013).
In this study, we used sequence information from the bagasse metagenomic library.Mhuantong et al. (2015).In the previous report, several genes contribute to the degra dation of lignocellulose.The bgl6111 gene was selected in this work and performed as a heterologous expression sample.The cloning and expression herein aimed to ex plore the discovery of new biomassdegrading enzymes from the bagasse pile environment.
Gene cloning was successfully carried out by produc ing positive transformants, i.e., E. coli DH5α.E. coli DH5α was used as the host for propagation.This strain has been used in various studies as it can multiply pDNA well (Borja et al. 2012; Trivedi et al. 2014).The identi fied bgl6111 gene has a length of 2,520 bp.Several re searchers found the gene encoding BGL from the environ ment with varying sizes.Del Pozo et al. (2012) found a gene encoding BGL (SRF2g14) with a size of 2,361 bp from bovine rumen microorganisms.GomesPepe et al. ( 2016) also found a gene encoding BGL (Bgl10) with a size of 2,300 bp from soil microorganisms.The differ ences in gene sizes indicate the diversity of genes in the environment.Genes come from various microorganisms but encode the same orthology function (Pearson 2013).
The fulllength bgl6111 gene was obtained via PCR using specific primers designed to modify compatible cloning sites with the vectors (pPICZαA) (Hoseini and Sauer 2015).The recombinant enzyme was successfully produced and secreted.Chen et al. (2011) successfully cloned the gene that encodes BGL in the pPICZαA plas mid.Wang et al. (2017) also succeeded in cloning the bgl2 gene in the pPICZαA plasmid.Both researchers took ad vantage of the plasmid features called the αfactor.It func tions as a signal in the process of secretion in P. pastoris.
The recombinant plasmid pPICZαA-bgl6111 was suc cessfully transformed into P. pastoris KM71 in the cur rent work.Genome integration of the insert occurred at the AOX1 sites by homologous recombination (Vogl et al. 2018).The enzyme products were verified by measuring the enzyme activity and conducting an SDSPAGE anal ysis.The enzyme activities of D1, D2, D3, and H1 are known below.SDS analysis also revealed low levels of gene expression.Such a result was marked by the thinness of the band generated from SDSPAGE (Choi and Geletu 2018).This outcome strengthens the argument of previ ous research about an existing relationship between low protein activity and thin bands in SDSPAGE results.
The process of protein expression is complicated.Var ious complex factors can affect the results of recombinant enzyme expression.The factors may come from genetic or cultural processes.Genetic factors include promoters, gene doses, gene sequences, and posttranslational protein modification (Wang et al. 2017; Yu et al. 2017).Yu et al. (2014) stated that the factors of the culture process, such as temperature, induction duration, and culture volume, could significantly influence the yield of recombinant pro tein production in P. pastoris.However, the study did not analyze those factors.It only measured the CAI (Codon Adaptation Index), which was supposed to be one of the causes of low gene expression (Chuck et al. 2009).
According to Behura and Severson (2013), codon bias is when specific codons are used more frequently than other synonym codons.It often occurs in heterogeneous gene expression.Quax et al. (2015) stated that the codon dimension could be seen through the CAI index value.Therefore, CAI analysis was carried out in the current work to ensure the use of codon bias with the help of a web based application from GenScript (www.genscript.com)(Farshadpour et al. 2015).The results revealed that the CAI value of the bgl6111 gene sequence was 0.47.CAI values are considered if approved from 0.8 to 1.0.A low CAI value indicates a low level of gene expression.Hence, codon bias may be concluded to be one of the causes of the low activity of BGL products in the current work.
An important factor affecting low expression is us ing a sample from metagenomic samples.Metagenomic samples are environmental DNA (eDNA) that directly ex tracted from the environment (Lewin et al. 2017).Vari ous types of microorganisms, including many uncultured microbes in the environment, can be identified through a metagenomic approach.Wooley and Ye (2009) explained that although metagenome samples are helpful in uncov ering the diversity of microorganisms, they are difficult to clone and express into a host.Unknown samples and lim ited information can be obtained from these genes, which are thus difficult to study, particularly in gene cloning and expression.
The new bgl6111 gene from the metagenomic library has been identified and heterologously expressed.The gene had a size of 2,520 bp with a GC content of 67%.BLAST analysis showed that the bgl6111 gene had a 67.61% similarity to BGL from uncultured bacterium (ABB51613.1).The bgl6111 geneencoded 839 amino acids, which were predicted to have a molecular weight of 89.4 kDa.Phylogenetic analysis revealed that bgl6111 was closely related to BGL from prokaryotic organisms.
The bgl6111 gene was cloned in the pPICZαA plasmid and was successfully expressed in P. pastoris KM71.Even though the band was thin, SDSPAGE results proved the presence of BGL protein products by showing a band of 89.4 kDa.The BGL product had the highest activity at D3 (1.210 U/mL).
These experimental results showed that bgl6111 might be defined as a βglucosidase with low activity.Although classified as low, the enzyme activity optimized by ul trafiltration could enhance up to 539.8% (7.742 U/mL).This study may provide valuable information for various novel βglucosidase from nature, which may be used as an alternative enzyme for industrial purposes.However, this result may explain not only the enzyme activity to hy drolyze the substrates but also that we could boost the ac tivity higher from the metagenomic approach.

Conclusions
The bgl6111 gene is a novel βglucosidase obtained from a metagenomic library from sugarcane bagasse.The bgl6111 gene was successfully cloned and expressed in Pichia pastoris KM71.The bgl6111 gene was success fully synthesized and produced the βglucosidase enzyme product.This enzyme may be helpful for its application in industrial production in the future and needs to be re modified to produce high activity.Moreover, as we did here, heterologously expressed functional enzymes by ge netically modifying the host organism can also be practical tools in developing enzymes with desired properties.

FIGURE 1
FIGURE 1 Gel PCR electrophoresis of the bgl6111 gene showed a band size of 2,535 bp.M = 100 bp plus DNA ladder (Thermo Scientific); S = sample bgl6111 gene.

FIGURE 2
FIGURE 2 The graph of β-glucosidase recombinant activity with a standard deviation line (D1 = first day of fermentation; D2 = second day of fermentation; D3 = third day of fermentation; H1 = optimized sample).

FIGURE 4
FIGURE 4 Structural model of bgl6111 protein generated by Phyre2.(a) Ligand site is highlighted in green, and (b) the catalytic site is highlighted in red.