In silico characterization and comparison of the fruit ripening related beta‐ amylase ( BAM ) gene family in banana genome A and B

Banana is one of the most important commodities for maintaining global food security. Primary metabolic processes during the ripening of banana greatly affect post‐harvest quality, particularly in starch metabolism. The beta‐ amylase ( BAM ) gene family is known as a group of genes that plays an important role in starch metabolism regulation. In this study, we focused on the characterization and comparative analysis of the BAM gene family in DH Pahang and Pisang Klutuk Wulung (PKW) varieties, these being the AA and BB genomes, respectively. The sequences of BAM gene family were retrieved from the database of Musa acuminata ’DH Pahang’ and Musa balbisiana ’PKW’ genome, then structural and functional characterization was performed, followed by identification of cis‐acting elements in the BAM promoter regions. The results showed that the BAM gene family structure was relatively conserved in both genomes, and a putative BAM11 gene was found, the function of which has not been studied in other plants. Cis‐acting element analysis showed that they were distinct in the copy number and types of elements that were responsive to various phytohormones. This study suggested that the BAM genes involved in ripening are spatiotemporally regulated. However, further functional genomic analysis is required to describe the specific role and regulation of BAM genes during ripening in banana.


Introduction
Banana is one of the most important crops in the world, playing a key role in maintaining global food security and as a source of income for bananaproducing coun tries (FAO 2020). As a climacteric fruit, banana is har vested when the fruit is physiologically mature, and then it will ripen after being picked from the plant (Dwivany et al. 2016). The common ripening treatment is that ma ture green bananas are treated with ethylene to acceler ate the ripening processes by inducing changes in primary metabolisms (Pathak et al. 2018). The process of pulp soft ening and sweetening is very important in determining the quality of banana fruit, and is mainly determined by starch degradation. Unfortunately, the process of starch degrada tion in banana which has high starch content, is still poorly understood, even though this process is also responsible for providing energy for other metabolic processes during ripening (CordenunsiLysenko et al. 2019).
Degradation of starch granules in amyloplasts during ripening involves various hydrolase enzymes. Accord ing to Nascimento et al. (2006), βamylase (BAM) is vi tal in the process of total starch degradation because of its role at the final stage of starch degradation and abil ity to cleave starch chains at a specific position to produce the final product maltose. Furthermore, increased expres sion of BAM genes is known to correlate with decreased starch content during banana ripening, and vice versa, sup ported by many studies (Nascimento et al. 2006; Jourda et al. 2016; Miao et al. 2016; Xiao et al. 2018; Cordenunsi Lysenko et al. 2019). Studies on the BAM genes in ba nanas currently focus on transcriptome profiling and in silico genomic analysis to identify members of the BAM gene family associated with starch mobilization that are regulated by ethylene during fruit ripening. Jourda et al. (2016) identified 13 MaBAM genes through in silico anal ysis of the Musa acuminata genome, and Xiao et al. (2018) reidentified 11 MaBAM genes from the GenBank and Ba nana Genome Hub databases.
Currently, the genomic sequences of bananas were ob tained from the Musa acuminata 'DH Pahang' (D'hont et al. 2012) and Musa balbisiana 'Pisang Klutuk Wulung' (Davey et al. 2013) varieties. Starchy bananas have a higher starch content than dessert bananas, and is con sidered as one of the distinctive genomedirected pheno types. Dessert bananas generally belong to the M. acumi nata genome (A genome), while the starchy ones are gen erally characterized by the presence of both M. acuminata and M. balbisiana genome (A and B genome) in their ge nomic background (OECD 2010).
Therefore, based on this information, the aim of this study is to identify and compare Betaamylase (BAM) genes in M. acuminata 'DH Pahang' and M. balbisiana 'Pisang Klutuk Wulung' (PKW), based on the structure of the genes and proteins, as well as to predict cisacting ele ments on the gene promoter regions. The results are fun damentals for future research in improving carbohydrate compositions in bananas in relation to nutrition or palata bility.

Retrieval of the BAM gene family sequences from DH Pahang and PKW database
MaBAM gene sequences were obtained from the M. acuminata 'DH Pahang' v.1 database on the Banana Genome Hub website (https://bananagenomehub. southgreen.fr/; (Droc et al. 2013), including gene accessions, coding sequences (CDS), and annotated proteins.
MaBAM nucleotide sequences were then used to identify the MbBAM genes through BLASTN searches against the M. balbisiana PKW v.0 database (https://bananagenomehub.southgreen.fr/blast; (Droc et al. 2013). The parameter used was a cutoff value of 10 (10) and it was performed on the pseudochromosome database. The selected MbBAM gene accessions were the top hits that were considered of their % identity score, chromosome number, and structural completeness. MaBAM genes used in this study refer to the research of Xiao et al. (2018) who identified 11 MaBAM genes from the GenBank and Banana Genome Hub databases, based on BLASTP searches against Arabidopsis thaliana and Oryza sativa databases available in NCBI Genbank (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM= blastpdanPAGE_TYPE=BlastSearchandLINK_LOC= blasthome; (Clark et al. 2016). Further information regarding the dataset is listed in Suppl. Table 14.

Prediction of the structure of the BAM genes and the protein sequences
The structural features of the MbBAM genes were predicted and annotated using the FGENESH+ pro gram (http://www.softberry.com/berry.phtml?topic= fgenes_plusdangroup=programsdansubgroup=gfs; (Solovyev et al. 2006) with the parameter of organism specific genefinding was set to be M. acuminata or Dwarf Banana. This program combines ab initio and similaritybased approaches to improve its accuracy, due to the use of protein sequences that have high similarities to the targeted nucleotide sequences (Xiong 2006). The MbBAM gene sequences were obtained based on their similarity to the MaBAM gene sequences, therefore FGENESH+ approaches can be applied in predicting the structure of the MbBAM genes. The reference protein sequences used were MaBAM proteins obtained from the Banana Genome Hub website, then the structural features of all BAM genes were visualized using the Gene Structure Display Server 2.0 (GSDS) program (http://gsds.cbi.pku.edu.cn; (Hu et al. 2015) with queries of genomic sequences and CDS of the BAM genes. The putative product sequences of the MbBAM genes were also obtained using FGENESH+, together with the results of the prediction of the structure of the genes.

Similarity snalysis of BAM nucleotide and protein sequences between A and B genome
MaBAM nucleotide and protein sequences were com pared with MbBAM using Pairwise Sequence Alignment method on the EMBOSS Needle website (https://www.eb i.ac.uk/Tools/psa/emboss_needle/; (Madeira et al. 2019) with standard parameters, in order to obtain the similarity percentage between the nucleotide and protein sequence pairs between the two genomes (e.g. MaBAM1 to Mb BAM1, and so on). EMBOSS Needle uses the Needleman Wunsch algorithm to perform alignment so that each se quence comparison has the same length. Each BAM se quence was aligned and analyzed throughout the entire se quence (Xiong 2006).

Functional annotations of the BAM gene family
Putative MbBAM proteins were annotated us ing the BLASTP program on the NCBI GenBank (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM= blastpdanPAGE_TYPE=BlastSearchandLINK_LOC= blasthome; (Clark et al. 2016) against the Reference Proteins database (refseq_protein), followed by the BAM protein domain and motif analysis from both genomes. The BAM protein domains were analyzed using the NCBI Conserved Domain Search program (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi; (Lu et al. 2020) against the CDD (Conserved Domain Database) v.3. The results of the domain analysis were visualized using IBS (Illustrator for Biological Sequence) software. The BAM protein motifs are analyzed and visualized using the MEME program (http://memesuite.org; (Bailey et al. 2015) with the 'Site Distribution' parameter set to Zero or One Occurrence Per Sequence (zoops) and the 'Number of Motif' pa rameter set to 15 motifs per search. The results of the motif analysis are annotated using the InterProScan pro gram (http://www.ebi.ac.uk/interpro/search/sequence/; (Mitchell et al. 2019).

Phylogenetic tree construction of the BAM protein family
Phylogenetic analysis was performed on the BAM protein sequences from M. acuminata 'DH Pahang' and M. bal bisiana 'PKW', as well as BAM proteins in A. thaliana and O. sativa from research dataset of Xiao et al. (2018). First, Multiple Sequence Alignment was performed us ing the MUSCLE program (Edgar 2004), and then manual trimming was performed on the alignment results using the BioEdit v.7.2.5 program (Hall 1999) to remove ends with long gaps. The phylogenetic tree was then constructed us ing the MEGA X program (Kumar et al. 2018), with the MaximumLikelihood method and the bootstrap value of 1000 replicates.

Prediction of Cis-acting elements in the BAM promoter regions
The 2000 bp upstream genomic sequences representing the promoter regions of MaBAM and MbBAM genes were retrieved from genome sequences of M. acuminata 'DH Pahang' and M. balbisiana 'PKW' using SnapGene v.5.1 software, then putative cisacting element prediction was performed on these sequences using the PlantCARE program (http://bioinformatics.psb.ugent.be/webtools/pla ntcare/html; (Lescot et al. 2002).

Characteristics of the BAM gene family with similarity analysis of nucleotide sequences between A and B genome
The structure of BAM genes was visualized using the GSDS 2.0 program (http://gsds.cbi.pku.edu.cn; (Hu et al. 2015), thus we obtained a diagram that displays the exon, intron, 5' and 3'UTR structures of 11 BAM gene pairs (Fig  ure 1). When compared between the two genomes, almost all of the BAM gene pairs have similar exonintron compo sitions, with a maximum difference of one exon number. However, BAM2, BAM10, and BAM11 gene pairs have less similar exon structures when compared to other pairs. This result is reflected in the similarity of nucleotide sequences between DH Pahang and Pisang Klutuk Wulung (PKW) ( Table 1). Based on these data, it is clear that even after com paring the entire length of the sequence, high percentage of similarity was still obtained. Therefore, it is suggested that most of the BAM gene pairs have conserved nucleotide sequences between the two genomes. However, BAM2, BAM10, and BAM11 are exceptional, in which they are less similar in terms of the structure and nucleotide sequences. Sequence variation in BAM genes may indicate a genetic evolutionary event between the two genomes.

Prediction of the BAM protein domain and motif with similarity analysis of protein sequences between A and B genome
Domain analysis was performed on the putative BAM pro teins, then the results were visualized to obtain the diagram shown in Suppl. Figure 1.  (Reinhold et al. 2011).
The βamylase enzymes typically act as a monomer composed of an (α/β)8 barrel with a deep catalytic cleft where two molecules of maltose can bind to it, and it also contains two conserved glutamic acid residues that act as acid and base during hydrolysis (Mikami et al. 1994; Monroe et al. 2018). Compared to the general β amylases with monomeric active form, there is AtBAM2 that acts as a tetramer, with a dimerofdimers structure and a putative secondary binding site (SBS) for starch. Each dimer consists of two BAM subunits, creating a "starchbinding groove" lined by conserved residues iden tified as the SBSs, where a starch chain is expected to bind to the deep groove .
As for individual BZR1 protein, various researches have been conducted on M. acuminata, for example in a study by Shan et al. (2019), it is stated that the MaBZR1/2 protein acts as a repressor for fruit ripening genes that re spond to BR signaling. Therefore, based on these two studies, it is suggested that the BAM1 and BAM3 protein pairs in DH Pahang and PKW also act as a transcription factor for the banana ripening genes, but this assumption  still has to be proven through wet bench experiments.
The BAM11 protein pair also has a second domain, namely the MRL1 domain, but the research on the MRL1 protein itself is still limited. The MRL1 protein is mainly known to play a role in processing and stabilizing the mRNA of the rbcL gene, thereby increasing Rubisco bio genesis, and ultimately increasing atmospheric CO 2 fix ation (Johnson et al. 2010). When associated with ba nana ripening, it is known that high CO 2 levels can in hibit the ethylenedependent ripening process (Larotonda et al. 2008). However, the reason why MRL1 domain was found in the BAM11 protein still cannot be inferred, be cause the connection between GH14 and MRL1 domains is still not found yet.
Then the BAM protein motif prediction was per formed, in which 15 conserved motifs in DH Pahang and PKW were found (Figure 2). Of all motifs, there are 8 mo tifs annotated as Glycoside Hydrolase family 14 (GH14) domains, and 2 motifs annotated as Glycoside Hydrolase Superfamily, but the rest of them (5 motifs) were not found in the InterPro database. From the InterProScan search, Gene Ontology (GO) annotations were also obtained for the GH14 domain motif, so through in silico analysis, it is suggested that all these putative proteins are involved in polysaccharide catabolism and betaamylase activity. In addition, when compared between the two genomes, all of these BAM protein pairs have the same motif composition, except for the BAM10 protein pair.

Alignment of BAM proteins in DH Pahang and PKW
showed that all BAM isoforms share a huge conserved re gion around amino acid (aa) 231676. Moreover, each protein pair from both genomes has really similar amino acid residues in the whole sequences (Suppl. Figure ??2). These results are reflected in the similarity data of the BAM protein sequences between DH Pahang and PKW (Suppl . Table 5), which showed that almost all protein sequence pairs have high similarity (> 90%). All these re sults may indicate the preservation of structure of BAM proteins within the protein family as well as from both genomes (Miao et al. 2016). However, there is an excep tion for the BAM10 and BAM11 protein pairs which have lower similarity than the other pairs (range 6070%), and this is thought to be related to the large difference in pro tein length, and for the BAM10 pair, it is also related to the less conserved motif composition. These results are consistent with the nucleotide analysis discussed earlier. Therefore, it can be concluded that the BAM10 and BAM11 gene in both genomes are less conserved than the other genes when assessed from the structure, although this as sumption also has to be proven through wet bench exper iments.

Phylogenetic analysis of the BAM protein family in M. acuminata, M. balbisiana, A. thaliana, and O. sativa
When we compared all BAM proteins from DH Pahang and PKW, we obtained four groups with the same motif composition (Suppl. Table 6). These data correspond to the phylogenetic tree containing the BAM protein family members in M. acuminata, M. balbisiana, A. thaliana, and O. sativa (Figure 3). In the phylogenetic tree, we obtained four main clades which correspond to the BAM protein groupings mentioned earlier. Therefore, it is suggested that BAM protein pairs with similar sequences join to gether in the same clade, and considering that amino acid sequences greatly affect the protein function, it can be as sumed that each of these clades also contains BAM iso forms among M. acuminata, M. balbisiana, O. sativa, and A. thaliana. Especially for the BAM1 and BAM3 protein pairs in M. acuminata and M. balbisiana, they appear to be in the same clade with the putative isoforms, namely AtBAM7 and AtBAM8. In this phylogenetic analysis, a homology analysis can be carried out between each pair of BAM10 and BAM11 proteins. It is evident that each BAM10 and BAM11 pro tein pairs congregate in one closest clade, and this result is sufficient to show that the BAM10 and BAM11 protein pairs are homologous to each other, although the similarity of the sequences is indeed lower than other protein pairs.
In climacteric fruits, such as bananas, ethylene is the main regulatory factor that controls the fruit ripening process, although there are ethyleneindependent processes as well (Pathak et al. 2018). However, it is indicated that the ac tivities of other phytohormones can also interconnect and form signaling networks that coordinate fruit ripening as well, although the mechanisms are still less explored in the genus Musa.
The prior study of cisacting elements in BAM genes has revealed that the regulation of a BAM gene expres sion in banana fruit may involve more than one phyto hormone, where (Miao et al. 2016) identified cisacting elements presumably responsive to multiple hormones such as auxin, abscisic acid, and methyl jasmonate in 16 MaBAM genes. For example, for phytohormones which play a role in inducing fruit ripening, there are ABA and methyl jasmonate, and for phytohormones which inhibit fruit ripening, there are auxin, gibberellin, and salicylic acid (CordenunsiLysenko et al. 2019). In this study, we identified elements responsive to these hormones, there fore it is suggested that the BAM genes in DH Pahang and PKW possibly involve regulatory networks of various phytohormones during banana ripening.
In addition, abundance data of cisacting elements grouped by their function was also collected and then pro cessed into a graph of total cisacting elements in A and B genome ( Figure 4) and a graph of cisacting element num bers in each BAM gene ( Figure 5).
Based on Figure 4, it is evident that ethylene and auxinresponsive elements have exactly the same abun dance between the two genomes, whereas gibberellin and salicylic acidresponsive elements have similar numbers. This result may indicate that the hormoneresponsive ele ments in the BAM gene family also tend to be conserved between the two genomes. Then based on Figure 5, it is clear that the number of cisacting elements tends to be dominated by MeJA (methyl jasmonate) and ABA (ab  scisic acid)responsive elements, therefore it is suggested that these two elements are more conserved than the other elements. In addition, there is a slight difference in the copy number and type of cisacting element between each gene pair, which means that the phytohormones involved may be different between the two genomes. Therefore, these results may indicate differences in BAM genes' reg ulatory networks between A and B genome. Then con sidering that the BAM genes are inducible genes, the dif ferences in the cisacting elements composition may also indicate a difference in transcription time (e.g. at different maturation stages) or location (e.g. in different tissues) of the BAM genes between the two genomes (Biłas et al. 2016).
However, even though there are slight variations, all gene pairs still have similar cisacting element composi tions, therefore the promoter of each gene pair can still be considered as conserved between the two genomes. This cisacting element study can provide an initial overview of the regulatory network of the BAM gene family in DH Pahang and PKW during banana fruit ripening. But in fact, for endogenous genes, there are still many endoge nous factors that need to be considered, such as the pres ence of transcription factors and posttranscriptional regu lations. Therefore, these cisacting element data obtained from this in silico analysis may not be fully correlated with the gene expression data from experimental results.

Conclusions
In conclusion, BAM proteins in M. acuminata 'DH Pa hang' (A genome) and M. balbisiana 'Pisang Klutuk Wu lung' (B genome) contain wellconserved structures as characterized by the presence of Glycoside hydrolase fam ily 14 (GH14) domain. It is also suggested that there is a putative BAM11 gene in bananas which role has not been studied in other plants, and lastly, there is a slight differ ence in the composition of the cisacting elements, indi cating differences in the regulatory network of the BAM genes from the two genomes. All results in this study were entirely obtained from in silico analysis, hence validation through functional genomic analysis is suggested for these genes.