Whole genome sequencing of Indonesian dengue virus isolates using next-generation sequencing

Indonesia is a tropical country and hyperendemic for dengue. The disease prevalently affected Indonesian and it caused high morbidity and substantial economic burden. This vector-borne viral disease is caused by infection of dengue viruses (DENVs), which are the member of Flaviviridae family. While most of dengue studies in Indonesia focused on the epidemiology, the clinical aspects, the vectors, and to certain extent the virology, there were still gaps in the DENVs genomic aspects. Considering their high mutation rate, the DENVs were known for their high genetic diversity and it might affect the characteristics of the viruses. Comprehensive DENV genomic data were thus important for many aspects of diseasemanagement, including virus surveillance, pathogenesis, diagnostics, antiviral drug design, and vaccine development. We established in this study amethod for DENVwhole genome sequencing using the advancedNext-Generation Sequencing (NGS) andNextera XT DNA library preparation kit, coupled with simplified bioinformatic analysis methods. The Indonesian DENVs from four serotypes were isolated from patients’ sera, while library was prepared from enriched templates and sequenced using Illumina NGS. Our study highlighted the potential of a robust NGS method in producing whole genome sequence of DENVs, which would be important for future dengue studies.


Introduction
Dengue disease is endemic in Indonesia and currently remains the most important arboviral disease in the country.Indonesia continually reports the highest number of dengue cases in the WHO Southeast Asia Region (WHO 2012).Dengue fever, an acute febrile disease, is caused by dengue virus (DENV) infection.Four DENV serotypes (DENV-1, -2, -3, and -4) circulate in tropical and subtropical regions of the world, including Indonesia.DENV is transmitted through the aid of Aedes mosquitoes as the vector, mainly Aedes aegypti and Ae.albopictus (Lambrechts et al. 2010).The bite of infected mosquito to a susceptible human host may lead to dengue clinical manifestations, ranging from asymptomatic or a mild flu-like syndrome known as classic Dengue fever (DF), to the more severe form known as Dengue hemorrhagic fever (DHF) and the potentially lethal Dengue shock syndrome (DSS) (Martina et al. 2009).The DENV genome is a ∼10.7 kb single-stranded positive-sense RNA encoding 3 structural (C, prM/M, E) and 7 non-structural (NS1, NS2A, NS2B, NS3, NS4A, NS4B, NS5) proteins (Guzman et al. 2010).DENV has very diverse genetic characteristics, as shown by the presence of four serotypes.The diversity is continued to the level of within serotype, in which there are several clusters of variants termed as genotypes (Holmes 2009).
Despite the endemicity of the disease, genomic data of DENV in Indonesia is scarce.The number of DENV complete genome sequences from Indonesia searched on Gen-Bank public database revealed approximately only 2.7% of the total number of DENV genomes available from all over the world (119/4,297 genomes, accessed mid 2018), suggesting that Indonesia is under-sampled for DENV genomic studies.Genomic data can provide many benefits such as powerful insights into disease transmission and dy- Yohan et al. Indonesian Journal of Biotechnology 23(2), 2018, 74-83 namics, especially for rapidly evolving RNA viruses such as DENV (Costa et al. 2012).Comparative analysis of DENV genomes has been used widely to study the genetic diversity of the dengue viruses (Holmes and Burch 2000) and previous studies have also suggested a correlation between viral pathogenicity with its genetic structures (Leitmeyer et al. 1999).The in-depth analysis of DENV genome is also important to supplement epidemiological data with information that can be used to reconstruct the history of epidemics in time and space (Yohan et al. 2018).
Given the endemicity of dengue in Indonesia and the large gap on DENV genomic data that is currently present, efforts to generate DENV genome data from all regions in Indonesia should be sought.Using the conventional Sanger micro-capillary sequencing, we have been able to sequence the complete genomes of 80 DENV isolates from Makassar (Sasmono et al. 2015).This genomic study was the first and only report on the Indonesia DENV genomes in which the sequencing was performed solely in Indonesia using the available sequencing facility in our Institute.
The advent of new technologies for rapid genome sequencing, in which many millions of nucleotides can be obtained in a single run, has greatly enhanced the endeavors in genomic studies (Capobianchi et al. 2013;van Dijk et al. 2014).The next generation sequencing (NGS) technologies allow rapid and cost-effective acquisition of fulllength genomes and enable significant contributions to multiple areas in virology, including virus discovery and metagenomics (Quiñones-Mateu et al. 2014).
In this study, we utilize the NGS technology to sequence DENV genomes.We presented NGS method that is relatively simple and can be used by scientists with limited resources and knowledge in bioinformatics.DENV isolated from archived serum samples were sequenced, and phylogenetic analyses were conducted to understand their genetic characteristics.

Sample preparation and DENV isolation in tissue culture
The study utilized archived specimens collected from dengue surveillance studies in Jambi (Haryanto et al. 2016) and East Kalimantan (manuscript submitted) in 2014 -2016.The representatives of each DENV serotype were tested for this method and are listed in Table 1.Ethical clearance for using archived specimens was granted from Eijkman Institute Research Ethics Commission document No. 113.Serum samples were separated from 3-5 mL of blood using a blood-clot activator Vacutainer (BD Biosciences, NJ, USA) and centrifugation at 3000 × g.The collected serum was stored in aliquots at -80°C.
The DENV-positive serum sample was proceed to virus inoculation attempt, using protocol described elsewhere (Fahri et al. 2013).In brief, an amount of 200 µL of serum was prepared in RPMI medium supplemented with 2% of Fetal Bovine Serum (FBS) (Gibco-Thermo Fisher Scientific, Carlsbad, CA) and inoculated into a monolayer of C6/36 (Aedes albopictus, midgut) cells in T-25 flask.The inoculated cells were allowed for virus adsorption for 1 h at 28°C incubator, followed by medium change and incubation up to 14 d.Culture supernatant was harvested when cytopathic effect (CPE) detected or at the end of incubation period and tested for the presence of DENV.

RNA extraction, DENV detection, and serotyping
RNA was extracted from 140 µL of serum sample or tissue culture supernatant using QIAamp Viral RNA mini kit (Qiagen, Hilden, Germany), according to protocol described by the manufacturer without the addition of carrier RNA.The extracted RNA was eluted in a total volume of 60 µL and stored at -80°C until use.The presence of DENV RNA and the corresponding serotype was determined by simultaneous real-time RT-PCR for DENV detection and serotyping assay using Simplexa Dengue kit (DiaSorin, Sallugia, Italy), as described elsewhere (Sasmono et al. 2014).

DENV cDNA preparation and PCR amplification of DENV whole genome fragments
The extracted RNA was subjected to cDNA preparation and whole-genome PCR amplification according to protocols previously described (Ong et al. 2008;Christenbury et al. 2010;Sasmono et al. 2015).The DENV cDNA was generated using extracted RNA as template and Superscript III Reverse Transcriptase (RT) enzyme kit (Invitrogen-Thermo Fisher Scientific), according to protocol recommended by the manufacturer.The RNAse OUT recombinant ribonuclease inhibitor (Invitrogen) was added to maintain the integrity of initial RNA and a set of DENV serotype-specific antisense primers covering the whole genome of DENV was used.The RT reaction was allowed to run at 50°C for 1 h and products were stored at -20°C until use.Five overlapping 2-3 kb fragments were generated using primer sets for each DENV serotype (Ong et al. 2008;Christenbury et al. 2010;Sasmono et al. 2015) using Pfu Turbo DNA Polymerase enzyme kit (Agilent-Thermo Fisher Scientific) and generic PCR cycling condition of pre-denaturing at 95°C for 2 min; 40 cycles of denaturing at 95°C for 30 s, annealing at 55°C for 1 min, and extension at 72°C for 4 min 30 s; followed by final extension for 10 min and storage at 4°C.Primers used in RT-PCR reactions are listed in Table 2.The expected size of PCR fragments was sliced and purified from 0.8% agarose gel using QIAquick Gel Extraction kit (Qiagen), according to protocol from the manufacturer.The purified PCR amplicons' concentration were measured using Qubit DNA BR Assay kit (Thermo Fisher Scientific) and adjusted to 0.2 ng/µL using EB buffer (Qiagen).All five normalized PCR amplicons were pooled for each sample and stored at -20°C until use.The schematic representation of the steps performed is outlined in Figure 1.

NGS library preparation and deep sequencing
The PCR amplicons were processed for a deep sequencing approach from an enriched template, as modified from protocol described elsewhere Aw et al. (2014).The NGS library preparation was prepared using Nextera XT DNA Library Prep Kit kit (Illumina, San Diego, CA) with the amounts of starting material per sample adjusted to 1 ng of pooled amplicons.Strict protocol was observed as recommended by the manufacturer.The Nextera transposome was used to tagment DNA, in which the DNA was fragmented and then tagged with adapter sequences in a single step.Tagmented DNA was then subjected to an amplification step using a limitedcycle PCR program and Nextera XT index kit (24 indexes).The PCR reactions added the Index 1 (i7), Index 2 (i5), and full adapter sequences to the tagmented DNA from the previous step.The library DNA was purified and short library fragments were removed using Agencourt AMPure XP beads (Beckman Coulter, Indianapolis, IN), utilizing the DNA-binding and magnetic properties of the beads.
Purified library was then measured for its concentration using Qubit high sensitivity dsDNA kit (Thermo-Fisher Scientific) and further quantified and validated using KAPA Library Quantification kit (KAPA Biosystem-Roche), according to protocol described by the manufacturer.The normalized library yield after amplification was expected to be 10-15 nM, or higher.The library from each sample was then pooled together, further diluted and loaded into the 500-cycle MiSeq Reagent Nano Kit v2 cartridge and run on Illumina MiSeq instrument.

Sequencing reads processing and bioinformatic analysis
The sequence reads were retrieved as Fastq files.Individual sequence reads file quality was checked using FastQC software (https://www.bioinformatics-.babraham.ac.uk/projects/fastqc) and multiple quality control check for the generation of aggregated quality control reports were done using MultiQC software (Ewels et al. 2016), both available in the Galaxy public server at https://usegalaxy.org (Afgan et al. 2016).The reads paired-end processing and mapping to reference sequence were done using Geneious v.11.1.5software and default settings (Kearse et al. 2012), allowing robust bioinformatics analyses in a single software format and graphical interface.The paired-end reads were joined and assembled using the map to reference algorithm according to each DENV-serotype reference sequence (Table 2).The Geneious mapper with highest sensitivity was used to assemble and map reads to the reference sequence with GenBank accession number NC_001477 (GCF_000862125.1),NC_001474 (GCF_000871845.1),NC_001475 (GCF_000866625.1),and NC_002640 (GCF_000865065.1)for DENV-1, -2, -3, and -4, respectively.Sequence coverage and depth were visualized as an interactive desktop display in the assembly report.

DENV genetic distance and phylogenetic analyses
Contigs were generated and manual inspections were performed to ensure the correctness of final genome sequences.Sequence contigs were then queried to online BLASTn platform available at https://blast.ncbi.nlm-.nih.gov(Zhang et al. 2000) for sequence identification and similarity analysis to all DENV genomes in public database using nucleotide collection database (nt).A total of 100 taxa hits was selected as the basis of phylogenetic tree reconstruction based on genetic distance and Neigh-

NGS run of enriched template using Illumina MiSeq
The amplification approach yielded five overlapping DNA fragments covering the whole genome of DENV (Figure 1, shown as an example for DENV-2, please refer to Table 2 for fragment sizes of other serotypes).Using the starting amount of 1 ng per sample pool, the enriched template was loaded into the MiSeq NGS instrumentation.The generated sequence reads were sorted by their respective library adapters and compiled into the respective Fastq files.Quality controls on reads as aggregated using MultiQC showed optimum quality for all four DENV isolates, as shown by Phred quality value score of higher than 30 across each base position (green lines in green zone of the histogram) (Figure 2a).The mapped reads to DENV reference sequences as computed using Geneious mapper also showed a very high genome coverage (reaching 100%) and optimum high depth for each isolate (   and Table 1).Lower depth was observed for 5' and 3'ends of the genome.As a comparative analysis, we also performed reads alignment using BWA-MEM alignment and SPAdes assembler within the Linux-based environment.The results from Geneious mapper were comparable to the results generated in Linux (data not shown).

DENV genetic distance and phylogenetic analyses
Using the generated nearly-complete whole-genome sequences, BLASTn queries were employed to determine DENV sequence identity and similarity with other DENV complete genome sequences available in GenBank, according to its serotype.One hundred hits were downloaded as sequences with the highest sequence identity and maximum score with lowest expect (E) value in BLASTn.
The Neighbor Joining phylogenetic analysis using these 100 reference sequences revealed the close-relatedness of Indonesia isolates with other strains from Indonesia and from the surrounding regions.The DENV-1 Indonesia isolate was closely related to sequences from Singapore (Figure 3 ).Analysis of generated whole-genome sequences of DENV-2 Indonesia isolate revealed the close genetic distance with strains from China and Singapore (Figure 4).The close genetic distance was observed between Indonesia DENV-3 isolate and strains from Jakarta and Makassar, Indonesia; Singapore; and Cairns, Australia.The Indonesia DENV-4 isolate shared close genetic relationship with strains from China and The Philippines (Figure 5).

Discussion
Indonesia is hyperendemic for dengue in which all four DENV serotypes circulate.Although a common disease in Indonesia, unfortunately until now there has not been much DENV genome data available from Indonesia.This is one of the challenges that are faced by scientists in Indonesia and around the world in obtaining DENV genetic information to be used for both basic and applied research.
A large gap on the genome data between Indonesia and neighboring countries needs to be filled.Complete understanding of the evolution and epidemiology of DENV, particularly with respect to the etiology of severe disease, will require DENV genomic studies and the comparative analysis of complete genome sequences (Holmes and Twiddy 2003).DENV genome data can be linked with epidemiological, clinical, and immunological data.The data can also be used to track the transmission of virulent DENV and eventually predict possible epidemics.Furthermore, DENV genomic data may also benefit the antiviral drug development, protein/antigen immunogenicity  studies, dengue vaccine development and vaccine introduction.In the vaccine field, vaccine efficacy could depends on the similarity of the dengue virus genotypes used in vaccine production and those circulating in the area of vaccine introduction (Usme-Ciro et al. 2014).In the other hand, dengue diagnostics sensitivity was shown to be influenced by the geographical regions (Guzman et al. 2010;Aryati et al. 2013).This has been linked with the distinct distribution of DENV in different regions, one of the characteristics of DENV genotypes.Thus, information on the DENV genetic diversity and identity in a particular region will be important to dengue diagnostics development and application.
Because of its importance, there is now a concerted effort to obtain the complete genome sequences of DENV isolates from all over the world, an effort directed by the Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University (Holmes 2009).In Indonesia, efforts to obtain DENV genome data have been initiated.Using the conventional Sanger capillary sequencing technique, we sequenced the whole genomes of Makassar DENV to understand their genetic diversity.Using genomic data, we determined the DENV genotypes and revealed the Makassar DENV genotype fitness which may underlie the better virus transmission in the population (Sasmono et al. 2015).Using the same approach, the comparative analysis of Surabaya DENV-1 genomes revealed the virus evolutionary selection patterns in which the majority of the DENV-1 codons were under strong purifying selection, while seven codon sites were under positive selection (Yohan et al. 2018).
Among the available methods for NGS, the Nextera XT library preparation kit was designed to sequence small genomes including PCR amplicons greater than 300 bp, plasmids, microbial genomes, concatenated amplicons, and double-stranded cDNA, and thus suitable for DENV genome sequencing.The other advantage of this kit is the low DNA quantity requirement, as low as 1 ng for each sample.In regards to quality control of results, we observed that Phred scores of the four isolates' sequencing reads per base pair position were above 30, shown by the green zone in the Figure 2a.The coverage and depth of the reads were also optimal to ensure a reliable contigs generation.However, the use of transposons enzymatic cleavage of double-stranded DNA may reduce the effectiveness of the 5' and 3' end region sequencing of the genome, as shown by a relatively low sequence depth on those regions (Figure 2b).To generate complete genome sequences, additional conventional capillary sequencing may solve the problem of low reads in those genome parts.Nevertheless, the NGS method has proven to be more efficient in time and laboratory cost compared to Sanger sequencing in term of the multiplication effect where in one MiSeq run, 24 different isolates can be sequenced simultaneously with optimum depth.The NGS run using the enriched amplicon template and NexTera library preparation kit was proven to yield good result as shown by good quality control value and up to 100% of nearly complete genome coverage.
The nearly complete genomes of Indonesia DENV were submitted to GenBank and made accesible to public.Using fast online algorithm at the BLASTn web page, the sequence identity was confirmed to be DENV according to its own serotype.One of the benefit of the genetic distance tree analysis was to observe the genetic close-relatedness of our sequence compared to all available genomes in public database.From this point, we can proceed to the more in-depth bioinformatic analyses, such as selection pressure and virus evolutionary analyses (Yohan et al. 2018).The Indonesian DENV isolates were classified into the Genotype I, Cosmopolitan Genotype, Genotype I, and Genotype II for DENV-1, -2, -3, and -4, respectively, as observed from the genotyping approach using Envelope gene (Haryanto et al. 2016).Using the nearly complete genome, the DENV genomes sequenced in this study showed close-relatedness with strains from local Indonesia, neighboring countries, and countries in the region.The DENV-1 isolate showed close-relatedness to sequence from Singapore isolated in 2016 (Koo et al. 2018).The DENV-2 isolate was closely related to strains from Singapore and China and furthermore located in the basal position of a clade consisted of strains from Singapore and Malaysia (isolated in Johor in 2014, unpublished).The DENV-3 isolate also showed close genetic relationship with strains from cities in Indonesia, Singapore, and Australia.Meanwhile, the DENV-4 isolate showed close relatedness with strains from China and Philippines.Together, the findings confirms our previous data of complete genomes from Makassar (Sasmono et al. 2015) where Indonesian isolates showed close genetic relationships with strains from other cities in Indonesia, Singapore, and other regions, such as China and Australia.Moreover, results from this study may confirm our previous findings that highlighted the endemicity of Indonesian DENV isolates with the possibility of becoming the potential source of DENV importation in other countries regionally (Lestari et al. 2017;Megawati et al. 2017;Masyeni et al. 2018).These findings may support the use of rapid analysis using online tools in preliminary analysis of DENV whole genome sequences.
One of the bottlenecks of NGS often encountered by researcher is the downstream processing of the generated sequence reads which requires many computation and knowledge in bioinformatics.Here, we present a simplified method to analyze NGS output using Windows-based or online tools/softwares offering relatively user-friendly graphical interface.With this approach, scientists can perform basic and simple analyses of NGS output before continuing to the more in-depth bioinformatic processing.

Conclusions
The NGS method using enriched template approach is proven to be powerful and robust to generate DENV whole-genome sequences from Indonesian isolates and aided by the use of simplified bioinformatic analysis methods.The methods presented in this study will be useful to generate DENV genomes and build the genomic database needed to study and combat this virus and contribute to the better understanding of the dengue disease in Indonesia.

FIGURE 1
FIGURE 1 Schematic figure of steps performed in DENV wholegenome sequencing method using an enriched template and nextgeneration sequencing (NGS) platform.
Figure 2b, shown only for DENV-1 as figure representative

FIGURE 2
FIGURE 2 Snapshots of quality control parameters assessed on the generated sequence reads for 4 DENV serotypes' representatives.(a) Sequence quality histogram depicted as the mean quality value across each base position in the read.(b) Sequence reads coverage and mean depth view snapshot generated using Geneious map-toreference algorithm.

FIGURE 3
FIGURE 3Rapid DENV-1 phylogenetic tree based on genetic distance as inferred using Neighbor joining (NJ) algorithm and implemented in BLASTn phylogenetic tree online module.The Indonesian DENV-1 isolate is written in red font and classified into the Genotype I of DENV-1.

FIGURE 4
FIGURE 4Rapid DENV-2 phylogenetic tree based on genetic distance as inferred using Neighbor joining (NJ) algorithm and implemented in BLASTn phylogenetic tree online module.The Indonesia DENV-2 isolate is written in red font and classified into the Cosmopolitan Genotype of DENV-2.

FIGURE 5
FIGURE 5Rapid DENV-3 phylogenetic tree based on genetic distance as inferred using Neighbor joining (NJ) algorithm and implemented in BLASTn phylogenetic tree online module.The Indonesian DENV-3 isolate is written in red font and classified into the Genotype I of DENV-3.

FIGURE 6
FIGURE 6Rapid DENV-4 phylogenetic tree based on genetic distance as inferred using Neighbor joining (NJ) algorithm and implemented in BLASTn phylogenetic tree online module.The Indonesian DENV-4 isolate is written in red font and classified into the Genotype II of DENV-4.

TABLE 1
Representative DENV isolates used in NGS method development.
*Additional flanking sequences added to primer sequences..