Isolation and Molecular Marker Detection of Badh2 Gene from Aromatic Rice Germplasm Resources in Southern Henan

The production of aroma in aromatic rice is due to the increase of 2-acetyl-1-pyrroline (2-AP) precursor substances caused by the functional deletion of Badh2 gene on chromosome 8, and the accumulation of 2AP makes rice produce aroma. In this study, Badh2 gene was isolated and cloned from 18 representative aromatic rice cultivars in Southern Henan, and the bioinformatics analysis of Badh2 gene was carried out. Meanwhile, seven functional molecular markers developed by Badh2 gene were used to detect and analyze Badh2 gene in 18 aromatic rice varieties from Southern Henan. The results showed that the coding region of Badh2 gene was 1509 bp in length. It contained 15 exons and 14 introns, and encoded 503 amino acids. There are many types of variation of the Badh2 gene in the 18 aromatic rice varieties. According to the variation of Badh2 gene, the tested aromatic rice varieties could be divided into three groups, among which Xinxianggeng 1, Xiangnuo 25, Heixiangdao 193 and Xiangbao 2 were concentrated in group Ⅰ , while the other 14 kinds of aromatic rice were concentrated in group II. Seven functional molecular markers of Badh2 gene were used to detect different varieties mutation types in exon 2, exon 4~5, exon 7 and exon 13 of Badh2 gene. No aromatic rice varieties with different mutation types were found in promoter region, exon 12 and exon 14 of Badh2 gene. Therefore, our results provide important information for understanding the genetic basis of fragrant genes in aromatic rice germplasm resources in Southern Henan and breeding new varieties of high-quality aromatic rice using molecular marker-assisted selection.


Introduction
Rice (Oryza sativa L.) is the staple food for more than 3 billion people in the world and provides about 25% of their energy [1,2] which is one of the most important food crops. For most of the population in Southeast Asia, rice provides more than 35% of its energy [3,4]. The world population is expected to grow at a rate of 25% over the next 30 years and reach 10 billion [5]. With the improvement of people's living standard, the demand for high-quality rice is increasing [6]. Rice quality traits are complex, including nutritional quality, appearance quality, cooking and eating quality, and consumers tend to pay more attention to cooking and eating quality of rice [7][8][9]. As one of the cultivated rice types, aromatic rice is favored by consumers at home and abroad because of its unique aroma [10]. Over the past decade, the market share of aromatic rice has gradually increased, and the price of aromatic rice has been higher [11,12]. Therefore, it has important economic value and broad application prospects to conduct in-depth research on aromatic rice germplasm resources, cultivate new varieties with high quality and yield, and apply them in production practice.
More than 200 volatile substances have been isolated and identified in rice [10,13]. 2-Acetyl-1-pyrroline (2-AP) is considered as one of the main volatile substances in aromatic rice [13,14], and 2-AP can be detected at low concentrations. Previous studies have found that the aroma of gene is located on chromosome 8 of rice, and the aroma of rice is caused by the mutation of exons 2 and 7 of Badh2 gene. In exon 7 of betaine aldehyde dehydrogenase 2 gene, there are eight deletions and three polymorphisms of deoxynucleotide mutations, which result in the inability of the protein enzymes transcribed and translated from betaine aldehyde dehydrogenase 2 gene to perform their normal functions and make common rice emit aroma that it should not have [11,[15][16][17][18]. There are different scholars who argue that the loss of 7 bases at the second exon of Badh2 gene may be the main cause of rice fragrance [19][20][21][22]. In different aromatic rice varieties, in addition to one 7 bp deletion in exon 2, there may also be an 803 bp deletion in exons 4 and 5 of Badh2 gene [18,23,24], and a mutation site exists in exons 1, 10, 13 and 14 [23][24][25][26]. Further studies revealed that there were insertion, deletion or single nucleotide mutation sites in exon 1 and intron 1 of Badh2 gene, promoter region and its 5'-UTR region [18,27,28]. Aroma differences in rice are largely determined by allelic variations in the Badh2 gene. Aroma-related rice varieties often include 8 bp deletion in exon 7 and 3 single nucleotide polymorphisms (SNPs), or 7 bp deletion in exon 2 [18,23]. Therefore, when the Badh2 gene mutates in the coding region or regulatory region, it can produce non-biologically active betaine dehydrogenase, resulting in the aroma of rice, which provides an important theoretical basis for the cultivation of new aromatic rice varieties.
The genetic basis of rice fragrant genes is complex. There is less than one pair of genes controlling rice aroma, or more than 2~4 pairs or more. This may be due to: (1) The genetic basis of different aroma types (such as popcorn, jasmine, violet and pecan) is different; (2) There are interactions between fragrant genes and various environmental factors (illumination, temperature, soil fertility, etc.); (3) The diversity of aroma components in rice, and the methods and techniques for identifying different types of aroma are not perfect at present [29,30]. There are abundant aromatic rice germplasm resources in the South of Henan, among which the aroma is different. It is speculated that there may be some variation in the fragrant Badh2 gene in aromatic rice. The Badh2 gene was isolated and cloned from 18 representative fragrant rice varieties in Southern Henan. Sequence analysis of the Badh2 gene was carried out. Seven functional molecular markers of the Badh2 gene were used for detection and analysis. Therefore, the results of this study will further reveal the genetic basis of fragrant genes in Southern Henan aromatic rice and improve important information for the cultivation of new varieties of high-quality aromatic rice.

Materials for Testing
All the materials in this study are japonica rice varieties. Among them, 18 aromatic rice resources from Southern Henan are widely representative, 14 conventional aromatic rice and 4 waxy aromatic rice (Table 1). All aromatic rice varieties were sown in the same experimental field of Xinyang Academy of Agricultural Sciences in 2016. Each variety was planted in 2 rows, 12 plants per row, and the row spacing was 16.5 × 26.4 cm. Routine cultivation and management of common field was carried out from sowing to seed maturity.

Isolation and Cloning of Badh2 Gene
Genome-wide DNA extraction from all aromatic rice leaves by CTAB method [4,31]. PCR primers such as Table 2. The PCR reaction system includes DNA template 2 μl, 10 × Buffer 2.5 μl, MgCl 2 1.5 μl of 25 mmol·L -1 , dNTP 2.0 μl of 2.5 mmol·L -1 , forward and reverse primers (12.5 mmol·L -1 ) of 1.0μl, Taq enzyme of 5 U·μl -1 of 0.4 μl, supplemented with sterile water to 20 μl. The reagents used were purchased from Shanghai Biotechnology Co., Ltd. The conditions of PCR reaction were as follows: pre-denaturation at 94℃ for 5 min, denaturation at 94℃ for 30 s, annealing at 55~61℃ for 30~90 s, extension at 72℃ for 10 min after 32 cycles, and preservation at 4℃. After the target fragment was recovered by Axygen gel recovery kit, the target fragment was connected to the carrier of pMD19-T (purchased from Takara Company). Systems: Solution I 5 µL, pMD19-T 1 µL, gel recovery product X µL, DDW 4-X pMD19-T 1 µL, gel recovery product 4 μL. Conditions: 16 ℃, overnight. The above products were transformed into DH5α (purchased from Solarbio Company) and grown overnight at 37 ℃. A part of the monoclones grown on the plate was selected to grow in liquid LB medium containing ampicillin. After cultured in shaking bed at 37 ℃ and 180 rpm for 5~6 h, the positive bacteria were detected by PCR, and then sent to Anhui General Company for sequencing. Finally, the sequence was spliced completely.

Bioinformatics Analysis Software and Related Web Sites
The nucleotides and structures of Badh2 gene were analyzed, the amino acid sequences encoded by Badh2 were speculated, and the properties and functional domains of the proteins were analyzed and predicted by bioinformatics. The relevant tool software and web site for Badh2 gene sequence analysis are shown in Table 3. Table-3. The main tools used in bioinformatics analysis of the structure and function of Badh2 gene in rice

Detection of Gene Functional Markers
Badh2 gene has many allele forms in natural population, and different alleles have different aroma. Seven functional markers of Badh2 gene have been developed for insertion, deletion or single nucleotide mutation sites in exons 2, 4, 5, 7, 12, 13 and 14 of 5'-UTR region of Badh2 gene [11,28,32]. Among them, functional marker FME14 is a sequence marker of digestion amplification polymorphism (The cleaved amplified polymorphic sequence marker, CAPS), and the rest are common PCR molecular markers. Detailed information about the amplified products, primer sequence and annealing temperature of each Badh2 gene functional marker is shown in Table 4. All the primers were synthesized by Nanjing Kingsley Biotechnology Co., Ltd. and the preliminary tests showed that the seven functional markers of Badh2 gene were stable in the reaction system and the results were clear and reliable.

Isolation and Cloning of Badh2 Gene from Aromatic Rice in South Henan
Using the leaf DNA of aromatic rice in Southern Henan as template, the expected DNA fragment could be amplified by using the primers in Table 2 to amplify the expected DNA fragment (Fig. 1). The target fragments were recovered by gel recovery kit and linked to cloning vector pMD19-T. The fragments were transformed into DH5α and grew overnight at 37℃. After monoclonal detection, the positive bacterial liquid was sent to Anhui General Company for sequencing. Finally, the sequence of Badh2 gene of all 18 varieties of aromatic rice germplasm resources in Southern Henan was obtained. Badh2 gene contains 15 exons and 14 introns (Fig. 2), which encodes about 503 amino acids. There are many types of variation in Badh2 gene among 18 aromatic rice varieties in Southern Henan, which may be the reason for the difference of aroma among different aromatic rice varieties in Southern Henan.  Table 1.

Amino Acid Sequence Coded by Badh2 Gene and Analysis of its Physicochemical Properties
The Badh2 allele in Xiangfeng 916 aromatic rice widely used in production was analyzed by bioinformatics. The full length of Badh2 allele in Xiangfeng 916 aromatic rice was 7111 bp, encoding 503 amino acids. The isoelectric point of the protein was 5.36, the size of the protein was 54.68 KDa, and its instability coefficient was 72.32. The amino acid sequence analysis of Badh2 gene showed that the content of alanine in Badh2 protein was the highest, 11.1%. Secondly, the contents of glycine and glutamate were 8.5%. The content of methionine and histidine was 1.6% and 0.8% respectively. Among the Badh2 protein sequences, 66 negative-charged amino acids (such as aspartic acid, Asp) and glutamic acid accounted for 13.1% of the total amino acid residues, while 57 positivecharged amino acids (such as arginine, Arg) and lysine (Lys) accounted for 11.3% of the total amino acid residues. The fat index and instability index of Badh2 protein were 86.92 and 35.87, which indicated that Badh2 protein was relatively stable.
The hydrophobicity of rice Badh2 protein was predicted by Prot Scale online bioinformatics analysis tool. The results showed that the lower the score, the stronger the hydrophilicity, and the higher the score, the stronger the hydrophobicity. At the same time, the results showed that the most hydrophilic amino acid of Badh2 protein was arginine (Arg), with a score of -2.722, and the most hydrophobic amino acid was glycine (Gly) with a score of 2.067. Overall, the hydrophilic amino acids of Badh2 protein were significantly more than those of hydrophobic amino acids (Fig. 3). Therefore, Badh2 protein of aromatic rice is a relatively hydrophilic protein. Fig-3. The testing of hydrophilicity and hydrophobicity of Badh2 amino acids

Prediction of Transmembrane Structure and Signal Peptide Analysis of Badh2 Gene Encoding Protein
The existence of transmembrane domain of Badh2 protein in aromatic rice (www.cbs.dtu.dk/services/TMHMM) was analyzed by TMHMM. The results showed that there was no transmembrane domain of Badh2 protein (Fig. 4).
SignalP 4.0 online analysis software was used to predict the protein encoded by Badh2 gene (http://www.cbs.dtu.dk/services/SignalP). As shown in Fig. 5, the 35th alanine residue of Badh2 protein may be the original splicing site of signal peptide, but the highest predictive score and the highest signal peptide score are 0.110 and 0.153, respectively. Both predictive values are low. Comprehensive analysis results show that Badh2 protein is likely to have no signal peptide sequence.

Structural Analysis of Badh2 Protein in Aromatic Rice
The secondary structure of Badh2 protein in rice was analyzed by online protein analysis website (https://npsaprabi.ibcp.fr/cgi-bin). The results showed that there were 190 amino acids in Badh2 protein, accounting for 37.77% of the total sequence; 74 amino acid stretching fragments, accounting for 14.71% of the total sequence; 239 amino acid compositions were irregularly curled, accounting for 47.51% of the total sequence (Fig. 6). Using SWISS-MODEL software on-line (http:// swissmodel.expasy.org/repository), the tertiary structure of Badh2 protein was predicted and homologous modeling was carried out. The results showed that the random curl and α-helix structure occupied most of the (Fig 7), which was consistent with the secondary structure of Badh2 protein.

Cluster Analysis of Badh2 Gene in Southern Henan Aromatic Rice
According to the DNA sequence of Badh2 gene of all the 18 aromatic rice germplasm resources in Southern Henan, the DNA sequence of Badh2 gene of aromatic rice in Southern Henan was aligned by MEGA7 software, and the cluster map of Badh2 gene of 18 aromatic rice germplasm resources in Southern Henan was drawn by MEGA7 software. The results showed as shown Fig 8: Xinxianggeng 1, Xiangnuo 25, Heixiangdao 193 and Xiangbao 2 were clustered in group I, while the remaining 14 kinds of aromatic rice were clustered in group II. In group II, Nongxianggeng, Huaibin Xiangxiang rice, Xiangnuo 1862, Hei Xiangnuo 1926 and Xiangxiang 1 gathered in group A, ZhengXianggeng, Xinxiangnuo 933, Exiang 2, Xianggeng 805, Changxianggeng 101, Xiangfeng 916, Exiang 1 and Xiangbao 1 gathered in group B, and Nongxianggeng 4 was different from group A or group B in Badh2 gene sequence (Fig. 8).

Molecular Marker Detection and Analysis of Badh2 Gene in Aromatic Rice in South Henan
The genomic DNA of 18 aromatic rice varieties from South Henan was used as template. According to some of the variation loci, Badh2 gene functional markers (Table 4) were used to amplify PCR and detected by non denaturing polyacrylamide gel electrophoresis. Functional molecular marker FME2 was used to detect the fragments. The results showed that seven aromatic rice varieties, including Xianggeng 805, Changxianggeng 101, Nongxianggeng, Xiangbao 1, Exiang 1, Nongxianggeng 4 and Xinxianggeng 1, could amplify 100 bp fragments, indicating that there was a 7 bp deficiency in exon 2 of Badh2 gene in all seven aromatic rice varieties. Lost fragments belong to the same variation type of aroma gene. A 107 bp fragment was amplified from the other 11 fragrant rice varieties in Southern Henan, indicating that the 11 aromatic rice varieties in Southern Henan did not belong to the mutation type of exon 2 of Badh2 gene (Table 5). Badh2 gene functional marker FME4-5 was used for detection and analysis respectively. The results showed that Xianggeng 805 and Nongxianggeng 4 could amplify a fragment with the size of 321 bp. The remaining 16 aromatic rice varieties in Southern Henan province all amplified a fragment with the size of 1123 bp. It indicates that only Xianggeng 805 and Nongxianggeng 4 of the two aromatic rice materials in Southern Henan belong to exon 4~ 5 mutation types of Badh2 gene. Using the Badh2 gene functional marker FME7, it was found that Zhengxianggeng 11, Xinxiangnuo 933, Xiang feng 916, Exiang2, Huaibinxiangxiangdao, Xiangnuo 1862, Heixiangnuo 1926 and Xiangxiang 1 could amplify two fragments with sizes of 583 and 169 bp. The results indicated that there was an 8 bp deletion fragment and 3 bp mutation in the 7th exon of Badh2 gene in these 5 aromatic rice materials, which belonged to the same mutation type of fragrant gene. Badh2 gene functional marker FME13 was used to detect and analyze the fragments. The results showed that Xiangbao 2, Xiangnuo 25 and Heixiangdao 193 could amplify 195 bp fragments, while the other 15 varieties could only amplify 192 bp fragments. Badh2 gene was detected by functional markers FMU1-2, FME4-5, FME1 2 and FME14 of 18 fragrant rice varieties in Southern Henan, respectively. No polymorphic loci were found (Table 5). Therefore, the Badh2 gene of 18 aromatic rice varieties in Southern Henan may not be mutated at 5'-UTR region, exon 12 and exon 14.

Biological Function of Badh2 Gene
Previous studies have shown that a mutation in the Badh2 gene on chromosome 8 causes the rice to produce fragrance, while the Badh2 protein encoded by Badh2 gene in non-aromatic rice has the activity of betaine dehydrogenase. The loss of betaine dehydrogenase activity in the mutant type and the accumulation of 2-AP lead to the production of rice flavor [11,22,24]. Through sequencing and analysis of rice fragrant genes, most researchers believe that the deletion of exon 2 or 7 of Badh2 gene results in the accumulation of 2-AP [24] and aroma production. But the fact that some rice varieties have high levels of 2-AP but no Badh2 alleles suggests that there may be other Non-badh2 alleles that contribute to fragrance [26]. Most researchers only focus on the differences between exons of aromatic rice and non-aromatic rice, but for eukaryotes, introns play an important role in gene expression and regulation besides exons of genes [33,34]. The results showed that the Badh2 gene of 18 representative aromatic rice varieties in Southern Henan contained 15 exons and 14 introns (Fig. 2), encoding about 503 amino acids, and there were many types of mutations in the Badh2 gene of 18 aromatic rice varieties in Southern Henan. This may be the main reason for the difference of aroma among different aromatic rice varieties in the aromatic rice germplasm resources of Southern Henan. Further analysis of Badh2 gene sequence showed that aromatic rice varieties in Southern Henan could be divided into two groups. Xinxianggeng 1, Xiangnuo 25, Heixiangdao 193 and Xiangbao 2 were clustered in group I, while the other 14 aromatic rice varieties were clustered in group II. These results provide important information for the future development and utilization of Badh2 gene in Southern Henan.

Regulation Mechanism of Badh2 Gene
So far, although it has been confirmed that the loss of Badh2 gene function can lead to the production of rice aroma substance 2-AP, how Badh2 protein regulates the biosynthesis of 2-AP remains unclear. Betaine aldehyde dehydrogenase plays an important role in the biosynthesis of 2-acetyl-1-pyrrooline. However, how betaine aldehyde dehydrogenase plays a role in the synthesis of 2-acetyl-1-pyrrooline and whether other substances are involved in the regulation are still unclear and need to be further studied. Meanwhile, gamma-aminobutyraldehyde, as the precursor of 2-acetyl-1-pyrrolidine synthesis, may also play a key regulatory role in the biosynthesis of 2-acetyl-1-pyrrolidine. The loss of Badh2 protein function in aromatic rice may lead to the accumulation of gamma-aminobutyraldehyde in the body, and then convert gamma-aminobutyraldehyde to 1-pyrrolidine, and eventually synthesize a large amount of 2-acetyl-1-pyrrolidine(2-AP [22]. In non-aromatic rice, Badh2 protein has the catalytic activity of betainal aldehyde dehydrogenase, which may convert gamma-aminobutyric acid from gamma-aminobutyric acid in rice, but inhibit the synthesis of 2-AP precursor, 1-pyrrolidine, and ultimately make it impossible to synthesize 2-AP. Therefore, the biosynthetic pathway of 2-AP and related regulatory networks and mechanisms are still unclear. In this study, the Badh2 allele of xiangfeng 916, which is widely used in production, was taken as an example to conduct bioinformatics analysis of its Badh2 allele. The results showed that Badh2 protein was a relatively hydrophilic protein with good stability, and signal peptide sequence and transmembrane structure were not found in Badh2 (Fig. 3~5). Further analyzing the senior Badh2 protein structure prediction, the result shows that in the Badh2 protein structure, random coil and alpha helix occupy a large (Fig 7). The structure of protein plays a decisive role in its function. This study conducted structural prediction analysis of Badh2 protein, providing important clues for indepth analysis of 2-AP biosynthesis pathway and related regulatory networks and mechanisms.

Application of Badh2 Gene in Molecular Breeding of Aromatic Rice
Since Badh2 gene was isolated and cloned in japonica rice, multiple variation sites have been found in the gene [11,18,24,35], and a series of molecular markers have been designed for the identification of aroma genes, the screening of different aromatic rice varieties and the cultivation of new varieties of aromatic rice. At present, there are at least 17 mutation loci in Badh2 gene, which are distributed in the 5'-UTR region of Badh2 gene, the 1st exon, the junction between the 1st exon and the 1st intron, the 2nd exon, between the 4th exon and the 5th exon, and the 7th, 8th, 10th, 12th, 13th and 14th exons [11]. For these mutation sites, multiple pairs of molecular markers have been designed at the junction of the 1st exon and the 1st intron of Badh2 gene, the 2nd exon, the 4th exon and the 5th exon, and the 7th, 12th, 13th and 14th exons to identify the mutation type of Badh2 gene [11,12,15,24,27,28,36]. On the basis of these molecular markers, a total of 7 functional markers of rice flavor genes were further developed (Table 4) and verified by using isolated populations [11,24,27,36]. Among them, functional marker FMU1-2 can not only detect the 8 bp deletion mutation of Badh2 gene in the 7th exon, but also detect whether there is an 8 bp deletion in the 5'-UTR region of Badh2 gene. Functional markers FME2-7、FME7、FME12-3、FME13 and FME14 were used to identify the variation types of aroma genes in exons 2, 7, 12, 13 and 14 of Badh2, respectively. Functional markers FME14 belongs to CAPS (the cleaved amplified polymorphic sequence) tag, so the PCR product detection, need the help of restriction enzymes Bsl.
In this study, seven functional molecular markers were used to detect the mutation types of Badh2 gene in southern Henan aromatic rice. The results showed that there were mutations in exon 2, exon 4-5, exon 7 and exon 13 of the Badh2 gene in 18 Southern Henan aromatic rice varieties. Interestingly, there may be no variation in the 5'-UTR region, the 12th exon, and the 14th exon of the Badh2 gene ( Table 5). The results of these molecular markers were in good agreement with the results of Badh2 gene cluster analysis in Southern Henan (Fig. 8). Therefore, the detection and analysis of functional molecular markers of these aroma genes provide an accurate, rapid and effective means for molecular breeding of aromatic rice.

Conclusion
According to the variation of Badh2 gene in 18 aromatic rice varieties in Southern Henan, the tested varieties can be divided into three groups. Xinxianggeng 1, Xiangnuo 25, Heixiangdao 193 and Xiangbao 2 are clustered in group I, while the remaining 14 fragrant rice varieties are clustered in group II. Seven functional molecular markers of Badh2 gene were used to detect different mutation types in exon 2, exon 4-5, exon 7 and exon 13 of Badh2 gene. No aromatic rice varieties with different mutation types were found in other locations of Badh2 gene. Therefore, the results of this study provide an important basis for using molecular marker-assisted selection to breed new highquality aromatic rice varieties.