Phylogeny of certain members of Hyrcanus group (Diptera: Culicidae) in China based on mitochondrial genome fragments

Background Species of the Anopheles hyrcanus group are widely distributed in Palearctic and Oriental regions and some of them are important malaria vectors. The cryptic species of An. hyrcanus group was almost impossible to identify based only on their morphology. The phylogenetic relationship of An. hyrcanus group was also not clear. Methods Five members of An. hyrcanus group were identified by rDNA ITS2 sequencing as An. yatsushiroensis, An. belenrae, An. kleini, An. lesteri and An. sineroides. The mitochondrial genome fragments were sequenced and annotated using the mitochondrial genome of An. sinensis as reference. Based on the four segments and Joint Data sequences of these species, and other four anopheline species downloaded from GenBank, intraspecific as well as interspecific genetic distances were calculated and the phylogenetic trees were reconstructed by the methods of neighbor joining, maximum parsimony, minimum evolution and maximum likelihood. Findings Four parts of mitochondrial genomes, which were partial fragments COI + tRNA + COII (F5), ATP6 + COIII(F7 + F8), ND1(F19) and lrRNA (F21), were obtained. All fragments were connected as one sequence (referred as Joint Data), which had a total length of 3393 bp. All fragment sequences were highly conservative within species, with the maximum p distance (0.026) calculated by F19 of An. belenrae. The pairwise interspecific p distance calculated by each fragment showed minor or even no difference among An. sinensis, An. kleini and An. belenrae. However, interspecific p distances calculated by the Joint Data sequence ranged from 0.004 (An. belenrae vs An. kleini) to 0.089 (An. sineroides vs An. minimus), and the p distances of the six members of An. hyrcanus group were all less than 0.029. The phylogenetic tree showed two major clades: all subgenus Anopheles species (including six members of An. hyrcanus group, An. atroparvus and An. quadrimaculatus A) and subgenus Cellia (including An. dirus and An. minimus). The An. hyrcanus group was divided into two clusters as ((An. lesteri, An. sineroides) An. yatsushiroensis) and ((An. belenrae, An. sinensis) An. kleini)). Conclusions The An. hyrcanus group in this study could be divided into two clusters, in one of which An. belenrae, An. sinensis and An. kleini were most closely related. More molecular markers would make greater contribution to phylogenetic analysis.


Multilingual abstracts
Please see Additional file 1 for translations of the abstract into the five official working languages of the United Nations.

Background
Anopheles hyrcanus group belongs to the subgenus Anopheles, genus Anopheles. It is widely distributed in Palearctic and Oriental regions, including 25 species with valid reported mosquito species [1]. There were 22 species of An. hyrcanus group distributed in China, including three unnamed ones [2]. Some cryptic species of An. hyrcanus group have similar morphological characteristics, making it difficult to identify them based only on their morphology [3]. Moreover, quite a few hybridized individuals were found in field [4,5] and reestablished phylogenetic trees of An. hyrcanus group showed disparity according to various molecular markers [3,6,7]. These facts illustrated the complex phylogenetic relationships within An. hyrcanus group.
Mitochondrial genome strictly followed maternal inheritance in structure as well as in evolution, with abundant information for genetic and phylogenetic population studies. The mitochondrial genome of Anopheles mosquitoes consisted of 13 protein-coding genes, 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes and an AT-rich control region [8][9][10]. At present, certain genes of mitochondrial genome were employed to analyze the interspecific or intraspecific differences. For instance, COI sequence was used as DNA barcoding to distinguish mosquito species [11,12], while genes such as COI [13], COII [14], ND5 [15] and control region [16] were utilized to detect the genetic population structure of Hyrcanus group members.
So far, we have reported the complete mitochondrial genome of An. sinensis in Hyrcanus group [9]. In this study, we sequenced and analyzed mitochondrial genome fragments of An. hyrcanus group members in China in order to reestablish the phylogenetic relationships and determine the taxonomic status of cryptic species.
Moreover, we discussed the contribution of mitochondrial genome fragments in phylogenetic study.

Ethics statement
No permits were required for the described field studies. Adult mosquito collection in chicken farms and livestock pens was agreed by the owners at each location.

Mosquito collection and species identification
With the consent of the owners, mini light traps (MYFS-HJY-1, Houji Shenzhen, China) were set up in chicken farms and livestock pens between 6:30 pm and 7:30 am. Then the captured Anopheles specimens were collected by the mini light traps and manually by entomological aspirators in the evening and killed by freezing before being individually transferred to laboratory in centrifuge tubes for further analysis (Table 1). In the light of the taxonomic key by Lu et al. [17], the samples were morphologically identified as members of An. hyrcanus group.
The member species of An. hyrcanus group was further identified by molecular markers with rDNA ITS2 sequences. Single mosquito genomic DNA was extracted using DNAzol (Life Technologies, USA) following the manufacturer's instructions. DNA pellet was dissolved in 80 μl H 2 O. The rDNA ITS2 fragment was amplified according to the method by Lin et al. and Ma et al. [18,19]. An ABI 3730 (Boshang Biotech Co., Ltd. Shanghai, China) was applied to purify and sequence the PCR products. Finally, the sequences were Blast aligned in Genebank on the NCBI website to determine the species [18,19].

Amplification and sequencing of mitochondrial genome fragments
Some fragments of the mitochondrial genome were amplified referring to the universal primers designed for mitochondrial genome of Diptera [20] (Table 2, Additional file 7: Figure S1). The length range of the

Sequence characters of mitochondrial genome fragments
Five mitochondrial genome fragments out of the five members in An. hyrcanus group were obtained. Fragment F5 (893 bp in length) comprised segmental COI, full-length tRNA-Leu plus segmental COII (MK690504-MK690508). Due to a partial overlapping found in Fragment F7 and F8, they were connected (denoted as F7 + F8) for further analysis. Its length was 1407 bp and included segmental ATP6 and segmental COIII (MK825734-MK825738). F19, 583 bp in length, was segmental ND1 (MK825739-MK825748). F21 was segmental 16 lrRNA (MK825744-MK825748) with 510 bp in length. As the combined sequence of all fragments, JD had an aggregate length of 3393 bp.

Intraspecific differences
The five fragments' sequences of mitochondrial genome were aligned among the five members of An. hyrcanus group in this study. The averages of the nucleotide composition as well as the numbers of conserved and variable bases are shown in Additional file 2: Table S1. All fragments' sequences were highly conservative within species, with the maximum p distance (0.026) calculated by F19 of An. belenrae (Table 3). The reported intraspecific differences of mitochondrial genomes in An. hyrcanus group were as follows: An. lesteri (COII: h = 0.000-0.005; Cyt B: h = 0.000-0.005) [14], An. sinensis  [13,16,26]. The maximum intraspecific distance of COI in 17 members of Hyrcanus group was 0.008 (range: 0.002-0.017) [7]. The results indicated varied intraspecific differences among the fragments, suggesting that prudence would be required in the analysis of interspecies relationships within An. hyrcanus group. The differences between COI sequences increased in higher taxonomic categories [27], while the COI barcoding gap was usually 2% within species [28]. High divergence of intraspecific distance was probably caused by recent geographic isolation, indicating the presence of cryptic species [27].

Interspecific differences
The four different mitochondrial genome fragments and the JD sequences of the five members of An. hyrcanus group in this study, together with the five anopheline species downloaded from the GenBank were analyzed using MEGA 7.0. The ranges of the variable and parsimony information bases were from 46 (F21) to 636 (JD) and 18 (F21) to 363 (JD), respectively, and the averages of GC content varied from 24.6 to 27.4% (Table 4). The pairwise interspecific p distance based on the four fragments (F5, F7 + F8, F19, F21) of mitochondrial genome showed that F21 sequence was completely conserved between An. kleini and An. belenrae, suggesting that F21 fragment had no interspecific resolution. Moreover, other fragments showed minor difference among An. sinensis, An. kleini and An. belenrae. Meanwhile, the p distance between subgenus Cellia and Anopheles species was greater than that among the species within the same subgenus (Additional file 3: Table S2, Additional file 4: Table S3, Additional file 5: Table S4 and Additional file 6: Table S5).
The interspecific p distances calculated by the JD sequence ranged from 0.004 (An. belenrae vs An. kleini) to 0.089 (An. sineroides vs An. minimus), and the p distances of the 6 members of An. hyrcanus group were all less than 0.029 (Table 5).

Phylogenetic analysis
The phylogenetic trees were reconstructed using four methods based on the three independent fragments (except F21). All topological structures of the phylogenetic trees based on F7 + F8 and F19 were consistent and split up into two clades. One clade consisted of An. dirus and An. hyrcanus group six species, while the  other included the rest three species (Additional file 8: Figure S2). The topology based on F5 and JD was also consistent. There were two major clades, one of which included all nine species (An. hyrcanus group six species, An. atroparvus and An. quadrimaculatus A) that belonged to subgenus Anopheles, and the other contained An. dirus and An. minimus that belonged to subgenus Cellia (Fig. 1). The 6 species of An. hyrcanus group were divided into two clusters: ((An. lesteri, An. sineroides) An. yatsushiroensis)) and ((An. belenrae, An. sinensis) An. kleini)). The optimal nucleotide substitution models for F5 and JD were GTR + G and GTR + G + I, respectively. The test result of congruence length for JD showed there was not a congruence length data (P = 0.01). The bootstrap values of ML tree were almost above 47% (Fig. 1).
In An. hyrcanus group, An. sinensis had a very close genetic relationship with An. belenrae and An. kleini. Their adults shared such similar morphology that there were not enough taxonomic characteristics to distinguish them. Anopheles belenrae and An. kleini were reported in 2005 [36], and natural hybridization between An. sinensis and An. kleini were found later [4,5], indicating possible gene introgression in sympatric population and incomplete reproductive isolation between these two species as well as ongoing speciation. It was known that rDNA ITS2 sequences were the ideal molecular marker to distinguish cryptic species. However, the length of rDNA ITS2 in different mosquito species varied greatly, which led to alignment difficultly. Therefore, it was impossible to reconstruct different mosquito species simultaneously. Partial sequences of mitochondrial genome fragments were not able to provide adequate resolution for cryptic species. However, if mitochondrial genome sequences could be sufficiently informative, such as the employment of joint data connection, the phylogenetic tree of cryptic and distant genetic species could be reconstructed simultaneously.

Conclusions
Anopheles hyrcanus group is widely distributed in Palearctic and Oriental regions and some of them are important local malaria vectors. The cryptic species of An. hyrcanus group was almost impossible to identify based only on their morphology. In this study, the phylogenetic tree was established by creating mitochondrial genome fragments sequences (JD). It had two major clades, one of which included all Subgenus  Fig. 1 The phylogenetic maximum likelihood tree based on the Joint Data of mitochondrial genome fragments. The codes are the same as those in Table 5, and numbers on the clades denote bootstrap values