Pattern analysis of schistosomiasis prevalence by exploring predictive modeling in Jiangling County, Hubei Province, P.R. China
© The Author(s). 2017
Received: 22 February 2017
Accepted: 13 April 2017
Published: 26 April 2017
The prevalence of schistosomiasis remains a key public health issue in China. Jiangling County in Hubei Province is a typical lake and marshland endemic area. The pattern analysis of schistosomiasis prevalence in Jiangling County is of significant importance for promoting schistosomiasis surveillance and control in the similar endemic areas.
The dataset was constructed based on the annual schistosomiasis surveillance as well the socio-economic data in Jiangling County covering the years from 2009 to 2013. A village clustering method modified from the K-mean algorithm was used to identify different types of endemic villages. For these identified village clusters, a matrix-based predictive model was developed by means of exploring the one-step backward temporal correlation inference algorithm aiming to estimate the predicative correlations of schistosomiasis prevalence among different years. Field sampling of faeces from domestic animals, as an indicator of potential schistosomiasis prevalence, was carried out and the results were used to validate the results of proposed models and methods.
The prevalence of schistosomiasis in Jiangling County declined year by year. The total of 198 endemic villages in Jiangling County can be divided into four clusters with reference to the 5 years’ occurrences of schistosomiasis in human, cattle and snail populations. For each identified village cluster, a predictive matrix was generated to characterize the relationships of schistosomiasis prevalence with the historic infection level as well as their associated impact factors. Furthermore, the results of sampling faeces from the front field agreed with the results of the identified clusters of endemic villages.
The results of village clusters and the predictive matrix can be regard as the basis to conduct targeted measures for schistosomiasis surveillance and control. Furthermore, the proposed models and methods can be modified to investigate the schistosomiasis prevalence in other regions as well as be used for investigating other parasitic diseases.
KeywordsSchistosomiasis Clustering Predictive modelling
Please see Additional file 1 for translation of the abstract into the five working languages of the United Nations.
Schistosomiasis causes serious harm to residents’ health and impedes economic development in endemic areas in China [1–4]. Since the implementation of National Middle- and Long-term Plan of Schistosomiasis Prevention and Control, remarkable progress has taken place with an overall downward trend of endemicity and prevalence in terms of schistosomiasis patients, infected animals and snails. As of 2015, among 12 schistosomiasis endemic provinces in China, five have reached the stage of transmission-interruption, namely, Shanghai, Zhejiang, Fujian, Guangdong, and Guangxi, while 7 other provinces, namely, Hunan, Hubei, Jiangxi, Anhui, Jiangsu, Sichuan and Yunnan have reached the stage of transmission control . Although great achievement has been made in the past several decades, the risk of schistosomiasis still exists, especially in the lake and marshland areas, due to the suitable environment for intermediate snails’ development, frequent human and livestock activities [6–10]. Therefore, it is urgent to promote advanced studies for schistosomiasis surveillance and control, especially in the lake and marshland areas with low prevalence of schistosomiasis .
Data mining methods together with computational modelling has been playing an important role in the studies of schistosomiasis and has been applied widely in guiding field practice and designing epidemiology surveys. It is particularly useful to health planners and decision makers. A linear regression model found that the prevalence of schistosomiasis showed a significant linear regression relationship with ecological environmental factors including the riparian water table, annual rainfall and yearly evaporation and altitude in the endemic areas following the Three Gorges Construction . In a further step, multivariate regression found that eliminating water contact in the month of July would reduce the prevalence of schistosomiasis in the population . However, it is difficult to assess the risk factors of schistosomiasis that are believed to be non-linear by conventional statistical methods. Artificial neural network was found to be more suitable to be applied with the logistic model to illustrate the complex and nonlinear relationship between the risk rankings in schistosomiasis prevalence. The main risk factors of human infection with Schistosoma japonicum were people aged ≤15, people with lower education, residents in villages with higher infection rates, people belonging to a poor family and in populations where infections occurred often .
Recently, spatial-temporal cluster analysis has been widely used for schistosomiasis risk surveillance and timely response and to help prioritize intervention strategies and implementation targets. However, few reports were found using the matrix model integrated with spatial-temporal cluster analysis for the schistosomiasis surveillance. In this study, the pattern of schistosomiasis prevalence in the selected areas, which were located in lake and marshland regions, was analysed using cluster analysis methods and a matrix-based prediction model. The aim was not only to further develop appropriate surveillance strategies for the source of infection of schistosomiasis and its related factors, but also to provide a feasible scientific basis for the interruption of schistosomiasis.
Study site and data collection
Village clustering analysis
where μ i is the mean of points in S i .
Temporal predictive analysis
Validation by sampling faeces from domestic animals
In order to validate the results of spatial clustering and temporal prediction, this study carried out a faeces survey programme by covering different clusters of endemic villages. Schistosomiasis miracidia in the faeces samples was tested using the nylon hatching method. The samples were observed for at least 2 min at various times of incubation; the first, third and fifth hour for bovine and sheep faeces with the fifth and eighth hour for pig faeces. Positive faeces samples were also subjected to quantitative detection with the results recorded based on the presence and number of hatched miracidia observed with interpretation done by the single-blind method. Then, the results of local infection rates based on domestic animals’ faeces will be projected into the identified village clusters to validate the results of spatial clustering and temporal prediction.
The surveyed rates of wild faeces infection in each selected sample villages
The prevalence of schistosomiasis in China has been identified as a public health concern with a higher priority. The decades’ efforts has led to remarkable progress on the control and prevention of this disease [9, 24]. However, in current stage of lower infection rate the potential risks of direction infection still exist in certain regions, especially in the marshland and/or lake regions . The source of infection is mainly livestock as the same distribution as that found in humans, including rebound human infection along increasing numbers of infected cattle. Finally, the snail host populations have increased significantly in the marshland and lakes regions in the endemic areas . It is therefore important to identify the types of risks in different regions, so as to improve the capacity in schistosomiasis surveillance and control. Hubei Province is known as a hotspot of schistosomiasis, which can be attributed to the varying geographic landscapes of the entire region and the interplays among humans, cattle and snail populations, which are important components in the schistosomiasis surveillance framework . The selected study site, Jiangling County located in the middle reaches of Yangtze River in Hubei Province, is one of the typical marshland and lake endemic areas of schistosomiasis [24, 27]. The National Schistosomiasis Surveillance Programme has been carried out in Jiangling County for several decades, and thus provides a data foundation with temporal and spatial records of infection cases that facilitates the understanding the operational situation in a low- prevalence endemic area.
In this study, two data mining methods in name of village clustering and prevalence prediction have been proposed to investigate the hidden patterns of schistosomiasis prevalence in such a marshland and lake endemic county. Existing spatial analysis methods, like the spatial autocorrelation analysis or hotspot analysis can find the geographical attributes of disease occurrence annually, but they fail with regard to determining the difference of disease occurrences in different years due to the effect of transmission . It is found that this could be achieved by means of modifying the K-mean algorithm to identify village clusters from the historical records of schistosomiasis prevalence and the geographical locations of each village. The results provide a solution that divides these villages into different categories with reference to their temporal and spatial patterns of schistosomiasis prevalence.
In general, schistosomiasis prevalence in the study area declined year by year, while there was a differentiated trend when each village cluster was investigated, i.e. the villages in Clusters I and II demonstrated relatively more severe endemicity compared with the other three clusters. These analytical results agreed well with the real-world observations in the national schistosomiasis epidemic sampling survey [29, 30]. Based on the identified village clusters, it is further found that associated impact factors in the prevalence of schistosomiasis in human, cattle and snail populations using regression methods to explore the correlations between disease prevalence and its associated impact factors. As an extension of the conventional regression method, predictive matrix was applied by taking into account the complicated interplay between human, cattle and snail. In this way, the impact factors of schistosomiasis prevalence could be interpreted by last year occurrences in the three populations after adjusting the weights of each impact factors by the socioeconomic factors, including indicators of the areas of water with and without infected snails, the number of cattle herds, and the geographical areas of each village.
The generated predictive matrix in each village cluster can be used to characterize the difference of schistosomiasis prevalence in different regions, in which effects of each impact factors were different. These results agree with and also provide a solid foundation for integrated schistosomiasis control accordingly to the specific situations in each endemic region. The reliability of the predictive matrix method was validated both by a computational approach and the real-world survey-based validations, i.e. by comparing the simulated and the real schistosomiasis prevalence, the prediction errors were within the acceptable level. Furthermore, the results of the animal faeces investigations agreed with the potential schistosomiasis risks of each village cluster found.
Due to the continuously efforts on control and prevention, schistosomiasis prevalence in Jiangling County is under a relatively low level. In this study, two research questions had been investigated: (1) how to differentiate the total of 198 endemic villages in Jiangling County with reference to their patterns of schistosomiasis prevalence temporally and spatially; (2) how to interpret and explain these identified differentiations. In order to answer these two questions, the methods of village clustering and prevalence prediction had been proposed and applied based on the collected dataset of schistosomiasis annual surveillance among the years of 2009 to 2013 in Jiangling County. The results of spatial clustering analysis in this study have shown that these endemic villages can be categorized four types of village cluster with reference to the temporal and spatial patterns of schistosomiasis prevalence. For each of the identified village cluster, the generated prediction matrix can be used to estimate next year schistosomiasis prevalence based on the current level of infection as well as their associated impact factors. The results of village clusters and the prediction matrix can be regard as the basis to conduct targeted measures for schistosomiasis surveillance and control. Furthermore, the proposed models and methods can be modified to investigate the schistosomiasis prevalence in other regions and even be used for other types of parasitic diseases.
We would like to thank Jiangling Institute of Schistosomiasis Control and Prevention for their excellent comments and suggestions.
This work was supported by the National Natural Science Foundation of China (No. 81101280), by the National Special Science and Technology Project for Major Infectious Diseases of China (Grant Nos. 2012ZX10004-220, 2016ZX10004222-004), the China UK Global Health Support Programme (GHSP-CS-OP101), the Forth Round of Three-Year Public Health Action Plan of Shanghai, China(No. 15GWZK0101, GWIV-29). High Resolution Remote Sensing Monitoring Progect (No. 10-Y30B11-9001-14/16). The open project from Key Laboratory of Parasite and Vector Biology, Ministry of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
The sharing of data about schistosomiasis prevalence needs be approved by both National Institute of Parasitic Diseases, China CDC and Jiangling Institute of Schistosomiasis Control and Prevention, we will not share the original dataset without official permission. We would like to share statistical results of this study. If anyone needs these data, please contact the corresponding author for a soft copy.
SL, SX and JX conceived and designed the framework of this paper. XNZ, XZ, HH and YZ contributed reagents, materials, and analysis tools. SZL, SX, JBX and XZ, wrote the paper. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- CHEN MG, MOTT KE. Progress in assement of morbidity due to Schistosoma Japonicum Infection. Trop Dis Bull. 1988;85:1–45.Google Scholar
- Zhou XN, Wang TP, Wang LY, Guo JG, Yu Q, Xu J, Wang RB, Chen Z, Jia TW. The current status of schistosomiasis epidemics in China. Chinese Journal of Epidemiology. 2004;25(7):555–8 (in Chinese).PubMedGoogle Scholar
- Utzinger J, Zhou XN, Chen MG, Bergquist R. Conquering schistosomiasis in China: the long march. Acta Trop. 2005;96(2-3):69–96.View ArticlePubMedGoogle Scholar
- Hao Y, Zheng H, Zhu R, Guo JG, Wu XH, Wang LY, Chen Z, Zhou XN. Schistosomiasis status in People’s Republic of China in 2008. Chinese Journal of Schistosomiasis Control. 2009;23(6):451–6 (in Chinese).Google Scholar
- Lei ZL, Zhang LJ, Xu ZM, Dang H, Xu J, Lv S, Cao CL, Li SZ, Zhou XN: Endemic status of schistosomiasis in People’s Republic of China in 2014. Zhongguo xue xi chong bing fang zhi za zhi = Chinese journal of schistosomiasis control. 2015;27(6):563–9. (in Chinese)Google Scholar
- Zhou XN, Guo JG, Wu XH, Jiang QW, Zheng J, Dang H, Wang XH, Xu J, Zhu HQ, Wu GL. Epidemiology of schistosomiasis in the People’s Republic of China, 2004. Emerg Infect Dis. 2007;13(10):1470–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhu R, Lin DD, Wu XH, Wang QZ, Lv SB, Yang GJ, Han YQ, Xiao Y, Zhang Y, Chen W. Retrospective investigation on national endemic situation of schistosomiasis. II. Analysis of changes of endemic situation in transmission-controlled counties. Zhongguo Xue Xi Chong Bing Fang Zhi Za Zhi. 2011;23(2):114–20 (in Chinese).PubMedGoogle Scholar
- Xu J, Lin DD, Wu XH, Zhu R, Wang QZ, Lv SB, Yang GJ, Han YQ, Xiao Y, Zhang Y. Retrospective investigation on national endemic situation of schistosomiasis. III. Changes of endemic situation in endemic rebounded counties after transmission of schistosomiasis under control or interruption. Chinese Journal of Schistosomiasis Control. 2011;23(4):350–7 (in Chinese).PubMedGoogle Scholar
- Li SZ, Zheng H, Abe EM, Yang K, Bergquist R, Qian YJ, Zhang LJ, Xu ZM, Xu J, Guo JG. Reduction Patterns of Acute Schistosomiasis in the People’s Republic of China. Plos Neglected Tropical Diseases. 2014;8(5):141–50.Google Scholar
- Lin DD, Lv SB, Gu XN, Ying H, Zeng JF, Zu ZF, Chen HG. Retrospective investigation on changes of endemic situation before and after reaching criteria of schistosomiasis transmission controlled or interrupted in hilly endemic areas of Jiangxi province. Chinese journal of schistosomiasis control. 2013;25(5):462–6. (in Chinese)Google Scholar
- Zhang LJ, Li SZ, Wen LY, Lin DD, Abe EM, Zhu R, Du Y, Lv S, Xu J, Webster BL. Chapter Five – The Establishment and Function of Schistosomiasis Surveillance System Towards Elimination in The People’s Republic of China. Adv Parasitol. 2016;92:117.View ArticlePubMedGoogle Scholar
- You-jie H. The characteristics of infectant wild feces in schistosomiasis epidemic area of benland[J]. Practical Parasitology. 1998;2:86.Google Scholar
- Zhang Y, Feng XG, Xiong MT, Sun JY, Song J. Investigation of wild feces pollution in schistosomiasis endemic areas in Yunnan Province. Zhongguo Xue Xi Chong Bing Fang Zhi Za Zhi. 2014;26(4):428–30. in Chinese.Google Scholar
- Zhao Jia LJ-s, Wang S-w, et al. The investigation and analysis of infectant wild feces by schistosomiasis in snail farming area in Dali[J]. Parasitoses Infect Dis. 2013;11(3):155–7.Google Scholar
- Shi L, Li W, Wu F, Zhang JF, Yang K, Zhou XN. Chapter Four – Epidemiological Features and Control Progress of Schistosomiasis in Waterway-Network Region in The People’s Republic of China. Adv Parasitol. 2016;92:97.View ArticlePubMedGoogle Scholar
- Jain AK. Data clustering: 50 years beyond K-means. Pattern Recogn Lett. 2010;31(8):651–66.View ArticleGoogle Scholar
- Steinley D. K‐means clustering: a half‐century synthesis. Br J Math Stat Psychol. 2006;59(1):1–34.View ArticlePubMedGoogle Scholar
- Lloyd S. Least squares quantization in PCM. IEEE Trans Inf Theory. 1982;28(2):129–37.View ArticleGoogle Scholar
- Diekmann O. On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations. J Math Biol. 1990;28(4):365–82.View ArticlePubMedGoogle Scholar
- Heesterbeek JA. A brief history of R0 and a recipe for its calculation. Acta Biotheor. 2002;50(3):189–204.View ArticlePubMedGoogle Scholar
- Van dDP, Watmough J. Reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission. Math Biosci. 2002;180(1-2):29–48.View ArticleGoogle Scholar
- Diekmann O, Heesterbeek JAP, Roberts MG. The construction of next-generation matrices for compartmental epidemic models. J R Soc Interface. 2009;7(47):873–85.View ArticlePubMedPubMed CentralGoogle Scholar
- Murphy KP. Machine learning: a probabilistic perspective. MIT Press; 2012Google Scholar
- Li SZ, Qian YJ, Yang K, Qiang W, Zhang HM, Liu J, Chen MH, Huang XB, Xu YL, Bergquist R. Successful outcome of an integrated strategy for the reduction of schistosomiasis transmission in an endemically complex area. Geospat Health. 2012;6(2):215–20.View ArticlePubMedGoogle Scholar
- Xu X, Yang X, Dai Y, Yu G, Chen L, Su Z. Impact of environmental change and schistosomiasis transmission in the middle reaches of the Yangtze River following the Three Gorges construction project. 1999.Google Scholar
- Lei ZL, Zhou XN. [Eradication of schistosomiasis: a new target and a new task for the National Schistosomiasis Control Porgramme in the People’s Republic of China]. Chinese Journal of Schistosomiasis Control. 2015;27(1):1–4 (in Chinese).PubMedGoogle Scholar
- Wang LD, Chen HG, Guo JG, Zeng XJ, Hong XL, Xiong JJ, Wu XH, Wang XH, Wang LY, Xia G. A strategy to control transmission of Schistosoma japonicum in China. N Engl J Med. 2009;360(2):121–8.View ArticlePubMedGoogle Scholar
- Wang JT, Chen C, Wang E, Kawazoe Y. Approaches to study disease clustering in space. Disease Surveillance. 2010;4(10):4339.Google Scholar
- Zhang X, Gao FH, Zhang HM, Zhu H, Yu Q, Li SZ, Cao CL. Spatial-time cluster analysis of distribution of schistosomiasis in Jiangling County. Chinese Journal of Schistosomiasis Control. 2014;26(4):367–9. 381. (in Chinese).Google Scholar
- Xue Jingbo XS, Zhang Xia, et al.: Pattern analysis of tempo -spatial distribution of schistosomiasis in marsh⁃land epidemic areas in stage of transmission control. Chinese journal of schistosomiasis control. 2016;28(6):1-7. (in Chinese)Google Scholar