A potential transition from a concentrated to a generalized HIV epidemic: the case of Madagascar

Background HIV expansion is controlled by a range of interrelated factors, including the natural history of HIV infection and socio-economical and structural factors. However, how they dynamically interact in particular contexts to drive a transition from concentrated HIV epidemics in vulnerable groups to generalized epidemics is poorly understood. We aim to explore these mechanisms, using Madagascar as a case-study. Methods We developed a compartmental dynamic model using available data from Madagascar, a country with a contrasting concentrated epidemic, to explore the interaction between these factors with special consideration of commercial and transactional sex as HIV-infection drivers. Results The model predicts sigmoidal-like prevalence curves with turning points within years 2020–2022, and prevalence reaching stabilization by 2033 within 9 to 24% in the studied (10 out of 11) cities, similar to high-prevalence regions in Southern Africa. The late/slow introduction of HIV and circumcision, a widespread traditional practice in Madagascar, could have slowed down HIV propagation, but, given the key interplay between risky behaviors associated to young women and acute infections prevalence, mediated by transactional sex, the protective effect of circumcision is currently insufficient to contain the expansion of the disease in Madagascar. Conclusions These results suggest that Madagascar may be experiencing a silent transition from a concentrated to a generalized HIV epidemic. This case-study model could help to understand how this HIV epidemic transition occurs. Graphical abstract Supplementary Information The online version contains supplementary material available at 10.1186/s40249-023-01164-2.


A. Model description
A. The Susceptible-Infectious-Chronic-AIDS phase (SICA) Transmission Model.We represent the temporal dynamics of disease spread by a set of ordinary differential equations, where, as in any other sexually transmitted infection, disease progression (from the infected to the acute or AIDS phase) depends on sexual contacts between males and females (2).The female-male coupled system is organized into the male and the female submodels.These separate submodels are coupled through the force of infection or transmission rate, that is, the per capita rate at which male (or female) susceptible individuals acquire the infection through sexual contacts from female (or male) infectious individuals.This transmission rate takes into account the number of sexual contacts per year of an average male (or female) individuals, βY (or βX ), the distinct probability of transmission from infectious females to males (or from infectious males to females) (3), pY X (or pXY ), and the effective population fractions of infectious females (or males), x (or y) (4).Notice also that basal female contact rate, βX , is corrected by a factor η > 1 for sexual workers to indicate the higher contact rates characterizing this activity.
The male subsystem reads: where the total effective fraction of infectious females is a weighted average of the effective infectious fractions including young, x0, and adult women, x1: where f0 is the fraction of total sexual encounters a male has with young females.Effective infectious fractions of females should account for differential transmission of the disease from females either in the highly infectious group or the chronic group: where fW if the fraction of male sexual contacts with women who are sexual workers, and N f is the total female population.
The female subsystem accounts for the progression of the disease in the four female groups, which all follow the same four-equation scheme.Equations for young females that are not sex workers are given by: The four basic equations for old/adult non-sex-working females are: S − δ X (1) S − δ X (1) C − δ X (1) A + α X (0) A − σ 1 X (1) Likewise, the four equations for young, female sex workers read: Finally, the equations for old/adult female sex workers are represented by: S − δ W (1) S − δ W (1) C − δ W (1) As you see in Eqs (SA5)-(SA8), the force of infection of females includes a factor, the total effective fraction of infectious males, y, which is given by: where Nm is the total male population.Total populations, Nm and N f , also change dynamically.They are written in terms of sums over the different disease stages and groups: where Xa = X (0) a + X (1)  a + W (0) a + W (1)   a a ∈ {S, I, C, A} [SA11] Total effective infectious fractions, x an y, are the link between female and male disease dynamics.These dynamically changing variables are not true fractions as they are not constrained between 0 and 1 because they include the factor χ > 1, which measures the increase in transmission through sexual encounters with infectious individuals in the acute phase compared to the chronic phase (they would be only proper fractions if χ were 1).In any case, these normalized variables, x (or y), times pXY (or pY X ) determine the probability rate of acquiring the infection per sexual encounter.
The full list of model paramaters for the SICA model is given in Table SA1 along the references used to establish plausible parameter ranges.

of 38 David Alonso and Xavier Vallès
Further simplification is reached if we sum up over female equations.In that case, female and male adult populations are just tracked by the simple system: The stationary state, which is also the disease-free equilibrium, is given by: The expanded demographic model.The system given by Eqs.(SA12) involves decoupled dynamics of females and males.The first Eq. in (SA12) has been already solved for equilibrium.The last four Eqs in (SA12) describe adult female dynamics and are linear ODEs.Therefore, the following stationary equilibrium can be easily calculated.We write here the proportion of females in each of the groups.We use lower-case letters to note fractions or proportions with respect to total female population at equilibrium (FX /δX ).First, the young female proportions at equilibrium are: By looking at the equations for the temporal evolution of older females, we see that their equilibrium fractions can be easily written in terms of x (0)⋆ and w (0)⋆ : 3. The full disease-transmission model.The full system given by Eqs (SA1)-(SA8) is not linear in the dynamical variables.
However, given a constant infectious effective fraction for females x , male equations are very simple to solve (see Eqs. (SA1): Notice that the stationary state calculated above depends on x.Likewise, given a constant infectious effective fraction for males y (see Eqs. (SA5)-(SA8)), the 16 different female population types have an equilibrium that depends on y.This steady state can be readily calculated by equating Eqs.(SA5)-(SA8) to zero, which gives rise to a linear system of 16 equations and 16 unknowns (see code in the paper github site for a full specification of this solution).We can thus calculate both the expected effective fraction of infectious males y for a given constant effective fraction of infectious females x, and the expected effective fraction of infectious females x for a given constant effective fraction of infectious males y.The intersection of these two curves, x and y, defines the stationary effective fractions (x ⋆ , y ⋆ ) (Fig SA1 ), which then can be used to identify the complete stationary state through Eqs.(SA17) and the analogous equations for females (12).As expected, see Fig SA2, the temporal evolution of total prevalence tends to an asymptotic stationary state (horizontal broken orange line).Under the assumption of constant recruitment of new susceptible individuals into the adult populations, sex-age structured compartmental mathematical models for sexual-transmitted infections with constant disease-induced mortality can be shown to be well-poised and feasible (13).

B. R0 through the next-generation matrix
Diekmann et al. (14) introduced a general method to calculate the basic reproduction number R0 for compartmental models of disease transmission, i.e. epidemic models defined as a system of ordinary differential equations (ODEs).The method starts by identifying the so-called infection subsystem, i.e. the system of equations affecting only infected individuals.The linearization of this subsystem around the disease-free state gives us information about the growth rate of the infected population when the disease starts with a few infected people in an otherwise healthy population.These authors showed that R0 is the leading

Fig. SA2
. Temporal evolution of model HIV total prevalence when using constant parameters.The two vertical broken black lines define the three different phases as explained in the main text.The vertical broken red line defines the turning point.Since parameters are kept constant, the system reaches an asymptotic stationary state (horizontal broken orange line).The initial condition corresponds to a population with 10 infected sexual workers in a total population of sexual workers of 1000 within a total adult population (men and women) of 100000.
eigenvalue of the so-called "next generation matrix" (NGM), which can be built from the linearized infection subsystem.Here we outline the derivation of NGM and the calculation of R0 as the leading eigenvalue of this matrix.
Using the main ODE system Eqs.(SA1)-(SA8), we can extract the equations affecting only infected individuals.We obtain two equation for males: where x is the female infectious effective fraction (see Eq (SA2)).We also obtain 8 equations for the different types of female infected individuals: where y is the male infectious effective fraction (see Eq (SA9)).
We notice now that, in the absence of disease, the full system collapses into the simple demographic model described by (Eqs SA12).The disease-free equilibrium therefore can be simply characterized by: The linearization of the infection subsystem around this disease-free state can be represented with the help of two matrices, the transmission matrix T, and the transition matrix Σ.First, we define the state vector ⃗ x as L , W (1) L

[SB7]
One can check that the system given by Eqs (SB1)-(SB5), around the disease-free state, can be readily written as: where the matrices T and Σ are defined in a proper way.If further notation for compound parameters is adopted: 8 of 38 David Alonso and Xavier Vallès and the average residence times in each of the infectious stages: T X (0) T X (1) T W (0) T W (1) then the matrices T and Σ are written as follows: David Alonso and Xavier Vallès matrices as follows: 213 where E is an auxiliary matrix built from the transmission matrix T, consisting of unit column vectors ei for all i such that ith 214 row of T is not identically zero: 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Interestingly, in our case it is not necessary to explicitly calculate the inverse of the Σ matrix because the elements − Σ −1 ij 217 have a clear epidemiological interpretation: the expected time that individuals currently in state j will spend in state i before 218 dying or transitioning into a non-infectious stage.This interpretation calculate allows a straightforward calculation of the 219 whole −Σ −1 matrix which results into Eq.(SB27).

220
We now calculate the matrix product, KL = T(−Σ −1 ): where the 18 non-null elements kij are patiently and carefully defined in order-see Eqs (SB28)-(SB43): 223 10 of 38 David Alonso and Xavier Vallès David Alonso and Xavier Vallès Finally, after performing the matrix products prescribed in Eq (SB24), it can be shown that the NGM depends only on 8 elements kij and reads: The characteristic equation thus reads: which leads to a final closed expression for R0: We use this equation to estimate the value of R0 for different cities in Madagascar in 2000, when the disease was probably beginning to slowly expand among key population groups embedded in a fully susceptible population (Table SB1 and SB2 SB2 and     SB2).Time-dependent parameters FX , FY , δX and δY were set to their values in 2000.These figures complement Fig. 3 in the main text.The estimated average female-to-male transmission probabilities (pY X ) and the sexual encounter rates (βY ) are represented by a little circle defining the initial coordinates of the arrow.The tip of the arrow represents the potential reduction in R0 caused by a 60% reduction in the transmission probability from infectious females to males as a consequence of circumcision (as reported in ( 15)).

C. Parameter estimation from demographic data
In order to add realism to our projections, we considered real Madagascar demographic data from governmental and institutional sources (US government, the World Health Organization, and Institut National de la Statistique, Antananarivo (16-18)) For each year, between 2000 and 2016, four parameters were first estimated directly from data: annual per capita mortality rates for males and females (δX and δY ), and total number of males (FY ) and females (FX ) reaching sexual maturity every year.Their values are chosen to capture the temporal evolution of urban adult populations in the most important cities of Madagascar (see also Table F1).Since we only had demographic data at the national level, no further differences between cities were considered.Age-dependent per capita average rates were assumed the same across the eleven cities under study.
In this section, we first give further details on the estimation of annual mortality rates, δX and δY , and recruitment rates, FX and FY .Once these four demographic parameters are fully estimated for every city, we use the available information of the distribution of females in the different groups to obtain parameter values for the five rates (σ 0 , σ 0 r , σ 1 , and σ 1 r , and α), which are responsible for the distribution of females into young and adult, non-sexual and sexual worker groups.The few available data that exist show that this distribution was different in every city and changed considerably from 2014 to 2017 (see Table F1).
A. Fecundity and mortality rates.
This is a non-age-structured model tracking the temporal evolution of female and male total adult populations.However, both fertility-which determines recruitment rates FX and FY -and mortality are age-specific processes.Despite this simplified demographic assumption, we can reasonably estimate non-age-specific model rates from age-specific data, and thus capture year-to-year realistic population variability and average trends.
A.1.Average mortality rates.Human mortality is clearly an age-dependent process.However, the SICA model assumes that mortality rates are constant, rather than age-dependent.Namely, it is assumed that survival curves are negative exponentials, which means that individual expected life-span can be represented by the inverse of a mortality rate 1/δ.A reasonable way to take into account year-to-year variability and a slight sex-dependency in adult mortality rates is then using the life-expectancy at the age of entrance into adult life LA.As an estimate of a time-dependent 1/δ parameter, we used compiled life-table data to extract the life expectancy for every year of the study period (2000-2016) of the age group 15-19.Specifically, we consider: where entering a fully active sexual life per unit time.In general, these rates have year-to-year variability.Let aY and aX be the starting age for active sexual life for males and females, respectively.If we knew the age distribution every year, the number of males and females passing from age aX − 1 to aX , and from aY − 1 to aY , respectively, in year t would be directly the empirical time-dependent parameters FX (t) and FY (t) that the SICA model requires.However, such a degree of detail is very difficult to gather in demographic studies.In the absence of this detailed demographic data, FY and FX at time t can instead be estimated from the total number of males and women born in year t − aY and t − aX , respectively, along with sex-and age-specific mortality rates.Fortunately, such data have been regularly compiled and made publicly available by the WHO and other institutions at least on a quinquennial basis for every country in the world.
Let BY (t) and BX (t) be the total number of males and females born in year t.As we show below, these two numbers can be in turn estimated from average per capita total fertility and total populations in a given year t.Since the two cohorts BY (t − aY ) and BX (t − aX ) will suffer from age-and sex-dependent mortality up to year t, the parameters FY (t) and FX (t) David Alonso and Xavier Vallès can be estimated by calculating the number of survivors until year t for those boys and girls born in years t − aY and t − aX , respectively.Therefore, the parameters of interest can be estimated as: where s (a) X (t) is the survival probability during year t of female individuals of age a.For instance, s (0) X (t) is survival probability of new-born females during year t; those who will on average have their one-year birthdays in year t + 1.
Eqs. (SC3) show that the estimation of the recruitment rates FY and FX requires estimating, first, survival probabilities and, second, the number of new-born individuals for a given year.Let us proceed in order.

Survival Probabilities
In typical life tables, age-dependent mortality rates are associated to finite time intervals of a given number of years n.They are defined as: where where δa is an age-specific instantaneous mortality rate assumed constant for all individuals within the same age group, which means, individuals between age a and age a + n − 1 experience the same probability of death.In addition, ⟨N n a (t)⟩ can be approximated by an average population in that age group: By introducing Eqs (SC5) and (SC6) into Eq.(SC4), the age-dependent mortality rate of age group a at year t, can be rewritten as: By inverting the last equation, instantaneous time-dependent mortality rates, δa, corresponding to each age group a and year t can be estimated from the empirical age-dependent mortality rate values, m n a (t), which are typically given in demographic studies.The inversion of Eq (SC7) can be written as: Once instantaneous death rates are estimated for each sex, year, and age group, all survival probabilities appearing in Eq (SC3) can be also estimated.Notice that a survival probability for a one-year time interval is defined as: which, using Eq.(SC5), leads to: where the year-to-year dependency of δa(t) is here explicitly written (see Eq. (SC8)).

Number of Newborns at Year t
All that is now left to estimate are the quantities BY (t) and BX (t), the number of male and female new-borns in the population at year t.Again detailed demographic population monitoring would ideally register these two quantities every year.In the absence of this direct information, we can estimate these values from total fertility rates, total female population, and sex ratio at birth: where f is the ratio of males to females (sex ratio at birth), Φ(t) is an average per capita fertility rate (per year), i.e. the number of children an average female has per unit time, and X(t) is the total number of females in the population at time t.
For instance, Madagascar sex ratio at birth is about 1.02, while adult sex ratio is about 1.0.The average fertility rate Φ(t) can be estimated from the total fertility rate b(t), i.e. the average number of children born to an average female during her whole reproductive life if she follows the birthing pattern being experienced by the overall population at year t.
Assuming that female reproductive age spans from age a (0) X up to a (1) X , both rates can be estimated from total annual births stratified by age and female population age distribution according to the following definitions: where X − a (0) X ).However, the relation between these two rates is not quite evident.Since total fertility b(t) is the one usually provided by typical demographic surveys, we estimate the annual per capita fertility rate Φ(t) from the per capita total fertility b(t) in the following way: where r(t) is a time dependent parameter that could be estimated from data by using the previously given definitions in Eqs (SC12) and (SC13) to empirically fulfill Eq (SC14).For our purposes, we have made the simplifying assumption that the parameter r(t) is roughly constant through time.
In summary, the SICA model parameters, FX and FY , have been estimated from demographic table data by the following equations: where N (t) is the adult population at year t, f is the sex ratio at birth, and r is a constant free parameter.Since we know the sex ratio is balanced in Madagascar, for our estimates we calculated female population X(t), by dividing the total adult population N (t) by two.
In theory, according to Eqs. (SC15)-(SC15), r is a correcting factor to account for the effective female population contributing to local births every year.In practice, r should be regarded as a free model parameter used to scale the population temporal evolution to a realistic average level, and approximate the population trajectory for every city in an optimal fashion.This is achieved through a simple fitting procedure.Heuristically, the r parameter could potentially account for some extra population growth due to immigration from rural areas.
In sum, recruitment rates FX and FY should be interpreted as two functions of time t, accounting for the reproductive output of the population at some previous year t − aX , weighted by the total survival probability until present time -see Eqs (SC15)-(SC15).This introduces a time delay in our dynamical equations.WE have shown that we can get around the complexity of delayed ODEs, and make the simple ODE system given in Eqs.(SC1) best match observed population trajectories.
In order to do that, we have approximated female reproductive population by a fraction of total observed population at any given year.In theory, since sex ratio is balanced, female population is simply 0.5 times total adult population.We would expect this r fraction to be related to the relative importance of non-local births to the local recruitment.More precisely, total female recruitment is a compound quantity including both locally born F (0) X , and immigrated individuals F (i) X per year: Although our fitting method is not devised to estimate these two quantities independently, r values encode information about a non-negligible effect of immigrated women on the local growth of the cities.Our empirical estimates of total FX and FY (with r > 0.5) suggest this effect.In Fig SC1 , we present the output of the demographic model and total population data from Antananarivo for comparison.In this case, we found r = 0.615.http://www.worldometers.info/world-population/madagascar-population/.These data were used for model parameter estimation.

of 38
David Alonso and Xavier Vallès B. The female population distribution.Unfortunately, time series data on the actual female distribution across the four groups considered in every city were not available.We only had the sex worker population (per city) in 2014 and 2017 (see Table F1).
Since the relative distribution across groups is determined by the relation of in-rates to out-rates, both very high and very low rate values (or anything in between) can be adjusted to yield the same fraction of females in each category at stationarity.We therefore had to make an extra, but reasonable, assumption: we assumed that these rates should be rather slow (see adopted range values in Table SA1), which means that sexual habits tend to change at the scale of years rather than weeks or months.
For instance, if a young female decides to increase her sexual contact rate for transactional sex reasons, she will remain in these highly active sexual phase for at least a couple of years.As we explained in the main text, throughout this work we considered the sex-working stage sensu lato, although sex working population surveys are probably not able to uncover transactional sex, which means that our predictions might be quite conservative.Females in this sexually highly active stage are characterized by an up to 10-fold relative increase in their sexual contact rates (see parameter η in Table SA1 and Tables SB2 and SB1).In practice, these higher contacts mainly result from casual transactional sex rather than from professional sex work.
Average ages at which males and females are considered fully sexually active were chosen to be at 15 and 17 years of age, respectively.The aging rate α is not a typical demographic parameter, but rather defines the average time females are considered young by males in terms of assortative sexual contacts or sexual preference.In our projections and parameter searches, this parameter was considered to be between 8 and 12 year −1 (1/α ≈ 0.1, see Table SA1 and Tables SB2 and SB1).
That is, females are young as long as they are about between 15 and 23 to 27 (average 25) years old.There is an undetermined variance around all these parameter values, with no exception, which reflects individual heterogeneity.Although probably important, the consideration of this extra complexity is beyond the scope of the modeling approach presented here.
Parameter estimation relies on the definition of a likelihood function addressing the question: what is the probability of observing a given time series of certain population and epidemiological variables if one assumes that disease transmission dynamics is generated by our model structure, as defined by an ODE system, with a given set of parameter values, this is, a given parametric configuration?Further details on parameter estimation strategies are given in the following section.

D. Model Validation
This process was done in three different phases: the assessment and validation of the simple demographic model (Eqs SA13), the expanded demographic model (Eqs SA12), and finally, the disease transmission model (Eqs (SA1)-( SA8)).We define first a generic likelihood function, and then the three particular likelihood functions employed, and report further details on these three steps in order.
A. The likelihood function.Let X (o) be a matrix of observed data for t = t0 to t = tn.The matrix is built by row vectors, each of them corresponding to a time series of observed variables, from ⃗ X N : Observed variables can be related to model variables via output variables, i.e. variables that are a given mathematical function of system state variables.For instance, total prevalence within the SW population is one of these output variables.
For each observed variable we will thus have a predicted output variable from model numerical integration.In this situation, we can ask: "What is the probability of observing this data matrix under the assumption that our model and its specifying parameters are true?".This is: This likelihood function relies on the error functions, or probability of obtaining certain deviations from predicted values,

Ei(tj) ≡ X (o)
i (tj) − Xi(tj) given our model structure and parameters values: where θ should be regarded as a full model parameter set (a model parametric configuration), θr are parameters controlling the error function, e.g.variances, and ϵ is an additional parameter to accurately calculate probabilities from their corresponding continuous density functions:

David Alonso and Xavier Vallès
Although we have tried several error choices (results not shown), throughout this work, our results rely on Gaussian errors with varying variances, this is: where σ is the error standard deviation which may depend on tj.Parameter distributions were obtained by minimizing the negative loglikelihood (see Eq (SD1)), and filtering those that were only 2 points apart from the optimal one in their negative loglikelihood values.This is a typical specification on likelihood ratio tests.Parametric configurations producing loglikelihoods more than 2 units farther apart from the optimal one make data significantly less likely (0.01 times less likely) than the likelihood of the optimal one.
B. The simple demographic model.We can summarize FX and FY in Eqs (SC15)-(SC15), mathematically, as: where r is a free parameters and D is the full set of demographic data, stored as yearly demographic tables, namely sex ratio f , ages a (0) X and a (1) X , male and female life expectancy, age-dependent mortality rates, total fertility rates, and observed total populations for each year t.Although we write these functions for convenience here with an explicit time-dependence, this is implicit in the year-to-year variability of the different parameters, appearing in Eqs.(SC15)-(SC15), and directly estimated from annual demographic tables.
For each city, we know the real population trajectory No(t), from t = t0 to t1.Note the subscript o stands for observed.
Through simple numerical integration of the ODE system (SC1), we can generate a model prediction for that trajectory and a given value of r and N (t, r) (from t0 to t1), where N (o) (t0) is taken as the initial condition.The best estimate of the free parameter r is then determined using a Gaussian error function, in particular as a measure of the difference between empirical and model predicted population values.Assuming independent errors, and a constant variance, a likelihood function describes the probability of observing certain reported time series D = (N (o) (t0), . . ., No(t1)), on total population increase in Madagascar: where θe are the parameters controlling the distribution of errors, for instance, the variance for Gaussian errors.In addition, an extra parameter ϵ was used (with value of 2.0) to be able to deal with continuous random variables, governed by continuous density functions.Each factor, i.e. the probability of observing N (o) (t) individuals at time t, should be properly written as As usual, the optimization procedure minimizes the negative loglikelihood function.An error function with non-timedependent variance implies the minimization of the sum of squared deviations from deterministic predicted values obtained through the numerical integration of the dynamical system.We have instead presented the full likelihood function here because we made use of other error functions and explored an assumption about the error variance.In particular, we made the assumption that the error variance of our population measures is proportional to the expected value N (t, r).In any case, our resulting projections were robust to different assumptions on the error function of choice.
In our analysis, we included reliable estimates of urban populations for every city in 2013, data on the temporal evolution of total urban population in Madagascar, and life table data.The sum of the populations of the 11 cities represented only a fraction of the estimated total urban population.We assumed that this fraction remained roughly constant over the whole period.We then calculated the fraction of the population in 2013 for each and every of the 11 cities.We assumed that this share among the cities remained also constant over the whole period.Finally, we had reliable estimates of the fraction of children (between 0 and 15 years) in the total Madagascar population, in 1975 (44.4 %), 2017 (39.87 %), and 2018 (39.55 %) (19).We used this data to interpolate adult fractions over the period of interest (2000-2016).These assumptions combined allowed to estimate, yearly and empirically, adult populations for every city from 2000 to 2016.Given the high quality of life table data, these estimates are very reliable.By using the likelihood function above, we compared model predictions for temporal trajectories of the adult population in every city with the obtained empirical estimates of those from demographic data.
In sum, demographic data from Madagascar were accurate enough to successfully estimate model demographic parameters (recruitment FX , FY , and mortality rates, δX and δY ) directly from demographic tables and fecundity data.In addition, we required to fit an extra constant parameter r.This parameter can be regarded as a correction factor on recruitment rates to exactly match observed population increase in every city.It can be also interpreted as a measure of the influence of immigration from rural areas into the city on the number of females and males entering active sexual life per year.If this number is greater than 0.5, there is a positive immigration.If it is lower than 0.5, it means that there is a loss of children that were born in the city but emigrated out from the it before reaching sexual maturity.We obtained an r value of 0.615.These parameter estimates, both the time-dependent FX , FY , δX , δY , and the constant r parameters, produced model trajectories for the temporal evolution of the adult population for each of the cities in extremely good agreement with empirical data (see Fig SD1 and SE3).C. The expanded demographic model.In a second step, we used the data of the sex-working population per city (see data from Table F1) to obtain estimates of the parameters that control the temporal evolution of the distribution of females in the four classes considered (sex, non-sex worker, and young and old groups).Note that we only had data of the sex-working population for each city in 2014 and 2017 (see Table F1).These two values showed large error bars and, for some cities, were not fully consistent with each other.This may be a consequence of the inherent difficulties associated to reliably censusing these groups.
We should admit that these limitations introduce uncertainty in our model predictions.
However, in spite of the scarcity of temporal data points, in order to find plausible parameter combinations, model parameters were searched to fit a trajectory for the number of sex workers that could have been observed from 2000 to 2016 (see Fig SD1 ,     panel B).These pseudo-data were generated under two reasonable hypotheses, namely, the constant-fraction and the sigmoidal hypotheses.The first one assumes that the different values observed in 2014 and 2017 represent natural fluctuations of the fractions of sex workers with respect to the total women adult population around an average level.This hypothesis assumes this average fraction is constant over the whole period of interest (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016).The second hypothesis takes into consideration Madagascar economic crises (2009-2013) and makes the assumption that the fraction of sex workers within the adult female population could have increased.We modeled this growth as a sigmoidal curve.This involves assuming lower fractions of sex workers in 2000, and fractions of SW approximately matching the truly observed number by the end of the period.This requirement fixes one of the the three parameter of the sigmoidal curves, which was chosen to be A0 (see Eqs. (SD5)-(SD6)).
Therefore, in order to fully specify this hypothesis, two extra quantitative assumptions are still required: the smoothness of the sigmoidal jump (L0) and the threshold value (T0) in Eqs.(SD5)-(SD6)).We characterized our curves by a smoothness parameter of about 0.1 (in a range from 0.01 to 1.0), and a threshold value around 2011 (in a range from 2009 to 2013).We explore a total of 10 parametric combinations (see Table SD1).
In sum, both hypotheses assumed lower absolute numbers of SW at the beginning of the period, which would have increased either only due to population expansion (constant-fraction hypothesis), or in addition, sigmoidally as a consequence of the 2009-2013 economic crisis (sigmoidal hypothesis).Therefore, in our parameter searches, we had then three observed variables in every city: • Total adult male population • Total adult female population • Total sex worker population (as created according to either the constant-fraction or the sigmoidal hypothesis).
By specifying the likelihood function (see (Eq SD1,) we obtained parametric configurations for each of the hypotheses.
While under the constant-fraction hypothesis, σ0, σ1, σ r 0 and σ r 1 were all considered constant in time, under the sigmoidal hypothesis, we consistently let σ0, and σ1 parameters evolve in time following also a sigmoidal curve (see Fig SD1 , panel D), but we still kept σ r 0 and σ r 1 constant.Sigmoidal curves are flexible enough to represent a sudden increase in a quantity after a certain threshold is crossed.In our case, before the threshold year, sex workers were assumed to be in lower fractions.After that year, more and more women might have used transactional sex to cope with harsh economic conditions.In particular, we consider the following sigmoidal curves: In any case, α and the four σ parameters, and the three governing the two sigmoidals (see Eqs. (SD5)-(SD6) above), under the second hypothesis, were all estimated to fit plausible time series for the number of sex workers within the adult female population and the total adult female and male city populations observed during the studied period (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016).We did this for the 11 eleven cities in Table SF1.In  D. The full disease-transmission model.Finally, we use data of 2016 on HIV prevalence across cities within the sex-working (SW) population (see Table SF1) to search for parameter combinations able to produce trajectories compatible with the data at hand.Predicted trajectories result from numerical integration of the full ODE system.We left out the observed data on HIV prevalence of previous years.Although prevalence data (within the sexual working, SW, population) for some cities does exist for 2005, 2007, and 2010, and 2012 (see Table SF2), years before 2016 all reveal very low levels (less than 2.5%).It is not until 2016 that disease prevalence remarkably increases in most cities reaching values over 10% in some of them.Therefore, we left out previous years for visual validation purposes (see Fig SE4 , SE5, and SE6).
Here we assumed that HIV has been slowly expanding for the last decades in Madagascar until reaching observed levels in 2016.This is in full agreement with public health information and sparse, but valuable data from HIV-AIDS in Madagascar (20).The initial phase of an infectious disease is always exponential in a very good approximation.This is a common feature of the initial expansion of any infectious disease (2).Therefore, this hypothesis involves an exponential expansion of disease prevalence from very low values around 2000 to the observed 2016 levels.Unlike the previous constant-fraction vs sigmoidal hypotheses, this exponential expansion can be better regarded as an empirical fact than as a true hypothesis given that all information about AIDS in Madagascar points to a slow but steady expansion of the disease in the island for the last decades.
In any case, the full characterization of this expansion involves an extra quantitative assumption: we need to prescribe SW disease prevalence in 2000.We considered only 1% of that observed in 2016.Given prevalence values and population numbers for SW, this percentage implies that only a few SW individuals were already infected (lower than 10 individuals) at that initial year.Then disease prevalence within SWs would have increased exponentially from that initial low level, at the beginning of the studied period, up to 2016 levels.
We could have let the initial condition be defined by parameters to be searched as well.However, the beginning of the studied period is supposed to be quite well defined by very low levels of disease incidence in key exposed groups (SW) and neglishable levels within the general population (20).Just in case, we explored different prevalence levels at the beginning of the studied period (1%, 5%, 25%, and 50% of the prevalence levels observed by the end the period), and different introduction  SB1 and SB2.

E. Projected Trajectories
In order to project trajectories, the numerical integration of the full system requires annual time-dependent parameters, namely future recruitment and mortality rates (FX , FY , δX , and δY ) from 2016 to 2033.These values were extrapolated from the same demographic life tables under the assumption that mortality and fertility rates maintain observed trends between 2000 and 2016 (see Fig SE1).
Because we did not directly use raw data to compare with model output, but generated first time series of pseudo-data compatible with the observations at hand, it is very important to clarify in which terms, and under which conditions, the calculation of projected trajectories was conducted.Table SE1 summarizes the hypotheses under which each ensemble of parametric configurations was obtained.In Figs.SE2 and SE3, we show the model projections up to year 2033 calculated (panels b) for a number of parameter combinations that provide a good fit to data for the period 2000-2016 (see panels a).
Similar projections are obtained for other cities (Fig. SE6).Each parameter combination produces a single deterministic trajectory.The ensemble of trajectories is used to calculate percentile values.Upper and lower broken lines represent the 90% and 10% percentile values, respectively.The thicker line in the middle represents 50% percentile values.Data for the period between 2000 and 2016 are represented by circles.They have been generated under the sigmoidal hypothesis (H5, see Table SD1) to be compatible with the demographic expansion in Madagascar in this period.True data are highlighted with error bars (orange) representing confidence intervals (see years 2014 and 2017 in middle panels in both figures).
We also explored the impact of circumcision practices in Madagascar since it is known that this practice, strongly embedded in Madagascar culture, reduces the probability for males of acquiring the infection from infectious females.This was done by constraining parameter searches within the possible parameter range for pY X , which controls this infection probability once a sexual contact has occurred."Low pY X " means that searches were conducted in the range [0, 4 10 −4 ], while "high pY X " involve that searches were conducted in the range [0, 10 −3 ] for this particular parameter (see Table SE1 and Fig SE4).When we compare results, under the same overall hypothesis (H5), for "high pY X " vs "low pY X ", we observed that the effect of circumcision on the transmission probability from infectious women to healthy men (pY X ) is not very important.For the rest of model parameters, parametric configurations were searched within boundaries, as indicated in Table SA1.Model projections Table SE1.A summary of the explored hypotheses under which pseudo-data were generated and parameter searches (between 2000 and 2016) were then conducted.In grey, we show the combination that produced the greatest consistency levels between model predictions, data at hand, and the hypotheses.Although different hypothesis led to similar conclusions, the best likelihoods were obtained under the sigmoidal hypothesis, for high pY X , and fraction of sex workers (f = W/X) similar to those in 2017 (combination represented by the shaded cell in Table SE1).It means that this combination of hypotheses maximized the consistency between available data at hand and model specification.Results in the main text correspond precisely to these hypotheses (Sigmoidal H5-H6, see Table SD1).

Constant Fraction Sigmoidal
Parameter searches were conducted by minimizing the negative loglikelihood (see Eq (SD1)).Up to 10 8 different parameter combinations were sampled at random within ranges given by Table SA1.Each initial random combination was then used to seed a simplex algorithm (21) constrained within the prescribed parameter ranges (see Table SA1), which led to different optimal parametric configurations.The ensemble of optimal configurations obtained in this way was further filtered to keep only those that were only 2 points apart from the best one in their negative loglikelihood values.
Some parameter distributions within this filtered ensemble of optimal configurations were quite constrained by the information at hand, while other were less.Optimal distributions appeared to be similar across the different hypothesis and led to the

F. Data Sources and Code
Here we show compiled disease and population data (see Tab. SF1) for populations sizes, total sexual worker population, and prevalence of the disease within this group in the main cities of Madagascar after the Institut National de la Statistique, Antananarivo.Other data were compiled from several public sources (16)(17)(18)22).C Code to perform pseudo-data generation under the different hypotheses, run model numerical integrations, conduct parameter estimation from demographic tables, and all theses simplex-based (23), parallel random parameter searches is publicly available in this github site (1).SF2).The five lines represent 5%, 25%, 50%, 75%, and 95% percentiles from the lowest to the highest values, respectively.
, see also github site for details on code implementation).In Fig SB1 and Fig SB2, we explore the parameter space in two dimensions (βY , pY X ).The other parameters values are constant and correspond to the different cities (see Tables

Fig. SB1 .
Fig. SB1.Here we show how R0 (see Eq. SB46) changes as a function of β Y and p Y X .Isoclines of R0 = 1.0 (thick broken lines), R0 = 2.5, R0 = 5.0, and R0 = 10 are also represented.The rest of model parameters, but β Y and p Y X , are kept constant.

Fig. SB2 .
Fig. SB2.Here we show how R0 (see Eq. SB46) changes as a function of β Y and p Y X .Isoclines of R0 = 1.0 (thick broken lines), R0 = 2.5, R0 = 5.0, and R0 = 10 are also represented.The rest of model parameters, but β Y and p Y X , are kept constant.

Fig. SC1 .
Fig. SC1.Demographic trends in Antananarivo.The numerical integration of the demographic model in Eqs (SC1), taking the population in 2000 as the initial condition, and time-dependent parameters F Y , F X , δ X , δ Y , all estimated from data from 2000 to 2016, produces the solid green line.Squares represent population data from source: Fig SD1 we show only results for the sigmoidal hypothesis in Antsiranana.In bottom right panel, the sigmoidal time dependence in one of the σ parameters is shown across the parameter distribution that best fitted the data .David Alonso and Xavier Vallès

Fig. SD1 .
Fig. SD1.Demographic trends in Antsiranana.The numerical integration of the demographic model in Eqs (SC1), taking the population in 2000 as the initial condition, and the four time-dependent parameters F Y , F X , δ X , δ Y , and a extra time-constant parameter r estimated from data from 2000 to 2016, predicts the solid green line representing the temporal evolution of total adult population (Panel A).Circles represent yearly interpolated values from demographic empirical data.Population data from source: http://www.worldometers.info/world-population/madagascar-population/.The temporal evolution of the total number and the number of infected individuals of the sexual working female population is plotted in panels B and C, respectively, for a number of parameter combinations each of them predicting a similar time trajectory (different green lines).In panel B, single circles correspond to data generated under the hypothesis of a sex-working population increasing as a sigmoidal curve with a threshold parameter at year 2011, smoothness parameter 0.1, and 2014 SW levels.In panel D, we also show the temporal evolution of σ0 associated to the same ensemble of parameter configurations producing good fits.
years (from 2000 to 2010).Consistently, we obtained always better fits when the initial year was before 2005 (see Fig SD2), and prevalence levels in 2000 were rather low (1% to 5%).Importantly, our approach allows comparing the likelihood of different introduction years from 2000 on.Until approximately 2005, this likelihood is roughly the same, and then continuously drops down until the present (see Fig D2).In sum, these two hypotheses, (1) sigmoidal vs constant-fraction, and (2) exponential expansion, combined provided empirically-based trajectories for, first, the number of sexual workers (SW), and second, the prevalence of the disease within the total SW population.These two series of empirically generated data, along with accurate adult population time series data per city, were then compared with model-generated trajectories to search for disease model parameters able to produce comparable results.Here we only show results for the sigmoidal-exponential hypothesis.Fig SD1(and showsa good agreement between model-generated trajectories (green shading), and the yearly data (in circles), generated according to the plausible assumptions underlying this hypothesis.Average parameter values for these ensemble trajectories are given in Tables

Fig. SD2 .Fig. SE1 .
Fig. SD2.Negative Loglikelihood values per degree of freedom.According to this, the estimation procedure is better if the introduction year happens to be before 2005.We found consistent results for the other cities as well.
obtained from parametric configurations under the two situations resulted very similar.Here, results for Toamasina (SE4) and Antananarivo (SE5) are shown.A slight effect can be seen in disease prevalence in men.Other cities showed a similar pattern.
same qualitative conclusions.In Fig SE7,we compare box plots across a number of hypotheses for the whole set of model parameters.The first subplot corresponds to a dummy parameter multiplying recruitment rates FX and FY , which was added as a consistency check.If recruitment rates were correctly estimated to match the temporal evolution of the adult population in every city, as was explained in previous sections A.2 and B, then the optimization procedure for the full disease transmission model, which includes true parameters plus this dummy factor, should only retain parameter configurations where such a factor is equal to 1, as it was the case.

Fig. SE2 .Fig. SE3 .
Fig. SE2.Model fit and projected trends in Antananarivo.N , adult population; X SW , female sexual worker population; I SW , infected individuals within the sexual worker population.

Table SA1 .
Model Parameters for the SICA model.V 0 and V 1 define

reasonable parameter ranges for each parameter value. Subscripts X and Y stand for female and male, respectively. YF, young female; AF, adult female; SW, sexual worker. All rates are given in year −1 . Mortality and recruitment rates are directly estimated from life table information. Transmission probabilities are given per coital act and represent basal values. The χ factor controls how much these are incremented when a healthy individual has a sexual encounter with an individual in the acute infection
phase (I).The inverse of transition rates between stages (for instance, 1/γ and 1/µ) can

Table SB1 . Average model parameter values, and their corresponding values for R 0 , and their standard deviations. Average values are calculated over an ensemble of parametric configurations that provide the best model consistency with data under the sigmoidal hypothesis for 5 cities in Madagascar (see Supp Mat for details on
R 0 calculation and R 0 values in the other 5 cities) David Alonso and Xavier Vallès

Table SB2 . Average model parameter values, and their corresponding values for R 0 , and their standard deviations.
Here we outline the strategy for δX , δY , FX , and FY parameter estimation over time.FX and FY depend on female overall fertility from previous years, which in turn, depends on the local reproductive population at that point in the past.This would require the use of delayed ODEs models.The SICA model gets around this complexity by estimating FX and FY directly from demographic table data.In this sense, FX , FY , δX , and δY become data-derived parameters with empirical trends and year-to-year variability.By summing up equations for males and females in the full system, it is easy to show that in the absence of disease population dynamics in SICA model depend only on two sex-specific recruitment rates, FX and FY , and two mortality rates, δX and δY .Demography therefore can be simply be represented in a two-equation system: and L X A (t) are the life table life-expectancy values in the 15-19 age group for males and females, respectively, at year t (from 2000 to 2016).The FY and FX parameters of the SICA model represent the number of males and females, respectively, A.2. Recruitment Rates.
is the number of deaths in the age group between age a and age a + n − 1 during year t, and ⟨N n a (t)⟩ is the mid-year population in that age group at year t.Survival probabilities can be estimated from this statistic under the assumption of constant mortality within each age group.The survivors at the end of a year are related to the initial number of individuals at the beginning of that year for each age group through an exponential decay: is the annual number of births from females at specific age j.We can realistically assume that the age, aX , of entrance to active sexual life matches a X .We can express b(t) as a per year rate by dividing it by (a