VIEWS: 43 PAGES: 18 POSTED ON: 1/28/2011
USC 3001: Complexity AY 05/06 Semester 2 Project Report Around the world in eighty flights: Modeling the world-wide airport network By: Tan Kai Xin, Grace and Tan Li Yuan Instructor: Dr Rajesh Parwani Contents 1. Introduction 2. Concept of low degree and high betweenness-centrality 3. Degree-betweenness anomalies and multi-community networks 4. Model A: Based on preferential attachment and geographical distance constraints 5. Validating Model A 6. Model B: Inclusion of geo-political constraints 7. Validating Model B 8. Critical Analysis 9. Conclusion 1 1. Introduction The world-wide airport network not only provides convenience to travelers. Just as other critical infrastructures, the air transportation network poses an enormous impact on local, national, and international economies. It also has an indirect role to play the propagation of infectious diseases such as influenza and SARS. The world-wide air transportation network serves to mobilize millions of people every day. Often, particular states within a continent are designated to handle high volume of daily flights, sometimes more than they can handle. This results in delays and flight cancellations across the country in the event of dire weather conditions, leading to large economic losses. These failure and inefficiencies prompt various questions: What has led the system to this point? Why is a more efficient system not developed? In order to deal with these queries, it is important to characterize the structure and the evolutionary mechanisms of the world-wide airport transportation network. In principle, the structure of the air transportation network is mainly determined by the airline companies, that will aim maximize their immediate profit. However, it is also the result of geographical and political factors. It was observed that the world-wide air transportation network is a small-world network for which (i) the number of direct connections k to a given city (degree), and (ii) the number of shortest paths b going through a given city (betweenness centrality) have distributions that are scale-free. However, in contrast to the prediction of scale-free network models, it was observed that the most connected cities (largest degree) are not necessarily the most central cities (largest betweenness centrality) both on the world-wide level as well as regional airport networks. This was an important finding as it has been shown that nodes with high betweenness tend to play a more important role that those with high degree in keeping networks connected, which might necessarily play a key role in the propagation of diseases. In „Modeling the world-wide airport network‟, Guimera and Amaral aimed to address the issue of identifying the mechanism by which central nodes that are not hubs can emerge. 2 They not only have shown how current models that consider preferential attachment and geographical distance constraints solely cannot reproduce the observed behavior of low degree and high betweenness centrality in airport networks, though it could generate the phenomenon in which airports tend to be connected with other airports that are geographically close. They took a step further to account for the large betweenness-small degree occurrence by introducing a new type of mechanism that encompasses geo-political constraints. In this paper, we shall introduce the concept of low degree and high betweenness, in relation to the airport network. 3 2. Concept of low degree and high betweenness-centrality The degree of a node, also known as connectivity, provides information on its importance; however, it is certainly not the only proponent that depicts the significance of a node in the network. Fig 1: Node v has low degree but all the shortest paths from region C1 to C2 has to go thorough v, hence implying very large centrality. Indeed, the node v in Fig. 1 above, has a small connectivity (linking only 2 neighbors), however, the effect of its removal is certainly not determined by its connectivity but by the fact that it links together different parts of the network. A good measure of the centrality of a node has thus to incorporate a more global information such as its role played in the existence of paths between any two given nodes in the network. One is thus naturally led to the definition of the betweenness centrality (BC) which counts the fraction of shortest paths going through a given node. More precisely, the BC of a node v is given by : st (v) g (v ) s v t st (1) where σst is the total number of shortest paths from node s to node t and σst (v) is the number of shortest paths from s to t going through v. A pair-dependency relationship, describing the relationship between node v and the st (v) shortest path from node s to node t is defined as: st (v) st Hence, any two nodes that reside within the same region (eg: s and t both from the region C1) will have a zero value for μst (v) because the nodes that form the shortest path will not 4 include node v and σst (v) will naturally be 0. Therefore, it is reasonable to equate nodes s and t to reside in two separate regions. Consider the nodes v1, v and v2 in Fig. 1 above. Number of shortest path from node v1 to node v2 : σst = 1 (v1 v v2) Number of shortest path from node v1 to node v2 through v: σst (v) = 1 (v1 v v2) Hence st (v) = 1. For a pair dependency relationship of 1, this implies that all the shortest paths linking 2 nodes will have to pass through v, thus showing the importance of node v in linking the 2 nodes. Fig 1a: A modification to Figure 1 (shows the shortest paths from node v1 to v2) To further illustrate the importance of a node via the pair dependency relationship, Consider now the figure 1a shown above. Number of shortest path from node v1 to node v2 : =2 (v1 v v2 or v1 v‟ v2) Number of shortest path from node v1 to node v2 through v: = 1 (v1 v v2) Thus now takes on the value of ½. A smaller value of implies that node v now takes on a less important role in linking the 2 nodes. It is shown in Fig 1a that even without node v, an alternative shortest path exists between v1 and v2. 5 BC can also be rewritten as: g (v) s v t st (v) i j sCi ,tC j st (v) where i and j belong to different regions and nodes s and t reside in Ci and Cj regions respectively. For Fig 1, st (v) =1 for all s C1 , t C2 since all short paths have to go through node v, as in the case for v1 and v2. Therefore, the BC of the node v (in Fig 1) is given by st (v) g (v ) 2 sC1 ,tC 2 st 2 st (v) 2 1 2 N1 N 2 sC ,tC sC ,tC 1 2 1 2 where N1 and N2 are the number of nodes in region C1 and C2. This result shows that although v has small connectivity, its BC defined by (1) is large. 2N1N2 is therefore, the largest BC value for a particular node where all the shortest paths between 2 nodes from separate regions pass through that particular node. Thus it is noted that BC can be large regardless of the degree of a particular node. In Fig 1, nodes v1, v and v2 despite having different connectivity, have the same BC. High values of the centrality indicate that a node can reach the others on short paths or that this vertex lies on many short paths. If one removes a node with large centrality, it will lengthen the paths between many pairs of nodes. The extreme case is when the node is a cut-vertex (eg: node v in Fig1); its removal breaks the network into two disconnected components. Hence it highlights that central hubs with low degree may play a significant role in the evolution of the structure, as well as the efficiency of the network. Thus, it is not sufficient to focus our attention on hubs. Instead, it is crucial to examine the underlying causes which results in central airports which are not hubs, in order to design a better system for the world-wide airport network. We shall illustrate how such a phenomenon may occur in the real world network in the following section. 6 3. Degree-betweenness anomalies and multi-community networks In most of the complex networks, the existence of nodes with small degree and large betweenness centrality is not significant. In particular, the degree and betweenness centrality of a node are highly correlated in random networks. In other words, for nodes with small degree, there is a high probability that it has a small betweenness centrality, and vice versa. Hence, the presence of central airports, which are not hubs, can be considered as an anomaly. . A region is defined to be a cluster of densely connected nodes. However, more important is the issue on the mechanism that drives the formation of such scale-free networks with the obtained anomalous distribution of betweenness centralities. Towards this matter, we shall consider Alaska, which is a sparsely populated, isolated community. Despite is low population density; it has a disproportionately large number of airports. However, it was observed that only a few Alaskan airports, including Anchorage and Fairbanks, are connected to the continental US, while most Alaskan airports only have connections to other airports within Alaska. Furthermore, Alaska is nearer to Canada than continental US, but, it was observed that there are no connections from Alaskan airports to airports in Canada‟s Northern Territories. (see Fig3) The main reason for this observation is that the Alaskan population has to be connected to its political centers, which are situated in continental US. Thus, geopolitical constraint is the main factor which results in the abovementioned observation. Figure 3: Geographical location of Alaska and Canada 7 In a simplified model for the airport network the following scenarios could result: Figure 4a: A simplified model displaying low degree and large centrality of node v (representing Anchorage). Node s is the hub of the Alaskan community. For the first scenario in Fig 4a, Anchorage has low degree and high centrality, since it is necessary for all the flights from within Alaska to Continental US to pass through Anchorage. In this case, all the Alaskan airports are connected to node s (the hub) before connecting to Anchorage. This is possibly due to the effects of preferential attachment as well as geographical constraints: the other Alaskan airports, could be located at a distance much further away from Anchorage as compared to node s, hence, would preferably connect to an airport that is geographically close by (represented by node s) before connecting to Anchorage. However, Anchorage could also serve as the hub within the community (as depicted in fig 4b) the other Alaskan airports are located at a geographically close distance. Hence, Anchorage may now have high degree within Alaska, but it is certainly not a hub to all the nodes in the network. Nonetheless, it still serves as the main link to Continental US. Thus, cities like Anchorage provide the major connection to the outside world for the other cities in the communities, thereby explaining the large betweenness centrality. Indeed, the existence of nodes with anomalous centrality is related to the existence of region with a high density of airports but few connections to the outside. The degree-betweenness anomaly is therefore ultimately related to the existence of „communities‟ in the network. Having given a generalized description of the real-world airport network, we will describe how the models were derived by the authors, in an attempt to prove the hypothesis that geopolitical constraints play a part in affecting the structure of the airport network. 8 4. Model A: Based on preferential attachment and geographical distance constraints Aim: Construct a simple model which takes into account preferential attachment and geographical distance constraints At each time step, one of the following events takes place: (i) A new link between two existing nodes is established with probability p (ii) A new node is added and connected to m existing nodes with probability (1- p) When : Event (i) occurs, a new link is created between existing nodes i and j according to: ki k j ij F (d ij ) Event (ii) occurs, a link is created between the new node i and an existing node j, with j selected according to: kj ij F (d ij ) Investigate two different forms for the function F(d): (a) F1 (d ) d r (b) F1 (d ) exp( d d x ) ; d x is the characteristic distance Preferential attachment leads to a power-law degree distribution F(d) leads to the truncation of the power-law decay and when F(d) increases very rapidly, the power-law decay regime may disappear completely Notes: 1. Nodes are created in locations which correspond to actual airport locations 2. Size of the model network is the same as the size of the real network 3. Locations of new nodes are chosen in random order 9 5. Validating Model A Case (a): F1 (d ) d r Figure 5: Degree distribution for scaled degree and betweenness for F1 (d ) d r Fix p=0.65 so that the exponent of the degree distribution to agree with the observed data.Also fix m=1 so that the average degree is as close as possible to the average degree of the world-wide airport network The results for r=1, r=2 and r=3 are presented in Figure 5. From Figure 5, it can be observed that the model is able to reproduce the observed degree distributions and the observed betweeness distributions. Next, plot „Betweeness against Degree‟ (b(k)) to check if the simulation produces data which corresponds to the real world data, in which there exist airports which have low degree, but high betweenness. If it can be observed from the simulation results that there are some nodes which have low degree, but high betweenness, then the model can be used to represent the world-wide airport network. Figure 6a: Betweenness of the nodes as a function of their degree for a model world-wide airport network The points in Figure 6a correspond to the simulations of the model, while the shaded regions represent the 95% confidence intervals for random networks which have the same degree distributions as the model networks. Note that the confidence intervals for random networks are used here because there exist a high correlation between the „betweenness‟ and „degree‟ in random networks. 10 Since most of the simulation data falls within or are close to the shaded regions, it shows that there exist a high correlation between the „betweenness‟ and „degree‟ in the model network, especially at the level where small degrees occur. However, in the world-wide airport network, it is observed that there exist a low correlation between „degree‟ and „betweenness‟, especially at the level where small degrees occur. Hence the model is unable to explain for the presence of central airports which are not hubs. Figure 6b: Betweenness of the nodes as a function of their degree for a model North American airport network The model gives rise to similar results for the case of North American airport network (Figure 6b), in which there exist a high correlation between the „betweenness‟ and „degree‟ in the model network, especially at the level where small degrees occur. Although the model is unable to explain for the presence of central airports which are not hubs, it can be observed that the model is fairly consistent, regardless of the size of the network. Case (b): F1 (d ) exp( d d x ) Figure 7: Degree distribution for scaled degree and betweenness for F1 (d ) exp( d d x ) As for case (a), fix p=0.65 so that the exponent of the degree distribution to agree with the observed data. Also fix m=1 so that the average degree is as close as possible to the average degree of the world-wide airport network The results for dx=1.0RT and dx=0.2 RT are presented in Figure 7, where RT is the radius of the Earth. It can be observed from Figure 7, that the model is able to reproduce the observed degree distributions and the observed betweenness distributions. Besides that, it was observed that 11 F(d) only affects the structure of the network when considering regions much larger than dx. Hence, one may foresee some problems when applying the model to a regional airport network. Similar to case (a), plot „Betweenness against Degree‟ (b(k)) to check if the simulation produces data which corresponds to the real world data. Figure 8a: Betweenness of the nodes as a function of their degree for a model world-wide airport network Similarly, the points in Figure 8a correspond to the simulations of the model, while the shaded regions represent the 95% confidence intervals for random networks which have the same degree distributions as the model networks. There are some fluctuations of b(k) in the model world-wide airport network. The model produces the presence of nodes with relatively small degree and high betweenness, which is in agreement with the real world case. However, the model North American airport network does not produce the consistent results, as of that of the model world-wide airport network. Figure 8b: Betweenness of the nodes as a function of their degree for a model North American airport network As observed in Figure 8b, unlike the model North American airport network, there exist a high correlation between the „betweenness‟ and „degree‟ in the model network and it does not produce simulation data with small degree and high betweenness. Hence, it can be deduced that the introduction of characteristic length (dx) poses some problems. Since the results are not consistent for the world-wide network and North American network, it can be confirmed that F(d) only affects the structure of the network when considering regions much larger than dx. Thus, case(a) F1 (d ) d r provides a better model than case (b) F1 (d ) exp( d d x ) . 12 6. Model B: Inclusion of geo-political constraints From the analysis of Model A which takes into account the influence of preferential attachments and distance constraints, it can be observed that these two factors appear to explain the degree and betweenness distributions, but fail to account for the presence of nodes with small degree but high betweenness. Hence, there must other factors which affect the formation and evolution of the airport network. Suppose that there is an additional constraint which is a consequence of geo-political considerations. In other words, only a few airports in each country are connected to airports in other countries, while the other airports are only allowed to connect to airports within the same country, regardless of the geographical distance between the airports. This has been illustrated earlier in the paper, for the case of Alaskan airports being connected to the continental US due to political reasons, instead of being connected to nearer airports in Canada‟s Northern Territories. In order to account for the effect of geo-political constraints, the following model is a modification to Model A such that most nodes are only allowed to establish connections with other nodes within the same country and only a few are allowed to establish international connections. The first 10% of nodes are added exactly as in Model A, with F (d ) d r and r=1 (which gives the best fit for the degree and betweenness distribution of the real world data). Next, add the remaining nodes according to the rules in Model A, but with the additional rule that only allow connections to be formed within the same country. 13 7. Validating Model B Figure 9: Betweenness of the nodes as a function of their degree for cases with geo-political constraints Model B generates central nodes with small degree (as shown in fig 9), which is consistent with the observations in the real airport network, for both the case of the world-wide airport network and North American airport network. Hence, it can be concluded that the new model is able to explain the existence of large- betweenness and small-degree nodes at both the global and regional level. 14 8. Critical Analysis The aim of the models was to determine the main factor which gives rise to the existence of airports which have low degree but high betweenness centrality. It was observed that Model A (based on preferential attachment and geographical constraints) was unable to produce simulation results which illustrate the presence of nodes with small degree and high betweenness centrality. Hence the authors modified it to introduce geopolitical constraints, in Model B. The simulation results of Model B were able to account for the low degree and high betweenness-centrality phenomena. However, there were several assumptions made in the model. The authors fixed p=0.65 and m=1 such that the cumulative degree distributions of the betweenness and degree agrees with the real world data, as shown in Fig.5 and Fig. 7. This was a reasonable assumption since the main aim of the models was to show that the factors results in nodes with low degree and high betweenness centrality. Hence, it was necessary to ensure that the cumulative degree distributions matched the real world data before proceeding with the analysis of the correlation between degree and centrality. In Model B, the first 10% of nodes were allowed to establish international connections. However, we feel that this value should be varied so that we may better understand how the significance of geopolitical constraints. For example, if the variations of this percentage can more accurately describe the correlation between the degree and centrality of the nodes in the network, we can approximate the extent of the effect of geopolitical constraints on the evolution of the world-wide airport network. Besides that, F(d) was taken to follow a power law ( F (d ) d r ) in Model B. We feel that this assumption is reasonable, since in Model A, it was already proven that F (d ) d r provides a better approximation than F (d ) exp( d d x ) . Nevertheless, the assumptions were reasonable. Based on these reasonable assumptions, the model produced simulation data which is of good fit to the real world data and accounted for the effects of geo-political constraints. 15 9. Conclusion In this project, we have shown that the authors have created a good model that accounts for the constraints and pressures governing the evolution of the airport transportation network. This explains why central nodes that are not hubs can emerge. However, one other factor they could possibly consider is population density; within a region that is sparsely populated, it is natural that there will be fewer links extending out of that region. Hence, this could also influence the structuring of the airport transportation network. One benefit from the understanding of the airport network structure is the ability to identify central nodes, which play an important role in the propagation of infectious diseases. This could curb the further spread of diseases by temporarily breaking these links so that disease remains self-contained within the region. Hence knowledge of the topology of the airport transportation network allows quicker identification, which is consequently more effective in controlling the transmission of diseases across regions. 16 References Barthelemy, Marc. “Betweeness centrality in large complex network” Arxiv:cond-mat 0309436 v2 (13 May 2004) Guimera, R. and Amaral, L.A.N. “Modeling the world-wide airport network.” The European Physical Journal B 38 (2004): 381-385. Guimera, R et al. “The world-wide air transportation network: Anomalous centrality, community structure, and cities‟ global roles.” Arxiv:cond-mat 0312535 v2 (12 Jul 2005). 17