Long-Run Integration in Social Networks

Document Sample
Long-Run Integration in Social Networks Powered By Docstoc
					                       Long-Run Integration in Social Networks∗
                   Sergio Currarini†             Matthew O. Jackson‡                 Paolo Pin§

                                        This Draft: January 12, 2011



                                                     Abstract

             We study network formation where nodes are born sequentially and form links with pre-
        viously born nodes. Connections are formed through a combination of random meetings and
        through search, as in Jackson and Rogers (2007). A newborn’s random meetings of existing
        nodes are type-dependent and the newborn’s search is then by meeting the neighbors of the
        randomly met nodes. We study “long-run integration,” which requires that as a node ages
        sufficiently, the type distribution of the nodes connected to it approaches the overall type–
        distribution of the population. We show that long-run integration occurs if and only if the
        search part of the network formation process is unbiased, and that eventually the search process
        dominates in terms of the new links that an older node obtains. Integration, however, only oc-
        curs for sufficiently old nodes, and the aggregate type-distribution of connections in the network
        still reflects the bias of the random process. We illustrate the model with data on scientific
        citations in physics journals.


1       Introduction
Homophily patterns in networks have important implications. For example, citation patterns across
literatures can affect whether important ideas developed in one literature eventually diffuse into
another. Homophily also affects a variety of behaviors and the welfare of individuals connected in
social networks.1 In this paper we analyze a model that provides new insight into patterns and the
emergence of homophily, and illustrate its findings with an application to a network of scientific
citations.
    ∗
     This supersedes “Overlapping Network Formation”, Currarini, Jackson and Pin (2006), which also appeared as
a chapter in Pin’s dissertation in (2007). This version contains some new theoretical results and strengthening of the
existing ones, and adds an empirical analysis of citations.
   †
              a
     Universit` di Venezia. Email: s.currarini@unive.it
   ‡
     Department of Economics, Stanford University and the Santa Fe Institute. Email: jacksonm@stanford.edu,
http://www.stanford.edu/∼jacksonm/
   §
                                                   a
     Dipartimento di Economia Politica, Universit´ degli Studi di Siena (Italy). Email: pin3@unisi.it
   1
     See McPherson, Smith-Lovin, Cook (2001) and Jackson (2007, 2008) for more background and discussion.



                                                          1
     The primary issue that we investigate is how homophily patterns change over time. Do nodes
become more integrated as they age? How does integration relate to the link formation process?
For instance, does the network end up more integrated if new connections are found through the
existing network or if nodes always meet anonymously? Intuition would suggest that the extent to
which the existing connections influence new ones will seriously affect the long run behavior of the
system.
     To answer these questions, we study a stochastic model of network formation in which nodes
come in different types, and in which the formation of links is sensitive to such types. More
specifically, we extend the model of Jackson and Rogers (2007) so that a new node is born at each
time period and has a given “type”, and forms (directed) links with the nodes born in previous
periods. A newborn node selects older nodes to connect to in two ways. First, a set of older nodes
are met and linked to according to a random, but potentially type-biased process. As an example,
a given scientific paper written in a given field (its type) has relations to a set of existing papers,
and possibly with greater frequency within its own field. These form a set of citations, or directed
links from the new paper to older papers. Second, the newborn node then meets and links to
some neighbors of the nodes to which it has already formed links. This is referred to as the search
process. Again, this part of the process might be type biased, but we also consider a case where
it is not. As an example with regards to citations, an author finds some references by examining
the reference lists of the papers that he or she has already located and cited. This search part of
the process may have a different bias than the random part of the process, since these papers were
cited by the papers that he or she has already chosen to cite and thus are more likely to be related.
We examine the limiting, long-run properties of this process.
     One possible interpretation for the biases in the process is as a reduced form for agents’ pref-
erences over the types of their neighbors and/or of biased meeting opportunities that agents face
in connecting to each other. So, in one direction we enrich a growing network model by allowing
for types and biases in connections, and in another direction we still bypass explicit strategic con-
siderations by studying a process with exogenous behavioral rules. Since search goes through the
out-neighbors only, strategic considerations are to an extent already limited, since a node cannot
increase the probability of being found by choosing its out-neighborhood. While this may not be a
good assumption in certain instances of social networks, such as friendships or job contacts, where
search presumably goes both ways on a link, it is appropriate in other contexts, such as scientific
citations where the time order of publications strictly determines the direction of search.
     Our results concern the dynamics of link formation among different types, and in the extent
to which biases in the process of link formation translate into biases in the long-run patterns of
connections. In particular, we are interested in the conditions under which the system tends to
“integrate” in the long run. We consider two definitions of integration. The first, weak integration,
requires that older nodes have a higher probability than younger nodes of being linked to by



                                                 2
newborn nodes, independently of their types. For example, this requires that an older, established
paper in one field have a higher probability of being cited by a newborn paper than some very
young paper, regardless of the fields of the older, younger, and newborn papers. In this weak sense,
age overcomes the bias in the link formation process. Effectively, this notion of integration requires
that old enough nodes become sufficiently “authoritative” to be found by a newborn node with
a relatively large probability even if this is of a different type. The second and more demanding
definition of integration is what we call long-run integration. It requires that as a node ages, the
distribution of types of nodes that have linked to that node eventually approaches the distribution
of types in the population. This requires that as a node ages any bias in the distribution of nodes
who have connected to it disappears.
    Our main theoretical results are as follows.
    Weak integration is satisfied whenever the probability that a given node is found increases with
that node’s in-degree. This holds in any version of our model where at least some links are formed
through the search part of the process and there is some possibility of connecting across types.2
    In contrast, long-run integration is significantly more difficult to satisfy. It is satisfied if and
only if the search part of the link formation process is unbiased (type-blind), so that every node
that is linked to by one of the nodes located through the random attachment part of the process
has an equal probability of being linked to under search. Thus, this requires that any bias in the
network formation process occur only in the random part of the process.
    In addition, we discuss how under mild conditions on the biases, the process moves monotoni-
cally towards the long run behaviour. In particular, the aging of nodes has the effect of weakening
the bias in their in-degree, and when search is unbiased, the in-degree composition tends to the
frequencies of types in the population.
    We also discuss some subtle aspects of the relation between the biases in the random meeting
process, the short run composition of in-degrees and the total number of connections agents receive.
For the case of unbiased search with two types, we show that the more homophilous type3 ends up
accumulating more total connections from both types, and that it attracts a more balanced mix of
types in the short run.
    To understand the long-run integration result, note that if both parts of the process are biased
then long-run integration cannot hold. Thus, let us examine why long-run integration holds when
the random part of the process is biased, but search is unbiased. As nodes age, the relative fraction
of in-links that they obtain through the search part of the process begins to dominate, since the
number of neighbors through whom they can be found grows and also since the probability that
any give node is found via the random process decreases because there are more nodes.
   2
     As will become clear, weak integration would also hold in a variety of other models that also exhibit the property
of having linking probabilities increase sufficiently with in-degree.
   3
     The term “homophily” refers to the probability of meeting same type agents in excess of this type’s population
share. See also footnote 13


                                                          3
    Note that even though search begins to dominate and is unbiased, it is still not obvious that
long-run integration will hold. That is because the likelihood of finding various neighbors is still
biased in the random part of the process. So let us examine this in more detail, and for simplicity
with just two types, say purple and green, as the logic extends easily. A given purple node can
be found by a newborn green node of a different type via search in different ways: one is that the
green newborn finds a neighbor of the purple node that is green, and the other is that the newborn
finds a neighbor of the purple node that is purple. It is relatively easier for the green node to find
other green nodes given the bias in the random part of the process, but then the purple node tends
to have more purple neighbors early in the process. The critical fact that it can happen either way,
means that this bias is lower than the current bias in the purple node’s neighborhood, and thus
tends to lower the bias overall. As the purple node’s neighborhood becomes less and less bias over
time, then that leads it to become even less biased, and the bias in the process vanishes over time.
    It must be noted that long-run integration coexists with a contrasting feature: the fraction
of links formed between agents of different types is never uniform across types. This persistent
asymmetry across types reflects the fact that although each node will eventually end up attracting
an unbiased spectrum of links, in the meantime younger nodes still experience biases, and so
integrating across the full population links can still be biased. This has to be the case since we
know that the links formed randomly are always biased, and so there is at least a given fraction of
links that are formed that are biased.
    In addition to the theoretical analysis, we also illustrate the model using data on scientific
citations in journals of the American Institute of Physics (AIP) published between 1977 and 2007.
We find that the proportion of citations that a paper obtains from other papers in its own field
decreases as the paper becomes more cited. An interpretation of the observed citation patterns
suggests biases in both the random and search parts of the process, but with a smaller bias in the
search part of the process.
    In using this specific application we are motivated by two factors. First, patterns of scientific
citations have important welfare consequences as they can affect the diffusion of knowledge, and
the contamination of different research fields.4 Previous research, such as that by Palacios–Huerta
and Volij (2004) and Koczyy, Nichiforz and Strobel (2010), generalizing popular concepts as the
recursive impact factor, stress that the importance of a citation relies on the paths that it allows
in the network of citations. We extend this argument considering under which conditions citations
are likely to bridge scientific production across different communities.5 Second, scientific citations
possess all the features of the network formation process that we study: nodes (papers) appear in
chronological order and never die, they only link to previously born nodes, they have types (scientific
classifications), and they find citations both directly and though search among the citations of other
  4
      See, for instance, Breschi and Lissoni (2006) and Jaffe and Trajtenberg (1996).
  5
      Rinia et alii (2001) study cross-field citations in the scientific production of the 90’s, for three different datasets.




                                                             4
papers.6
                                                      e
   Our analysis is independent of work by Bramoull´ and Rogers (2009) who examine a similar
model, but with some differences in the questions asked and application.
   The paper is organized as follows. Section 2 described the model. Section 3 contains our
definitions of integration and a mean-field analysis. Section 4 illustrates the model using citation
data. Section 5 concludes the paper. Finally, an appendix contains some additional results on
Markov matrices, the proofs of the propositions, a more detailed description of a possible matching
process, and some examples.


2       The Model
Time is indexed by t = 1, 2, ..... In each period a new node is born. We index nodes by their birth
dates, so that node t was born in period t.
    Nodes have “types,” with a generic type denoted θ belonging to a finite set Θ (with cardinality
H). A newborn’s type is random and drawn according to the time invariant probability distribution
p (so that types are i.i.d., across time).
    A newborn node sends (directed) links to n > 1 nodes that were born in previous periods. Of
these n nodes, a fraction mr is selected according to a type-dependent random process (where mr n
is an integer in the true process, but allowed to be arbitrary in the mean-field continuous-time
approximation). In particular, p(θ, θ ) denotes the probability that a link sent by a node of type θ
reaches a node of type θ . Among nodes of type θ , each node has an equal probability of getting one
of the mr n links - so there is no further discrimination in this part of the process. The remaining
fraction ms = 1 − mr of the n links are determined according to a search process: each new node
looks at the neighbors of the nmr nodes found in the first step and, among these, selects nms
nodes at random.7
    If the random meeting process were uniform, the probability p(θ, θ ) would equal the share p(θ )
of θ agents in the system. We consider more general meeting processes that can be biased.
    6
      These longitudinal aspects of citation networks have motivated the use of growing network models in previous
                                                                                  o
papers including the seminal work on citation networks by Price (1965, 1976). B¨rner, Maru and Goldstone (2004)
and Simkin and Roychowdhury (2007), among others, find that citations on the PNAS on a 20 years interval show
some aspects of a bias towards recently published papers, while Redner (1998) and Newman (2009) correcting for
cohort size and idiosyncratic popularity find an age effect (first mover advantage) and a frequency distribution of in-
citations that are consistent with a growing network model such as the one that we develop here. Finally, Shi, Tseng
and Adamic (2009) find a positive correlation between homophily of out-citations and the number of in–citations,
but this effect is valid only for low number of in-citations.
    7
      In the process, if some node is found to which the newborn is already connected, then the node is redrawn.
If there are too few new nodes in the neighborhoods of the nodes found in the first part of the process, then the
random nodes redrawn. To ensure that the process is well-defined, we begin with a set of n2 nodes in a sequence,
each connected to all predecessors.




                                                         5
    This can be interpreted in different ways. One is that the bias is a reduced form for preferences
that nodes have over the type of connections they form. The case of “homophilistic” preferences
for type θ is then captured by a situation where p(θ, θ) > p(θ). Of course, the search part of the
process can also be (directly) biased. We describe that more fully below.
    In the Appendix we formulate a detailed process of link formation, which generates biased
probability of matchings, and which is based on the possibility of agents “rejecting” connections
according to their type. For convenience, we work throughout the paper with the reduced form
introduced here.8
    The case in which mr = 1 is referred to as the “purely random” model, while the case of
0 < mr < 1 is referred to as the “random-search” model.
    Pjt (θt , θj ) is the probability that a node born in period j of type θj receives a link from a node
of type θt born at time t > j.

2.1    Purely random model
In the purely random model, the probability that node j gets linked from a θ-type node born at
time t + 1 is simply given by the joint probability that node t + 1 is of type θ, p(θ), times the
probability it finds j among all the other nodes of type θj who are in the network at time t + 1.
Under a mean-field approximation, the fraction of nodes of type θj at time t is tp(θj ). Thus, a
mean-field approximation is that

                                                                 p(θ, θj )
                                       Pjt+1 (θ, θj ) = n p(θ)               .                                 (1)
                                                                  tp(θj )

In the formula above, the term in brackets is multiplied by n - the number of links formed by node
t + 1.
    It is useful to express the terms of the above formula in a compact way. For all θ, θ we write

                                                               p(θ, θ )
                                           Br (θ, θ ) ≡ p(θ)            .
                                                                p(θ )

Note that the ratio p(θ,θ)) in the above expression is a measure of the bias that type θ applies
                       p(θ
to type θ , so that when this ratio is 1 there is no bias, while when it is greater (less) than 1
there is a positive (negative) bias of type θ towards type θ . In case of no bias, Br (θ, θ ) is simply
the probability of birth of a type θ node, and Pjt+1 (θ, θj ) is n times the joint probability that
the newborn node is of type θ and that node j is found by drawing uniformly at random from a
population of t nodes. In the Appendix we discuss the properties of such a bias in more detail.
   8
    See Currarini, Jackson and Pin (2009, 2010) for more details on other such models that can justify this reduced
form.




                                                        6
      Let Br denote the |Θ| × |Θ| matrix containing the            terms Br (θ, θ ):
                                                                                 
                                    Br (1, 1) Br (1, 2)             . Br (1, H)
                                                                                 
                                       .     Br (2, 2)             .     .       
                            Br ≡                                                 .
                                  
                                       .         .                 .     .       
                                                                                  
                                        .         .                 . Br (H, H)

We can decompose the matrix Br as the product of two matrices A and Q, where A may be seen
as a transition matrix of a Markov process (a Markov matrix),9 and Q is a diagonal matrix where
the diagonal is a probability vector:
                                         Br = QAQ−1 ,

with                                                                     
                                                            p(1) ...   0
                          Aθθ = p(θ, θ )       and    Q =  ... ...   ...  .
                                                                         

                                                             0   ... p(H)
    We can now rewrite (1) in compact form, to express the probability that at type t + 1 a node
of a generic θ-type node links to a generic θ -type node born at time t0 < t + 1 :
                                                            n
                                                 Pt+1 =
                                                  t0          Br .                                         (2)
                                                            t

2.2     Random-Search Model with Unbiased Search
We now introduce the search part of the network formation process. Before describing the full
model with biases in both search and random link formation, we begin with the case where the
random link formation may be biased but where the search process is unbiased.
    When nodes search among the neighbors found via the random links, the probability that node
j is found by newborn node t + 1 depends on the shape of the network that has formed up to time
t. In particular, it depends on the type-profile of neighbors of j at time t, and on the bias of the
newborn node to randomly link to such types. If search is not type-biased, each link that agent
t + 1 forms through search is drawn from a uniform distribution over the set of all neighbors of all
nodes that agent t + 1 has found at random. For this unbiased search case the following expression
is a mean-field approximation of the overall linking probability:
                                                                                  t
                                    nmr                                           λ=j   Pjλ (θ , θj ) 1
                 Pjt+1 (θ, θj ) =       Br (θ, θj ) + nms          p(θ)p(θ, θ )                            (3)
                                     t                                              tp(θ )           n
                                                            θ ∈Θ

   The first term in the right–hand side of (3) differs from (1) only because mr < 1. The second
term is the probability of node j being found through search. It is given by the number of search
  9
    In Appendix A we derive some general results on Markov matrices that will be useful in Appendix B, where we
prove our propositions.


                                                        7
links (nms ) formed by the node born at t + 1, times the sum, over all possible types θ , of the
probabilities that j is found through a node of type θ . For each possible type θ , this probability is
given by the joint probability of the following events (corresponding to the four terms in the first
summation over types): i) the newborn node is of type θ; ii) it forms a link with a θ -type node;
iii) that the θ -type node has linked to j since j was born;10 iv) among the n neighbors of this
θ -type node, that exactly j is found.
    We can again express (3) for all possible types in a compact way using the matrix Br :
                                                                     t
                                              nmr      ms
                                     Pt+1 =
                                      t0          Br +    Br              Pλ0 ,
                                                                           t                                    (4)
                                               t        t
                                                                   λ=t0

          t
where     λ=j   Pλ0 expresses the expected in-degree, type by type, at time t for a node born at time
                 t
t0 .

2.3     Random-Search Models with Type-Biased Search
We now describe the full model in which search can also be type-biased.

2.3.1    Type-Bias in Search

In this variant of the model, the bias in search affects the choice of randomly found nodes that are
to be used to find the additional nms search connections. Here, each new node directs a fraction
of the nms search links to each given type of node found through the nmr random links. The
bias in how these nodes are selected can differ form the bias in the random process. This bias is
described via an H × H matrix where each element is positive and of the form Bs (θ, θ ).11 A value
of 1 indicates no bias, while a value greater (less) than 1 indicates a positive (negative) bias of type
θ towards type θ . This is the distortion in the relative probabilities that θ searches its randomly
found neighbors. The mean-field approximation of the process is described by

                                                          H                             t
            Pjt+1 (θ, θj ) =    nmr
                                 t Br (θ, θj )   +   ms
                                                      t   θ =1 Br (θ, θ   )Bs (θ, θ )   λ=j   Pjλ (θ , θj ) .   (5)

    The product Br (θ, θ )B s (θ, θ ) in (5) describes the probability applied by type θ to the selection
of random nodes in search of type θj . Note that the bias is independent of both time variables j
and t, and of the type θj of the target. We discuss this in more detail, for the case of two types, in
the Appendix.
  10
     Note that this ratio has the total (expected) number of links received by agent j from θ agents up to time t as
numerator, and the total number of θ nodes in the system at time t as denominator
  11
     There are constraints on this bias matrix to have the resulting output be well-defined probabilities, but much
can be deduced for general forms of the matrix, and so we only specify the (obvious) constraints as they become
necessary.




                                                          8
   In matrix form, the system becomes:
                                                                             t
                                          nmr      ms
                                 Pt+1
                                  t0    =     Br +    (Bs           Br )           Pλ0 ,
                                                                                    t                              (6)
                                           t        t
                                                                            λ=t0

where is the Hadamard product: (Bs             Br )(ij) ≡ Bs      (ij)   · Br    (ij)   . Note that, from the decompo-
sition Br = QAQ−1 , it follows that

                           Bs      Br = Bs      QAQ−1 = Q (Bs                     A) Q−1 ,

where the biases are such that Bs A is still a Markov matrix.
   We refer to this variation of the model as RSB, for random-search with biased search.

2.3.2   Type-bias on targeted nodes

While we focus on the above-specified formulation of search-bias in what follows, we remark that
search could also be biased in other ways. In particular, a newborn node has linked to a set of
nodes via the random part of the process and then searches those nodes’ neighborhoods to find
new nodes with whom to link. It can bias this process in two ways: it might be biased in terms of
which of its randomly found neighbors neighborhoods it searches through, and it might be biased
in terms of whether it evenly samples neighborhoods or biases the sampling process. Above we
have focused on the first form of bias. For completeness, we note the formulation for the other
form of bias, which is a variation on that above.
    In this variant of the model, the bias in the search process affects directly the probabilities that
nodes of the various types are found out of the pool of out-neighbors of nodes found through search.
The system of equation (3) is modified to allow for such bias as follows:

                                                         H                   t
                                 nmr               ms
              Pjt+1 (θ, θj ) =       Br (θ, θj ) +             Br (θ, θ )          Pjλ (θ , θj )Bs (θ, θj ) .      (7)
                                  t                 t
                                                        θ =1                λ=j

The search-bias parameter Bs (θ, θj , θ , λ, t) of type θ towards type θj . Rewriting:
                                                                             t
                                          nmr      ms
                                 Pt+1 =
                                  t0          Br +    Bs            (Br            Pλ0 ),
                                                                                    t                              (8)
                                           t        t
                                                                            λ=t0

When Bs (θ, θj ) > 1 the expected number of links that a given θj node receives from a newborn
node of type θ is larger than what it would receive if search was unbiased. Since this applies to all
nodes of type θj , it implies that a newborn node of type θ will form a fraction of its search links
with nodes of type θj that exceeds what is the share of type θj nodes in her distance 2 neighborhood
after the random part of our process. Similar but opposite considerations apply to the case in which
Bs (θ, θj ) < 1.
    The justification for this formulation of search bias is discussed in more detail in the Appendix.
We refer to this variation of the model as RSBT.

                                                        9
2.4   A Mean-Field Continuous Approximation
We study a continuous time approximation of the model, using the techniques of mean-field theory.
This provides approximations and limiting expressions of the process that ignore starting conditions
and other short-term fluctuations that can be important in shaping finite versions of the model,
and so the results must be viewed with the standard cautions that accompany such approximations
and limit analyses. We consider the expected change in the discrete stochastic process as the
deterministic differential of a continuous time process. We define
                                                                    t
                                                     Πt0
                                                      t        ≡          Pλ0 .
                                                                           t
                                                                   λ=t0

With a continuous approximation:
                                          ∂ t
                                            Π = Pt+1 .
                                          ∂t t0     t0

We study equations (2), (4) and (6) in terms of ordinary differential equations in matrix form:
                               ∂ t               n
                                 Π          =      Br ;                                                           (9)
                               ∂t t0             t
                               ∂ t               nmr                      ms
                                 Π          =         Br +                   Br Πt0 ,                            (10)
                               ∂t t0               t                       t     t

                               ∂ t               nmr                      ms
                                 Π          =         Br +                   (Bs Br )Πt0                         (11)
                               ∂t t0               t                       t          t

with the initial condition
                                                           Πt0 = 0.
                                                            t0

Summarizing, (9) refers to the purely random model (R), (10) refers to the random–search model
with unbiased search (RSU ), and (11) refers to the random–search model with search biased on
intermediaries (RSB ).
    If Br is invertible (so that the specification of types is not redundant) the unique solutions to
these differential equations are, respectively, the following:
                                                  t
                      Πt0
                       t     = nBr log                     ;                                                     (12)
                                                 t0
                                                         ms Br
                                 mr              t
                      Πt0
                       t     = n                                    −I ,                                         (13)
                                 ms             t0
                                                                                      ms (Bs Br )
                                     mr                                          t
                      Πt0
                       t     = n        (Bs           Br )−1 Br                                     −I .         (14)
                                     ms                                         t0

where a constant to the power of a matrix is defined as follows:
                                                                                                         µ
                                                                            ∞                   t
                         t   ms Br      “
                                            ms log
                                                     “
                                                          t
                                                              ” ”
                                                               Br
                                                                                     ms log    t0   Br
                                     =e                  t0
                                                                        =                                    .   (15)
                        t0                                                                    µ!
                                                                            µ=0



                                                                   10
Example 1 The 2–types symmetric random model (R).

                                                                                   π      1−π
Consider a symmetric setting in which n = 2, p(1) = p(2) = 1 , mr = 1, Br =
                                                           2                                      .
                                                                                 1−π        π
This is a purely random model R. We implement equation (12) to find the expected in–degree of a
node born at t0 : its in–links from same–type nodes follow equation 2π log tt0 , while its in–links
from different–type nodes follow equation 2(1 − π) log tt0 . Note that there is a rapid fall-off in
the growth of the in-degree of a given node over time, as it only grows logarithmically due to
competition from other nodes and fixed out-degree of newborns.

Example 2 The 2–types symmetric random–search model with unbiased search (RSU).

Let us contrast previous example with a random–search model with unbiased search RSU, with
mr = ms = 1 .12 In this case the outcome of equation (13) is that for a node born at time t0 the
             2
in–links from same–type nodes follow equation
                                                                 π− 1
                                               t        t           2
                                                 +                      −2 ,
                                              t0       t0
while those from different–type nodes follow equation
                                                                   π− 1
                                                 t           t        2
                                                   −                      .
                                                t0          t0
Here we see that the growth of in-degree falls off more slowly than in the purely random model, as
in-degree grows proportionally to the square root of t rather than log of t. This reflects that fact
that as a node acquires more in-links, it becomes easier to find, which counter-acts some of the
competition from younger nodes.

Example 3 The 2–types symmetric random–search model with biased search (RSB).

Consider a symmetric RSB model with the same parameters as in the previous examples, and
                                     1−σπ
                               σ
matrix of biases: Bs =       1−σπ
                                      1−π    .13 In this case the explicit outcome of equation (14) is
                              1−π      σ
that for a node born at time t0 the in–links from same–type nodes follow equation (we write t
instead of t/t0 , normalizing t0 to 1)
                                 √         1 2π − 1       σπ + π − 1
                                   t + tπ− 2          −2               ,
                                             2σπ − 1       2σπ − 1
while those from different–type nodes follow equation
                                   √         1 2π − 1       σπ − π
                                     t − tπ− 2         −2            .
                                               2σπ − 1     2σπ − 1
The same differences in comparison with the purely random model hold also for this model.
 12
      A more general 2 × 2 case is treated in Appendix D.2.
 13
      In Appendix D.3 we discuss why Bs must depend on Br and a more general 2 × 2 case.


                                                        11
3         Integration
In all the network formation processes described in the previous sections, links are formed with
some degree of type-bias. As we said, this can be though of as a reduced form of agents’ preferences
on potential partners and/or if the biases they face in their meeting opportunities. From a welfare
perspective it is then important to understand how these biases in linking behavior translate into
biases in the realized connections of agents. In particular, in this section we study under which
conditions and to what extent the type-biases in the in-degree of agents tend to vanish in the long
run. To this aim we propose various definitions of integration, that reflect varying degrees to which
the dynamic process weakens the biases in the way agents form links.

3.1         Weak integration
The first notion that we consider requires that there exist a large enough “age” difference such that
an older node receives a link from a newborn node with higher probability than a younger node,
independently of the types of the nodes involved.

Definition 1 The network formation process satisfies the weak integration property if for every
t0 , there exists t > t0 such that, for all t ≥ t and for all θ ∈ Θ, the node born at time t has a lower
probability than node t0 to receive a link from a node of type θ born at time t + 1.

   Under this condition an old enough node of type θ ends up receiving a link from a newborn
node of type θ with a higher probability that a young enough node of the same type θ as the
newborn.
   It is clear that the basic random model R does not satisfy this property. We show instead that
the random-search models satisfy this property, even with a bias in search.

Proposition 1 If mr < 1, both versions of the random-search model, with or without biased search
(RSU, RSB), satisfy the weak integration property.14

    The proof (which appears in the appendix) shows that Property 1 is not specific to the ho-
mogeneous search model. Indeed, various models in which the in-degree of a node determines the
probability of being found by a newborn node in a sufficiently increasing manner would give the
same result. This can be directly checked in both models of type-biased search of Section 2.3–2.4.
Moreover, search is not needed for this type of dependence to take place. In the conclusions we
discuss another mode with “type-biased” preferential attachment in which the probability of re-
ceiving a link is positively correlated with a node’s in-degree, and which exhibits the same weak
integration property.
    14
         In fact, this also holds for RSBT, as we show in the appendix.



                                                             12
3.2    Long-run Integration
A stronger notion of integration requires that eventually nodes attract connections according to
the population shares of types. Definition 2 introduces this requirement for nodes of sufficiently
old age. In other words, in the long run any surviving difference in the proportion of links received
by old nodes from different types is only due to the distribution of types in the population, and
not on the biases in preferences or on the type of the receiving node.

Definition 2 The network formation process satisfies the long-run integration property if for
every node t0 the ratio of each type θ in the in-degree of t0 converges to θ’s populations share as
node t0 ages.

   Long run integration is not satisfied by the basic random model R. The next proposition shows
that this property is satisfied by the random-search model with unbiased search RSU.

Proposition 2 If mr < 1, the random–search model with unbiased search (RSU) satisfies the long–
run integration property, while the random–search model with nontrivially biased search (RSB) does
not.15

    The intuition behind this result is as follows. First, it is clear that if the search part of the
process is biased then long-run integration cannot hold, since as nodes age, the relative fraction of
in-links that they obtain through the search part of the process begins to dominate. This happens,
since the number of neighbors through whom they can be found grows and so it is relatively easier
to find an older node via search compared to a younger node, while there is no advantage to age
in the random part of the process.16 So let us examine why it is that as search begins to dominate
and is unbiased, then long-run integration holds. A given node can be found by a newborn node of
a different type via search in different ways: one is that the newborn finds a neighbor of the given
node that is of the same type as the newborn, and the other is that the newborn finds a neighbor
of the given node that is of the same type as the given node. The first way is relatively more likely
given the bias in the random part of the search, but the fact that this can also occur via the second
route then leads this process to be less biased than the original random part of the search process.
So, the neighborhood of an older node then tends to start being less biased than the original search
process. Once it is less biased, this makes it even easier to be found by nodes of the opposite type,
and so the neighborhood becomes even less biased, and this continues with the limit reflecting an
  15
     “Does not” means that for generic specifications of the bias process, long run integration fails. However, there
can be specific (nongeneric) where integration holds, even with non-trivial bias. See the discussion of Examples 3
and 4.
  16
     A node faces increased competition from other nodes over time in both parts of the process and indeed the
growth of in-degree slows down over time, but a given node gains a greater relative advantage in the search part of
the process over time which then comes to dominate.



                                                        13
unbiased process. As mentioned in the introduction, as a node ages it becomes more of a “hub”
node (very loosely speaking), with many in-links. As its incoming bias becomes lower, then it leads
to even less of a bias as it becomes increasingly easy to find by other type nodes, and eventually
the bias in the process disappears.
    Let us analyze whether long-run integration occurs in Examples 1–3 from Section 2.4. In the
purely random model of Example 1 the ratio of same type in-links to different type in-links is
π/(1 − π) and it remains constant over time, implying that long-run integration does not occur
unless there is no bias (π = 1/2). In the RSU model of Example 2, the ratio between in–links from
same-type nodes and in–links from different type nodes converges to 1 as t grows relative to t0 ,
reflecting the ratio of population shares and indicating that long run integration occurs. In the RSB
model of Example 3, long-run integration holds due to the symmetry of the matrix Br (and all of
the other symmetry in the example). As next example makes clear, Example 3 is “non-generic” and
more generally long-run integration fails if there is biased search that does not happen to exactly
balance across types.

Example 4 An asymmetric 2–types random–search model with biased search (RSB).

Consider an RSB model with n = 2 types that are equally likely p(1) = p(2) = 1 . Suppose that
                                                                                   2
                                                                                     1/2 1/2
there is no bias in the random part: from the specification in (1) we have that Br =          .
                                                                                     1/2 1/2
                                                                        1                   1    1
Finally, let us assume that the search part is such that mr = ms =      2   and Bs =                ,
                                                                                          2−σ σ
with 1 < σ < 2, which means that only the second type has a self–bias in the search part. In this
case the outcome of equation (14) is that (we write t instead of t/t0 , normalizing t0 to 1)
                           √ 2−σ         σ−1
                                               1
                                                           √ 1         σ−1
                                                                            1
                                                                                
                                t 3−σ + t 4 3−σ − 1           t 3−σ − t 4 3−σ
                    Πt = 
                                                                               
                       1
                                √ 2−σ                    √ 1
                                                                                
                                            σ−1                      σ−1
                                                2−σ                      2−σ
                                   t 3−σ − t 4 3−σ         t 3−σ + t 4 3−σ − 1
                                  √
In the long–run the terms with t dominate: a type 1 node ends up receiving same-type links with
a ratio 2 − σ compared to different-type links; whereas a type 2 node ends up receiving same-type
                     1
links with a ratio 2−σ . As types are equally represented in the population, long–run integration
would require that both those ratios be 1, which holds only in the case of no bias σ = 1.
    Note from Examples 3 and 4 that long-run integration holds in the RSB model when the
Hadamard product of the matrices Bs and Br is either symmetric or equals the matrix Br . So, in
the 2 types case, the bias in search must either equal the bias in the random part (that is, the RSU
model), or it must be symmetric. In these two (non-generic) cases, the random bias disappears in
the long run once the search part of the process takes over. For the 2 types case this is immediately
implied by equation (i) in the Proof of Proposition 2, in which the proportions of in-connections
from the various types are derived explicitly.

                                                 14
3.3    Partial Integration
Even when full long-run integration does not occur, it is still of interest to understand whether
time has the effect of integrating the system at least to a partial extent. More precisely, we can ask
whether the in-degree of an agent becomes more and more “homogeneous” as time elapses, leading,
as we know, to full integration in the limit when search is unbiased. This type of ”monotonicity”
property may hold even under biased search, although, as we have seen, long run integration does
not occur in this case.

Definition 3 The network formation process satisfies the partial integration property if for
every node t0 the fraction of each type θ in the in-degree of t0 is weakly closer to θ’s population
share at time t than at time t, for t > t, and strictly closer for some types.

    In particular we will consider the matrix A, as defined in the beginning of Section 2, and its
biased analogous Bs A for the general RSB model. These Markov matrices represent the biases
of the random parts cleaned form the effect of size of the different populations.
                                                                                ¯
    Consider a Markov matrix M. As formally stated in Appendix A, if we call M ≡ limµ→∞ Mµ ,
we say that M satisfies a monotone convergence property if, for every pair i, j ∈ {1, . . . , H}, and
                                µ
for every µ ∈ N, the element Mij satisfies:
               ¯                 µ     µ+1  ¯
   1. if Mij > Mij , then Mij ≥ Mij ≥ Mij ≥ Mij ;
               ¯                 µ     µ+1  ¯
   2. if Mij < Mij , then Mij ≤ Mij ≤ Mij ≤ Mij .

    The monotone convergence property captures the idea that transition probabilities are monotone
over time. Even with a strictly positive transition matrix, this condition does impose additional
restrictions.17 It is beyond the scope of this paper to find general or even necessary conditions
for monotone convergence of Markov matrices. This monotone convergence property is a sufficient
condition for us to prove the following result.

Proposition 3 In a random-search model RSB, with mr < 1 and such that Bs                               A satisfies the
monotone convergence property, then the partial integration property holds.

    Note that the RSU model is a particular case of the RSB model where Bs is a matrix of all 1’s.
In this way the hypothesis of the monotone convergence property that must hold for Bs A in the
statement above can be applied directly to A in the RSU model. It is easy to check directly that
the partial integration property holds for both Example 2 of the RSU model, and Examples 3 and
4 of the RSB model.
  17
    Just as a simple illustrating example, consider a Markov process with three states where it is mostly likely to go
from state 1 to state 2, and 2 to 3, and 3 to 1, but with small probabilities of transitioning directly to other states
too. Then in one period going from 1 to 2 is likely, but then it is unlikely to occur in two periods or three periods, but
more likely in four periods, and so forth. Things eventually converge to equal likelihood on all states, but convergence
is not monotone. One can also find such examples that are more complicated where homophily is present.


                                                           15
3.4      Homophily bias and linking patterns in the RSU model
Having established some general results on the various notions of integration, we now use the
framework of unbiased search to study the patterns of homophily that stem from biases in favor of
same-type links in the matrix Br . Working with a simple example with two types, we examine the
effect of homophily on both the number of in-connections and on their composition, in the short
and long run. Let both types 1 and 2 come with equal population shares p(1) = p(2) = 1 . Let 2
type 1 have a strong homophilous bias, and let type 2 have no bias at all, with a particular matrix
of linking probabilities p(θ, θ ):
                                                       5/6 1/6
                                                                      .
                                                       1/2 1/2
Finally, let n = 10, mr = .5 and ms = .5.
    Figure 1 illustrates the evolution of the in–degree of a representative node born at time t0 =
1000, decomposed by the type of the originating node. It is clear from the figure that homophilous
nodes (type 1) receive more links from both types than the non–homophilous nodes. Intuitively,
type 1 nodes face a higher probability than type 2 nodes of being found at random by type 1 nodes,
and equal probability of being found at random by type 2 nodes, and end up attracting more links
in total.
    Note that this is still consistent with long-run integration. Even though nodes’ in-neighborhoods
homogenize over time, which then means that they face relatively even probabilities of being found
eventually by different sorts of nodes, there is an initial advantage to the homophilous types.
Homophilous types have a higher probability of being found at random, and thus accumulate in-
links at a faster rate in the early part of their lives. Thus, they have a larger in-degree at an
earlier stage in their lives than the unbiased types. The fact that the probability of gaining new
links through search ends up being proportional to in-degree, means that they then maintain this
gap. Thus they grow faster over time, since growth is proportional to in-degree and they gain an
initial advantage which then persists over time, even though the composition of their in-degrees
homogenizes.
    Note that this relation between homophily and total connections is consistent with the dis-
tributions found by Shi, Tseng and Adamic (2009) in the context of scientific citations between
different fields in computer science.18 There, for an average scientific paper, being homophilous
in out-citations is positively correlated with the total number of in–citations. This was explained
by the authors in terms of the positive effect of a homophilous attitude of a given paper, in citing
other papers, on its success in terms of in-citations. In our model the same correlation is explained
instead as the effect of the positive correlation between the homophilous behavior of a paper of
type θ and the behavior of the future papers of the same type. Being part of a homophilous group
favors the process of accumulation of in-connections.
 18
      See Section 4 for an analysis of the application of our model to scientific citations.


                                                            16
       700                                                   250

       600
                                                             200
       500

       400                                                   150

       300
                                                             100
       200
                                                              50
       100


                   104         105         106         107                104         105         106         107




 1: In–links by type of receiver (left panel for type 1, right panel for type 2) and of sender (dotted
for type 1)


    Turning now to the composition of in-connections, we note that nodes of the homophilous type
display a relatively lower degree of homophily (measured as the share of same type in-connections
on total connections) than nodes of the unbiased type. This is clear from Figure 2, in which it is
also shown that the difference in the in-degree composition of the two types integrates in the long
run, consistently with proposition 2. The reason for this somewhat subtle pattern has to do with
the effect of homophily bias on total connections. Homophilous nodes attract more links from both
types, but with a larger proportion from the opposite type at earlier times, while unbiased nodes
end up attracting fewer links, mostly from same-type nodes in the earlier periods.
                                                             0.75

       2000
                                                             0.70

       1500
                                                             0.65

       1000
                                                             0.60


        500
                                                             0.55



                 104     105         106         107   108          104         105   106   107         108   109




 2: Left panel: total in–links for a type–1 node (dotted line) and for type–2 node. Right panel:
share of in–links that are from own type (dotted line for type 1)


    In general, one can show that in the two type case the more homophilous type always receives
more connections from both types in the long run. To see this, we need to use the proof of Proposi-
tion 2 (contained in the appendix), where it is shown that the expected number of connections that
a node of type i receives in the long run from a given type j is proportional to the i − th element

                                                        17
of the unit eigenvector of the matrix A containing the meeting probabilities p(θ, θ ). Letting i = 1
be the homophilous type with probability p(θ1 , θ1 ) = π > 1 , the first element of this eigenvector
                                                             2
              1                                          2−2π
is equal to 3−2π , while the second element is equal to 3−2π . It can easily checked that the first
                                          1
element is larger than the second for π > 2 , and that the difference is increasing in π.

3.5   Aggregate long-run integration
The long-run integration properties described in the previous sections apply to individual nodes,
who eventually homogenize their in-degree. A different question concerns the long-run overall
relation among different types: what is the long-run average in-degree of a given type of nodes
from any other give type? For a formal answer we need an additional definition.

Definition 4 The network formation process satisfies the aggregate long–run integration prop-
erty if the average fraction of in-degree from nodes of various types of all the nodes of the network
of any given type converges to the actual ratios of the overall population.

    Definition 4 applies when the overall populations of different types integrates on average in the
long–run. It is clear that in the simple random model R proportions are fixed and are described by
the matrix Br , so that the aggregate long–run integration property coincides with the (individual)
long–run integration property, and they both hold only under a specific and non–generic case.
    The (individual) long–run integration (Definition 2) is different from the aggregate long–run
integration (Definition 4). Thus, the qeustion is how quickly the aggregation happens, since the
aggregate property requires that long-run integration must occur for most nodes. We now show
that the unbiased random–search model does not satisfy aggregate long–run integration.

Proposition 4 The random-search model with a bias in the random part of the process but unbi-
ased search ( RSU) does not satisfy the aggregate long–run integration property.

    The intuition behind this results is straightforward. Although in the long-run any given node
eventually becomes integrated, there are many relatively young nodes in the system for which their
in-degree is still mostly formed via the random part of the process. In fact, we can see this also
from the out-degree which is always biased for at least the mr fraction of the links formed directly
at random. Even if the search overcomes the other part of the bias, a given fraction of links are
formed in a biased manner, and so integrating over all nodes, in-degree will still be biased over
time.

3.6   On the Dynamics of Out-degrees
So far we have focused on the dynamics of agents’ in-degree. It is of interest to also look at the
composition of the out-degrees, and how this evolves in time. This for two reasons. First, out-links


                                                18
may affect welfare and their composition may therefore be relevant. Second, there is a relation
between the evolution of the out-degree of nodes and the tendency of in-degrees to integrate (either
partially or totally).
    We first look at the steady state composition of the out-degree and we focus on the RSU model.
Let us denote by dij,t the proportion of total links that originate from a node of type i born at time
t that are directed towards nodes of type j. The evolution of these proportions in the RSU model
is given by:

                                                                      H                   t
                                                                                          τ =1 dhj,τ
                           dij,t+1 = (1 − ms )Br (i, j) + ms                Br (i, h)                       (16)
                                                                                             t
                                                                      h=1

   The out–degree depends on the random part (first term) and on the search part (second term)
through the average out–degree of existing nodes. In matrix form, the steady state relation is
written as follows:

                                                                                t
                                                                                τ =1 Dt
                                   Dt+1 = (1 − ms )Br + ms Br           .                          (17)
                                                                    t
    To get a feeling for the limit of this process, it is useful to examine the steady state Ds of this
system. The steady-state is such that the out-degree of each type remains unchanged in time:

                                         Ds = (1 − ms )Br + ms Br Ds ,                                      (18)

         yielding

                                        Ds = (1 − ms ) (I − ms Br )−1 Br .                                  (19)

         Using the algebraic identity
                                                                 ∞
                                                       −1
                                        (I − ms Br )        =         (ms Br )µ ,
                                                                µ=0

         we obtain the following expression:   19

                                                                               
                                                                ∞
                                                   1 − ms
                                    Ds = B                          (ms A)µ  B−1
                                                     ms
                                                            µ=1
.
    In the above expression, the matrix in brackets is such that, as ms → 1, the elements of each
column homogenize (see Lemma B of Appendix A). However, full homogeneity only occurs at the
limit ms → 1.
    19                                                       ¯
    Note that the matrix we obtain coincides with the matrix D, defined in the Proof of Proposition 4 dealing with
the aggregate in-degree of types.


                                                            19
    To obtain some insight on the time evolution of the out-degree, let us express equation (20) as
a differential equation, and solve it explicitly (as we have done in (10) and (13) for the in-degree).
    The system is

                               ∂                           ∆t
                                  ∆t = (1 − ms )Br + ms Br    .                                  (20)
                               ∂t                          t
   with solution:

                                            ¯
                                       ∆t = Dt + Ctms Br ,
    where C is a constant matrix.
    For a given initial condition D1 (that we can identify with the matrix A of biases) the solution
for Dt can be written as:
                                       ∂        ¯   1        ¯
                              Dt =        ∆t = D + (D1 − D)tms Br ,                             (21)
                                       ∂t           t
       ¯
where D is a constant term. For ms < 1, the second term approaches the null matrix as t → ∞. As
                                                               ¯
long as the matrix D1 is more biased than the steady state D (which is true for D1 =A), the bias
in excess of the steady state decreases with time, vanishing out in the long run (see also Appendix
A).
    This means that the biases in the out-links formed by agents decrease over time, consisten with
the homogenization of the search process and the in-degree of older nodes which are dominating
the search part of the process. However, unlike the case of the in-degree of old nodes, full homog-
enization does not occur even in the limit, since the random part of the out-degree formation does
not vanish as time grows.

Example 5 Out–degree dynamics for the 2–types symmetric case.

Consider the symmetric RSU case discussed in Example 2, with a self–bias 1 < π < 1. In that
                                                                             2
                                           1    1−π
                               ¯        3−2π  2 3−2π
case, if we define the matrix D =          1−π    1
                                                       , we obtain that the matrix of out–degrees
                                       2 3−2π 3−2π
follow the following functional form in matrix notation:
                                                         1      1
                                       3        π−                   −π
                           Dt = D + t− 2 +π
                                ¯
                                                  1
                                                       3−2π   3−2π
                                                                       1
                                                                            .
                                                3−2π   −π π−         3−2π
            1
As 1 < 3−2π < π for every 1 < π < 1, there are two things to notice:
     2                        2
                ¯
(1) Matrix D is a self–biased stochastic matrix (i.e. it has greater elements on the diagonal), but
it is less biased than Br .
(2) Dt is also a stochastic matrix. The second matrix on the right has positive elements on the
                                                                                     ¯
diagonal, while the other two are negative, so that Dt is more self–biased than D. However, this
                                      3
additional bias vanishes to 0 with t− 2 +π . Finally, D1 = Br and Dt is less biased than Br for t > 1,
              3
because t− 2 +π is 1 for t = 1 and then it decreases to 0.

                                                 20
4    An illustration using scientific citations
In this section we use our random-search model to study the patterns of cross-field scientific citations
in physics.
    The use of scientific citation is motivated by several factors. First, there is a large body of
literature that shows how key aspects of the time evolution of citations can be captured by models
in which some sort of preferential attachment mechanism is at work. The existence of a cumulative
effect of time was found by Price (1976), and then by Radner (1998) for ISI papers and by Newman
(2008), showing that older papers effectively enjoy a first mover advantage in receiving citations,
independently of the intrinsic quality of the paper. Although some bias in favor of recent papers
seem to allow for a better fit of certain datasets (see Borner, Maru and Goldstone (2004) and
Simkin and Roychowdhury (2007)), the evidence of a rich-gets-richer mechanism seems sound. In
addition, Simkin and Roychowdhury (2005) have shown that this evidence is best accounted for
when preferential attachment is generated by a random-search mechanism as the one we use in this
paper, where in looking for a citation authors first randomly select papers, and then look at these
papers’ reference lists to randomly pick additional citations.
    There is less evidence on the patterns of citations across disciplines or across other types of
categories in which research may be organized. Among these, several works have shown that
geographical distance and countries boundaries is one important determinant of citations patterns,
while Lehman, Lautrup, and Jackson (2003) have shown that citations patterns are quite uniform
across sub-fields in the high energy physics dataset (SPIRES). Also, Shi, Tseng and Adamic (2009)
find a relation between the homophily in citing other papers and the total citations received by
computer science papers (we have discussed this in Section 3.4).
    Summing up, the generative process of citations possesses all the basic aspects of the network
formation process studied in this paper. First, it is a growing network process, since new papers are
written in chronological order, and old papers do not vanish or die. Second, citations are directional,
and only citations from newer to older nodes are possible. Third, citations never disappear, and
accumulate over time. In addition, and specifically to our mode, nodes have “types”, that we
identify with the scientific classification of a paper (see below for details). Finally, a key element
of our process is that links are formed both at random and by search through established links. In
the case of citations, these two channels of search are present, since one can distinguish between
citations that come from direct knowledge of a paper from citations that originate from the list of
references of other papers that one has read. So, all the key elements of our formal analysis are
present, and this illustration can be used to test our integration results, and to learn more about
the generative process of citations in general.
    We use the American Institute of Physics (AIP) citations dataset, which reports all the papers
published in journals of the AIP between 1977 and 2007. There is a total of 241749 papers and
1982689 citations (8 citations on average). Around 10 per cent of the papers are never cited, while


                                                  21
the most cited one receives around 3700 citations).
   Types are defined by the first digit of the PACS classification code:

       00:   General;
       10:   The Physics of Elementary Particles and Fields;
       20:   Nuclear Physics;
       30:   Atomic and Molecular Physics;
       40:   Electromagnetism, Optics, Acoustics, Heat Transfer, Classical Mechanics, Fluid Dynamics;
       50:   Physics of Gases, Plasmas, and Electric Discharges;
       60:   Condensed Matter: Structural, Mechanical, and Thermal Properties;
       70:   Condensed Matter: Electronic Structure, Electrical, Magnetic, and Optical Properties;
       80:   Interdisciplinary Physics and Related Areas of Science and Technology;
       90:   Geophysics, Astronomy, and Astrophysics.

    We first note that the time profiles of types’ population shares, measured, for each type and
for each year, as the proportion of the total papers published during that year that are of that
given type, is somewhat stationary during the whole period (see Figure 3).20 The approximate
stationarity of most categories is roughly in line with our assumption in the theoretical model that
probabilities of birth of various types are time invariant.
    In order to identify the various elements of our theoretical model, we need to distinguish citations
that originate from a direct random draw from the pool of all existing papers (“random” citations)
from those that originate from a search process that goes through the references contained in one’s
random citations (“search” citations). To do this, we proceed as follows. We first identify a citation
from paper A to paper C as a “search” citation if there exists some paper B with the following
properties: 1) B is published before C and after A, 2) A cites B, and 3) B cites C.
    This method obviously has some degree of arbitrariness and will not perfectly identify how the
authors found the papers they cite. The bias of this simplification is however not clear. At one
side, it overstates the weight of “search” in the citation process, since A may well cite C because
C is an important paper in the field, reason for which also B cites C, without A having known
about C though B. On the other side, however, it could be that authors of paper A know about
paper C only because they came into paper B, which cites C: they could decide to cite only C
because it contains an older version of the same idea. It could also be that some papers are found
through the search process, without the authors ever citing the intermediate paper, and so some
  20
    The only two sharp changes in the time profiles are around 1990 for type 10 (Physics of Elementary Particles
and Fields) and type 70 (Condensed Matter: Electronic Structure, Electrical, Magnetic, and Optical Properties).
These changes are explained by looking at more detailed classification of types. The increase of type 70 is driven
by the sharp increase in the subcategory 74 “Superconductivity”, to be put in relation with the fast development of
the computer industry; the sharp decrease of type 10 is mainly driven by a decrease in the subcategory 11 “General
theory of fields and particles”.


                                                        22
                .4
                .3
                .2
                .1
                0




                     1970         1980            1990            2000            2010
                                                  year

                                             type_0          type_1
                                             type_2          type_3
                                             type_4          type_5
                                             type_6          type_7
                                             type_8          type_9




                              3: Shares of types’ proportions in time


citations are coded as random even though they were found through search. We stick with the
strict interpretation of the model, given that we have no other way of identifying the actual process
that the authors followed.
    Using this method we identify 59 percent of total citations as “search” citations. We then
classify the remaining 41 percent of citations as “random” citations, being the complement of the
“search” citations.

4.1   Homophily Bias in Random Out-Citations
In order to identify the bias in the random part of the process, we compare the share of “random”
out-citations that are of the same type of the citing paper with the population share of the type
of the citing paper. The first share (qout in table 1) is obtained by averaging the share of random
same-type out-citations of all papers of a given type during the whole time period. The second
share (w in table 1) is obtained as the share of papers of a given type over all papers in the sample
for the whole time period.
    The difference between these two shares is positive for all types, with maximum value of about
0.8 for type 2, minimum value is 0.33 for type 80 (Interdisciplinary physics), and average value of
0.63. Normalizing, for each type, this difference by the the maximal potential difference given by
one minus the population share of the type, we obtain the Coleman (1958) homophily index of each


                                                 23
                      00      10       20     30     40       50      60      70     80      90
              qout   0.67    0.85     0.87   0.72   0.64     0.77    0.64    0.86   0.35    0.67
               w     0.11    0.11     0.08   0.08   0.06     0.016   0.14    0.35   0.02    0.03
               ih    0.63    0.83     0.86   0.70   0.62     0.76    0.58    0.79   0.33    0.66


                                   1: Same-type bias in the overall citations.


type (ih in table 1).21 This index turns out not to be correlated with types’ population shares.

4.2    Search Bias, Long-run integration, and Partial Integration
One challenge with an empirical investigation of the various concepts of integration is that certain
papers happen to be intrinsically more cited than others, simply because they are more fundamental
or important than others for their discipline. This type of “fitness” is independent of the age of
the paper, and is not modeled in our analysis. More importantly, it could potentially outweigh the
effect of time, and of the large in-degree that older nodes accumulate in time, which is one of the
forces behind the long-run integration property.
    We deal with this problem by looking at the type-composition of the τ –th citation of each
paper, thereby replacing time with citation order. This allows us to normalize the time–scale of
each single paper, as if they all had the same fitness. In this new context, the hypothesis we
are testing is whether the homophily of the in-degrees of a paper decreases with the order of its
in-citations, getting close to the relative size of that paper’s type as this order gets large. This is
meant to capture the main force that leads to partial integrations: the growth of nodes’ in-degree
is to a large extent composed of in-citations of the “search” type, which are, in the case of unbiased
search, less biased towards one’s own type than in-citations of the “random” kind.
    In Figure 4 we illustrate the share of same type in-citations ordered by types’ population shares.
Each dot measures on the x–axis the population share of a given type (measured as the average
over the whole time period), and on the y–axis the average value (taken over all papers of that
given type) of the share of same type in-citations out of the first τ in-citations.
    The key feature of Figure 4 is that shares of same-type in-citations uniformly decreases with
the in-degree of nodes for all types in the sample. Since the absolute levels of these shares are well
above the levels of population shares for small in-degrees, this suggests that the citation process
becomes less and less biased towards own type as in-degrees become large.
    Thus, what we observe is consistent with partial integration. In particular, this trend is con-
sistent with our theoretical analysis of the more prevalent role of search over time, provided the
  21
    This normalization has the purpose of allowing for meaningful comparison of groups of different sizes, by taking
into account the maximal potential amount of homophily that each group has. See Currarini, Jackson and Pin (2009)
for more discussion.



                                                        24
                 .6
share of same−type in−citations



                                                                                                 first 5 in−citations
                      .4




                                                                                                 first 10 in−citations
                                                                                                 first 15 in−citations
                                                                                                 first 20 in−citations
                                                                                                 first 25 in−citations
                                                                                                 first 30 in−citations
      .2




                                                                                                 45 degree line
                 0




                                     0      .1          .2        .3            .4
                                                 population share




                                     4: Shares of same-type in-citations by order of citation.
                                .6
share of same−type in−citations
         .2      0  .4




                                     0                    .05                            .1                         .15
                                                                 population share

                                                        first 5 in−citations         first 10 in−citations
                                                        first 15 in−citations        first 20 in−citations
                                                        first 25 in−citations        first 30 in−citations
                                                        45 degree line




         5: Shares of same-type in-citations by order of citation: marginal.


                                                                     25
search process is less biased than the random process. In the limit, if search were unbiased, we
should observe long-run integration, that is, the share of same-type in-citations coinciding with
the 45 degree line. This trend is not found in Figure 4, where same-type shares are significantly
flatter than the 45 degree line, and become flatter for larger degrees. Interestingly, this behavior
seems however driven by a single observation (type 20: “Nuclear Physics”), which refers to the
largest group in the sample. If we omit this single type, we obtain the trend in Figure 5, where
the regressed patterns of same-type shares uniformly approaches the 45 degree line for larger and
larger in-degrees.


5    Concluding Remarks
Our interest in this paper has been the extent to which biases in the way agents link to each other
(that is, biases in the process of network formation) translate into biases in the patterns of the actual
network (that is, in the outcome of the process). Our analysis provides one basic insight: when
some of the connections are formed through a network-based search process (friends of friends),
the type-composition of agents’ neighborhoods homogenizes in the long run, and in particular, full
integration of types occurs when search part of the formation process is unbiased.
    As we have pointed out, the mechanism at work is intuitive: as nodes age, they accumulate
more and more links through the rich-gets-richer dynamics of the search part of the process, ending
up attracting links from all types (even from those types by which they are discriminated in the
random part). Through these connections, they are found through search by even more nodes of
all types, and the mechanism reinforces itself becoming less biased over time. As time elapses, old
nodes are found by all types at rates that mirror population shares.
    Two things account for the integration in the RSU model: over time the probability of being
found at random vanishes compared to the probability of being found through search; and the in-
degree of old nodes becomes less biased, which then further reduces the biases in the probabilities
that older nodes are found by newborn nodes of various types. Both conditions are made possible
by the passing of time, which increases the total population and the in-degree of old nodes on one
hand, and homogenizes in- (and out-) degree by mixing the meeting biases through the cumulative
mechanism described in the proofs of the main propositions.
    We remark that it is not enough simply for the search part of the process to dominate, but
one also needs the gradual homogenization of that process over time, as the more an old-node gets
found by other types, the easier it becomes for it to be found by other types in the future. The
distinction between these is clear if we examine the limit as mr → 0 in the RSU model; looking at
the proof of Proposition 2, we note that the low powers of the matrix A of biases, which are not
homogeneous, still maintain weight in the average defining the in-degree as long as the age of the
node, given by the ratio tt0 stays “small”.
    If we examine a different model, then one could obtain the immediate integration of all nodes

                                                   26
as mr → 0. The difference would be that instead of having biased-random and unbiased-search,
one could have biased-random and unbiased-preferential attachment (as a variant of Albert and
Barabasi (1999)). This would parallel our model, but uncouple the search part of the process from
the bias in the randomly selected nodes whose neighborhoods are searched. To parallel the RSU
model, one can also assume that the preferential attachment part is unbiased, in the sense that
the probability of a node being found is directly proportional to its relative in-degree in the whole
population, irrespectively of its type.
   Using the same notation of the previous sections, we can express the probability of link j to
obtain a link from a θ node at t + 1 as follows:

                                                                  t
                                nmr                         θ ∈Θ Πj (θ , θj )
                     Pjt+1 (θ, θj ) =Br (θ, θj ) + nmp p(θ)                   ,                 (22)
                                  t                              nt
Using a mean-field approximation we express the change in the in-degree of node j as:

                       ∂ t            nmr               mp p(θ)
                         Π (θ, θj ) =     Br (θ, θj ) +                  Πt (θ , θj ),          (23)
                       ∂t j            t                   t              j
                                                                  θ ∈Θ

     When t grows large, the random part of the process vanishes and long-run integration occurs.
Also, as mr → 0 nodes are found almost only via a type-blind manner, and then all nodes integrate.
Note also that for large enough values of time, this would happen for nodes that have a large in-
degree irrespective of their age (due, for instance, to some “fitness” node-specific parameter).
     More research is needed to incorporate strategic elements into the link formation process. As
it is, the model represents situations in which the meeting biases come from exogenous constraints
(institutional, geographical, organizational barriers or underlying preferences), and agents cannot
affect the induced probabilities. Interesting considerations are likely to arise when such options are
allowed, and when agents anticipate the outcome of link formation on the type mix of their in-
                                                                                e
degree and on their welfare (this is also suggested by an example in Bramoull´ and Rogers (2009)).
We believe that these issues, and more general analyses of the homophily and dynamic integration
of network formation processes, lie at the heart of the research agenda in the field, and will be the
object of future research.


References
                              a
 [1] Albert R. and A.–L. Barab´si (1999), “Emerging of Scaling in Random Networks,” Science
     286, 509–512.

      o
 [2] B¨rner K., J.T. Maru and R.L. Goldstone (2004) “The simultaneous evolution of author and
     paper networks,” PNAS 101, 5266–5273.

             e
 [3] Bramoull` P. and B. Rogers (2009) “Diversity and Popularity in Social Networks”, mimeo.


                                                 27
 [4] Breschi, S. and F. Lissoni (2006): “Mobility of inventors and the geography of knowledge
     spillovers. New evidence on US data,” CESPRI WP n. 184.

 [5] Coleman, J. (1958) “Relational analysis: the study of social organizations with survey meth-
     ods,” Human Organization, 17, 28–36.

 [6] Currarini, S., M.O. Jackson, and P. Pin (2006) ““Overlapping Network Formation,” mimeo:
     Stanford University.

 [7] Currarini, S., M.O. Jackson, and P. Pin (2009) “An Economic Model of Friendship: Homophily,
     Minorities and Segregation,” Econometrica 77 (4), 1003–1045.

 [8] Currarini, S., M.O. Jackson, and P. Pin (2010) “Identifying the Roles of Choice and Chance in
     Network Formation: Racial Biases in High School Friendships”, Proceedings of the National
     Academy of Sciences, 107, 4857–4861.

 [9] Koczyy, L. S., A. Nichiforz and M. Strobel (2010) “Intellectual Influence: Quality versus
     Quantity”, Mimeo.

[10] Jaffe, A. B. and M. Trajtenberg (1996): “Flows of knowledge from universities and federal
     laboratories: Modeling the flow of patent citations over time and across institutional and
     geographic boundaries,” PNAS 93: 12671–12677.

[11] Jackson, M.O., (2007) “Social Structure, Segregation, and Economic Behavior,” pre-
     sented as the Nancy Schwartz Memorial Lecture, 2007; SSRN working paper 1530885,
     http://ssrn.com/abstract=1530885.

[12] Jackson, M.O. (2008) Social and Economic Networks, Princeton University Press.

[13] Jackson, M.O. and B. Rogers (2007): “Meeting strangers and friends of friends : How random
     are social networks?” American Economic Review 97 (3), 890–915.

[14] Lehmann, S., B. Lautrup and A.D. Jackson (2004): “Citation networks in high energy physics”
     Phys. Rev. E 68, 026113.

[15] McPherson, M., L. Smith-Lovin and J. M. Cook (2001): “Birds of a Feather: Homophily in
     Social Networks,” Annual Review Sociology 27, 415–44.

[16] Newman, M. E. J. (2009): “First-mover advantage in scientific publication,” Europhys. Lett.
     86, 68001.

[17] Palacios–Huerta, I., and O. Volij (2004) “The Measurement of Intellectual Influence,” Econo-
     metrica 72 (3), 963–977.


                                               28
[18] Pin, P. (2007) Four multi-agents economic models: From evolutionary competition to social
     interaction, PhD Thesis, University of Venice.

[19] Price, D.J.S., (1965) “Networks of scientific papers.” Science 149, 510 - 515.

[20] Price, D.J.S., (1976) “A general theory of bibliometric and other cumulative advantage pro-
     cesses.” J. Am. Soc. Inf. Sci 27, 292 - 306.

[21] Redner, S. (1998) “How popular is your paper? An empirical study of the citation distribu-
     tion,” Eur. Phys. J. B 4: 131–134.

[22] Rinia, E. J., T. N. van Leeuwen, E. E. W. Bruins, H: G. van Vuren, and A. F. J. van Raan
     (2001) “Citation delay in interdisciplinary knowledge exchange,” Scientometrics 51 (1), 293–
     309.

[23] Shi, X., B. Tseng and L. Adamic (2009) “Information Diffusion in Computer Science Citation
     Networks,” Proceedings of the Third International ICWSM Conference.

[24] Simkin M.V. and V.P. Roychowdhury(2007) “A mathematical theory of citing,” Journal of the
     American Society for Information Science and Technology, 58(11): 1661–1673.


Appendix A             Some results on Markov Matrices
This first Section of the Appendix provides some results that are necessary for the proofs of our
results. Take an H × H Markov matrix M with all positive elements, i.e. a positive Markov matrix.

Lemma A For every x > 0 the H × H matrix
                                                     ∞
                                                           xµ µ exp (Mx) − I
                           M(x) ≡ (ex − 1)−1                  M =
                                                           µ!     exp (x) − 1
                                                     µ=1

is a Markov matrix.

Proof: for every µ ∈ N, Mµ is a Markov matrix. To show that M(x) is a Markov matrix, we need
to prove that for every i, j ∈ {1, . . . , H} we have that 0 < M (x)ij < 1, and that H M (x)kj = 1.
                                                                                     k=1
The first condition comes from the fact that M (x)ij is a convex combination of (an infinite number
of) probabilities.
The second condition comes from the fact that
            H                             ∞           H                               ∞
                                                xµ                                          xµ
                 [M (x)]kj = (ex − 1)−1                     [M µ ]kj   = (ex − 1)−1            =1 .
                                                µ!                                          µ!
           k=1                            µ=1         k=1                             µ=1




                                                         29
  M(x) can be seen as a weighted average of the infinite     elements of {Mµ }µ∈N .
We know that
                                                           
                                                 v(M)
                                                   .
                                lim Mµ = 
                                                           
                                                   .
                                                   .         ,                                  (a)
                               µ→∞                         
                                                 v(M)

where the row–vector v(M) is the unique eigenvector associated with eigenvalue 1 of matrix M (up
to a normalization that it’s elements sum to one, by the Perron-Frobenius Theorem). We define
                                                                     ¯
this matrix at the limit, with all equal elements on each column, as M. Now we prove a relation
between the limit of M(x) and M. ¯

Lemma B For every positive Markov matrix A, and for every couple i, j ∈ {1, . . . , H}, we have
that
                                                           ¯
                           lim [M (x)]ij = lim [M µ ]ij = [M ]ij .
                                x→∞              µ→∞

                           ¯                                   ¯
Proof: By definition of M, for every > 0 there is a number k ∈ N, such that for every µ > k,  ¯
                                                                                   ν
we have [M µ ]ij − [M ]ij < . By driving x → ∞ we can impose to 0 the weight (ex x
                                                                                 −1)ν! of every
    ¯ In this way [M (x)]ij becomes a weighted average of almost only elements from {M µ }µ∈N ,
ν < k.
         ¯                                         ¯
with µ > k. As for all of them we have [M µ ]ij − [M ]ij < , we have the result.


Definition 5 M satisfies the monotone convergence property if, for every couple i, j ∈ {1, . . . , H},
                                  µ
and for every µ ∈ N, the element Mij has the following properties:

              ¯                 µ     µ+1  ¯
  1. if Mij > Mij , then Mij ≥ Mij ≥ Mij ≥ Mij ;

              ¯                 µ     µ+1  ¯
  2. if Mij < Mij , then Mij ≤ Mij ≤ Mij ≤ Mij .

                                                                 ¯
    What comes out directly from the definition is that, if Mij > Mij , then there is at least one µ
                                          µ     µ+1
for which the inequality is strict, i.e. Mij > Mij .

Lemma C For every couple i, j ∈ {1, . . . , H}, and for every x > 0 If M satisfies the monotone
convergence property, then

              ¯
  1. if Mij > Mij , then   ∂
                           ∂x [M (x)]ij   < 0;

              ¯
  2. if Mij < Mij , then   ∂
                           ∂x [M (x)]ij   > 0.

Proof: We focus on case 1, as the other is proven by reversing inequalities.
  First, note that the function
                                         µ x
                                           (e − 1) − ex
                                         x

                                                  30
is negative if and only if
                                                       xex
                                                 µ<          .
                                                      ex − 1
                                                                 xex                     xex
Let us call ν(x) the minimum integer strictly above             ex −1 ,   i.e. ν(x) ≡   ex −1   .
   Now we can show that
                                                     ∞
                   ∂                         1             xµ    µ x
                      [M (x)]ij   =                                (e − 1) − ex [M µ ]ij
                   ∂x                   (e x − 1)2         µ!    x
                                                     µ=1
                                                      ∞
                                             1             xµ    µ x
                                  <                                (e − 1) − ex         M ν(x)           (b)
                                        (e x − 1)2         µ!    x                                  ij
                                                     µ=1

It is a matter of calculus to check that
                                      ∞
                                            xµ   µ x
                                                   (e − 1) − ex = 0 ,
                                            µ!   x
                                      µ=1

and then the derivative in (b) is strictly negative.


Appendix B            Proofs
Proof of Proposition 1: We provide the details of the proof for the RSU case, which easily
extends to the other cases.
Note first that the node born at time t in definition 1 has, at the beginning of time t + 1 (before
node t + 1 sends its links) an in-degree of 0. This directly implies that the probability of t to
receive a link at time t + 1 from a node of type θ , given that such a node is born, is equal to the
probability of being found at random among the t nodes in the network. This probability is equal
to:
                                            nmr
                                                p(θ , θ(t )) .                                   (c)
                                              t
On the other hand, the probability that node t0 is be found at time t + 1 is the sum of the
probability of being found at random and through search. In the model with homogeneous search,
this is:
                         nmr                                   Πt (θ, θ(t0 )) 1
                              p(θ , θ(t0 )) + nms     p(θ , θ) t0               .                (d)
                          t                                     t p(θ(t0 )) n
                                                      θ∈Θ

Note that in (d) the terms in the vector Πt0 (θ, θ(t0 )) grow without bound as t tends to infinity,
                                            t
while the first terms in (d) and in (c) are constant once t is eliminated from the denominator of
both expressions. It follows that we can always choose a t large enough for (d) to be larger than
(c).

Proof of Proposition 2: We want to see how the matrix Πt0 of type–by–type links for a node
                                                             t
born at time t0 develops. To do this we compare its behavior with the behavior of the type–blind

                                                         31
process, where the in–links evolve according to22
                                                                             ms
                                                            mr          t
                                              πt0 (t) = n                         −1      .
                                                            ms         t0

To make this comparison in the long run we study

                                                                Πt0
                                                                  t
                                                            lim        .                                                             (e)
                                                           t→∞ πt0 (t)



   Consider the solution to the RSU model, as described by equation (13), with the decomposition
Br = QAQ−1 .
   We rewrite (13) as:
                                                       −1
                                 t    mr     t ms QAQ
                                Πt0 =                     −I .
                                      ms    t0

By (2.1), and the facts that I = QIQ−1 and An = QAn Q−1 , we obtain:
                                                                µ         
                                    ∞                  t
                    mr                  ms log      t0     A
         Πt0
          t     = n    Q                                            − I Q−1
                    ms                              µ!
                                    µ=0
                                                                                                                   µ     
                                                                                  −1 ∞                        t
                      mr        t    ms
                                                              t    ms                         ms log         t0
                = n                       − 1 Q                        −1                                              Aµ  Q−1 .   (f)
                      ms       t0                            t0                                         µ!
                                                                                    µ=1


      Limit (e) implies that (we use Lemma B from Appendix A)
                                                                                                           µ               
                                                                            −1 ∞                    t
                   Πt                                  t     ms                     ms log         t0
                lim t0     =    lim Q                           −1                                              Aµ  Q−1 
               t→∞ π(t)         t→∞                   t0                                      µ!
                                                                              µ=1

                           = Q  lim Aµ Q−1
                                        µ→∞
                                     
                                 v(A)
                                  .
                                   .
                                       −1
                           = Q
                                  .  Q
                                          ,                                                                                         (g)
                                 v(A)

where the row–vector v(A) is the unique eigenvector associated with eigenvalue 1 of matrix A
(normalized to sum to 1). In this way, in the long run a node of type i born at time t0 receives a
fraction of in–links from nodes of type j which is approximately a ratio

                                                                   v(A)i
                                                            p(j)
                                                                    p(i)
 22
      This process reduces to the 1–type case studied in Jackson and Rogers (2007).



                                                                  32
of the overall nodes that it would receive in a type–blind process. This proportion is the product
of p(j) times a term that is constant for type i.

   Consider now the solution to the RSB model, as described by equation (15). Following the
same procedure as above, since Bs A is still a Markov Matrix (see Appendix C). We obtain
                                                                 
                                                      v(Bs A)
                        Πt0                                .
                          t
                               = (Bs Br )−1 Br Q 
                                                                  −1
                   lim                                     .
                                                           .      Q     .               (h)
                  t→∞ πt0 (t)                                    
                                                      v(Bs A)

   In the long run a node of type i born at time t0 will receive a number of in–links from nodes of
type j which is a fraction


                                  H
               Πt0                                                              v(Bs A)i
           lim   t
                              =             (Bs      Br )−1 Br           p(h)
          t→∞ πt0 (t)                                               jh             p(i)
                         ij       h=1
                                   H        H
                                                                                                v(Bs A)i
                              =                      (Bs    Br )−1            [Br ]kh p(h)
                                                                         jk                        p(i)
                                  h=1       k=1
                                   H         H
                                                                                     p(k, h)           v(Bs A)i
                              =                      (Bs    Br )−1            p(k)              p(h)
                                                                         jk           p(h)                p(i)
                                  h=1 k=1
                                    H H
                                                                                                  v(Bs A)i
                              =                      (Bs    Br )−1            p(k)p(k, h)
                                                                         jk                          p(i)
                                      h=1 k=1
                                       H
                                                                                     v(Bs A)i
                              =               (Bs    Br )−1        p(k) p(j)                  .                   (i)
                                                              jk                        p(i)
                                      k=1

   of the overall links that it would receive in a type–blind process, where the last line comes from
the fact that H p(k, h) = 1. The second term is still a constant for type i, but the first term is
                h=1
generically not proportional to p(j).

Proof of Proposition 3: Let us start from the RSU model (which is a particular case of the RSB
model such that Bs is a matrix of all 1’s). The result comes from the expression of matrix Πt0 as
                                                                                            t
defined in equation (f), in the Proof of Proposition 2:
                                                                                         µ      
                                                           −1 ∞                       t
                   Πt0                    t     ms                   ms log          t0
                    t
                              = Q                   −1                                        Aµ  Q−1 .
                  πt0 (t)                t0                                     µ!
                                                              µ=1

            ms      −1
         t
Here    t0    −1     is just a rescaling term so that the matrix in brackets is again a Markov
matrix (see Lemma A in Appendix A). From the Proof of Proposition 2 we know that it converges


                                                            33
to the distribution of the population shares. As A satisfies the monotone convergence property, we
can apply Lemma C from Appendix A to prove that this convergence is monotonic.

   The general RSB case is analogous, with the same distinctions discussed above in the Proof of
Proposition 2. In this case
                                                                      µ         
        Πt0                            t ms
                                                 −1 ∞     ms log tt0
          t
               = (Bs Br )−1 Br Q            −1                          (Bs A)µ  Q−1 .
       πt0 (t)                        t0                       µ!
                                                                µ=1

As Bs A satisfies the monotone convergence property, we can apply Lemma C from Appendix A
and the same reasoning applies.

Proof of Proposition 4: From (13) we can compute the aggregate in–degree of all the nodes in
the network. To do this we simply integrate the expression over time:
                                                                    
                       t              t         ∞       t          µ
                                          mr      log τ ms Br
                         Πt dτ =
                          τ             n                             dτ
                     0              0     ms               µ!
                                               µ=1
                                                              µ
                                              t                             
                                    mr
                                          ∞         log τt
                                                                dτ (ms Br )µ
                               = n            0                             
                                    ms                        µ!
                                             µ=1
                                             ∞
                                        mr           t · µ! · (ms Br )µ
                                = n
                                        ms                    µ!
                                           µ=1
                                              ∞
                                          mr
                                = t·n                (ms Br )µ .                                 (j)
                                          ms
                                               µ=1

Column j of this matrix represents the aggregate in–degrees received by the t · p(θj ) nodes of type
j born up to time t. To get the average in–degree for each type we can divide every column by
t·p(θj ), that is right–multiply the whole matrix by (tB)−1 , obtaining (remember that mr = 1−ms ):
                                                                 
                                                     ∞
                                             1 − ms
                              nDQ−1 ≡ n 
                                ¯                       (ms Br )µ  Q−1 .                        (k)
                                               ms
                                                        µ=1

We have defined the matrix
                                                                                  
                                ∞                                         ∞
                     1 − ms                                    1 − ms
               ¯
              D=                     (ms Br )µ  = Q                        (ms A)µ  Q−1 .
                       ms                                        ms
                                µ=1                                     µ=1


It is a constant quantity of the system. The term in brackets 1−ms ∞ (ms A)µ is a Markov
                                                               ms     µ=1
matrix (Lemma A in Appendix A). It approaches A      ¯ as ms → 1 (Lemma B in Appendix A).
This means that the more the search process dominates the random one, the more the steady-
state distribution among types approaches homogeneity. Nevertheless, fixing one positive value of

                                                      34
ms < 1, these proportions are constant and unequal, so that the complete homogeneity among
types never happens.



Appendix C             Introducing formal biases in a discrete stochastic
                       process
In this appendix we discuss the constraints that should be satisfied by a linear distortion of a
probability distribution over a finite sample so that the composition is still a probability distribution.
We do this by proposing a simple stochastic mechanism, and by showing under which circumstances
a bias can be identified to be equivalent to this mechanism.
    Start with some probability distribution over a finite discrete set Θ with cardinality H, denoted
by p = (p1 , p2 , . . . , pH ).
A bias is a multiplicative factor for each probability, such that the new probability distribution
maintains unity measure, i.e. a vector of biases b = (b1 , b2 , . . . , bH ), such that
                                                        H
                                               b·p=           bi pi = 1 .                                             (l)
                                                        i=1

An implication of this is that if the original probability pi is 0 for some event i, there is no bias
that can make i happen with positive probability.

     An intuitive way to imagine and even implement such a bias is the following. Suppose that we
are extracting the elements of Θ from an urn, with probabilities given by p. Imagine that whenever
we extract an element of type i, after observing it, we disregard it with a probability ri specific to
i, and make a new extraction with replacement. More formally we have a new vector r that gives
probabilities of refusing an extraction. In this way the probability to extract and accept an element
i is given by the probability of doing it in the first extraction, plus the probability to do it in the
second extraction, and so on. . . In formulas, if we denote this new biased probability bi pi , it follows
that
                                           H                                H             2

             bi pi = pi (1 − ri ) +             pi ri   pi (1 − ri ) +            pi ri       pi (1 − ri ) + . . .
                                          i=1                               i=1
                                    ∞
                   = pi (1 − ri )         p·r
                                    t=0
                         1 − ri
                   = pi         .                                                                                    (m)
                        1−p·r
This formulation is well defined if there is at least one element ri of r, such that ri < 1. It is
                       1−ri
immediate to call bi = 1−p·r the bias for event i, and to check that property (l) holds.

                                                          35
    It is also easy to see that whenever two vectors p and b are specified, we can always find a vector
r that satisfies condition (m). The solution solves a system of H linear equations and it may not
be unique. As an example it is possible to check that the vector r = (k, k, . . . , k), with all elements
equal to k ∈ [0, 1), gives always the same (neutral) bias b = (1, 1, . . . , 1).


Appendix D              Examples
Let us consider how the mechanism described in Appendix C can be applied to the growing networks
that we define in Section 2. We restrict attention to the case where there are only two types in the
populations, type 1 forming a fraction of the population of p(1) = p, and type 2 with p(2) = 1 − p.
We call (r1,1 , r2,1 ) a vector of refusal probabilities of type 1, and (r1,2 , r2,2 ) a vector of type 2. In
particular, let types be homophilous with refusal probabilities r1 = (0, r1 ), with r1 ∈ [0, 1), and
similarly r2 = (r2 , 0), with r2 ∈ [0, 1).

D.1     Purely random model (R)
The system of equations that characterizes this system is
                          
                           P t+1 (1, 1) = n p
                           j                          1
                          
                                             t    1−(1−p)r1
                           t+1
                           P (1, 2) = n (1 − p) 1−r1
                          
                               j              t          1−(1−p)r1
                                                                                    ,                                (n)
                                P t+1 (2, 1) =
                                j                     n       1−r
                                                            p 1−pr22
                               
                                                      t
                                t+1                   n              1
                                P (2, 2) =                 (1 − p) 1−pr2
                               
                                  j                    t

so that we obtain a result with an explicit matrix of biases
                                               1                     1−r1
                                         p 1−(1−p)r1       (1 − p) 1−(1−p)r1
                               Br =           1−r                         1
                                                                                    .
                                           p 1−pr22             (1 − p) 1−pr2

D.2     Random Unbiased Search model (RSU)
                                         e
This is the case analyzed by Bramoull´ and Rogers (2010). The system of equations that charac-
terize our system is now
                                                   Pt                                  Pt
 t+1                                                     P λ (1,1)              1−r1          P λ (2,1)
 P (1, 1) = nmr p
 j                          1                 1
                                   + nms p 1−(1−p)r1 λ=j tpj        + (1 − p) 1−(1−p)r1 λ=j j                1

                    t   1−(1−p)r1                                                          t(1−p)           n
                                                          Pt                                   Pt
                                                                    λ                                  P λ (2,2)
                                                              λ=j Pj (1,2)
 t+1
 P (1, 2) = nmr (1 − p) 1−r1 + nms p

 j                                                  1                                 1−r1
                                                                           + (1 − p) 1−(1−p)r1 λ=j j                       1
                     t         1−(1−p)r1         1−(1−p)r1        tp                                 t(1−p)                n
                                                      Pt     λ
                                                                                     Pt                                        .
 t+1                nmr 1−r2              1−r         λ=jPj (1,1)               1          P λ (2,1)     1
 P (2, 1) =
 j
                     t p 1−pr2   + nms p 1−pr22          tp       + (1 − p) 1−pr2 λ=j j t(1−p)           n

                                                          Pt                                Pt
                                                                  P λ (1,2)                          P λ (2,2)

 t+1                nmr            1                1−r                                1                        1
 P (2, 2) =
                      t (1   − p) 1−pr2 + nms     p 1−pr22 λ=j tpj          + (1 − p) 1−pr2 λ=j j

 j                                                                                               t(1−p)         n
                                                                                                                     (o)

                                                           36
Br is the same as above, and we again have an explicit solution from Section 2.4, that we can make
explicit as the matrix is just 2 × 2:
                     
                                                  1                 1−r1      µ      
                                            p 1−(1−p)r1 (1 − p) 1−(1−p)r1
                                ms log tt0
                      ∞                        1−r
                                                                   1
                                                                                     
                  mr                        p 1−pr22     (1 − p) 1−pr2             
          Πt0 = n                                                                − I .       (p)
                                                                                     
            t
                  ms                                  µ!
                     
                                                                                   
                     µ=0                                                           


The explicit solution is
                                                                    „                          «                                                            „                          «
                                                                        r2 −1                                                                                   r2 −1
8                           00                                                                                                         1                                                     1
                                                             «m p              +      1            +m                                               «−m p              +      1
                                                               s                                                                                       s
                                                         „                                                                                 „
                                                                        pr2 −1   −pr1 +r1 −1                                                                    pr2 −1   −pr1 +r1 −1
>
>                                                   t                                                                                           t
                           B @p(r2 −1)((p−1)r1 +1) t0                                                   +(p−1)(r1 −1)(pr2 −1)A
>                          BB                                                                                                C                                                                C
                                                                                                                                               t0
>
>                                                                                                                                                                                             C
>
>
   t
                      n mr B
>                          B                                                                                                                                                                  C
> Πt0 (1, 1)
>
>
>              =        ms B                                                    p(r2 (2(p−1)r1 −p+2)−pr1 )+r1 −1
                                                                                                                                                                                           − 1C
                                                                                                                                                                                              C
>
>                          B                                                                                                                                                                  C
>
>                          @                                                                                                                                                                  A
>
>
>
>
>
>                                                                „                    «                 „                    «
                                                             «m p r2 −1 +                          «−m p r2 −1 +
                            0                        0                                      1                                                                      1
                                                                               1                                      1
>
                                                                                       +m
>
                                                               s                                      s
>                                                        „                                    „
                           B (p−1)(r1 −1)(pr2 −1)B t               pr2 −1 −pr1 +r1 −1                     pr2 −1 −pr1 +r1 −1
                                                                                          −1A t
>
>                                                                                           C
>                                                @ t                                                                                                               C
>
>
>                          B                        0                                           t0                                                                 C
> Πt (1, 2)           n mr B
>                          B                                                                                                                                       C
               =
>                                                                                                                                                                  C
>
>
>  t0                   ms B                                        p(r2 (2(p−1)r1 −p+2)−pr1 )+r1 −1                                                               C
>
>                          B                                                                                                                                       C
>
>                          @                                                                                                                                       A
>
<
                                                                    „
                                                                        r2 −1
                                                                                               «                     „                    «                                                       .
                                                                                                                «−m p r2 −1 +
                            0                        0                                                    1                                                        1
                                                             «m p              +      1            +m                              1
                                                               s                                                   s
>                                                        „                                                 „
                                                                        pr2 −1   −pr1 +r1 −1                           pr2 −1 −pr1 +r1 −1
>
>
>                          B p(r2 −1)((p−1)r1 +1)B t
                                                 @ t                                                    −1A t
                                                                                                          C                                                        C
                                                                                                             t0
>
>
>                          B                         0                                                                                                             C
                      n mr B
>
> Πt (2, 1)
>
               =
                           B                                                                                                                                       C
                                                                                                                                                                   C
>
>
>  t0                   ms B                                        p(r2 (2(p−1)r1 −p+2)−pr1 )+r1 −1                                                               C
>
>                          B                                                                                                                                       C
>
>                          @                                                                                                                                       A
>
>
>
>
>                                                                   „                          «                                „                                 «
                                                                        r2 −1                                                       r2 −1
>                           0                        0                                                     1                                                            1
>
>                                                            «m p              +      1            +ms                  «−m p              +      1
                                                               s                                                           s
                                                         „                                                     „
                                                                        pr2 −1   −pr1 +r1 −1                                        pr2 −1   −pr1 +r1 −1
>
                          B p(r2 −1)((p−1)r1 +1)B t                                                                 t
>
                                                                                                         −1A
>
>                                                                                                          C                                                             C
                                                @ t                                                                t0
                                                    0
>
>                         B                                                                                                                                              C
>
> Πt (2, 2)
>                      mr B                                                                                                                                              C
>  t0          =      nm B                                              p(r2 (2(p−1)r1 −p+2)−pr1 )+r1 −1
                                                                                                                                                                      − 1C
                        s B
>
>                         B                                                                                                                                              C
>
>                                                                                                                                                                        C
>
>                         @                                                                                                                                              A
>
:

                                                                                                                                                                                    (q)

 If we assume that the parameters of the system are p = 1/2, n = 10, mr = .5, ms = .5, r1 = .8,
r2 = 0 and t0 = 1000, then we obtain exactly the example discussed in Section 3.4.

D.3       Random-Search with Search bias (RSB)
                                                        b1,1 b1,2
Now we have to consider a new matrix of bias B =                  (that will be the Bs defined
                                                        b2,1 b2,2
in the model), that can be derived from a homophilous matrix of additional refusal probabilities
        0 s1
S=              .
       s2 0
The system of equations that characterize our system is now
                                                                      Pt      λ                                  Pt       λ
                                                                                                                                 !
                                                                       λ=j Pj (1,1)                                 λ=j Pj (2,1)
     8
     > P t+1 (1, 1)          nmr         1                              1                              1−r1                         1
     >
     >  j               =     t
                                   p 1−(1−p)r       + nms    b1,1 p 1−(1−p)rtp
                                                                                      + b1,2 (1 − p) 1−(1−p)r        t(1−p)         n
     >
     >
     >                                          1                               1                              1
     >
     >                                                                      Pt       λ                                 Pt       λ
                                                                                                                                        !
     >
                             nmr           1−r1                                   Pj (1,2)                    1−r1             Pj (2,2)
     > P t+1 (1, 2)                                                  1        λ=j                                         λ=j             1
     >
                        =        (1 − p) 1−(1−p)r + nms b1,1 p 1−(1−p)r                     + b1,2 (1 − p) 1−(1−p)r
     >
     >
     <  j                     t                  1                        1       tp                                 1      t(1−p)        n
                                                            Pt      λ (1,1)                       Pt       λ (2,1)
                                                                                                                   !                                                            .   (r)
     > P t+1 (2, 1)          nmr    1−r2              1−r2    λ=j Pj                          1      λ=j Pj          1
     >
                        =        p 1−pr + nms b2,1 p 1−pr                   + b2,2 (1 − p) 1−pr
     >
     >
     >
     >  j                     t         2                 2       tp                            2     t(1−p)         n
     >
     >
     >                                                            Pt       λ                            Pt        λ
                                                                                                                         !
     >
                             nmr                            1−r2         Pj (1,2)                               Pj (2,2)
     > P t+1 (2, 2)                                                 λ=j                                    λ=j
     >
     >
                        =                  1
                                 (1 − p) 1−pr + nms b2,1 p 1−pr                                     1
                                                                                  + b2,2 (1 − p) 1−pr                       1
        j                     t                                          tp                                  t(1−p)         n
     :
                                             2                  2                                     2



 Biases are on the (already biased) probabilities of matrix Br . Essentially, we have now a new
matrix of bias in the search part, that we defined as Bs Br in Section 2.3. This matrix has the

                                                                                    37
form
                           1                             1−r
                                                          1
                                                                             
                      p 1−(1−p)r               (1−p) 1−(1−p)r (1−s2 )
                             1                                1
                            1−r1                              1−r1                               p               (1−p) (1−s2 )
             1−s1 (1−p) 1−(1−p)r               1−s1 (1−p) 1−(1−p)r                     1−s1 (1−p)(1−r1 )        1−s1 (1−p)
                   1−r2
                                 1                                  1        =                                                          .             (s)
                p 1−pr (1−s2 )                              1
                                                    (1−p) 1−pr                              p(1−s2 )               (1−p)
                        2                                       2                              1−s2 p            1−s2 p(1−r2 )
                          1−r2                             1−r2
                  1−s2 p 1−pr                      1−s2 p 1−pr
                              2                                 2


We can replace Bs Br with (s) in the solution (14). It is possible to obtain an explicit solution
analogously to the one obtained in (r) for the RSU case.

D.4     Type bias on search bias on targeted nodes (RSBT)
In this case, we still have a bias derived from a homophilous matrix S.
The system of equations that characterize this system is similar to that in the case of RSB. However
this leads to two matrices of biases, because biases are on the target:
                                                                       Pt       λ                                 Pt       λ
                                                                                                                                 !
                                                                        λ=j Pj (1,1)                                λ=j Pj (2,1)
   8
   > P t+1 (1, 1)           nmr                                                                          1−r1
   >
   >  j               =      t
                                        1
                                  p 1−(1−p)r    + nms     b1 p 1−(1−p)r
                                                           1,1
                                                                   1
                                                                             tp
                                                                                        + b2 (1 − p) 1−(1−p)r
                                                                                            1,1                       t(1−p)
                                                                                                                                     1
                                                                                                                                     n
   >
   >
   >                                       1                        1                                           1
   >
   >                                                                         Pt        λ (1,2)                          Pt       λ (2,2)
                                                                                                                                         !
                                                                               λ=j Pj                                      λ=j Pj
   >
   > P t+1 (1, 2)           nmr           1−r1                                                                 1−r1
                                (1 − p) 1−(1−p)r + nms b1 p 1−(1−p)r  1                        + b2 (1 − p) 1−(1−p)r                       1
   >
                      =
   >
   >
   <  j                      t                  1           1,2            1         tp           1,2                 1      t(1−p)        n
                                                             Pt      λ                               Pt     λ
                                                                                                                   !                               .   (t)
   > P t+1 (2, 1)           nmr    1−r2               1−r2      λ=j Pj (1,1)                          λ=j Pj (2,1)
                                p 1−pr + nms b1 p 1−pr                       + b2 (1 − p) 1−pr  1                     1
   >
                      =
   >
   >
   >
   >  j                      t         2          2,1      2       tp             2,1              2    t(1−p)        n
   >
   >
   >                                                               Pt       λ (1,2)                       Pt      λ (2,2)
                                                                                                                          !
                                                                     λ=j Pj                                 λ=j Pj
   >
   > P t+1 (2, 2)
   >                        nmr                              1−r2
                                (1 − p) 1−pr + nms b1 p 1−pr
                                          1                                          + b2 (1 − p) 1−pr1                      1
   >
   :  j               =      t                         2,2                tp            2,2                   t(1−p)         n
                                             2                   2                                      2



The biases are on the probabilities of finding a target of that particular type, and these probabilities
differ according to the intermediary (superscript on the b’s). We obtain
        0               tp                     (1−s1 )tp          1              0           t(1−p)                    (1−s1 )t(1−p)           1
                     Pt
                           P λ (1,2)
                                                Pt
                                                      P λ (1,2)
                                                                                               Pt
                                                                                                    P λ (2,2)
                                                                                                                           Pt
             tp−s1                     tp−s1                                         t(1−p)−s1                   t(1−p)−s1        P λ (2,2)
      B               λ=j j                      λ=j j            C            B                λ=j j                         λ=j j            C
    1 B                                                           C          2 B                                                               C
   B =B
      B
                                                                  C
                                                                  C   and   B =B
                                                                               B
                                                                                                                                               C
                                                                                                                                               C   .
      @             (1−s2 )tp                     tp              A            @          (1−s2 )t(1−p)                  t(1−p)                A
                     Pt
                           P λ (1,1)
                                               Pt
                                                     P λ (1,1)
                                                                                              Pt
                                                                                                     P λ (2,1)
                                                                                                                           Pt
             tp−s2                     tp−s2                                         t(1−p)−s2                   t(1−p)−s2      P λ (2,1)
                      λ=j j                     λ=j j                                            λ=j j                      λ=j j



 This makes the biases depend on every element inside the brackets that characterize the search
part of system (t). They can be taken out, as a rough approximation, only if at the limit of t j
we have B                                          ¯
           1 and B2 converging to a unique matrix B of biases.


                                           ¯
   Taking out Bs as a single constant B, as we do in Section 2.3, is a big simplification. Even so,
that case is not so easily solvable as it has an additional bias compared to the RSU model. This is
the case of the RSBT model analyzed here.




                                                                            38