Approximation Algorithms for the Metric Median Problem

Document Sample
Approximation Algorithms for the Metric Median Problem Powered By Docstoc
					Approximation Algorithms for the Metric k -Median
                    Problem
                                                           Michael Shindler

                                                   √
  Paper type: Research [ ] Critical survey [           ]             A. Variations of k-median
  Main area(s): Theory of Computation                                  In addition to k-median, as described above, there are some
   Abstract—A short overview of recent results in clustering
                                                                     naturally similar problems that arise, either due to the manner
related algorithms. Exact algorithms and approximations for k-
median of various quality are discussed, as are variations of this   by which the data was collected or the manner by which the
problem for different input and space/time requirements. Natural     data will be used. Variations of k-median include:
alternative formulations of k-median are covered as are related        • Online k-median. In online algorithms, some part of the
problems. Also discussed is the application of such solutions to          input is unknown in advance. In the case of online k-
various information-centric uses. This paper concludes with a
discussion about clustering metrics as a whole.                           median, the full metric space is known, but k is not. The
   Index Terms—clustering, survey paper, k-median, streaming              goal is to determine a permutation of the points such that
algorithm, sampling algorithm, approximation algorithm, linear            the first k serve as a good solution to k-median for that
programming, data mining, operations research                             k, for all k. This can be thought of as asking for k in
                                                                          an online fashion. In related formulations, demand points
                      I. I NTRODUCTION                                    arrive online and k can be increased for a cost. The latter
   The past decade has seen significant advances in approx-                model is more common for facility location.
imating several prominent NP-Hard problems from the field               • k-median with diameter, in which the maximum distance
of operations research. At the center of this is the k-median             between two points in N is ∆ and the minimum distance
problem, in which an appropriate partitioning of the data set             between two is δ. This effectively limits the amount by
into k representative clusters is sought. This is one of several          which a point can affect the optimal cost.
metrics available to quantify the quality of a clustering.             • Prize-collecting k-median, in which some points may be
   In this survey, I give an overview of many of the applications         excluded from the solution cost, either up to a limit or by
of k-median and related problems, as well as several forms                paying an arbitrary cost to not assign these to any of the
of algorithmic solutions to natural formulations of the task.             k medians. This is sometimes known as k-median with
In section 2, I provide a formal definition of the k-median                penalties. This puts an upper bound on the cost by which
problem as well as many variations of it. In section 3, I explain         any point can affect the optimum cost.
applications of k-median: why solutions to the problem are             • k-median with outliers, in which some fraction η of
useful in practice. In section 4, I overview more than a                  the points don’t count towards the cost of the solution;
decade’s worth of results related to the problem. In section 5,           this prevents small, distant clusters from significantly
k-median under alternate forms of input is explored. Section 6            affecting the approximation guarantee.
discusses alternate forms of the problem. Section 7 discusses          • Capacitated k-median, in which each median can have
problems that are similar to k-median. I conclude with a                  at most C data points in its cluster. This is sometimes
discussion of alternate notions of clustering.                            known as robust k-median.
                  II. P ROBLEM S TATEMENT                            B. Similar problems to k-median
   Given N , a set of points in some metric space, and some            Similar problems to k-median include k-means, k-center,
integer k, our goal is to select k points, K ⊆ N . Once the          and facility location. Approaches from these can be used to
points are selected, the solution quality is the sum of all N ’s     assist results in k-median and vice versa, and as such, are
points’ distances to their nearest element of K; a higher quality    worth considering when studying k-median.
solution corresponds to a lower cost. The set of data points           • In k-center, the goal is to select k points so as to minimize
that have a given m ∈ K as their closest is known as m’s                  the maximal distance from any given point to its cluster
cluster. Point m is the cluster’s center, or median.                      center. This can be seen as setting up Wireless Access
   The k-median problem is NP-Hard and can be approximated                Points and minimizing the broadcast radius for all served
both by solution cost and by the number of medians produced               points.
by the solution. In general, an [α, β]-approximation for k-            • In k-means, the objective function to minimize is the sum
median guarantees a cost of at most αOPT and uses at most                 of squares for the distance from each point to its center.
βk medians. If an approximation factor is listed as a single              In Euclidean space, the 1-mean corresponds to the center
expression, the algorithm uses exactly k medians and the                  of mass, although the 1-median does not have such a
expression is the cost ratio to OPT.                                      closed form [21] in two or more dimensional space.
  •   Facility Location is similar in goal to k-median, but          answer database queries for which traditional relational models
      without an explicit bound on the number of medians that        lack the expressive power to ask in the first place or for which
      may be selected. Instead, each point that isn’t a median       it is difficult for a user to pose a specific query. Fraud detection
      pays a cost to connect to its nearest median, as before,       and event cataloging fall into this category. It can also be used
      and any point may be designated as a median by paying          to alleviate the so-called “write-only memory” phenomenon
      a given cost. Facility Location is used as a subroutine in     that occurs when large amounts of data are gathered, making
      many k-median solutions.                                       it difficult for humans - and most algorithms - to read and
      The total cost of a facility location solution is split into   understand [8].
      the service cost paid to connect points to medians and the
      facility cost, incurred by opening new medians (facilities).                    IV. R ESULTS ON k-M EDIAN
      It is worth comparing, briefly, the similarities of k-median       The k-median problem has been studied in many forms. In
      and facility location with uniform facility costs. An α-       1992, the first polynomial-time approximation algorithm with
      approximation for k-median is itself an α-approximation        provable performance guarantees was discovered. Nine years
      for that version of facility location, as a search can be      later, the first constant approximation was published.
      done for the best value of k. However, even an exact
      algorithm for that version of facility location does not       A. Hardness Results
      give a solution for k-median. In a sense, this makes k-           The non-metric k-median problem is NP-Hard and is as
      median the harder problem.                                     hard to approximate as Set Cover, and as such, no approxima-
                                                                     tion better than O(log n) may exist unless P = NP. The same
              III. A PPLICATIONS OF k-M EDIAN                        is true for non-metric facility location [25].
   The need for efficiently locating sources of supply causes            The metric k-median problem is NP-Hard and Max-SNP
demand for accurate solutions to the k-median problem and its        Hard. Approximating k-median within ε and maintaining
variants. Efficient location permits quick and cheap transport-       exactly k medians is also NP-hard.
ation to customers from warehouses and improved customer                Online solutions in the model of k arriving online cannot
relations due to speed of delivery [28] [34]. The solution is                                       2
                                                                     achieve better than a 2 - n−1 -competitive ratio due to the
of concern to the firm(s) in question, as it is both a guide          star graph with unit-cost edges. The center must be chosen as
for maximizing profits and is instructional for infrastructural       the first point, otherwise 1-median would be n-competitive on
investment [43]. It also can be used by analysts both interior       that graph, and the optimal solution for n − 1-median involves
and exterior as a measure of the efficiency of the firm or             selecting everything except the center piece [35].
of the industry as a whole [43]. Minimizing the distance to             The k-median problem in general graphs is (1+ 2 ) ≈ 1.736-
                                                                                                                        e
facilities from serving locations can increase the speed with        hard to approximate [37]. Facility location over the same is
which inventory can be replaced at sales locations, and thus         1.463-hard to approximate [20].
drive up sales [7].
   Facility Location is similarly used, as it models the fall        B. Optimal Solutions on Special Metrics
in shipping cost associated with increased warehousing while            Tamir [44] gives an O(n2 k) dynamic programming algo-
balancing total cost [18]. This is convenient as most geograph-      rithm to solve k-median on tree metric. The algorithm pro-
ical systems can be represented as these two basic elements,         ceeds from the leaves of the tree up to the root. He observes
and it is quite apparent when a situation is described in            that it gains the same time bound when applied to forest
economical terms [41].                                               metrics. He cites a previous result showing an O(kn) time
   For some input data, such as customer transaction records, a      algorithm for instances in which the input graph is a path. In
k-median solution can be interpreted as k “typical” customers’       each of these three cases, the solution is optimal and solvable
information. This is great for targeted advertising (by discov-      in pseudo-polynomial time.
ering market segments) or a stockholder summary. Similar                The usefulness for tree metric solutions isn’t limited to cases
applications exist for large data sets, such as web pages or         where the metric is given as a tree. A common application
phone records. Phone record clustering is useful in detecting        of this result is that it is used to form approximations for
telephone fraud [4]. Capacitated k-median can be used to             general metrics by embedding those metrics into trees and
determine locations for setting up proxy servers on the web.         then solving the problem optimally on the tree or on the
   Problems in the form of k-center are particularly important       probability distribution over trees produced. This results in an
for cases when each distance from a distribution center, rather      approximation factor of exactly the distortion produced by the
than total distance, causes additional expenses or hardships.        embedding. In 1996, Bartal [5] provided a method to embed
Perishable goods, such as fresh milk, require refrigeration to       any metric space into a probability distribution over trees in
transport over distance [43].                                        such a way that the expected distortion is O(log2 n). That
   Clustering is also useful in machine learning and classifi-        is, if d(x, y) is the distance between x and y in the original
cation as similar cases can be grouped either into an epitomical     metric, the distance between them in the newly formed tree
sample or into groupings of the same, permitting quick class-        metric is expected to be O(log2 n)d(x, y). The trees formed
ification of future items [26]. Clustering can also be used to        are k-hierarchically well-structured trees (“k-HSTs”). They are
rooted, weighted trees formed in such a way that, for all nodes,    D. Bicriteria Approximations
the weight of the edge to any given child is the same and are
a decrease of at least a factor of k from the edge that extends        The first polynomial time approximation algorithm with
from its parent to it.                                              a provable performance guarantee is due to Lin and Vitter
                                                                    [31], [32]. Their algorithm provides a [1 + ε, (1+ 1 )(ln n + 1)]
                                                                                                                       ε
   In 1998, Bartal [6] improved this to O(log n log log n)
                                                                    approximation on k-median. They also have a [2(1 + ε), 1 + 1 ] ε
distortion and also provided a deterministic version of the
                                                                    approximation. The latter requires additional input in the form
result. This was done by instead computing hierarchically
                                                                    of a bound on the optimal cost. While this approach is based
partitionable metrics (“HPMs”), on which the graph is given
                                                                    on linear program relaxation as a starting point, it does not
lengths such that, on a cut in the graph, the lengths crossing
                                                                    employ traditional rounding ; for example, some variables may
the cut are at least as much as the diameter and then each
                                                                    be set to zero in the relaxed linear program solution but be
partition formed by the cut obeys this rule as well, with a
                                                                    set to 1 by the algorithm before the solution settles. They do
(naturally) smaller diameter. The HPMs are then translated to
                                                                    this by filtering the results, guaranteeing that each vertex is
HSTs.
                                                                    assigned to a center in its approximate neighborhood.
   There exist metrics for which any tree embedding must have
                                                                       Korupolu et al [29] provide an analysis of k-median solved
Ω(log n) distortion, and for a few years, the gap remained
                                                                    by a local search heuristic. Local search is a technique that
open in general graphs. In 2003, Fakcharoenphol, Rao, and
                                                                    starts with some plausible solution, perhaps obtained by an-
Talwar [17] closed the gap by providing embeddings of
                                                                    other approximation algorithm or by an arbitrary starting point,
general metrics into trees with O(log n) distortion. Rather than
                                                                    and incrementally improves the solution until no immediate
constructing the set of trees directly, as Bartal had done, they
                                                                    improvement can be found. In the case of k-Median, any
decompose the graph by growing clusters’ diameters, cutting
                                                                    subset of N of size k is a possible solution, although it is
crossed edges with probability based on the length of the
                                                                    not necessarily a good one. It is possible to improve the
edge relative to the diameter. Once the graph is decomposed
                                                                    solution by taking some p medians (1 ≤ p ≤ k) and p non-
in this manner, the tree distribution can be grown from the
                                                                    medians and “swapping” them, making the medians into non-
components formed.
                                                                    medians and vice versa. All n data points are then reassigned
   Before the gap was closed, Charikar et al [9] gave an            to their nearest of the new set of medians and the solution
approximation by derandomized the process that produced the         cost is updated. If any of the possible improvements leads to
tree distribution. They did this by solving the linear program      an improved solution cost, that becomes the new temporary
relaxation on the original problem and building a set of trees      solution; in the event that multiple lead to improvement, the
such that the fractional solution is exact on the distribution of   best improvement is chosen. If none lead to improvement, the
trees produced. The rounded solution from here produces an          solution whose cost could not be improved upon is the final
O(log n log log n) approximation, which can be improved to          result. Note that this finds the local optimum, which is not
O(log k log log k) with an idea from [31]. As this was achieved     necessarily the global optimum. The approximation guarantee
prior to the O(log n) embedding of [17], there is no explicit       is based on the worst-possible local optimum as compared to
claim to whether or not O(log k) is possible through this           the global optimum. This is known as the locality gap.
approach. I believe it is possible, although I leave it unproven
                                                                       Korupolu et al consider local search with one swap and
as solutions to far better factors than O(log k) exist for k-
                                                                    provide a guarantee of user’s choice of [1 + ε, 10 + 17 ] or
                                                                                                                               ε
median.
                                                                    [Θ(ε3 ), 5 + 1 ]. The guarantee is formed by showing that for
                                                                                 ε
                                                                    any solution state sufficiently more expensive than the optimal,
C. Arbitrary Precision within Special Metrics                       there is a state one swap away that has a lower cost. This
                                                                    essentially proves a bound on the locality gap, thus showing
   Arora et al [1] give the first arbitrary precision approxima-     that upon termination, local search is within that bound of
tion for planar k-median in the form of a PTAS. Their solution      optimal. This also helps to show that it takes polynomial time
                   1
takes time nO(1+ ε ) and has an expected approximation ratio        to run, as there are a polynomial number of swaps possible
of 1 + ε with constant probability. Their algorithm works           at each stage and polynomially-many that can happen before
by dividing the plane into a point-quadtree and using a             convergence. The chosen bound β on the number of medians
dynamic programming approach on that result, limiting how           determines the starting position of the local search: any set of
assignments may be made outside of their individual sectors         βk points.
within the quadtree. This is a generalization of the approach          Indyk [24] gives an approximation that produces a [(1 +
Arora used to solve Euclidean TSP to arbitrary precision.           γ)3(2+α), 2β] approximation with probability Ω(γ), for some
   In the event of a small value of k, a slightly different         confidence parameter γ > 0 and some [α, β] approximation
approach may be used. Instead of restricting association out                                         ˜
                                                                    for k-median that runs in time O(n2 ), such as any mentioned
of a segment of the quadtree, the median locations themselves       in this paper with such a guarantee. The error probability
                                                                                                              1
may be approximated. This results in a 1 + ε-approximation          can be reduced to a constant with O( γ ) repetitions of the
with runtime n(O(log n ))2k , a notable improvement when k
                        ε                                           algorithm, taking the smallest cost solution produced. The
is small.                                                                                               ˜ 2
                                                                    runtime of a single iteration is O( kn ). It works by taking
                                                                                                          δ
                               √
                             ˜
a single, sufficiently large (O( nk)) sample of points in N             Note that each time we call facility location, we have essen-
and running the provided [α, β] approximation algorithm on          tially relaxed the requirement that exactly k medians be opened
the sample. From the first sample, the fraction with the largest     by moving the requirement to the objective function as a cost.
values for distance to their medians are separated and black-       This is the essence of Lagrangian Relaxation, introduced by
box algorithm is run again on these. The solution is the output     Jain and Vazirani [25], and applied to produce an approximate
of both runs of the black-box algorithm, for a total of 2βk         solution. The realization for this came by observing that the
medians.                                                            k-median algorithm of Lin and Vitter uses ideas from constant-
                                                                    factor approximations for facility location. The Lagrangian
E. Constant-factor Approximations                                   Relaxation technique formalizes the relationship between these
                                                                    two problems.
   The first constant factor approximation that is exact on the         A problem arises in this, however, because facility location
number of medians used is due to Charikar et al [11] in 2001.       isn’t solved exactly by our subroutine: if we were going to
The algorithm produces a 6 2 -approximation by linear program
                              3                                     spend the time to solve it exactly, we could just solve k-median
relaxation. From the linear program solution, a collection of       itself directly. Furthermore, it isn’t immediately obvious that
trees are built and the problem is then solved optimally on the     an approximate solution to facility location on the new cost
newly formed forest metric.                                         metric is a bounded approximation on k-median. We need
   More specifically, they begin by consolidating nearby data        to be more careful. In this case, if a solution with exactly
points, modifying the demands so as to not change the linear        k medians isn’t found, either due to precision or a deliberate
program solution. Without changing the cost of the fractional       decision to limit the number of iterations, we will still have two
solution, they are able to form a semi-sparse data set, with        surrounding cases, one with more medians and one with fewer.
all positive demand points apart from one another. They then        Accepting a small loss in the approximation factor, the two
modify the linear program solution on these to consolidate          can be combined via a convex combination and randomized
nearby fractional centers. The rounding, viewed as a solution       rounding. In general, the Lagrangian Relaxation technique, as
over a forest metric, produces the approximate solution.            described by Jain and Vazirani, is to take any hard constraint,
   Jain and Vazirani [25] present a 6-approximation. This, on       such as the number of medians, and move it from the linear
its own, is not the key important contribution of their paper, as   programming constraints to the objective function, adding it
Arya et al gave a 3 + ε-approximation the same year. Jain and       (times some multiplier) to any objective function already there.
Vazirani, however, introduce the use of primal-dual schema to       The problem is then solved as a subroutine, either explicitly or
incorporate into the algorithm, instead of relying on solving a     using a known result for the relaxed problem, and the solution
linear program explicitly and forming an integral solution from     is interpreted as a solution to the original.
there. In addition to giving an improved runtime, this technique       Lagrangian Relaxation can also be applied to solve prob-
also permits improvements based on the structure of the             lems where two related problems are joined by a k-median
problem - an advantage absent in traditional linear program-        type constant. Jain and Vazirani [25] give the example of
based solutions. This leads to both more efficient solutions and     needing two types of facilities (schools and hospitals) and
solutions that are strictly combinatoric - eliminating the linear   clustering by connecting points to one of each of these, but
program-based portions of the algorithm. This technique can         only being able to build k total facilities. The Lagrangian
apply to other problems as well and is thus relevant outside        Relaxation of this problem becomes two facility location
of k-median.                                                        problems.
   Another technique may well be to use facility location as a         Charikar and Guha [10] improve the result of [25] to
subroutine. For illustration purposes, let us first consider how                                   ˜
                                                                    give a 4-approximation in O(n3 ) time. After the small and
this would work if we were to attempt to solve the problem          large solutions are found, an additional augmentation step is
exactly. Our goal is to find k medians, and facility location        performed by moving medians from the large solution to the
does find medians, but it does not allow us to specify an            small. This is made possible by their observation that the dual
explicit limit on the number of medians. Instead, it allows         variables are continuous and small changes in the facility cost
us to specify the cost per facility. We can use a solution to       are bounded in their effect on the dual variables.
facility location as a black box and do a binary search on             A better approximation was found later the same year by
the cost per facility input in order to find a cost that yields      Arya et al [3] using local search, improving over the previous
exactly k facilities. For any given attempt at cost per facility,   local search heuristic of [29]. Arya et al show that if p swaps
if fewer than k facilities are opened, we can conclude that         are allowed at each stage, the approximation guarantee is 3 +
                                                                    2
the facilities were overpriced and lower the cost for the next      p . With only one swap permitted at each stage, this is a 5-
iteration (using a classic principle from economics: to sell        approximation, with additional swaps taking additional time
more of something, lower the price). Alternately, if facility       but providing an improvement on the solution quality. Any
location reports too many medians, then the price of facilities     amount of swaps per stage yields an improved solution quality
                                                                                2
is too low. Eventually, assuming facilities may have any real       over the 6 3 -approximation of [11].
cost, we will converge on a solution in which k facilities are         The proof of the single swap case revolves around the
opened; this is our solution.                                       idea of “capturing” an optimal median. An optimal median is
captured by some element in some solution if more than half          solution along the way as requested. In addition to being useful
of the points assigned to the optimal median in the optimal          for information that may literally be streaming in real time,
solution are being serviced by the capturing point. Note that        these algorithms are also useful for cases where random access
the capturing point is bad for us. We are fortunate in that          to data is expensive or non-existent or where multiple reads
optimal facilities can be swapped in exactly once and any point      are prohibitively expensive. It also applies to database-related
that captures exactly one optimal point will be swapped out          cases when the query to be executed references more data than
for the optimal point by local search. Furthermore, if a point       can fit in main memory [4].
captures two or more optimal points, it will not be swapped             The streaming algorithm of Guha et al [20] reads the
out by single swap - it is these points that cause the locality      data m points at a time, clustering them using some [α, β]
gap. Using these facts, an upper bound on the cost increase          bicriteria approximation algorithm, and repeats, remembering
by swaps made or not made can be proven. Combining this              the weighted medians as “level-1 medians.” When m weighted
with the limit on how many swaps may occur during a run of           medians are stored at level-i, the weighted points are clustered
local search gives us the locality gap.                              into βk level-i+1 medians. When the program is done reading
   For the case of multiple swaps, the notion of capture is          all of the input - or we wish to cluster based on what has been
extended to any subset of the current solution state. A subset       seen so far, all medians seen so far are clustered into k final
captures a set of optimal points: namely it captures any optimal     medians. The result is a constant approximation.
point for which the subset covers half or more of the points that       The paper also gives a one-pass O(nk log n) time and
the optimal point covers in the optimal solution. Any subset         O(nε ) space algorithm. It uses the local-search algorithm
making a capture is termed bad, as before. Note that for any         of Arya et al [3] as a subroutine and produces a constant
subset of size one, the definitions of capture are identical.         approximation ratio for small values of k.
Because each iteration of the local search is able to remove            Charikar et al [13] improve on Guha’s streaming result.
a bad median from the current solution state, and the number         Their streaming algorithm is O(kpoly log n) space and is a
of times a median can be swapped in or out continues to be           constant approximation with high probability. It uses online
                                                2                    facility location, due to [36] and described later in this paper,
bounded, the approximation guarantee of 3 + p can be shown.
                                                                     as a subroutine. By bounding facility cost and the quantity
              V. A LTERNATE F ORMS OF I NPUT                         of medians (facilities) produced by the online facility location
   As k-median is important to large-scale data mining, it has       algorithm, they are able to get performance guarantees for
also been studied as an online problem, as streaming, and as         their streaming algorithm. To guarantee a constant-factor ap-
sampling, all of which are necessary due to the exponential          proximation on cost from the algorithm, O(k log n) medians
growth in available data.                                            are selected instead, as they cannot shuffle the input to get
                                                                     a constant approximation. When the data stream has been
A. Online solutions                                                  processed in its entirety, the O(k log n) weighted medians are
   Mettu and Plaxton [35] provided a constant approximation          consolidated into exactly k. The algorithm takes one pass over
for the online k-median problem. The algorithm runs in time          the data, provides a constant approximation with probability
                                                                              1
O(n2 +nl), where l is the number of bits necessary to represent      1 - poly(n) , and uses only the space necessary to represent
the distance between two points in the metric space. They            O(k log2 n) points from the stream.
show that a traditional greedy algorithm, selecting the next            Charikar et al also describe a solution to the asymmetric
median so as to minimize the objective cost given the previous       k-median problem, in which it is not guaranteed that d(x, y)
pieces, has an unbounded competitive ratio. Theirs, instead, is      = d(y, x), although reflexivity and directed triangle inequality
hierarchical greedy: it picks a point greedily, but rather than      continue to apply. Their non-streaming solution is essentially
settle on that point, it confines its next search to that point’s     to grow radii among points to cover additional uncovered
neighborhood. It repeats this process until there is only one        points at the cheapest cost. This is similar to a greedy
point left to pick; this takes O(log ∆ ) repetitions. While the
                                         δ                           approach used for k-center and works solely if ∆ is known
                                                                                                                         δ
constant-competitiveness of this algorithm is useful in its own      in order to give a size by which to iterate. This provides an
right, it is also of particular use in that it can be run once and   [O(1), 2k log ∆ ]-approximation.
                                                                              ε      δ
then queried for multiple values of k without reprocessing the          Their streaming result for the same is done by showing
input.                                                               that the non-streaming can be modified to take less memory
                                                                     and proceed over O(log ∆ ) passes. The new algorithm uses
                                                                                                 δ
B. Streaming-based solutions                                         O(k 2 log ∆ ) space.
                                                                                 δ
   In many cases, it is not practical to store all the data and         Streaming solutions are also applied to cases where old
read it repeatedly. For instance, if customer data, such as          data needs to expire and is no longer considered relevant.
purchase information, were desired to be clustered and the           Babcock et al [4] present an algorithm for k-median over
then-current k medians output nightly, reading all the previous      sliding windows, in which only the R most recent points are
data before clustering becomes prohibitively expensive and           relevant, can be chosen as medians, and affect the optimal
redundant after a short period of time. Streaming algorithms         solution. Any data points before the R are deemed to have
maintain an approximate solution state and can output a partial      expired. Furthermore, the memory requirements of streaming
remain in effect. Their initial solution takes O( τk4 R2τ log R)      assumptions. First, the assumption must be made that data
                                 1
memory and produces a [2O( τ ) , 2]-solution for τ < 1 , chosen
                                                          2
                                                                      points can be selected uniformly at random without needing
by the user for trade-off purposes. This can be extended to           to read the entire input in the process; if this assumption does
produce exactly k medians with the same approximation factor          not hold, the algorithm might as well be a streaming algorithm,
and memory requirement. As in the algorithm of Guha et al,            as the reading the data on its own will take linear time. Also, if
medians at intermediate levels are maintained, as are buckets         the sample is not obtained by randomness, an adversary could
for relevance of oldest to newest, for expiration purposes. Each      deliver a misleading sample that ignores even big clusters. It
bucket is split into the different levels, as per Guha’s algorithm.   is also necessary to assume that the data is either clustered by
An overestimate of the clustering cost of the bucket is also          OPT to be reasonably large (Ω( n ) in size) or that the problem
                                                                                                        k
maintained. Each median must also track approximately how             being solved is actually k-median with outliers. Alternately,
many active points are in its cluster. When groups of medians         the diameter of the problem could be small. If these are not
are clustered, they are weighted according to the number of           the case, then a small cluster of outliers could hide from the
active points each contains.                                          sampling and a linear scan of the data would need to detect
   For the case of streaming data that arrives in an online fash-     this, again defeating the benefits of sampling. This assumption,
ion, this can be extended with the same memory requirement            in any form, essentially prohibits a small subset of the points
      ˜
and O(k) time to update at each additional arrival point.             from unduly affecting the optimal solution. Finally, in any
                                                                      sampling-based solution, the k medians themselves must be
C. Sampling-based solutions
                                                                      an acceptable output to the program, as outputting an explicit
   With the rapid growth of stored data, sometimes even linear-       mapping N → K would take O(n) time, even if the solution
time algorithms are impractical. As such, it becomes important        were magically produced.
to condense data for consumption and analysis. Sampling-                 Mishra et al [38] were the first to apply this technique to get
based algorithms solve this by guaranteeing an approximation          an approximation algorithm for k-median that runs in sublinear
ratio in time sublinear in the size of the data. As this prohibits    time. Specifically, they show that, given an α-approximation
even a full reading of the data, assumptions about it must            to k-median (such as, say, any algorithm from Section IV-E
be made, such as those about the quantity and importance of           of this paper), they can produce a 2α + ε-approximation with
outliers as well as the distribution of the relevant data.            probability arbitrarily close to one. If the metric is Euclidean
   For any sampling solution, the following result is useful:         space ( d ), the approximation factor is α + ε.
          Let S ⊆ N and K ⊆ S be the optimal k-median                    The algorithms they present assume also that the diameter
      solution on S. K is a 2-approximation on what the               of the space is known, although this can be estimated if it
      k-median solution on S would be if any point from               is not known a priori. Because any sample of sufficient size
      N , rather than only the points in S, could be selected         will behave like the full set for some classes of functions, and
      as medians when solving k-median on S.                          because they are able to show that computing the diameter is
   A version of this fact is cited in [37] and in [38]. The fact is   such a function, computing the diameter of a sufficiently large
useful in any sampling solution to k-median for two reasons:          sample provides an estimate, if such is needed.
   First, if we consider N to be a subset of an infinite metric           To prove the approximation bounds, they show that the ratio
space - which it is - then any solution on N is only a factor         of the average distance of a point to the closest approximate
of two away from an unconstrained solution and has a much             median in the sample converges quickly to the same ratio over
more confined search space. For example, if we are trying to           the full set.
find a set of k customers that are indicative of our customer             In order to make these guarantees, however, several trade-
data set, we are not significantly worse by insisting that our         offs for runtime, accuracy, and probability of guarantee are
output be real customers rather than a fictional customer that         necessary. A larger diameter necessitates a larger sample, to
is somehow more average.                                              prevent outliers from hiding. Increased accuracy or probability
   Also, if we do a sampling-based algorithm for applications         of success also require more elements to be sampled. The
where reading the entire input is infeasible or intractable, we       running time is directly dependent on the size of the sample,
can restrict our attention to the sample for that portion of the      so increases in these directly lead to increased runtime, just
program, rather than the entire input, which would defeat the         as additional input size does.
point of sampling in the first place.                                     Czumaj and Sohler [15] improve on this result by removing
   Furthermore, statistics research has shown that sufficiently        the dependence on the input size and reducing the dependence
large samples of data converge to behaving the same as the            on the diameter in the sampling. Using a smaller sample,
full data set under some characteristic conditions. In essence,       they are able to provide a 2(α + ε) approximation to the
this permits that the sample studied need only represent the          problem in a significantly improved runtime. They also extend
behavior of the entire data set rather than represent the entire      the improvement to the Euclidean space metric, providing the
data set [38]. This makes selecting and demonstrating the             same α + ε guarantee from the original.
validity of the sample much easier.                                      Meyerson et al [37] present a sampling-based algorithm to
   It is worth noting that any sampling-based (or other               solve k-median, within a constant factor, with high probability.
sub-linear) algorithm to solve k-median must make several             They will first generate several sample sets of size Ω( k log k),
                                                                                                                               ε
uniformly at random, from the data and will approximate the          slowly, local search tracks the optimal value to the end. At
solution to each sample with the algorithm of Arya et al [3],        each stage of penalty value, the previous solution is used as a
choosing the best sample to output. Of course, the decision          starting point.
for best will also be approximate: determine the actual cost            Charikar et al [13] present a one-pass solution for k-median
will take Ω(n) time. Instead, they approximate this within a         with outliers that is exact on the number of medians. To
constant factor by checking each set in several more random          produce this result, they draw a sufficiently large sample,
samples, showing that the “best” k-medians chosen by this are        run an [α, β]-approximation on that solution, and interpret
a constant approximation of the true best from the choices.          the solution to the sample as a solution to the problem. This
This works well on data with large clustering, and if the            is based on the observation that a sufficiently large random
algorithm of Charikar et al [12] is used as a subroutine instead     sample of N will cluster about the same way that N does. It
of the Arya algorithm, the result will work for k-median with        uses O(k log n) space and, with high probability, it produces
outliers. They also show the trade-offs between sample size          an approximation on N without a significant increase in cost,
and minimum cluster size; if larger samples are allowed, then        the number of medians, or the number of outliers over the
the required minimum cluster size in OPT may grow as well.           bounds produced by the subroutine.
       VI. A LTERNATE VERSIONS OF THE PROBLEM                        B. Prize-Collecting k-median
A. k-Median with Outliers                                               Charikar et al [12] are able to produce a 4-approximation to
   The k-median problem with outliers is a natural formulation       prize-collecting k-median through Lagrangian Relaxation by
for solving k-median in situations where a small fraction of         using their algorithm for facility location with penalties and
the data is errors of some sort, noise or otherwise corrupted        adding centers from the larger solution to the smaller solution
data, and these errors can disproportionately affect the optimal     through various selection rules. This is similar to the approach
solution cost. If the solution identifies the outliers as well,       used by many algorithms described earlier in the paper.
they can be subject to manual intervention: removal if the              They also note that PTAS from the Euclidean Version can
data point is noise, investigation if it is an anomaly, et cetera.   be extended for Euclidean Prize-Collecting k-median, as per
The problem is also worthy of theoretical study as an issue of       the results of [1].
when k-median problems have multiple global constraints.
                                                                     C. Capacitated k-median
   Charikar et al [12] provide an algorithm for k-median
with outliers, giving a (1+ε)-approximation on the number of            The result of Korupolu et al [29] can be applied to ca-
outliers and a 4(1+ 1 )-approximation on cost. Their algorithm       pacitated k-median with splittable demands, providing bicrite-
                     ε
is based on the use of their 4-approximation to prize-collecting     ria approximations of [1 + ε, 12 + 17 ] and [Θ(ε3 ), 5 + 1 ].
                                                                                                           ε                     ε
k-median. They are able to approximate the optimal cost, C ∗         In both cases the capacity requirement is not broken. As
to within a factor of ε. By setting the penalty from prize           with uncapacitated k-median, the desired bound on medians
                            ∗
collecting k-median to C , they are able to get a solution           determines the starting point of the search as well as the
                           εη
to k-median with outliers with the given bounds.                     approximation guarantee.
   Chen [14] presents the first polynomial time constant ap-             The result of Arora et al [1] can be extended to Euclidean
proximation for k-median with outliers. Chen’s algorithm uses        capacitated k-median by adding a capacities dimension to
Lagrangian Relaxation, as described in [25]. In this case,           the dynamic programming table. Through this approach, Eu-
the Lagrangian relaxation of the problem is facility location        clidean capacitated k-median may be approximated to 1 + ε
                                                                                n 2
with outliers. Two solutions will be found in the Lagrangian         in nO(log ε ) time.
relaxation phase: one with too few centers and another with             Pal et al [40] give a local search solution to facility
too many. If the one with too many has at least k + 2 centers,       location with nonuniform hard capacities and achieve a 9-
he cannot use [25]’s merge step and must use his own greedy          approximation. The hard capacity limit means that the capacity
algorithm.                                                           cannot be exceeded by any amount - approximations on
   If exactly k + 1 centers are used by the larger solution, then    that factor are not acceptable. The local search is similar to
the smaller solution no longer has a provable cost bound, so         that used for k-median in starting point, however there are
a solution starting from the larger solution must be found.          additional steps allowed between states. In addition to a swap
Lagrangian relaxation can be applied here again. Consider            of facility, adding or removing a facility is permitted. The
k-median with outliers and the ability to exclude more than          algorithm can still be shown to terminate in polynomial time.
the pre-specified number of outliers by paying some penalty.
                                                                                      VII. R ELATED P ROBLEMS
This is the same problem if the penalty is infinite but is a
Lagrangian relaxation otherwise. Computing a local search as         A. k-Center
a subroutine, starting from the larger solution, and gradually         In k-center, the goal is to select k points so as to minimize
increasing the penalty for additional medians will yield the         the maximal distance from any given point to its cluster center.
desired approximation. The idea of their local search is to find      A good way to think of this is as setting up Wireless Access
a set of successor states in such a way as to have a constant        Points and minimizing the broadcast radius for all served
approximation among them. Because the penalty is increased           points. It is NP-Hard to approximate k-center to a constant
factor better than two [22], and thus this result is tight. For     possible states) or on solution quality (as there are examples
the optimization version, minimizing the maximum radius is          on which it can be arbitrarily bad) [2]. Initial attempts by the
NP-Hard [38].                                                       Theory community to resolve this included exact algorithms
   Dyer and Frieze [16] give a simple heuristic for k-center.       [23] and arbitrarily accurate approximations [21], although
They take the point with the heaviest weight as the first            k-means remained in regular use.
center and each successive center is chosen greedily as the            Recently, two independent results attempted to solve this
point whose distance to its closest center is maximal. This         impasse by small modifications to k-means in order to
achieves an approximation ratio of 3, and if the ratio of the       provide it with provable solution quality guarantees and faster
largest weight to the smallest weight is less than two, the         runtimes, both provably and in practice.
approximation factor is one plus the weight ratio. Note that this      The first is due to Ostrovsky et al [39]. They show that
algorithm also works if k-center is cast as an online problem       k-means does well when a good clustering exists for the
with k arriving online, as with online k-median [35]                parameters given: that is, the data naturally has a clustering
   Hochbaum and Shmoys [22] provide the best possible               for the given value of k. They give a method for seeding the
constant-factor approximation to the metric k-Center problem.       Lloyd algorithm that is similar to the k-center solution of [16]
Theirs is a 2-approximation on the radius, uses an exact            and show that only one iteration of local search is necessary
number of centers, and is based on an observation from linear       to provide a constant bound for solution quality on such data.
programming duality. The observation is that a dominating set       This is naturally faster than Lloyd’s algorithm as no additional
of size k in the square of the graph corresponds to a k-center      iterations are necessary.
solution in the original. This, too, is NP-Hard to compute,            The second, an algorithm named k-means++, is due to
but a feasible dominating set can be computed from a strong         Arthur and Vassilvitskii [2]. In this, the starting solution state is
stable set, which can be approximated in a greedy fashion.          chosen carefully, rather than randomly as it was in k-means,
   Charikar et al [13] present a one-pass solution for k-           before using traditional k-means for the rest of the algorithm.
Center with outliers that, with high probability, produces an       The first center is chosen uniformly at random and each
α-approximation on k-center with outliers, approximating the        successive center is chosen with probability proportional to
number of outliers by a factor of (1 + ε)2 . It is based on the     the square of its proximity to its nearest center. This is known
same ideas as their solution to k-median with outliers (see         as D2 weighting. Choosing D2 weighting alone - without
above). It requires, as input, an algorithm that can solve k-       the local search step - guarantees an approximation factor of
center with outliers to approximate within a factor of α and        Θ(log k), although the practical solution quality is improved
be exact on the number of outliers; such an algorithm is given      by the local search. It can be shown that if this is run 2k
in [12] and is described above. The advantage to using this         times, it is likely for the best solution found to be within a
algorithm instead of the one being used as input is that this is    constant factor of the optimal. Their empirical results on both
faster with a loss in approximation factor only to the quantity     synthetic and natural datasets showed k-means++ is a good
of centers used.                                                    improvement in both running time and solution quality over
   Another paper of Charikar et al [12] gives an O(n3 ) time 3-     k-means.
approximation to k-Center with outliers. The k-center problem
with outliers cannot be approximated to better than 3 unless        C. Facility Location
P=NP.
                                                                       Facility Location is similar in goal to k-median, but without
B. k-means                                                          an explicit bound on the number of medians that may be
   The k-means problem is very similar to k-median, except          selected. Instead, each point that isn’t a median pays a cost to
the objective function to minimize is the sum of squares for        connect to its nearest median, as before, and any point may be
the distance from each point to its center.                         designated as a median by paying a given cost. This cost may
   By far, the most common technique used in practice to solve      or may not be uniform across all points. This can be thought
k-means is due to Lloyd in 1982 [30]. The technique is similar      of as warehouses for distributing goods to a chain of stores.
in idea to local search in that k means are selected randomly          Mettu and Plaxton [35] give an offline facility location
and the appropriate clustering for the state is computed. Rather    solution using the model of hierarchically greedy that was
than search all p possible swaps, however, the next solution        used in their solution to online k-median. Theirs is a 3-
state is the center of mass of the k clusters, taking advantage     approximation and runs in time O(n2 ).
of the closed form nature of the k-means problem. Those                Charikar and Guha [10] provide a local-search based solu-
centers are then the basis for a re-clustering, and the process                                                               ˜ 2
                                                                    tion to approximate facility location within 2.414 + ε in O( n )
                                                                                                                                  ε
                                                                                                                                   2
repeats itself until the solution stabilizes. The solution could,   time or to get an approximation tradeoff of (1 + γ, 1 + γ )
in principle, take a great while to converge, but in practice it    on facility cost versus service cost. This is near optimal for
runs very quickly.                                                  tradeoffs, as no algorithm can achieve a better tradeoff than
                                                                                1
   It is perhaps confusing that Lloyd’s algorithm - referred to     (1+γ, 1+ γ ). Combining [25] and another previous result and
generally as just k-means - remains in practice despite no          adding their own observation of scaling yields an approxima-
provable guarantees on either the running time (beyond the k n      tion ratio of 1.728. The combination is important, because no
algorithm based solely on the primal dual technique of [25]            Recently, Kannan et al [27] proposed a new bicriteria
can do better than 3-ε.                                             formulation to determine the quality of a clustering solution.
   The current best-known approximation for facility location       Now, the input becomes a similarity graph for the data, with
is a 1.519 factor due to Mahdian et al [33]. The approach used      higher edge weights corresponding to vertices that are more
is an integration of many approaches used for both k-median         similar. A k-clustering partitions the graph into k components,
and facility location.                                              ideally in such a way that the most similar points are in the
   Meyerson [36] provides online algorithms for facility lo-        same connected components. In this, we can judge the quality
cation under different models. If the facilities have uniform       of a clustering by the cuts it creates; we want to measure the
costs, then the arrival of each point prompts a decision: shall     size of the cut relative to the sizes of the pieces it creates. The
we build a facility here? He proposes randomized decision           quality of a clustering is the minimum quality of the clusters
making for this case. If it would be more expensive to connect      it creates.
this point to any currently open facility, then we open one            This causes a problem in that all vertices are equally im-
here. Otherwise, we open one here with probability equal to         portant. We would rather give importance to vertices that have
the ratio of the cost to connect to the nearest facility to the     many group similarities. As such, we measure the conductance
cost to open a facility here. If the points arrive in a random      of the cut as the ratio of the cost of cut to the similarity that
order, this is constant competitive. However, if the adversary      the members on one side of the cut share with the graph.
may choose the order in which the points arrive, the algorithm         As with k-median, a few outliers can cause problems.
is O(log n)-competitive.                                            Instead of explicitly choosing to exclude a fraction a priori,
   If facilities can have non-uniform costs, the above algorithm    we will judge a clustering by a second criterion: the fraction
will no longer work, as the current location may be arbitrarily     of the total weight not covered by the clusters. This allows
expensive. Instead, the metric space and associated facility        us to exclude outliers by paying a cost, with a smaller cost
costs are given in advance, and when we choose to open              corresponding to data that is farther from our desired clusters,
a facility, we may do so anywhere. We must still remain             and thus more of an outlier.
competitive on the points mentioned so far, so we cannot               The problem statement is then to be given the similarity
simply solve the offline version and call it an online solution.     graph and one of the two parameters: either minimize outlier
By rounding the facility costs to powers of two and considering     cost while retaining the desired quality or maximize the quality
multiple choices of where to open the next facility, including      without excluding more than the given budget for outliers.
not doing so, he gets a constant competitive algorithm for
randomized arrival order and O(log n) for the adversarial case.                             ACKNOWLEDGMENTS
   For both cases, it can be shown that no online algorithm
                                                                       I am grateful to Adam Meyerson for providing the starting
can be constant-competitive for adversarial order.
                                                                    seeds of papers to read. From these, I was able to find some
   The online solution is also useful as a solution to the offline
                                                                    initial results and additional related papers. I am also grateful
problem. On its own, it can provide a constant-approximation
                                                                    for his feedback on successive drafts of this paper.
in O(n2 ) time by shuffling the input points and then processing
each point in the new order as though it were an online
                                                                                                  R EFERENCES
arrival. While the approximation factor delivered is worse
than the guarantees made by other algorithms, this is much          [1] Arora, S., Raghavan, P., and Rao, S. Approximation schemes for Eu-
faster. Furthermore, it can be used as a subroutine to local            clidean k-median and related problems. In Proceedings of the 30th Annual
                                                                        ACM Symposium on Theory of Computing, pages 106-113, 1998.
search-based solutions in order to start the algorithm at a good    [2] Arthur, D. and Vassilvitskii, S. k-means++: The Advantages of Careful
solution. When applied to [10], described above, the same
     √                                                                  Seeding. Proceedings of the 15th Annual ACM-SIAM Symposium on
                                               2
1 + 2 + ε guarantee can be made in O( n ) time.
                                              ε
                                                                        Discrete Algorithms. 2007.
                                                                    [3] Arya, V., Garg, N., Khandekar, R., Meyerson, A., Munagala, K., and
   Fotakis [19] expanded on this result to show that no random-         Pandit, V. Local Search Heuristics for k-Median and Facility Location
                                          log n
ized algorithm can do better than Ω( log log n ) on any metric          Problems. In Proceedings of the 30th Annual ACM Symposium on Theory
space. He also gave a deterministic version of Meyerson’s               of Computing. 2001.
                                                                    [4] Babcock, B., Datar, M., Motwani, R., and O’Callaghan, L. Maintaining
algorithm and showed that it achieves a competitive ratio of            Variance and k-Medians Over Data Stream Windows, 2003 PODS.
      log n
O( log log n ).                                                     [5] Bartal, Y. Probabilistic approximations of metric spaces and its algo-
                                                                        rithmic applications. In IEEE Symposium on Foundations of Computer
          VIII. OTHER N OTIONS OF C LUSTERING                           Science, pages 184-193, 1996.
                                                                    [6] Bartal, Y. On approximating arbitrary metrics by tree metrics. Proceedings
   All of the algorithmic approaches discussed thus far use a           of the 30th Annual ACM Symposium on Theory of Computing. 1998.
mathematical formula to evaluate the quality of the clustering      [7] Baumol, W.J. and Wolfe, P. A warehouse location problem. Operations
solutions produced. These work well in general and for many             Reserach. 1958.
                                                                    [8] Bradley, P.S., Fayyad, U.M., and Mangasarian, O.L. Mathematical Pro-
types of data, although each has examples for which the cost            gramming for Data Mining: Formulations and Challenges. INFORMS:
metric is easily fooled; that is, there is a natural and correct        Journal of Computing, special issue on Data Mining. 1998.
way to group the example data - that is the point of clustering     [9] Charikar, M., Chekuri, C., Goel, A., Guha, S. Rounding via Trees:
                                                                        Deterministic Approximation Algorithms for Group Steiner Trees and
after all - which the problems, even their exact solution, fail         k-median. Proceedings of the 30th Annual ACM Symposium on Theory
to find.                                                                 of Computing. 1998.
[10] Charikar, M. and Guha, S. Improved Combinatorial Algorithms for            [37] Meyerson, A., O’Callaghan, L. and Plotkin, S. A k-median Algorithm
    the Facility Location and k-Median Problems. Proceedings of the 40th            with Running Time Independent of Data Size. Journal of Machine
    Annual IEEE Conference on Foundations of Computer Science. 1999.                Learning, Special Issue on Theoretical Advances in Data Clustering
                                       ´
[11] Charikar, M., Guha S., Tardos, E., and Shmoys, D. A Constant Factor            (MLJ). 2004.
    Approximation Algorithm for the k-Median Problem. Journal of Com-           [38] Mishra, N., Oblinger, D., and Pitt, L. Sublinear Time Approximate
    puter and System Sciences. 2002. Originally appeared in Proceedings of          Clustering. Proceedings of the 9th Annual ACM-SIAM Symposium on
    the 31st Annual ACM Symposium on Theory of Computing. 1999.                     Discrete Algorithms. 2001.
[12] Charikar, M., Khuller, S., Mount, D., and Narasimhan, G. Algorithms        [39] Ostrovsky, R., Rabani, Y., Schulman, L., and Swamy, C. The effective-
    for Facility Location Problems with Outliers. In Proceedings of the 12th        ness of Lloyd-tpye methods for the k-Means problem. Proceedings of the
    Annual ACM-SIAM Symposium on Discrete Algorithms. 2001.                         47th Annual Symposium on Foundations of Computer Science. 2006.
[13] Charikar, M., O’Callaghan, L., and Panigraphy, R. Better Streaming                                 ´
                                                                                [40] Pal, M., Tardos, E., Wexler, T. Facility location with nonuniform hard
    Algorithms for Clustering Problems. Proceedings of the 35th Annual ACM          capacities. Proceedings of the 42nd Annual Symposium on Foundations
    Symposium on Theory of Computing. 2003.                                         of Computer Science. 2001.
[14] Chen, K. A constant factor approximation algorithm for k-median            [41] Scott, A.J. Operational analysis of nodal hierarchies in network systems.
    clustering with outliers. Proceedings of the 19th Annual ACM-SIAM               Operations Research Quarterly. 1971.
    Symposium on Discrete Algorithms. 2008.                                                                  ´
                                                                                [42] Shmoys, D., Tardos, E., and Aardal, K. Approximation algorithms
[15] Czumaj, A. and Sohler, C. Sublinear-Time Approximation Algorithms              for facility location problems. Proceedings of the 29th Annual ACM
    for Clustering via Random Sampling. Random Structures and Algorithms.           Symposium on Theory of Computing. 1997.
    2007.                                                                       [43] Stollsteimer, J.F. A working model for plant numbers and locations.
[16] Dyer, M. and Frieze, A.M. A simple heuristic for the p-center problem.         Journal of Farm Economics. 1963.
    Operations Research Letters. 1985.                                          [44] Tamir, A. An O(pn2 ) algorithm for the p-median and related problems
[17] Fakcharoenphol, J., Rao, S., and Talwar, K. A tight bound on approxi-          on tree graphs. Operations Research Letters. 1996.
    mating arbitrary metrics by tree metrics. Proceedings of the 35th Annual
    ACM Symposium on Theory of Computing. 2003.
[18] Feldman, E., Lehrer, F.A., and Ray, T. L. Warehouse location under
    continuous economies of scale. Management Science. 1966.
[19] Fotakis, D. On the Competitive Ratio for Online Facility Location.
    International Colloquium on Automata, Languages and Programming.
    2003.
[20] Guha, S., Mishra, N., Motwani, R., and O’Callaghan, L. Clustering
    Data Streams. Proceedings of the 41st Annual IEEE Conference on
    Foundations of Computer Science. 2000.
[21] Har-Peled, S. and Mazumdar, S. Coresets for k-means and k-median
    clustering and their applications. Proceedings of the 36th Annual ACM
    Symposium on Theory of Computing. 2004.
[22] Hochbaum, D. S., and Shmoys, D. B. A Best Possible Heuristic for the
    k-Center Problem. Mathematics of Operations Research. 1985.
[23] Inaba, M., Katoh, N., and Imai, H. Applications of weighted voronoi
    diagrams and randomization to variance-based k-clustering. SCG ’94:
    Proceedings of the tenth annual symposium on Computational geometry.
    1994.
[24] Indyk, P. Sublinear Time Algorithms for Metric Space Problems. Pro-
    ceedings of the 31st Annual ACM Symposium on Theory of Computing.
    1999.
[25] Jain, K. and Vazirani, V. Approximation Algorithms for Metric Facility
    Location and k-Median Problems Using the Primal-Dual Schema and
    Lagrangian Relaxation. Journal of the ACM. 2001.
[26] Jain, K. and Vazirani, V. Primal-dual approximation algorithms for
    metric facility location and k-median problems. Proceedings of the 40th
    Annual Symposium on Foundation of Computer Science. 1999.
[27] Kannan, R., Vempala, S., and Vetta, A. On Clusterings: Good, Bad,
    and Spectral. Proceedings of the 41st Annual IEEE Conference on
    Foundations of Computer Science. 2000.
[28] Keuhn, A. A. and M.J. Hamburger. A heuristic program for locating
    warehouses. Management Science. 1963.
[29] Korupolu, M., Plaxton, C., and Rajaraman, R.. Analysis of a local search
    heuristic for facility location problems. Proceedings of the 9th Annual
    ACM-SIAM Symposium on Discrete Algorithms. 1998.
[30] Lloyd, S.P. Least squares quantization in pcm. IEEE Transactions on
    Information Theory. 1982.
[31] Lin, J. and Vitter, J.S. ε-Approximations with Minimum Packing Con-
    straint Violation. Proceedings of the 24th Annual ACM Symposium on
    Theory of Computing. 1992.
[32] Lin, J. and Vitter, J.S. Approximation Algorithms for Geometric Median
    Problems. IPL. 1992.
[33] Mahdian, M., Ye, Y., and Zhang, J. Approximation Algorithms for
    Metric Facility Location Problems. SIAM Journal on Computing. 2006.
[34] Manne, A.S. Plant Location under economies-of-scale-decentralization
    and computation. Management Science. 1964.
[35] Mettu, R.R. and Plaxton, C.G. The Online Median Problem. Proceedings
    of the 41st Annual Symposium on Foundations of Computer Science. 2000.
[36] Meyerson, A. Online Facility Location. IEEE Symposium on Founda-
    tions of Computer Science. 2001.