Vertex k-center problem is in the process of being merged into this article. If possible, please edit only this article, as the article mentioned above may be turned into a redirect. Relevant discussion may be found on this article's talk page and/or the source article's talk page. (November 2023)

In graph theory, the metric $k$ -center problem is a combinatorial optimization problem studied in theoretical computer science. Given $n$ cities with specified distances, one wants to build $k$ warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory, this means finding a set of $k$ vertices for which the largest distance of any point to its closest vertex in the $k$ -set is minimum. The vertices must be in a metric space, providing a complete graph that satisfies the triangle inequality.

Formal definition

[edit]

Let $(X,d)$ be a metric space where $X$ is a set and $d$ is a metric
A set $\mathbf {V} \subseteq {\mathcal {X))$ , is provided together with a parameter $k$ . The goal is to find a subset ${\mathcal {C))\subseteq \mathbf {V}$ with $|{\mathcal {C))|=k$ such that the maximum distance of a point in $\mathbf {V}$ to the closest point in ${\mathcal {C))$ is minimized. The problem can be formally defined as follows:
For a metric space ( ${\mathcal {X))$ ,d),

Input: a set $\mathbf {V} \subseteq {\mathcal {X))$ , and a parameter $k$ .
Output: a set ${\mathcal {C))\subseteq \mathbf {V}$ of $k$ points.
Goal: Minimize the cost $r^{\mathcal {C))(\mathbf {V} )={\underset {v\in V}{\max ))$ d(v, ${\mathcal {C))$ )

That is, every point in a cluster is in distance at most $r^{\mathcal {C))(V)$ from its respective center. ^[1]

The k-Center Clustering problem can also be defined on a complete undirected graph G = (V, E) as follows:
Given a complete undirected graph G = (V, E) with distances d(v_i, v_j) ∈ N satisfying the triangle inequality, find a subset C ⊆ V with |C| = k while minimizing:

\max _{v\in V}\min _{c\in C}d(v,c)

Computational complexity

[edit]

In a complete undirected graph G = (V, E), if we sort the edges in non-decreasing order of the distances: d(e₁) ≤ d(e₂) ≤ ... ≤ d(e_m) and let G_i = (V, E_i), where E_i = {e₁, e₂, ..., e_i}. The k-center problem is equivalent to finding the smallest index i such that G_i has a dominating set of size at most k. ^[2]

Although Dominating Set is NP-complete, the k-center problem remains NP-hard. This is clear, since the optimality of a given feasible solution for the k-center problem can be determined through the Dominating Set reduction only if we know in first place the size of the optimal solution (i.e. the smallest index i such that G_i has a dominating set of size at most k), which is precisely the difficult core of the NP-Hard problems. Although a Turing reduction can get around this issue by trying all values of k.

Approximations

[edit]

A simple greedy algorithm

[edit]

A simple greedy approximation algorithm that achieves an approximation factor of 2 builds ${\mathcal {C))$ using a farthest-first traversal in k iterations. This algorithm simply chooses the point farthest away from the current set of centers in each iteration as the new center. It can be described as follows:

Pick an arbitrary point ${\displaystyle {\bar {c))_{1))$ into ${\displaystyle C_{1))$
For every point $v\in \mathbf {V}$ compute $d_{1}[v]$ from ${\displaystyle {\bar {c))_{1))$
Pick the point ${\displaystyle {\bar {c))_{2))$ with highest distance from ${\displaystyle {\bar {c))_{1))$ .
Add it to the set of centers and denote this expanded set of centers as ${\displaystyle C_{2))$ . Continue this till k centers are found

Running time

[edit]

The i^th iteration of choosing the i^th center takes ${\mathcal {O))(n)$ time.
There are k such iterations.
Thus, overall the algorithm takes ${\mathcal {O))(nk)$ time.^[3]

Proving the approximation factor

[edit]

The solution obtained using the simple greedy algorithm is a 2-approximation to the optimal solution. This section focuses on proving this approximation factor.

Given a set of n points $\mathbf {V} \subseteq {\mathcal {X))$ , belonging to a metric space ( ${\mathcal {X))$ ,d), the greedy K-center algorithm computes a set K of k centers, such that K is a 2-approximation to the optimal k-center clustering of V.

i.e. $r^{\mathbf {K} }(\mathbf {V} )\leq 2r^{opt}(\mathbf {V} ,{\textit {k)))$ ^[1]

This theorem can be proven using two cases as follows,

Case 1: Every cluster of ${\displaystyle {\mathcal {C))_{opt))$ contains exactly one point of $\mathbf {K}$

Consider a point $v\in \mathbf {V}$
Let ${\bar {c))$ be the center it belongs to in ${\displaystyle {\mathcal {C))_{opt))$
Let ${\bar {k))$ be the center of $\mathbf {K}$ that is in $\Pi ({\mathcal {C))_{opt},{\bar {c)))$
$d(v,{\bar {c)))=d(v,{\mathcal {C))_{opt})\leq r^{opt}(\mathbf {V} ,k)$
Similarly, ${\displaystyle d({\bar {k)),{\bar {c)))=d({\bar {k)),{\mathcal {C))_{opt})\leq r^{opt))$
By the triangle inequality: ${\displaystyle d(v,{\bar {k)))\leq d(v,{\bar {c)))+d({\bar {c)),{\bar {k)))\leq 2r^{opt))$

Case 2: There are two centers ${\bar {k))$ and ${\bar {u))$ of $\mathbf {K}$ that are both in $\Pi ({\mathcal {C))_{opt},{\bar {c)))$ , for some ${\displaystyle {\bar {c))\in {\mathcal {C))_{opt))$ (By pigeon hole principle, this is the only other possibility)

Assume, without loss of generality, that ${\bar {u))$ was added later to the center set $\mathbf {K}$ by the greedy algorithm, say in i^th iteration.
But since the greedy algorithm always chooses the point furthest away from the current set of centers, we have that ${\displaystyle {\bar {k))\in {\mathcal {C))_{i-1))$ and,

${\begin{aligned}r^{\mathbf {K} }(\mathbf {V} )\leq r^((\mathcal {C))_{i-1))(\mathbf {V} )&=d({\bar {u)),{\mathcal {C))_{i-1})\\&\leq d({\bar {u)),{\bar {k)))\\&\leq d({\bar {u)),{\bar {c)))+d({\bar {c)),{\bar {k)))\\&\leq 2r^{opt}\end{aligned))$ ^[1]

Another 2-factor approximation algorithm

[edit]

Another algorithm with the same approximation factor takes advantage of the fact that the k-Center problem is equivalent to finding the smallest index i such that G_i has a dominating set of size at most k and computes a maximal independent set of G_i, looking for the smallest index i that has a maximal independent set with a size of at least k. ^[4] It is not possible to find an approximation algorithm with an approximation factor of 2 − ε for any ε > 0, unless P = NP. ^[5] Furthermore, the distances of all edges in G must satisfy the triangle inequality if the k-center problem is to be approximated within any constant factor, unless P = NP. ^[6]

Parameterized approximations

[edit]

It can be shown that the k-Center problem is W[2]-hard to approximate within a factor of 2 − ε for any ε > 0, when using k as the parameter.^[7] This is also true when parameterizing by the doubling dimension (in fact the dimension of a Manhattan metric), unless P=NP.^[8] When considering the combined parameter given by k and the doubling dimension, k-Center is still W[1]-hard but it is possible to obtain a parameterized approximation scheme.^[9] This is even possible for the variant with vertex capacities, which bound how many vertices can be assigned to an opened center of the solution.^[10]

References

[edit]

^ ^a ^b ^c Har-peled, Sariel (2011). Geometric Approximation Algorithms. Boston, MA, USA: American Mathematical Society. ISBN 978-0821849118.
^ Vazirani, Vijay V. (2003), Approximation Algorithms, Berlin: Springer, pp. 47–48, ISBN 3-540-65367-8
^ Gonzalez, Teofilo F. (1985), "Clustering to minimize the maximum intercluster distance", Theoretical Computer Science, vol. 38, Elsevier Science B.V., pp. 293–306, doi:10.1016/0304-3975(85)90224-5
^ Hochbaum, Dorit S.; Shmoys, David B. (1986), "A unified approach to approximation algorithms for bottleneck problems", Journal of the ACM, vol. 33, pp. 533–550, doi:10.1145/5925.5933, ISSN 0004-5411, S2CID 17975253
^ Hochbaum, Dorit S. (1997), Approximation Algorithms for NP-Hard problems, Boston: PWS Publishing Company, pp. 346–398, ISBN 0-534-94968-1
^ Crescenzi, Pierluigi; Kann, Viggo; Halldórsson, Magnús; Karpinski, Marek; Woeginger, Gerhard (2000), "Minimum k-center", A Compendium of NP Optimization Problems
^ Feldmann, Andreas Emil (2019-03-01). "Fixed-Parameter Approximations for k-Center Problems in Low Highway Dimension Graphs" (PDF). Algorithmica. 81 (3): 1031–1052. doi:10.1007/s00453-018-0455-0. ISSN 1432-0541. S2CID 46886829.
^ Feder, Tomás; Greene, Daniel (1988-01-01). "Optimal algorithms for approximate clustering". Proceedings of the twentieth annual ACM symposium on Theory of computing - STOC '88. New York, NY, USA: Association for Computing Machinery. pp. 434–444. doi:10.1145/62212.62255. ISBN 978-0-89791-264-8. S2CID 658151.
^ Feldmann, Andreas Emil; Marx, Dániel (2020-07-01). "The Parameterized Hardness of the k-Center Problem in Transportation Networks" (PDF). Algorithmica. 82 (7): 1989–2005. doi:10.1007/s00453-020-00683-w. ISSN 1432-0541. S2CID 3532236.
^ Feldmann, Andreas Emil; Vu, Tung Anh (2022). "Generalized $$k$$-Center: Distinguishing Doubling and Highway Dimension". In Bekos, Michael A.; Kaufmann, Michael (eds.). Graph-Theoretic Concepts in Computer Science. Lecture Notes in Computer Science. Vol. 13453. Cham: Springer International Publishing. pp. 215–229. arXiv:2209.00675. doi:10.1007/978-3-031-15914-5_16. ISBN 978-3-031-15914-5.

Formal definition

Computational complexity

Approximations

A simple greedy algorithm

Running time

Proving the approximation factor

Another 2-factor approximation algorithm

Parameterized approximations

See also

References

Further reading