Proving the Height of a Red-Black Tree is Less Than or Equal to \(2log_2(n+1)\)
Published:
With this theorem holding, we can ensure the time complexity of the search, insertion, and deletion operations within a red-black tree is \(O(log(n))\). We’re going to prove this theorem by induction, starting with a leaf node as the base case.
Question
- if \(T\) is a perfect binary tree, \(n = 2^{h+1}-1\)
- if \(T\) is a complete binary tree, \(n \ge 2^h\)
Can we develop a similar theorem for red-black tree?
Theorem
For a red-black tree with \(n\) number of internal nodes, \(h\) is less than or equal to \(2log_2(n+1)\), where \(h\) is its height.
Notice that it’s more practical to restrict the target of the counting to internal nodes, as the external nodes of the red-black tree don’t hold actual keys.
Idea of the Proof
\(h \le 2log_2(n+1) \iff n \ge 2^{\frac{h}{2}} - 1\), so we’re going to prove that given \(h\), \(n\) is greater than or equal to \(2^{\frac{h}{2}}-1\). The reason is that, in the context of tree data structures, we usually regulate the height to have certain properties. This, consequently, provides some height-relevant axioms, which can facilitate the proof.
Preliminary
- a tree data structure \(T\)’s height, \(height(T.root)\), is the number of edges contained in the longest path from the root to a leaf
- the black height of \(T\), \(bh(T.root)\), is the number of black nodes, excluding the root, on any path from the root to a leaf
- the color of the root and the leaf node of a red-black tree is always black
Proposition
\(bh(root) \ge \frac{h}{2}\)
Proof of the Proposition
\(\begin{align*} bh(root) &= h - \text{number of red nodes on the path with length } h\\ &\ge h - \underbrace{\lfloor \frac{h}{2} \rfloor}_{\text{maximum number of red nodes}} \\ &\ge h - \frac{h}{2} \\ &= \frac{h}{2} \end{align*}\)
Note:
Considering the longest path from the root to a leaf, the number of the nodes on this path, excluding the root, is \(h\), and since there are no consecutive red nodes and the leaf node is always black, the maximum number of the red nodes is \(\lfloor \frac{h}{2} \rfloor\).
Lemma
Consider a node \(x\), and the total number of internal nodes in the subtree rooted at \(x\) is \(n\). Then, \(n \ge 2^{ bh(x) } -1\).
Proof of the Lemma
When the black height is equal to 0, which is a leaf node, \(n^{leaf} = 1\). The inequality holds for leaf: \(n^{leaf} = 1 \ge 2^{bh(leaf)}- 1 = 2^0-1 = 0\).
Suppose for any pair of node \(x\) and it’s child \(y_i, i \in \{1, 2\}\), representing the left and right child respectively, the inequality holds for \(y_i\): \(n^{y_i} \ge 2^{bh(y_i)} - 1\).
Notice that if \(y_i\) is red, \(bh(y_i) = bh(x)\), and if \(y_i\) is black, \(bh(y_i) = bh(x) - 1\). Hence, \(bh(y_i) \ge bh(x) - 1\).
Finally, \(n^x = n^{y_1} + n^{y_2} +1 \ge 2^{bh(y_1)} -1 + 2^{bh(y_2)} - 1 + 1 \ge 2 * 2^{bh(x)-1} - 1 = 2^{bh(x)} -1\).
Note: “+1” term in \(n^{y_1} + n^{y_2} +1\) means considering \(x\) itself.
Proved by induction, for all node \(x\), the total number of nodes in subtree rooted at \(x\) is greater than or equal to \(2^{bh(x)} - 1\).
Proof of the Theorem
Combine the propostion with the lemma, \(n = n^{root} \ge 2^{bh(root)} - 1 \ge 2^{\frac{h}{2}} - 1, \Rightarrow h \le 2log(n+1)\)