The Tree Data Structures Every Developer Should Know

Trees are everywhere in software: file systems, databases, compilers, network routers. Yet many developers only have a vague understanding of the different tree variants and their tradeoffs. Let’s fix that.

Binary Search Trees: The Foundation

A binary search tree (BST) maintains the invariant that for every node, all values in the left subtree are smaller and all values in the right subtree are larger.

class BSTNode:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

def search(node, target):
    if node is None:
        return None
    if target < node.value:
        return search(node.left, target)
    elif target > node.value:
        return search(node.right, target)
    return node

The beauty of BSTs is that search, insert, and delete are all O(log n)—if the tree is balanced. An unbalanced tree degrades to a linked list.

When to use: When you need ordered data with fast search, insertion, and deletion. Range queries (“find all values between 10 and 20”) are efficient.

When to avoid: When you don’t control insertion order. Inserting sorted data creates a completely unbalanced tree.

Self-Balancing Trees: Guaranteed Performance

To avoid degeneration, self-balancing trees automatically restructure during insertions and deletions.

Red-Black Trees

Each node is colored red or black, with rules that ensure no path from root to leaf is more than twice as long as any other.

Red-black trees are the workhorse of standard libraries. Java’s TreeMap, C++’s std::map, and Linux’s process scheduler all use them.

Tradeoff: Slightly slower insertions than unbalanced BSTs (due to rebalancing), but guaranteed O(log n) worst case.

AVL Trees

Stricter balancing than red-black trees: the heights of left and right subtrees differ by at most one. This means faster lookups but more rotations during modifications.

When to prefer AVL: Read-heavy workloads where lookup speed matters more than insertion speed.

B-Trees

Unlike binary trees, B-trees have many children per node (often hundreds). This minimizes disk seeks, making them ideal for databases and file systems.

When you query a database index, you’re almost certainly traversing a B-tree or its variant, the B+ tree.

When to use: Any time data lives on disk. The branching factor is tuned to match disk block sizes.

Heaps: Priority Queues Made Simple

A heap is a complete binary tree where each parent is smaller (min-heap) or larger (max-heap) than its children. Unlike BSTs, there’s no left-right ordering between siblings.

import heapq

tasks = []
heapq.heappush(tasks, (1, "urgent"))
heapq.heappush(tasks, (3, "low priority"))
heapq.heappush(tasks, (2, "normal"))

while tasks:
    priority, task = heapq.heappop(tasks)
    print(f"Processing: {task}")

The magic of heaps is that the minimum (or maximum) element is always at the root, accessible in O(1). Insertion and removal are O(log n).

When to use: Priority queues, finding k largest/smallest elements, heap sort, Dijkstra’s algorithm.

Implementation note: Heaps are typically stored as arrays, not linked nodes. The children of index i are at 2i+1 and 2i+2. This gives excellent cache performance.

Tries: Prefix Trees for Strings

A trie stores strings character by character, with shared prefixes sharing nodes.

        root
       / | \
      c  d  t
      |  |  |
      a  o  o
     /|  |  |
    r t  g  p
    |
    s

This trie contains: “car”, “cars”, “cat”, “dog”, “top”.

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word):
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True

    def starts_with(self, prefix):
        node = self.root
        for char in prefix:
            if char not in node.children:
                return False
            node = node.children[char]
        return True

When to use: Autocomplete, spell checkers, IP routing tables, dictionary implementations. Any time you need prefix matching.

Space consideration: Tries can use significant memory. Consider compressed tries (radix trees) for long strings with few shared prefixes.

Segment Trees: Range Query Specialists

When you need to answer range queries (“sum of elements from index 3 to 7”) and also update elements, segment trees shine.

Each node represents a range. The root covers the entire array; leaves cover individual elements. Queries and updates are both O(log n).

When to use: Range sum queries, range minimum/maximum queries, computational geometry. Common in competitive programming.

Choosing the Right Tree

Problem	Best Tree
Sorted key-value store	Red-Black Tree
Database index	B+ Tree
Priority queue	Heap
Autocomplete	Trie
Range queries with updates	Segment Tree
Read-heavy ordered data	AVL Tree

The Meta-Lesson

Trees are about tradeoffs. Stricter balancing means faster reads but slower writes. Higher branching factors mean shallower trees but more complex nodes. Dense arrays mean better cache performance but fixed sizing.

Understanding these tradeoffs—not just memorizing implementations—is what separates developers who choose the right data structure from those who default to whatever’s convenient.

Contents