This is an overview of core algorithms and analysis techniques likely seen in CS: sorting, graphs, unionfind and MST, greedy vs dynamic programming, string matching, and an intro to NP.
Asymptotic Analysis & Techniques
- Big O, Big Theta, Big Omega
Describe how runtime or memory grows with input size
n. For algorithms we care about the worst case O(…). - Common runtimes
- O(1) -> constant
- O(log n) -> binary search
- O(n) -> linear scan
- O(n log n) -> good comparison sorts, many divide & conquer algorithms
- O(n²) -> simple nested loops (naive DP, insertion sort worst case)
- O(2ⁿ), O(n!) -> exponential / factorial (brute force subset / permutation search)
- Recurrences
Divide & conquer algorithms are often expressed as recurrences
T(n) = 2T(n/2) + O(n)for mergesort, solved using the Master theorem. - Amortized analysis
Average cost per operation over a sequence, even if single operations may be expensive (example: dynamic array resize, unionfind path compression).
Sorting Algorithms
Insertion Sort
- Idea
Build the sorted array one element at a time by inserting each element into the already sorted prefix.
- Complexity
- Best: O(n) (already sorted)
- Average/Worst: O(n²)
- In place, stable
- C++ example
void insertion_sort(std::vector<int>& v) {
int n = (int)v.size();
for (int i = 1; i < n; ++i) {
int key = v[i];
int j = i - 1;
while (j >= 0 && v[j] > key) {
v[j + 1] = v[j];
--j;
}
v[j + 1] = key;
}
}
Mergesort
- Idea
Divide array into halves, sort each half recursively, then merge two sorted halves.
- Complexity
- Always O(n log n)
- Stable, not in place (needs O(n) extra space)
void merge(std::vector<int>& a, int l, int m, int r) {
std::vector<int> tmp;
int i = l, j = m + 1;
while (i <= m && j <= r) {
if (a[i] <= a[j]) tmp.push_back(a[i++]);
else tmp.push_back(a[j++]);
}
while (i <= m) tmp.push_back(a[i++]);
while (j <= r) tmp.push_back(a[j++]);
for (int k = 0; k < (int)tmp.size(); ++k) {
a[l + k] = tmp[k];
}
}
void mergesort(std::vector<int>& a, int l, int r) {
if (l >= r) return;
int m = l + (r - l) / 2;
mergesort(a, l, m);
mergesort(a, m + 1, r);
merge(a, l, m, r);
}
Quicksort
- Idea
Choose a pivot, partition array into < pivot and >= pivot, recursively sort partitions.
- Complexity
- Average: O(n log n)
- Worst: O(n²) (bad pivot choices, e.g. sorted input with naive pivot)
- In place, not stable; great constants -> used in practice
int partition(std::vector<int>& a, int l, int r) {
int pivot = a[r];
int i = l;
for (int j = l; j < r; ++j) {
if (a[j] < pivot) {
std::swap(a[i], a[j]);
++i;
}
}
std::swap(a[i], a[r]);
return i;
}
void quicksort(std::vector<int>& a, int l, int r) {
if (l >= r) return;
int p = partition(a, l, r);
quicksort(a, l, p - 1);
quicksort(a, p + 1, r);
}
Heapsort
- Idea
Build a max heap from the array, repeatedly extract the maximum to the end.
- Complexity
- O(n log n) worst case
- In place, not stable
void heapify(std::vector<int>& a, int n, int i) {
int largest = i;
int L = 2 * i + 1, R = 2 * i + 2;
if (L < n && a[L] > a[largest]) largest = L;
if (R < n && a[R] > a[largest]) largest = R;
if (largest != i) {
std::swap(a[i], a[largest]);
heapify(a, n, largest);
}
}
void heapsort(std::vector<int>& a) {
int n = (int)a.size();
for (int i = n / 2 - 1; i >= 0; --i)
heapify(a, n, i);
for (int i = n - 1; i > 0; --i) {
std::swap(a[0], a[i]);
heapify(a, i, 0);
}
}
Graphs & Traversals
- Representation
- Adjacency list:
vector<vector<int>>-> good for sparse graphs. - Adjacency matrix:
vector<vector<bool>>-> simple, but O(n²) memory.
- Adjacency list:
- BFS (Breadth First Search)
Level by level traversal using a queue. Finds shortest path in unweighted graphs.
- DFS (Depth First Search)
Goes as deep as possible along each branch using recursion or a stack. Used for cycle detection, topological sort, connected components.
BFS Example (Shortest Path in Unweighted Graph)
std::vector<int> bfs_shortest_paths(
const std::vector<std::vector<int>>& adj, int start) {
int n = (int)adj.size();
std::vector<int> dist(n, -1);
std::queue<int> q;
dist[start] = 0;
q.push(start);
while (!q.empty()) {
int u = q.front(); q.pop();
for (int v : adj[u]) {
if (dist[v] == -1) {
dist[v] = dist[u] + 1;
q.push(v);
}
}
}
return dist;
}
Dijkstra's Algorithm (Shortest Path)
- Use case
Single source shortest paths on a weighted graph with non negative edge weights.
- Complexity
- Using binary heap (priority queue): O((V + E) log V)
- Key idea
Greedily expand the node with the smallest current distance, once popped from the priority queue, its distance is final.
using Edge = std::pair<int,int>; // (neighbor, weight)
std::vector<int> dijkstra(
const std::vector<std::vector<Edge>>& adj, int src) {
const int INF = 1e9;
int n = (int)adj.size();
std::vector<int> dist(n, INF);
dist[src] = 0;
using State = std::pair<int,int>; // (dist, node)
std::priority_queue<State, std::vector<State>,
std::greater<State>> pq;
pq.push({0, src});
while (!pq.empty()) {
auto [d, u] = pq.top(); pq.pop();
if (d != dist[u]) continue; // stale
for (auto [v, w] : adj[u]) {
if (dist[v] > dist[u] + w) {
dist[v] = dist[u] + w;
pq.push({dist[v], v});
}
}
}
return dist;
}
unionfind (Disjoint Set Union, DSU)
- Purpose
Maintain a collection of disjoint sets with two main operations:
find(x)(which set?) andunion(x,y)(merge sets). Used heavily in Kruskal's MST algorithm. - Optimizations
- Path compression in
find() - Union by rank/size
- Amortized almost O(1) per operation
- Path compression in
struct DSU {
std::vector<int> parent, sz;
DSU(int n) : parent(n), sz(n, 1) {
std::iota(parent.begin(), parent.end(), 0);
}
int find(int x) {
if (parent[x] != x)
parent[x] = find(parent[x]); // path compression
return parent[x];
}
bool unite(int a, int b) {
a = find(a); b = find(b);
if (a == b) return false;
if (sz[a] < sz[b]) std::swap(a, b);
parent[b] = a;
sz[a] += sz[b];
return true;
}
};
Minimum Spanning Trees (MST)
For a connected, weighted, undirected graph, an MST is a subset of edges connecting all vertices with minimum total weight and no cycles.
Kruskal's Algorithm (with unionfind)
- Idea
- Sort all edges by weight.
- Process edges from smallest to largest.
- Add an edge if it connects two different components (no cycle), using unionfind.
- Complexity
O(E log E) due to sorting, unionfind operations are almost O(1).
struct Edge {
int u, v, w;
};
int kruskal(int n, std::vector<Edge> edges) {
std::sort(edges.begin(), edges.end(),
[](const Edge& a, const Edge& b) {
return a.w < b.w;
});
DSU dsu(n);
int total = 0;
for (auto &e : edges) {
if (dsu.unite(e.u, e.v)) {
total += e.w;
}
}
return total;
}
Prim's Algorithm (Alternative MST)
- Idea
Start from any node, always add the cheapest edge that connects the current tree to a new vertex (greedy, like Dijkstra but with edge weights only).
- Complexity
O((V + E) log V) with a heap.
Topological Sort
- Use case
Ordering of vertices in a directed acyclic graph (DAG) so every edge u->v goes from earlier to later in the order.
- Applications
Task scheduling with dependencies, resolving build order, etc.
- Algorithms
- DFS based: push node to list after exploring all neighbors, then reverse list.
- Kahn’s algorithm: repeatedly remove nodes with in degree 0.
// Kahn's algorithm
std::vector<int> topo_sort(const std::vector<std::vector<int>>& adj) {
int n = (int)adj.size();
std::vector<int> indeg(n, 0);
for (int u = 0; u < n; ++u)
for (int v : adj[u])
++indeg[v];
std::queue<int> q;
for (int i = 0; i < n; ++i)
if (indeg[i] == 0) q.push(i);
std::vector<int> order;
while (!q.empty()) {
int u = q.front(); q.pop();
order.push_back(u);
for (int v : adj[u]) {
if (--indeg[v] == 0)
q.push(v);
}
}
// If order.size() < n, graph had a cycle
return order;
}
Greedy Algorithms
- Key idea
At each step, pick the locally optimal choice and hope it leads to a global optimum. Correctness usually proven using "greedy choice" and "optimal substructure".
- Examples
- Interval scheduling (select maximum number of non overlapping intervals).
- Activity selection, Huffman coding, MST (Kruskal, Prim), Dijkstra.
Greedy Example: Interval Scheduling
Choose maximum number of non overlapping intervals by sorting by end time.
struct Interval {
int start, end;
};
int max_non_overlapping(std::vector<Interval> intervals) {
std::sort(intervals.begin(), intervals.end(),
[](const Interval& a, const Interval& b) {
return a.end < b.end;
});
int count = 0;
int last_end = std::numeric_limits<int>::min();
for (auto &iv : intervals) {
if (iv.start >= last_end) {
++count;
last_end = iv.end;
}
}
return count;
}
Dynamic Programming (DP)
- Key ideas
- Optimal substructure: optimal solution is composed of optimal sub solutions.
- Overlapping subproblems: same subproblems appear many times.
- Patterns
- 1D DP (e.g., Fibonacci, LIS length).
- 2D DP (e.g., edit distance, knapsack, DP on grid).
- Approaches
- Top down (memoization)
- Bottom up (tabulation)
DP Example: 0/1 Knapsack (Bottom Up)
int knapsack(const std::vector<int>& w,
const std::vector<int>& v,
int W) {
int n = (int)w.size();
std::vector<std::vector<int>> dp(n + 1, std::vector<int>(W + 1, 0));
for (int i = 1; i <= n; ++i) {
for (int cap = 0; cap <= W; ++cap) {
dp[i][cap] = dp[i - 1][cap]; // skip item
if (cap - w[i - 1] >= 0) {
dp[i][cap] = std::max(dp[i][cap],
dp[i - 1][cap - w[i - 1]] + v[i - 1]);
}
}
}
return dp[n][W];
}
String Matching
- Problem
Given text T and pattern P, find all occurrences of P in T.
- Naive algorithm
Try aligning P at each position in T -> O(nm) worst case.
- Better algorithms
- KMP (Knuth Morris Pratt): builds prefix function, O(n + m).
- Boyer Moore / Boyer Moore Horspool: skips ahead using character heuristics, good in practice.
- Aho Corasick: multi pattern matching, builds automaton (trie + failure links).
Naive String Matching
std::vector<int> find_naive(
const std::string& text, const std::string& pat) {
std::vector<int> pos;
int n = (int)text.size();
int m = (int)pat.size();
for (int i = 0; i + m <= n; ++i) {
int j = 0;
while (j < m && text[i + j] == pat[j]) {
++j;
}
if (j == m) pos.push_back(i);
}
return pos;
}
KMP Prefix Function
std::vector<int> build_lps(const std::string& pat) {
int m = (int)pat.size();
std::vector<int> lps(m, 0);
int len = 0;
for (int i = 1; i < m; ) {
if (pat[i] == pat[len]) {
lps[i++] = ++len;
} else if (len > 0) {
len = lps[len - 1];
} else {
lps[i++] = 0;
}
}
return lps;
}
std::vector<int> kmp_search(
const std::string& text, const std::string& pat) {
std::vector<int> pos;
if (pat.empty()) return pos;
auto lps = build_lps(pat);
int i = 0, j = 0;
while (i < (int)text.size()) {
if (text[i] == pat[j]) {
++i; ++j;
if (j == (int)pat.size()) {
pos.push_back(i - j);
j = lps[j - 1];
}
} else if (j > 0) {
j = lps[j - 1];
} else {
++i;
}
}
return pos;
}
Intro to NP, NP Complete, NP Hard
- P
Class of decision problems solvable in polynomial time by a deterministic machine.
- NP
Class of decision problems whose “yes” solutions can be verified in polynomial time given a certificate (or solvable in poly time by a nondeterministic machine).
- NP Complete
Hardest problems in NP: every NP problem can be reduced to them in polynomial time, and they are themselves in NP (SAT, 3-SAT, Clique, Hamiltonian path).
- NP Hard
At least as hard as NP complete, but not necessarily in NP (may not even have efficiently verifiable certificates).
- P vs NP
Open problem: is P = NP or P ≠ NP? We don’t know.
- Reductions
Show problem A is at least as hard as B by transforming instances of B into instances of A in polynomial time. Used to prove NP completeness.
Takeaways
- Sorting: Know insertion sort (concept), mergesort, quicksort, heapsort and their time/space trade offs.
- Graphs: Comfort with BFS/DFS, Dijkstra, topological sort, and basic representations.
- MST + unionfind: Kruskal uses unionfind, Prim is an alternative based on greedy expansion.
- Greedy vs DP: Greedy = local choices, DP = explicit subproblem states & transitions.
- String Matching: Naive is simple, KMP and BM/BMH/AC matter when performance matters.
- Complexity & NP: Understand Big O, recurrences, and the high level meaning of P, NP, NP complete.