Linear Algebra

Linear Independence

The redundancy test: does any vector carry no new information?

01 · First principlesThe question: is anything redundant?

You have a set of vectors {v₁, …, vk}. The question linear independence answers is blunt: can any one of them be built from the others? If yes, that vector adds no new direction — throw it away and the span does not shrink. If no vector is buildable from the rest, the set is independent: every member earns its place.

Without this concept we cannot answer: how many directions do these vectors actually cover? Is this basis really a basis? Do my features measure k different things or fewer? Independence is the test that separates apparent count from true count.

Slogan: independent = no redundancy. Dependent = at least one vector is a remix of the others.

02 · The definitionWhy the textbook phrasing is the same test

The textbook says: v₁, …, vk are linearly independent when the only way to combine them into zero is the trivial way.

c₁v₁ + c₂v₂ + … + ckvk = 0  ⇒  c₁ = c₂ = … = ck = 0
the only recipe for zero is "use nothing"

This is the redundancy test in disguise. Suppose some nontrivial combination gives zero with, say, c₁ ≠ 0. Divide through by c₁ and rearrange:

v₁ = −(c₂/c₁)v₂ − … − (ck/c₁)vk

A nontrivial recipe for zero is exactly a recipe for one vector in terms of the others. The two statements are one statement. (The zero-combination form is preferred only because it treats all vectors symmetrically — it does not need to nominate a culprit.)

Geometrically: two vectors are dependent when they lie on one line; three are dependent when they lie in one plane. Dependence means the set fails to escape a lower-dimensional flat.

03 · The payoffUnique representation

Why insist on independence? Because it buys uniqueness. If b is in the span of an independent set, there is exactly one recipe for it. Proof in one line: two recipes for b subtract to a nontrivial recipe for zero, which independence forbids.

This is what makes coordinates meaningful. A basis is an independent set that spans the space, and "the coordinates of x in this basis" is a well-posed phrase only because independence guarantees a single answer. With a dependent spanning set, every vector has infinitely many representations and the word "coordinate" stops meaning anything.

Spanning gives existence of a recipe; independence gives uniqueness. A basis is the set that gives you both with not one vector to spare.

04 · The test in practiceHow you actually check

Stack the vectors as columns of a matrix A. Independence of the columns is a statement about that matrix, which connects this note to its siblings:

Statement about columnsSame fact about AWhere it lives
Columns independentrank(A) = k (full column rank)Rank and Span
Columns independentAx = 0 only for x = 0 (trivial null space)Null Space
Square case, independentdet(A) ≠ 0, A invertibleDeterminant, Inverse
Columns dependentSome singular value is 0 (or ≈ 0 numerically)Singular Matrices

Numerically nobody tests "exactly dependent"; floating point makes exact zeros rare. The honest tool is the smallest singular value of A: near zero means nearly dependent, which in practice causes the same trouble as dependent. More than n vectors in ℝⁿ are always dependent — you cannot fit n + 1 genuinely new directions into n dimensions.

05 · Why ML caresMulticollinearity: dependent features

In linear regression, the columns of the design matrix X are features. A feature that is (nearly) a linear combination of others — temperature in °C and in °F, or "total" alongside its parts — carries no new information, and the damage is concrete:

  1. Weights become unidentifiable. If x₃ = x₁ + x₂, then weights (w₁, w₂, w₃) and (w₁ + c, w₂ + c, w₃ − c) produce identical predictions. The data cannot distinguish them: infinitely many solutions, exactly the non-uniqueness of section 03.
  2. XᵀX becomes singular or ill-conditioned. The normal equations (XᵀX)w = Xᵀy involve inverting XᵀX, and dependent columns make it singular. Nearly dependent columns make the solution wildly sensitive: huge weights of opposite sign that cancel.
  3. Interpretation dies first. Predictions can remain fine while individual coefficients (and their reported significance) become meaningless.

The standard fix is ridge regression: solve (XᵀX + λI)w = Xᵀy. The λI lifts every eigenvalue off zero and restores a unique, stable answer — buying identifiability at the price of a little shrinkage. The same near-dependence story, viewed through eigenvalues, is the subject of Singular Matrices.

Mental Model