Linear Algebra

Linear Independence

The redundancy test: does any vector carry no new information?

01 · First principlesThe question: is anything redundant?

You have a set of vectors {v₁, …, v_k}. The question linear independence answers is blunt: can any one of them be built from the others? If yes, that vector adds no new direction — throw it away and the span does not shrink. If no vector is buildable from the rest, the set is independent: every member earns its place.

Without this concept we cannot answer: how many directions do these vectors actually cover? Is this basis really a basis? Do my features measure k different things or fewer? Independence is the test that separates apparent count from true count.

Slogan: independent = no redundancy. Dependent = at least one vector is a remix of the others.

02 · The definitionWhy the textbook phrasing is the same test

The textbook says: v₁, …, v_k are linearly independent when the only way to combine them into zero is the trivial way.

c₁v₁ + c₂v₂ + … + c_kv_k = 0 ⇒ c₁ = c₂ = … = c_k = 0

the only recipe for zero is "use nothing"

This is the redundancy test in disguise. Suppose some nontrivial combination gives zero with, say, c₁ ≠ 0. Divide through by c₁ and rearrange:

v₁ = −(c₂/c₁)v₂ − … − (c_k/c₁)v_k

A nontrivial recipe for zero is exactly a recipe for one vector in terms of the others. The two statements are one statement. (The zero-combination form is preferred only because it treats all vectors symmetrically — it does not need to nominate a culprit.)

Geometrically: two vectors are dependent when they lie on one line; three are dependent when they lie in one plane. Dependence means the set fails to escape a lower-dimensional flat.

03 · The payoffUnique representation

Why insist on independence? Because it buys uniqueness. If b is in the span of an independent set, there is exactly one recipe for it. Proof in one line: two recipes for b subtract to a nontrivial recipe for zero, which independence forbids.

This is what makes coordinates meaningful. A basis is an independent set that spans the space, and "the coordinates of x in this basis" is a well-posed phrase only because independence guarantees a single answer. With a dependent spanning set, every vector has infinitely many representations and the word "coordinate" stops meaning anything.

Spanning gives existence of a recipe; independence gives uniqueness. A basis is the set that gives you both with not one vector to spare.

04 · The test in practiceHow you actually check

Stack the vectors as columns of a matrix A. Independence of the columns is a statement about that matrix, which connects this note to its siblings:

Statement about columns	Same fact about A	Where it lives
Columns independent	rank(A) = k (full column rank)	Rank and Span
Columns independent	Ax = 0 only for x = 0 (trivial null space)	Null Space
Square case, independent	det(A) ≠ 0, A invertible	Determinant, Inverse
Columns dependent	Some singular value is 0 (or ≈ 0 numerically)	Singular Matrices

Numerically nobody tests "exactly dependent"; floating point makes exact zeros rare. The honest tool is the smallest singular value of A: near zero means nearly dependent, which in practice causes the same trouble as dependent. More than n vectors in ℝⁿ are always dependent — you cannot fit n + 1 genuinely new directions into n dimensions.

05 · Why ML caresMulticollinearity: dependent features

In linear regression, the columns of the design matrix X are features. A feature that is (nearly) a linear combination of others — temperature in °C and in °F, or "total" alongside its parts — carries no new information, and the damage is concrete:

Weights become unidentifiable. If x₃ = x₁ + x₂, then weights (w₁, w₂, w₃) and (w₁ + c, w₂ + c, w₃ − c) produce identical predictions. The data cannot distinguish them: infinitely many solutions, exactly the non-uniqueness of section 03.
XᵀX becomes singular or ill-conditioned. The normal equations (XᵀX)w = Xᵀy involve inverting XᵀX, and dependent columns make it singular. Nearly dependent columns make the solution wildly sensitive: huge weights of opposite sign that cancel.
Interpretation dies first. Predictions can remain fine while individual coefficients (and their reported significance) become meaningless.

The standard fix is ridge regression: solve (XᵀX + λI)w = Xᵀy. The λI lifts every eigenvalue off zero and restores a unique, stable answer — buying identifiability at the price of a little shrinkage. The same near-dependence story, viewed through eigenvalues, is the subject of Singular Matrices.

Mental Model

Independence is a redundancy test: no vector in the set can be built from the others.
"Only the trivial combination gives zero" is the same test, phrased without naming a culprit.
Independence buys uniqueness of representation; spanning buys existence; a basis is both.
Columns independent ⇔ full rank ⇔ trivial null space ⇔ (square case) invertible — one fact, four notes.
In ML, dependent features = multicollinearity: unidentifiable weights, singular XᵀX, and ridge's λI as the cure.