General linear transformation
Linear Transformations and Matrices
Section 5.1: Linear Transformations
Core Idea: Generalizing Structure-Preserving Maps
Having established the general notion of a Vector Space in Chapter 4, Section 5.1 aims to define the most important type of function between vector spaces: those that respect the underlying vector space operations (addition and scalar multiplication). These are called Linear Transformations.
Rationale for the Definition (Definition 5.1.1):
- What properties should a "natural" or "structure-preserving" map
have? Since vector spaces are defined by their addition and scalar multiplication, a natural map should interact nicely with these. - Preserving Addition: If we add two vectors in
and then apply , we should get the same result as if we first apply to each vector individually and then add the results in . That is, . - Preserving Scalar Multiplication: If we scale a vector in
by and then apply , we should get the same result as if we first apply to the vector and then scale the result in by . That is, . - The Definition: These two essential properties become the definition of a linear transformation. It's a function between vector spaces that "plays nice" with the operations that define those spaces.
- Broad Applicability: This definition is abstract enough to apply not just to functions between
and , but also to functions involving spaces of functions, matrices, etc., as seen in the examples (differentiation, integration). This abstraction allows for unifying concepts across different mathematical domains.
Rationale for Key Theorems (Uniqueness, Existence, Representation):
- Linearity Means Determined by Basis Action (Theorem 5.1.5): If we know what a linear transformation
does to a set of vectors that span the domain (in particular, a basis), then we know what it does to every vector in . - Rationale: Any
can be written as a linear combination of the spanning vectors, say . Because preserves addition and scalar multiplication, we have . So, the value of is completely determined by the values . This means linear transformations are highly structured; their behavior isn't arbitrary but fixed by their action on a relatively small set.
- Rationale: Any
- Freedom to Define on a Basis (Theorem 5.1.6): We can define a linear transformation
simply by choosing where each basis vector of should map to in . For any choice of target vectors , there exists a unique linear transformation such that for a basis of . - Rationale: This theorem guarantees existence and flexibility. It tells us that bases provide complete freedom to construct linear transformations with desired properties. We don't need a "formula" for
; specifying its action on a basis is enough to uniquely determine a valid linear .
- Rationale: This theorem guarantees existence and flexibility. It tells us that bases provide complete freedom to construct linear transformations with desired properties. We don't need a "formula" for
- Matrix Representation
(Definition 5.1.7): Since is determined by (where is a basis for ), and each can be uniquely represented by its coordinates relative to a basis of , we can encode the entire transformation as a matrix whose columns are these coordinate vectors . - Rationale: This provides a concrete way to represent potentially abstract linear transformations using a grid of numbers (a matrix). It bridges the abstract theory of linear transformations with the computational tools of matrix algebra. The choice of bases
and acts like choosing coordinate systems or "languages" to describe the transformation.
- Rationale: This provides a concrete way to represent potentially abstract linear transformations using a grid of numbers (a matrix). It bridges the abstract theory of linear transformations with the computational tools of matrix algebra. The choice of bases
- Matrix Operations Mirror Transformation Operations (Prop 5.1.15, 5.1.18): The definitions for adding matrices, multiplying a matrix by a scalar, and multiplying matrices are specifically chosen so that they correspond precisely to adding linear transformations, scaling linear transformations, and composing linear transformations, respectively, when viewed through their matrix representations relative to fixed bases.
- Rationale: This ensures that matrix algebra is a faithful computational model for the algebra of linear transformations. We can perform manipulations on matrices (which are often easier to compute with) and know that the results accurately reflect operations on the underlying functions. The complex-looking definition of matrix multiplication, for example, is exactly what's needed to make
work out.
- Rationale: This ensures that matrix algebra is a faithful computational model for the algebra of linear transformations. We can perform manipulations on matrices (which are often easier to compute with) and know that the results accurately reflect operations on the underlying functions. The complex-looking definition of matrix multiplication, for example, is exactly what's needed to make
In summary, Section 5.1 generalizes the idea of "structure-preserving maps" between vector spaces. The rationale is to define transformations based on the core vector space operations (addition and scalar multiplication), allowing the theory to apply broadly. Key results establish that these transformations are determined by their action on a basis and can be represented concretely by matrices once bases are chosen, with matrix operations directly mirroring the operations on the transformations themselves.
Let
We then have that T is a linear transformation.
Let
Let
(where is the zero vector of , and is the zero vector of ) for all for all and all
Let
Let
Let
Let
- The function
is a linear transformation - For all
, then function is a linear transformation
Let
for all
Let
Let
Section 5.2: The Range and Null Space of a Linear Transformation
Core Idea: Understanding the Input-Output Behavior of Linear Transformations
Given a linear transformation
- Range (Where do the outputs land?): What vectors in the codomain
are actually "hit" by the transformation ? - Null Space (What inputs get "lost"?): Which vectors in the domain
get mapped to the zero vector in the codomain?
Rationale for Range(T):
- Definition:
. It's simply the set of all achievable outputs. - Why Study It? The range tells us the "reach" of the transformation. Knowing the range is equivalent to knowing which equations of the form
have solutions. It directly relates to whether is surjective (onto) – is surjective if and only if . - Subspace Property (Prop 5.2.2): The range isn't just any subset of
; it's a subspace. - Rationale: This means the set of outputs inherits the algebraic structure of
. Linear combinations of achievable outputs are themselves achievable outputs, because is linear. This structure makes the range easier to analyze (e.g., finding a basis for it).
- Rationale: This means the set of outputs inherits the algebraic structure of
- Connection to Columns (Prop 5.2.3): For
with standard matrix , the range is the span of the columns of : . - Rationale: Any output
is, by definition of matrix-vector multiplication, equal to . Thus, the set of all outputs ( ) is precisely the set of all linear combinations of the columns ( ). This gives a very concrete way to compute and understand the range for standard matrix transformations.
- Rationale: Any output
Rationale for Null Space (Kernel) Null(T) (Definition 5.2.1):
- Definition:
. It's the set of all inputs mapped to the zero vector. - Why Study It? The null space captures what the transformation "collapses" or "nullifies". If
contains only , then distinguishes every non-zero vector from . If contains non-zero vectors, it means multiple distinct input vectors are being mapped to the same output vector . As we'll see, this relates directly to whether is injective (one-to-one). - Subspace Property (Prop 5.2.2): The null space is a subspace of the domain
. - Rationale: This confirms that the set of inputs mapping to zero has a stable algebraic structure. If
and map to , so does and , because is linear.
- Rationale: This confirms that the set of inputs mapping to zero has a stable algebraic structure. If
- Connection to Homogeneous Systems: For
with standard matrix , is exactly the solution set of the homogeneous system . This provides a direct computational method (Gaussian elimination) for finding the null space.
Rationale for Rank, Nullity, and the Rank-Nullity Theorem:
- Rank and Nullity (Def 5.2.8): Since the range and null space are subspaces, they have dimensions.
measures the "dimension of the output", while measures the "dimension of the inputs lost". - Rank-Nullity Theorem (Thm 5.2.10): For
( finite-dim), . - Rationale: This theorem expresses a fundamental conservation principle. The dimensions available in the domain
are perfectly accounted for: some dimension is "preserved" and shows up in the dimension of the range (rank), while the rest is "collapsed" into the dimension of the null space (nullity). The sum is always the original dimension of the input space. This provides a crucial link between the "size" of the domain, range, and null space.
- Rationale: This theorem expresses a fundamental conservation principle. The dimensions available in the domain
Rationale for Connecting Null Space/Range to Injectivity/Surjectivity/Solutions:
- Injectivity (Prop 5.2.11):
is injective if and only if . - Rationale: If
, then . For to be injective, this should only happen when . This means the only vector that can map to is . The nullity directly quantifies the "failure" of injectivity.
- Rationale: If
- Surjectivity:
is surjective if and only if . - Rationale: This is essentially the definition. Surjectivity means every vector in
is an output. The rank quantifies the "success" of surjectivity; is surjective iff .
- Rationale: This is essentially the definition. Surjectivity means every vector in
- Structure of Solutions to
(Cor 5.2.7): If is one particular solution ( ), then the set of all solutions is . - Rationale: Any two solutions
to must satisfy . So, their difference lies in the null space. This means any solution can be written as , where . Geometrically, the solution set is a "shifted" version of the null space.
- Rationale: Any two solutions
In summary, Section 5.2 defines and analyzes the range and null space because they are fundamental subspaces that reveal key aspects of a linear transformation's behavior – its outputs, what it collapses, its injectivity, and its surjectivity. The Rank-Nullity Theorem provides a crucial link between the dimensions of these spaces and the domain. These concepts are also essential for understanding the structure of solutions to linear systems.
Let
is a subspace of is a subspace of
Let
is a linear combination of the columns of
Thus,is the span of the columns of
Let
is surjective - Every row of
has a leading entry
Let
Let
- If
and , then - If
and , then
Let
Let
Let
Let
Let
Let
Let
- If
is injective, then - If
is surjective, then - If
is bijective, then
Suppose that
Let
Section 5.3: Determinants
Core Idea: Generalizing Signed Area/Volume and Linking it to Matrix Properties
Remember back in Section 3.4 we defined the determinant for
- Geometric Intuition: To define a number associated with
vectors in (or an matrix) that represents the signed -dimensional volume of the parallelepiped they form. The sign should indicate orientation (like right-hand vs. left-hand rule in ). - Algebraic Properties: To find a function that behaves predictably and has useful algebraic properties, especially concerning matrix operations and invertibility.
The Rationale - Defining the Determinant Axiomatically (Definition 5.3.1):
- Challenge: Defining "volume" and "orientation" geometrically becomes very difficult and non-intuitive for dimensions
. - Solution: Instead of a direct geometric formula, the determinant is defined axiomatically. We list the essential properties that a signed
-dimensional volume function should satisfy: - Normalization: The volume of the standard unit "hypercube" formed by
should be 1. ( ). - Degeneracy: If you repeat a vector (i.e.,
), the parallelepiped is "flat" in at least one direction and should have zero -dimensional volume. ( ). - Linearity (Scaling): Scaling one of the vectors by
should scale the signed volume by . ( ). This handles both magnitude scaling and orientation reversal (if ). - Linearity (Addition): This property (
) is less geometrically obvious but is crucial for algebraic manipulation and ensures compatibility with vector addition.
- Normalization: The volume of the standard unit "hypercube" formed by
- Uniqueness (Theorem 5.3.2): A fundamental (and non-trivial) result is that for any
, there exists exactly one function satisfying these axioms. This unique function is the determinant, . This justifies defining the determinant via these properties.
The Rationale - Connecting Determinants to Row Operations and Properties:
Defining the determinant of a matrix
- Swapping Rows (Prop 5.3.3): Swapping two rows multiplies the determinant by
. - Rationale: Follows algebraically from the axioms. Geometrically corresponds to changing the orientation.
- Scaling a Row (Axiom 3): Multiplying a row by
multiplies the determinant by . - Rationale: Directly from the scaling property of the determinant function.
- Adding a Multiple of One Row to Another (Prop 5.3.4): This operation does not change the determinant.
- Rationale: Follows algebraically from linearity and the degeneracy property (
). Geometrically, this corresponds to a "shear" transformation of the parallelepiped, which preserves volume.
- Rationale: Follows algebraically from linearity and the degeneracy property (
The Rationale - Computational Methods and Key Theorems:
- Computation via Row Reduction: The properties above provide a practical method to compute determinants. Use row operations (mostly row combinations, which don't change the determinant, and swaps, which just flip the sign) to reduce the matrix to an upper triangular form. The determinant of a triangular matrix is just the product of the diagonal entries (Prop 5.3.10). Keep track of the sign changes from swaps.
- Rationale: This leverages the efficient Gaussian elimination process and the simple determinant calculation for triangular matrices.
- Cofactor Expansion (Theorem 5.3.14): Provides a recursive formula to compute determinants.
- Rationale: This formula arises naturally when fully expanding the determinant definition using multilinearity. It connects the determinant of an
matrix to determinants of smaller matrices. While often slower than row reduction for large matrices, it's important theoretically and useful for smaller cases (like ).
- Rationale: This formula arises naturally when fully expanding the determinant definition using multilinearity. It connects the determinant of an
- Determinant and Invertibility (Corollary 5.3.11):
is invertible . - Rationale: This is arguably the most important property algebraically. If
, it means the rows are linearly dependent (Prop 5.3.5, Prop 5.3.8), so the matrix can't be row reduced to the identity, hence it's not invertible. Conversely, if is invertible, its RREF is , and . Since row operations only multiply the determinant by non-zero numbers, must have been non-zero. Geometrically, invertibility requires the transformation not to collapse space into a lower dimension, meaning the volume factor ( ) must be non-zero.
- Rationale: This is arguably the most important property algebraically. If
- Determinant of a Product (Theorem 5.3.15):
. - Rationale: This connects determinants with matrix multiplication (and thus function composition). The volume scaling factor of a composition of transformations is the product of the individual scaling factors.
In essence, Section 5.3 defines the determinant as a function capturing signed volume and satisfying key linearity properties. This function provides a powerful tool linking the geometry of transformations (volume scaling, orientation) with the algebra of matrices (invertibility, row operations) and provides methods for computation.
For each
Let
Let
Let
Suppose that
If
If
If
For any
Let
If
Section 5.4: Eigenvalues and Eigenvectors
Core Idea: Finding the "Natural Axes" of a Linear Transformation
When a linear transformation
Rationale for Eigenvalues and Eigenvectors (Definition 5.4.1 / 5.4.3):
- Geometric Motivation: Imagine applying a linear transformation
. Most vectors will be moved and rotated to point in a different direction from where they started. However, some special vectors might just get stretched or shrunk, so is parallel to . This means for some scalar . These vectors (which must be non-zero by definition) point along the "natural axes" or fundamental directions of the transformation . The scalar tells us the scaling factor along that direction. - Why
? The equation only makes sense if the input and the output live in the same vector space . - Simplifying Transformations: If we can find a basis consisting entirely of eigenvectors, then the action of
becomes very simple when described relative to that basis – it's just scaling along the basis directions. This was the motivation hinted at in Section 3.2 when a change of basis made the matrix diagonal.
Rationale for Eigenspace (Proposition 5.4.2):
- Definition: For a given eigenvalue
, the eigenspace is the set of all vectors (including ) such that . - Why Study It? It groups together all vectors that are scaled by the same factor
. - Subspace Property: Eigenspaces are subspaces of
. - Rationale: This is crucial because it means the set of vectors behaving in this simple way (scaled by
) has structure. If and are scaled by , so is their sum and any scalar multiple . This allows us to find a basis for the eigenspace, summarizing all eigenvectors for efficiently.
- Rationale: This is crucial because it means the set of vectors behaving in this simple way (scaled by
Rationale for the Computational Approach (Matrices, Null Spaces, Determinants):
How do we actually find these special vectors and scalars for
- Connecting to Null Space (Prop 5.4.4): The core algebraic trick is rewriting the eigenvalue equation:
. - Rationale: This converts the eigenvalue problem into finding non-zero vectors in the null space of a different matrix,
. We already know how to find null spaces using Gaussian elimination.
- Rationale: This converts the eigenvalue problem into finding non-zero vectors in the null space of a different matrix,
- Finding Eigenvalues (Corollary 5.4.5 & Definition 5.4.6):
- A scalar
is an eigenvalue there exists a non-zero such that . - This means
is an eigenvalue . - For a square matrix
, is not invertible . - Therefore,
is an eigenvalue . - Rationale: This gives us a computational method! The expression
is a polynomial in (the characteristic polynomial). Its roots are precisely the eigenvalues of . Finding eigenvalues is reduced to finding roots of a polynomial.
- A scalar
- Finding Eigenvectors/Eigenspaces: Once an eigenvalue
is found (by solving ), the corresponding eigenvectors are simply the non-zero vectors in . The eigenspace is the entire null space . - Rationale: This links back to the null space calculation method from Section 4.2 and 5.2 (solving the homogeneous system
).
- Rationale: This links back to the null space calculation method from Section 4.2 and 5.2 (solving the homogeneous system
Rationale for Diagonalization (Definition 5.4.9 & Corollary 5.4.11):
- Goal: To find a basis
consisting entirely of eigenvectors. - Why? If such a basis exists, the matrix
becomes diagonal, with the eigenvalues on the diagonal (Prop 5.4.10). - Rationale: Relative to this basis, the transformation
is just simple scaling along the basis directions. This makes 's action easy to understand geometrically and computationally (e.g., for calculating powers ).
- Rationale: Relative to this basis, the transformation
- Definition: A transformation
(or its matrix ) is diagonalizable if such an eigenbasis exists. - Condition:
is diagonalizable if and only if we can find enough linearly independent eigenvectors to form a basis for the entire vector space . (Having distinct eigenvalues is sufficient, but not necessary).
In summary, Section 5.4 introduces eigenvalues and eigenvectors as the scaling factors and invariant directions that simplify the understanding of a linear transformation
Let
Let
Let
If
Let
is a diagonal matrix are all eigenvectors of
Furthermore, in this case, the diagonal entries ofare the eigenvalues corresponding to .
Let