91 SPECIALRELATIVITYANDFLATSPACETIMEvectors, known as the dimension of the space. (For a tangent space associated with a pointin Minkowski space, the dimension is of course four.)Let us imagine that at each tangent space we set up a basis of four vectors e(μ), withμ E [0, 1, 2,3] as usual. In fact let us say that each basis is adapted to the coordinates r’;that is, the basis vector e(r) is what we would normally think of pointing along the r-axis,etc. It is by no means necessary that we choose a basis which is adapted to any coordinatesystem at all, although it is often convenient. (We really could be more precise here, butlater on we will repeat the discussion at an excruciating level of precision, so some sloppinessnowisforgivable.) Thenany abstractvector A can bewritten asalinear combinationofbasis vectors:(1.23)A= A"é()The coefficients A" are the components of the vector A. More often than not we will forgetthe basis entirely and refer somewhat loosely to "the vector Au", but keep in mind thatthis is shorthand.The real vector is an abstract geometrical entity,while the componentsare just the coefficients of the basis vectors in some convenient basis. (Since we will usuallysuppress the explicit basis vectors, the indices will usually label components of vectors andtensors. This is why there are parentheses around the indices on the basis vectors, to remindus that this is a collection of vectors, not components of a single vector.)A standard example of a vector in spacetime is the tangent vector to a curve. A param-eterized curve orpath through spacetime is specified bythe coordinates as afunction of theparameter, e.g. r(). The tangent vector V() has componentsdrhVH=(1.24)dxThe entire vector is thus V = Vμé(p): Under a Lorentz transformation the coordinatesr change according to (1.11),while the parameterization ^ is unaltered; we can thereforededuce that the components of the tangent vector must change asVu→VH=A"VV.(1.25)However, the vector itself (as opposed to its components in some coordinate system)isinvariantunder Lorentztransformations.Wecanusethisfact toderivethetransformationproperties of the basis vectors.Let us refer to the set of basis vectors in the transformedcoordinate system as é(v). Since the vector is invariant, we haveV=Vré() = Vue(u)=AμVue(u).(1.26)But this relation must hold no matter what the numerical values of the components Vμ are.Thereforewe can saye() = A"ue(v) -(1.27)
1 SPECIAL RELATIVITY AND FLAT SPACETIME 9 vectors, known as the dimension of the space. (For a tangent space associated with a point in Minkowski space, the dimension is of course four.) Let us imagine that at each tangent space we set up a basis of four vectors ˆe(µ) , with µ ∈ {0, 1, 2, 3} as usual. In fact let us say that each basis is adapted to the coordinates x µ ; that is, the basis vector ˆe(1) is what we would normally think of pointing along the x-axis, etc. It is by no means necessary that we choose a basis which is adapted to any coordinate system at all, although it is often convenient. (We really could be more precise here, but later on we will repeat the discussion at an excruciating level of precision, so some sloppiness now is forgivable.) Then any abstract vector A can be written as a linear combination of basis vectors: A = A µ eˆ(µ) . (1.23) The coefficients Aµ are the components of the vector A. More often than not we will forget the basis entirely and refer somewhat loosely to “the vector A µ ”, but keep in mind that this is shorthand. The real vector is an abstract geometrical entity, while the components are just the coefficients of the basis vectors in some convenient basis. (Since we will usually suppress the explicit basis vectors, the indices will usually label components of vectors and tensors. This is why there are parentheses around the indices on the basis vectors, to remind us that this is a collection of vectors, not components of a single vector.) A standard example of a vector in spacetime is the tangent vector to a curve. A parameterized curve or path through spacetime is specified by the coordinates as a function of the parameter, e.g. x µ (λ). The tangent vector V (λ) has components V µ = dxµ dλ . (1.24) The entire vector is thus V = V µ eˆ(µ) . Under a Lorentz transformation the coordinates x µ change according to (1.11), while the parameterization λ is unaltered; we can therefore deduce that the components of the tangent vector must change as V µ → V µ ′ = Λµ ′ νV ν . (1.25) However, the vector itself (as opposed to its components in some coordinate system) is invariant under Lorentz transformations. We can use this fact to derive the transformation properties of the basis vectors. Let us refer to the set of basis vectors in the transformed coordinate system as ˆe(ν ′) . Since the vector is invariant, we have V = V µ eˆ(µ) = V ν ′ eˆ(ν ′) = Λν ′ µV µ eˆ(ν ′) . (1.26) But this relation must hold no matter what the numerical values of the components V µ are. Therefore we can say eˆ(µ) = Λν ′ µeˆ(ν ′) . (1.27)
101 SPECIALRELATIVITYANDFLATSPACETIMETo get the new basis e(u) in terms of the old one e()we should multiply by the inverseof the Lorentz transformation Aμ.But the inverse of a Lorentz transformation from theunprimed to the primed coordinates is also a Lorentz transformation, this time from theprimed to the unprimed systems. We will therefore introduce a somewhat subtle notation,by writing using the same symbol for both matrices, just with primed and unprimed indicesadjusted. That is,(A-1)"μ=A",(1.28)orA"A°=,A"A"=(1.29)where u is the traditional Kronecker delta symbol in four dimensions.(Note that Schutz usesa different convention, always arranging the two indices northwest/southeast; the importantthing is where the primes go.) From (1.27) we then obtain the transformation rule for basisvectors:(1.30)é(v) = Av"é(u) .Therefore the set of basis vectors transforms via the inverse Lorentz transformation of thecoordinates orvector components.It is worth pausing a moment to take all this in. We introduced coordinates labeled byupper indices, which transformed in a certain way under Lorentz transformations.We thenconsidered vectorcomponentswhichalsowerewritten withupperindices,whichmadesensesince they transformed in the same way as the coordinate functions. (In a fixed coordinatesystem, each of the four coordinates r can be thought of as a function on spacetime, ascan each of the four components of a vector field.) The basis vectors associated with thecoordinate system transformed via the inverse matrix, and were labeled by a lower index.This notation ensured that the invariant object constructed by summing over the componentsand basis vectors was left unchanged by the transformation,just as we would wish.It'sprobably not giving too much away to say that this will continue to be the case for morecomplicated objects with multiple indices (tensors).Once we have set up a vector space, there is an associated vector space (of equal dimen-sion)which we can immediately define, known as the dual vector space. The dual spaceis usually denoted by an asterisk, so that the dual space to the tangent space Tp is calledthe cotangent space and denoted T.The dual space is the space of all linear maps fromthe original vector space to the real numbers; in math lingo, if w e T, is a dual vector, thenit acts as a map such that:(1.31)w(aV +bW) = aw(V) +bw(W) E R,where V, W are vectors and a, b are real numbers. The nice thing about these maps is thatthey form a vector space themselves; thus, if w and n are dual vectors, we have(1.32)(aw +bn)(V) = aw(V) + bn(V)
1 SPECIAL RELATIVITY AND FLAT SPACETIME 10 To get the new basis ˆe(ν ′) in terms of the old one ˆe(µ) we should multiply by the inverse of the Lorentz transformation Λν ′ µ. But the inverse of a Lorentz transformation from the unprimed to the primed coordinates is also a Lorentz transformation, this time from the primed to the unprimed systems. We will therefore introduce a somewhat subtle notation, by writing using the same symbol for both matrices, just with primed and unprimed indices adjusted. That is, (Λ−1 ) ν ′ µ = Λν ′ µ , (1.28) or Λν ′ µΛ σ ′ µ = δ σ ′ ν ′ , Λν ′ µΛ ν ′ ρ = δ µ ρ , (1.29) where δ µ ρ is the traditional Kronecker delta symbol in four dimensions. (Note that Schutz uses a different convention, always arranging the two indices northwest/southeast; the important thing is where the primes go.) From (1.27) we then obtain the transformation rule for basis vectors: eˆ(ν ′) = Λν ′ µ eˆ(µ) . (1.30) Therefore the set of basis vectors transforms via the inverse Lorentz transformation of the coordinates or vector components. It is worth pausing a moment to take all this in. We introduced coordinates labeled by upper indices, which transformed in a certain way under Lorentz transformations. We then considered vector components which also were written with upper indices, which made sense since they transformed in the same way as the coordinate functions. (In a fixed coordinate system, each of the four coordinates x µ can be thought of as a function on spacetime, as can each of the four components of a vector field.) The basis vectors associated with the coordinate system transformed via the inverse matrix, and were labeled by a lower index. This notation ensured that the invariant object constructed by summing over the components and basis vectors was left unchanged by the transformation, just as we would wish. It’s probably not giving too much away to say that this will continue to be the case for more complicated objects with multiple indices (tensors). Once we have set up a vector space, there is an associated vector space (of equal dimension) which we can immediately define, known as the dual vector space. The dual space is usually denoted by an asterisk, so that the dual space to the tangent space Tp is called the cotangent space and denoted T ∗ p . The dual space is the space of all linear maps from the original vector space to the real numbers; in math lingo, if ω ∈ T ∗ p is a dual vector, then it acts as a map such that: ω(aV + bW) = aω(V ) + bω(W) ∈ R , (1.31) where V , W are vectors and a, b are real numbers. The nice thing about these maps is that they form a vector space themselves; thus, if ω and η are dual vectors, we have (aω + bη)(V ) = aω(V ) + bη(V ) . (1.32)
111 SPECIALRELATIVITYANDFLATSPACETIMETo make this construction somewhat more concrete, we can introduce a set of basis dualvectors()bydemanding()(e(m) = 81 .(1.33)Then every dual vector can be written in terms of its components, which we label with lowerindices:w=wg)(1.34)In perfect analogy with vectors, we will usually simply write wμ to stand for the entire dualvector.Infact,you will sometime seeelements of T,(what we have called vectors)referred toascontravariant vectors,and elements ofT*(what wehave called dual vectors)referredto as covariant vectors.Actually, if you just refer to ordinary vectors as vectors with upperindices and dual vectors as vectors with lower indices, nobody should be offended. Anothername for dual vectors is one-forms, a somewhat mysterious designation which will becomeclearer soon.The component notation leads to a simple way of writing the action of a dual vector ona vector:w(V) = wμVv((e(n)=(1.35)= WuVER.This is why it is rarely necessary to write the basis vectors (and dual vectors) explicitly; thecomponentsdoall of thework.Theformof (1.35)also suggests that we can think of vectorsas linear maps on dual vectors, by defining(1.36)V(w) = w(V) = wμVμTherefore, the dual space to the dual vector space is the original vector space itself.Of course in spacetime we will be interested not in a single vector space, but in fields ofvectors and dual vectors. (The set of all cotangent spaces over M is the cotangent bundle,T*(M).) In that case the action of a dual vector field on a vector field is not a single number,but a scalar (or just“function")on spacetime.A scalar is aquantity without indices,whichis unchanged underLorentztransformations.We can use the same arguments that we earlier used for vectors to derive the transfor-mation properties of dual vectors. The answers are, for the components,(1.37)Wμ=A"W,and for basis dual vectors,(0) = A°,8() .(1.38)
1 SPECIAL RELATIVITY AND FLAT SPACETIME 11 To make this construction somewhat more concrete, we can introduce a set of basis dual vectors ˆθ (ν) by demanding ˆθ (ν) (ˆe(µ)) = δ ν µ . (1.33) Then every dual vector can be written in terms of its components, which we label with lower indices: ω = ωµ ˆθ (µ) . (1.34) In perfect analogy with vectors, we will usually simply write ωµ to stand for the entire dual vector. In fact, you will sometime see elements of Tp (what we have called vectors) referred to as contravariant vectors, and elements of T ∗ p (what we have called dual vectors) referred to as covariant vectors. Actually, if you just refer to ordinary vectors as vectors with upper indices and dual vectors as vectors with lower indices, nobody should be offended. Another name for dual vectors is one-forms, a somewhat mysterious designation which will become clearer soon. The component notation leads to a simple way of writing the action of a dual vector on a vector: ω(V ) = ωµV ν ˆθ (µ) (ˆe(ν)) = ωµV ν δ µ ν = ωµV µ ∈ R . (1.35) This is why it is rarely necessary to write the basis vectors (and dual vectors) explicitly; the components do all of the work. The form of (1.35) also suggests that we can think of vectors as linear maps on dual vectors, by defining V (ω) ≡ ω(V ) = ωµV µ . (1.36) Therefore, the dual space to the dual vector space is the original vector space itself. Of course in spacetime we will be interested not in a single vector space, but in fields of vectors and dual vectors. (The set of all cotangent spaces over M is the cotangent bundle, T ∗ (M).) In that case the action of a dual vector field on a vector field is not a single number, but a scalar (or just “function”) on spacetime. A scalar is a quantity without indices, which is unchanged under Lorentz transformations. We can use the same arguments that we earlier used for vectors to derive the transformation properties of dual vectors. The answers are, for the components, ωµ′ = Λµ′ νων , (1.37) and for basis dual vectors, ˆθ (ρ ′ ) = Λρ ′ σ ˆθ (σ) . (1.38)
121SPECIALRELATIVITYANDFLATSPACETIMEThis is just what we would expect from index placement; the components of a dual vectortransform under the inverse transformation of those of a vector. Note that this ensures thatthe scalar (1.35) is invariant under Lorentz transformations, just as it should beLet's consider some examples of dual vectors, first in other contexts and then in Minkowskispace. Imagine the space of n-component column vectors, for some integer n. Then the dualspace is that of n-component row vectors, and the action is ordinary matrix multiplication:(VI)V2Vw = (wi w2 .. wn),VnLV2w,vi(1.39)w(V) = (wi w2* wn)VnAnother familiar example occurs in quantum mechanics, where vectors in the Hilbert spaceare represented by kets, (b). In this case the dual space is the space of bras, (ol, and theaction gives the number (@).(This is a complex number in quantum mechanics, but theidea is precisely the same.)In spacetime the simplest example of a dual vector is the gradient of a scalar function,the set of partial derivatives with respect to the spacetime coordinates, which we denote by"d":0_g()(1.40)dp=OrHThe conventional chain rule used to transform partial derivatives amounts in this case to thetransformationruleofcomponentsofdualvectors:0OrpOrHOruOru00(1.41)Oruwhere we have used (1.11)and (1.28) to relate the Lorentz transformation to the coordinatesThe fact that the gradient is a dual vector leads to the following shorthand notations forpartial derivatives:a0(1.42)0 = 00= 0, :
1 SPECIAL RELATIVITY AND FLAT SPACETIME 12 This is just what we would expect from index placement; the components of a dual vector transform under the inverse transformation of those of a vector. Note that this ensures that the scalar (1.35) is invariant under Lorentz transformations, just as it should be. Let’s consider some examples of dual vectors, first in other contexts and then in Minkowski space. Imagine the space of n-component column vectors, for some integer n. Then the dual space is that of n-component row vectors, and the action is ordinary matrix multiplication: V = V 1 V 2 · · · V n , ω = (ω1 ω2 · · · ωn) , ω(V ) = (ω1 ω2 · · · ωn) V 1 V 2 · · · V n = ωiV i . (1.39) Another familiar example occurs in quantum mechanics, where vectors in the Hilbert space are represented by kets, |ψi. In this case the dual space is the space of bras, hφ|, and the action gives the number hφ|ψi. (This is a complex number in quantum mechanics, but the idea is precisely the same.) In spacetime the simplest example of a dual vector is the gradient of a scalar function, the set of partial derivatives with respect to the spacetime coordinates, which we denote by “d”: dφ = ∂φ ∂xµ ˆθ (µ) . (1.40) The conventional chain rule used to transform partial derivatives amounts in this case to the transformation rule of components of dual vectors: ∂φ ∂xµ′ = ∂xµ ∂xµ′ ∂φ ∂xµ = Λµ′ µ ∂φ ∂xµ , (1.41) where we have used (1.11) and (1.28) to relate the Lorentz transformation to the coordinates. The fact that the gradient is a dual vector leads to the following shorthand notations for partial derivatives: ∂φ ∂xµ = ∂µφ = φ, µ . (1.42)
131 SPECIALRELATIVITYANDFLATSPACETIME(Very roughly speaking, “rt has an upper index, but when it is in the denominator of aderivative it implies a lower index on the resulting object.")I'm not a big fan of the commanotation, but we will use Oμ all the time. Note that the gradient does in fact act in a naturalway on the example we gave above of a vector, the tangent vector to a curve. The result isordinary derivative of the function along the curve:orhd(1.43)As a final note on dual vectors, there is a way to represent them as pictures which isconsistent with the picture of vectors as arrows. See the discussion in Schutz, or in MTW(where it is taken to dizzying extremes).A straightforward generalization of vectors and dual vectors is the notion of a tensorJust as a dual vector is a linear map from vectors to R, a tensor T of type (or rank) (k,l)is a multilinear map from a collection of dual vectors and vectors to R:T:T,..×T×T,×...×T,→R(1.44)(k times)(l times)Here, “"x" denotes the Cartesian product, so that for example T, × T, is the space of orderedpairs of vectors. Multilinearity means that the tensor acts linearly in each of its arguments;for instance, for a tensor of type (1,1), we haveT(aw +bn,cV + dW) = acT(w, V) + adT(w, W) + bcT(n, V) + bdT(n, W) .(1.45)From this point of view, a scalar is a type (0, O) tensor, a vector is a type (1, O) tensor, anda dual vector is a type (, 1) tensor.The space of all tensors of a fixed type (k,I) forms a vector space; they can be addedtogether and multiplied by real numbers. To construct a basis for this space, we need todefine a new operation known as the tensor product, denoted by . If T is a (k,l) tensorand S is a (m, n) tensor, we define a (k + m, I + n) tensor T S byT@ S((1),.,.(k),...,.w(k+m), V(1),..., V(),., V(+n)),w(+m), V(+1),..., V(+n) (1.46)= T(w(1),...,w(k), V(1),..., V()S(w(k+1),.(Note that the w() and V() are distinct dual vectors and vectors, not components thereof.)In otherwords, first act T on theappropriate set of dual vectors and vectors, and then actS on the remainder, and then multiply the answers. Note that, in general, T S+ S T.It is now straightforward to construct a basis for the space of all (k,I) tensors, by takingtensor products of basis vectors and dual vectors; this basis will consist of all tensors of theforme(u) @...8e(μn) 8d(v1) 0...@d(u) .(1.47)
1 SPECIAL RELATIVITY AND FLAT SPACETIME 13 (Very roughly speaking, “x µ has an upper index, but when it is in the denominator of a derivative it implies a lower index on the resulting object.”) I’m not a big fan of the comma notation, but we will use ∂µ all the time. Note that the gradient does in fact act in a natural way on the example we gave above of a vector, the tangent vector to a curve. The result is ordinary derivative of the function along the curve: ∂µφ ∂xµ ∂λ = dφ dλ . (1.43) As a final note on dual vectors, there is a way to represent them as pictures which is consistent with the picture of vectors as arrows. See the discussion in Schutz, or in MTW (where it is taken to dizzying extremes). A straightforward generalization of vectors and dual vectors is the notion of a tensor. Just as a dual vector is a linear map from vectors to R, a tensor T of type (or rank) (k, l) is a multilinear map from a collection of dual vectors and vectors to R: T : T ∗ p × · · · × T ∗ p × Tp × · · · × Tp → R (k times) (l times) (1.44) Here, “×” denotes the Cartesian product, so that for example Tp ×Tp is the space of ordered pairs of vectors. Multilinearity means that the tensor acts linearly in each of its arguments; for instance, for a tensor of type (1, 1), we have T(aω + bη, cV + dW) = acT(ω, V ) + adT(ω, W) + bcT(η, V ) + bdT(η, W) . (1.45) From this point of view, a scalar is a type (0, 0) tensor, a vector is a type (1, 0) tensor, and a dual vector is a type (0, 1) tensor. The space of all tensors of a fixed type (k, l) forms a vector space; they can be added together and multiplied by real numbers. To construct a basis for this space, we need to define a new operation known as the tensor product, denoted by ⊗. If T is a (k, l) tensor and S is a (m, n) tensor, we define a (k + m, l + n) tensor T ⊗ S by T ⊗ S(ω (1), . . ., ω(k) , . . ., ω(k+m) , V (1), . . ., V (l) , . . ., V (l+n) ) = T(ω (1), . . ., ω(k) , V (1), . . ., V (l) )S(ω (k+1), . . ., ω(k+m) , V (l+1), . . ., V (l+n) ) . (1.46) (Note that the ω (i) and V (i) are distinct dual vectors and vectors, not components thereof.) In other words, first act T on the appropriate set of dual vectors and vectors, and then act S on the remainder, and then multiply the answers. Note that, in general, T ⊗ S 6= S ⊗ T. It is now straightforward to construct a basis for the space of all (k, l) tensors, by taking tensor products of basis vectors and dual vectors; this basis will consist of all tensors of the form eˆ(µ1) ⊗ · · · ⊗ eˆ(µk) ⊗ ˆθ (ν1) ⊗ · · · ⊗ ˆθ (νl) . (1.47)