*Behavioral and Brain Sciences*, 2001, **24**** **(4), 665-668

David H. Foster

Department of Optometry and NeuroscienceUniversity of Manchester Institute of Science and Technology

Manchester

M60 1QD

UK

d.h.foster@umist.ac.uk

http://www.op.umist.ac.uk/dhf.html

Shepard's analysis of how shape, motion, and color are perceptually represented
can be generalized. Apparent motion and shape may be associated with a group
of spatial transformations, accounting for rigid and plastic motion, and
perceived object color may be associated with a group of illuminant transformations,
accounting for the discriminability of surface-reflectance changes and illuminant
changes beyond daylight. The phenomenological and mathematical parallels
between these perceptual domains may indicate common organizational rules,
rather than specific ecological adaptations.

**[****BARLOW; HECHT; KUBOVY &
EPSTEIN; SCHWARTZ; SHEPARD; TODOROVIČ]**

For the biologically relevant properties of objects such as their position,
motion, shape, and color, what sorts of representational spaces offer the
possibility of yielding invariant psychological principles? The aim here
is to show that the analysis **SHEPARD **uses to address
this problem can be generalized. Thus, the phenomenon of rigid apparent motion
between sequentially presented objects is cast as a special case of more general
kinds of apparent motion and surface-color perception under daylight is cast
as an invariant of more general illuminant transformations. Supporting experimental
data are cited for each. As a side-effect of this generalization, it may
be more difficult to maintain the notion that the rules governing these phenomena
are specific adaptations to properties of the world, although they remain
illuminating (**SCHWARTZ**, this issue). As with **SHEPARD**'s
approach, the present analysis depends critically on choosing appropriate
perceptual representations, here based on the natural group structures of
the spaces involved.

Figure 1a, b shows two possible apparent-motion paths between two sequentially
presented bars placed at an angle to each other (adapted from Foster 1978).
Of all the possible paths, what determines the one actually perceived? As
proposed in Foster (1975b), one way to tackle this problem is to imagine
that each path, in some suitable space, has a certain cost or energy associated
with it, and, in accord with Maupertius, the path chosen is the one with
least energy. As shown later, energy can be defined in two natural ways: (1)
with reference to the space in which the object appears to transform; and
(2) with reference to the space of transformations acting on the object.
Neither is a subcase of the other (cf. **KUBOVY & EPSTEIN, TODOROVIČ**,** **this issue).

Figure 1. Three possible apparent-motion paths in the plane (adapted from Foster, 1978).

How should apparent-motion paths be described? Assume that a stimulus object
*A* and some transformed version of it *T*(*A*) are each defined
on a region *S* of some 2- or 3-dimensional smooth manifold constituting
visible space. The spatial transformation *T*, which describes the point-to-point
relationship between *A* and *T*(*A*), should be distinguished
from any dynamical process that instantiates this relationship. Depending
on the type of apparent motion (rigid or plastic; see Kolers 1972), an object
may change its position, its shape, or both. For the sake of generality,
therefore, assume that the transformations *T* are drawn from a set
**T** that is sufficiently large to allow all such possibilities (Foster
1978). For technical reasons, assume also that the space *S* is compact
and connected, and that **T** is a group, with neutral element the identity
transformation Id, taking *A* into itself. Although **T** is a large
group, including nonlinear transformations, it is not necessarily assumed
to coincide with the entire group of diffeomorphisms of *S*.

Apparent motion between *A* and *T*(*A*) can then be represented
as the generation by the visual system of a time-parameterized family *c*(*t*),
0 ≤ *t* ≤ 1, of transformations defining a path in **T** starting
at Id and ending at *T*; that is, *c*(0) = Id and *c*(1) =
*T*. (The actual time scale has been set to unity.)

As shown later, the group **T** can be given the structure of a Riemannian
manifold, so that at each point *T* of **T** there is an inner product
〈 , 〉 defined on the tangent space at *T* (the tangent space at a point
is simply the collection of all tangent vectors to all possible curves passing
through that point). The length ||*v*|| of a tangent vector *v*
is given by 〈*v*, *v*〉^{1/2}. The length of a path and
its (kinetic) energy can then be defined straightforwardly.

For each path *c* in the group **T** of transformations connecting
Id to *T*, its arclength *L*(*c*) is given by ∫ ||*c'*(*t*)||
*dt*, where *c'*(*t*) is the vector tangent to *c*
at *t* (i.e. the velocity at *c*(*t*); see Fig. 2) and the
integral is taken over the interval 0 ≤ *t* ≤ 1. If *c*, parameterized
by arclength, is not longer than any other path with the same start and endpoints,
then *c* is called a geodesic.

Figure 2.
Some paths between Id and *T* in the transformation group **T**.

The energy *E*(*c*) of *c* is given by ∫ ||*c*(*t*)||^{2}
*dt*, where the integral is again taken over the interval 0 ≤ *t*
≤ 1. It can be shown that the energy *E*(*c*) as a function of
*c* takes its minimum precisely on those paths between Id and *T*
that are geodesics. How, then, should the Riemannian metric || || be defined?

Assume that apparent motion is determined by the properties of the manifold
*S* in which the object appears to transform. As a subset of 2- or 3-dimensional
Euclidean space, *S* inherits the Euclidean metric | |. For a given
object *A* in *S*, the induced Riemannian metric || || = || ||_{1}
on **T** is defined thus. Let *c'*(*t*) be the vector tangent
to a path *c* in **T** at time *t* (remember that any tangent
vector can be represented in this way). As *c*(*t*) is a transformation
acting on *S*, it follows that, for each point *p* in *A*,
the vector (*c'*(*t*))(*p*) is tangent to the path (*c*(*s*))(*p*),
0 ≤ *s* ≤ 1, at *s* = *t*. The energy of *A* at time
*t* is simply the integral of |(*c'*(*t*))(*p*)|^{2}
over all *p* in *A*. Define ||*c'*(*t*)||_{1}
to be the integral of |(*c'*(*t*))(*p*)| over all *p*
in *A*.

If **T** is the group of rigid transformations (isometries) of *S*,
the geodesics produce the types of motion shown in Fig. 1a, where the rotating
motion of the bar takes place about its center of mass and the latter moves
in a straight line. A matrix formulation is given in Foster (1975b). This
is the motion of a free body in space. Yet, as Foster (1975b) and **SHEPARD
**point out, it is not the apparent motion that is most likely
to be observed.

Assume instead that apparent motion is determined by the properties of
the group **T** in which the path is described: the emphasis is thus on
transformations rather than on transforms. Because **T** is a group,
it has a natural Riemannian metric || || = || ||_{2}, compatible
with its group structure, obtained by translating an inner product on the
tangent space to **T** at Id. With respect to || ||_{2}, the geodesics
*c* that pass through Id are (segments of) 1-parameter subgroups of
**T**; that is, *c*(*s* + *t*) = *c*(*s*)*c*(*t*),
wherever they are defined.

If **T** is the group of rigid transformations of *S*, the geodesics
produce the types of motion shown in Fig. 1b, where the rotating motion
of the bar and the movement of the center of mass both take place about
the same point. A matrix formulation is given in Foster (1975b). When the
perceived paths are estimated by a probe or windowing technique, they are
found to fall closer to these "group" geodesics than to those associated
with object space, namely the free-body motions (Foster 1975b; McBeath &
Shepard 1989; Hecht & Proffitt 1991).

**SHEPARD**'s argument for the simplicity of geodesics
concentrates on their representation as rotations or screw displacements
in the group of rigid transformations of 3-dimensional space. In fact, their
simplicity has a more general basis (Carlton & Shepard 1990, Foster 1975b),
which, notwithstanding **TODOROVIČ**,
extends to the nonlinear motion shown in Fig. 1c between a straight bar and
curved bar, and to non-smooth motion between smooth and non-smooth objects
( Foster 1978, Kolers 1972). To enumerate: (1) group geodesics minimize energy
with respect to the natural metric on the group **T**; (2) they coincide
with the 1-parameter subgroups of **T**, and are therefore computationally
economic in that each may be generated by its tangent vector at the identity
Id (Shepard's uniformity principle; see Carlton & Shepard 1990); and
(3) as 1-parameter subgroups each geodesic naturally generates a vector field
on *S* (an assignment of a tangent vector at each point of *S*
varying smoothly from point to point). This assignment does not vary with
time; that is, the vector field is stationary. Conversely, a stationary vector
field generates a unique 1-parameter subgroup of transformations.

A moving fluid provides a useful example of the significance of stationarity. Its streamlines defined by the velocity vector field usually vary with time, but, if the vector field is stationary, then the streamlines are steady and represent the actual paths of the fluid particles.

In general, the geodesics derived from the natural metric of object space (free-body motions) do not generate stationary vector fields.

The stationarity of vector fields may be relevant to the question of whether
kinematic geometry internalizes specific properties of the world (**SHEPARD**,
this issue). Thus certain vector fields may reflect J. J. Gibson's "ambient
optic array" (**SHEPARD**), but they may also relate
directly to observers' actions. Some kinds of mental activity, including
preparation for movement (Richter et al. 2000) and mental rotation (Deutsch
et al. 1988), are associated with neuronal activity in the motor cortex and
related areas. Each of the vectors constituting a (stationary) vector field
could offer the most efficient template for elementary neural activity to
take object *A* into its transform *T*(*A*) (see comments
by** BARLOW**, this issue). In this sense, apparent motion
might be an internalization not of the ways in which objects move freely
in space (cf. **HECHT**, this issue) but of the ways in
which observers manipulate or interact with them. Such hypotheses are testable
(**BARLOW**; cf. **KUBOVY & EPSTEIN**,
this issue).

The foregoing analysis assumed that the energy of apparent motion is minimized
(Foster 1975b; 1978). **SHEPARD**'s approach assumes an
affine connection (Carlton & Shepard, 1990). The result, however, is
the same.

A connection on any manifold *M*, not necessarily Riemannian, is a
rule ∇ that uses one vector field *X* to transform another vector field
*Y* into a new vector field ∇* _{X}*(

A connection provides a sensible notion of parallelism with respect to
a path *c* in *M*. Let *Y*(*t*), 0 ≤ *t* ≤ 1, be
a parameterized family of vectors such that *Y*(*t*) is in the
tangent space to *M* at *c*(*t*). Then *Y* is said to
be parallel with respect to *c* if ∇* _{c'}*(

Now suppose that the manifold *M* has a Riemannian metric. A connection
∇ on *M* is compatible with the Riemannian metric if parallel translation
preserves inner products; that is, for any path *c* and any pair *X*,
*Y* of parallel vector fields along *c*, the inner product 〈*X*,
*Y*〉 is constant. According to the fundamental theorem of Riemannian
geometry, there is one and only one symmetric connection that is compatible
with its metric: the Levi-Civita connection.

The geodesics defined as length-minimizing paths in the group **T**
of transformations are therefore precisely the same as the geodesics defined
with respect to the Levi-Civita connection on **T**. The premiss adopted
by **SHEPARD **and Carlton and Shepard (1990) is therefore
formally equivalent to that in Foster (1975b).

A problem with geodesic-based schemes for apparent motion—whether based
on metrics or connections—is how to cost the degree to which object structure
is preserved. As Kolers (1972) and others have noted, if the rigid transformation
*T* relating two objects is sufficiently large, then the apparent motion
may become non-rigid or plastic, even if *T* has not reached a cut point
on the geodesic (e.g. an antipodal point on the sphere).

One way to accommodate this failure is to introduce an additional energy
function *E*_{1} that represents the cost of preserving metric
structure over a path. Such a notion is not implausible. In shape-recognition
experiments with stimulus displays too brief to involve useful eye movements
or mental rotation, performance is known still to depend strongly on planar
rotation angle. Thus, for a rigid transformation *T* far from Id,
the total energy of the geodesic *c* connecting Id and *T* would
be ∫ ||*c'*(*t*)||^{2} *dt* + *E*_{1}(*c*),
which could exceed the energy ∫ ||*b'*(*t*)||^{2} *dt*
+ *E*_{2}(*b*) of some other, longer path *b* connecting
Id and *T*, preserving a weaker non-metric structure with smaller
energy function *E*_{2}.

If this is true, there ought to be a close relationship between apparent motion and visual shape recognition.

The existence of rigid apparent motion between two objects implies that
a visual isometry can be established. In a shape-recognition experiment,
therefore, the two objects should be recognizable as each other. This hypothesis
has been confirmed for rotated random-dot patterns (Foster 1973). But how
should one deal with structures other than metric ones? In practice, one needs
a definition of structure that can be interpreted operationally in terms
of the transformations (isomorphisms) preserving that structure (Foster 1975a;
Van Gool et al. 1994). For (1) metric, (2) affine, (3) projective, and (4)
topological structures, their groups of isomorphisms form a nested sequence,
**T**_{1} ⊂ **T**_{2} ⊂** T**_{3} ⊂ **T**_{4}
. Accordingly, for one of these more general structures *i*, suppose
that transformation *T* is drawn from **T*** _{i}* and
that sequentially presenting object

The remainder of this commentary is concerned with perceived surface color, the analysis of which has parallels with the analysis of apparent motion.

The illumination on surfaces varies naturally, and the spectrum of the
light reaching the eye depends both on the reflectance function of the surface
and on the illuminant spectrum. **SHEPARD **suggests that
the intrinsically 3-dimensional nature of daylight is intimately linked to
how observers compensate for illuminant variations.

Yet the degree to which observers are color constant is limited, with levels in the unadapted eye rarely exceeding 0.6–0.7, where on a 0–1 scale 1 would be perfect constancy (for review, see Foster et al. 2001). In contrast, observers can rapidly, effortlessly, and reliably discriminate illuminant changes on a scene from simultaneous changes in the reflecting properties of its surfaces (Craven & Foster 1992). The sequential presentation of the stimuli generates a strong temporal cue: illuminant changes give a "wash" over the scene and reflectance changes a "pop-out" effect (Foster et al. 2001). The former is analogous to apparent motion between an object and its smooth transform, and the latter to split apparent motion between an object and its discontinuous transform.

If perceived surface color is not always preserved under illuminant changes, then what is invariant in discriminations of illuminant and material changes?

One possibility is that observers assess whether the perceived relations between the colors of surfaces are preserved, that is, whether relational color constancy holds. Relational color constancy is similar to color constancy but refers to the invariant perception of the relations between the colors of surfaces under illuminant changes. It has a physical substrate in the almost-invariant spatial ratios of cone excitations generated in response to light, including illuminants with random spectra, reflected from different illuminated surfaces (Foster & Nascimento 1994). There is strong evidence that observers use this ratio cue, even when it may not be reliable (Nascimento & Foster 1997).

In the language of geometric-invariance theory, relational color constancy is a relative invariant with respect to illuminant changes, and, in that sense, is a weaker notion than color constancy (Maloney 1999). But relational color constancy can be used to produce color-constant percepts. Again, the argument depends on group properties.

The set **T** of all illuminant transformations *T* is a one-to-one
copy of the multiplicative group of (everywhere-positive) functions defined
on the visible spectrum, and it accordingly inherits the group structure
of the latter. The group **T** induces (Foster & Nascimento 1994) a
canonical equivalence relation on the space **C** of all color signals
(each signal consisting of the reflected spectrum at each point in the image).
That is, *C*_{1} and *C*_{2} in **C** are related
if and only if *T*(*C*_{1}) = *C*_{2} for
some *T* in **T**.

The assumption of color constancy is that it is possible to find some *f*
that associates with each *C* in **C** a percept *f*(*C*)
that is invariant under illuminant transformations. Because **T** is
a group, there is a one-to-one correspondence between color-constant percepts
*f*(*C*) and equivalence classes [*C*] of illuminant-related
color signals. This formal equivalence between color constancy and relational
color constancy can be exploited in practical measurements (e.g. Foster et
al. 2001).

As **SHEPARD **points out, although we may not perceive
everything that could be perceived about each surface, we at least perceive
each surface as the same under all naturally occurring conditions of illumination,
and, as argued here, sometimes even under unnatural illuminants.

The representations of apparent motion and perceived shape and object color
are intimately associated with groups of spatial transformations. In **SHEPARD**'s
analysis, the geodesics for apparent motion are attributed to an affine connection,
but the same geodesics can be derived as the natural energy-minimizing paths
of a transformation group, which allows an additional energy function to
be introduced to accommodate rigid-motion breakdown, and more generalized
kinds of shape recognition. In** SHEPARD**'s analysis
of perceived object color, daylight illuminants have a special role, but
the same perceptual invariants may be obtained with a group of illuminant
transformations taking illuminants beyond the daylight locus.

What of the evidence? For rigid transformations in 2- and 3-dimensional space there is a clear bias towards motions following the natural transformation-group metric. There is also evidence that rigid apparent motion does not occur at angles of rotation where shape recognition does not occur, consistent with the proposed link between the two phenomena. Finally, there is evidence that observers can exploit violations of invariance of spatial color relations under illuminant transformations in a predictable way.

The phenomenological and mathematical parallels between these various
perceptual domains may not be consequences of **SHEPARD**'s
notion of adaptation to specific properties of the world. They do, however,
suggest an application of common organizational rules.

This work was supported by the Engineering and Physical Sciences Research Council. I thank E. Pauwels for helpful discussions and E. K. Oxtoby for critically reading the manuscript.

Carlton, E. H., Shepard, R. N. (1990). Psychologically simple motions as
geodesic paths. I. Asymmetric objects. *Journal of Mathematical Psychology*,
*34*, 127-188.

Craven, B. J., Foster, D. H. (1992). An operational approach to colour
constancy. *Vision Research*, *32*, 1359-1366.

Deutsch, G., Bourbon, W. T., Papanicolaou, A. C., Eisenberg, H. M. (1988).
Visuospatial tasks compared via activation of regional cerebral blood-flow.
*Neuropsychologia*, *26*, 445-452.

Foster, D. H. (1973). An experimental examination of a hypothesis connecting
visual pattern recognition and apparent motion. *Kybernetik*, *14*,
63-70.

Foster, D. H. (1975a). An approach to the analysis of the underlying structure
of visual space using a generalized notion of visual pattern recognition.
*Biological Cybernetics*, *17*, 77-79.

Foster, D. H. (1975b). Visual apparent motion and some preferred paths
in the rotation group *SO(3)*. *Biological Cybernetics*, *18*,
81-89.

Foster, D. H. (1978). Visual apparent motion and the calculus of variations.
In E. L. J. Leeuwenberg & H. F. J. M. Buffart (Eds.), *Formal Theories
of Visual Perception* (pp. 67-82). Chichester: Wiley.

Foster, D. H., Amano, K., Nascimento, S. M. C. (2001). How temporal cues
can aid colour constancy. *Color Research and Application*, *26 (suppl.)*,
S180-S185.

Foster, D. H., Nascimento, S. M. C. (1994). Relational colour constancy
from invariant cone-excitation ratios. *Proceedings of the Royal Society
of London, Series B*, *257*, 115-121.

Indow, T. (1999). Global structure of visual space as a united entity.
*Mathematical Social Sciences*, *38*, 377-392.

Kolers, P. A. (1972). *Aspects of motion perception*. Oxford: Pergamon.

Maloney, L. T. (1999). Physics-based approaches to modeling surface color
perception. In K. R. Gegenfurtner & L. T. Sharpe (Eds.), *Color Vision:
From Genes to Perception* (pp. 387-416). Cambridge: Cambridge University
Press.

McBeath, M. K., Shepard, R. N. (1989). Apparent motion between shapes
differing in location and orientation: A window technique for estimating
path curvature. *Perception & Psychophysics*, *46*, 333-337.

Nascimento, S. M. C., Foster, D. H. (1997). Detecting natural changes of
cone-excitation ratios in simple and complex coloured images. *Proceedings
of the Royal Society of London, Series B*, *264*, 1395-1402.

Richter, W., Somorjai, R., Summers, R., Jarmasz, M., Menon, R. S., Gati,
J. S., Georgopoulos, A. P., Tegeler, C., Ugurbil, K., Kim, S.-G. (2000).
Motor area activity during mental rotation studied by time- resolved single-trial
fMRI. *Journal of Cognitive Neuroscience*, *12*, 310-320.

Van Gool, L. J., Moons, T., Pauwels, E., Wagemans, J. (1994). Invariance
from the Euclidean geometer's perspective. *Perception*, *23*,
547-561.