Tour Into the Picture: Using a Spidery Mesh Interface to Make Animation from a Single Image Youichi Horry"Ken-ichi Anjyo'Kiyoshi Arai' Hitachi,Ltd
Tour Into the Picture: Using a Spidery Mesh Interface to Make Animation from a Single Image Youichi Horry*‡ Ken-ichi Anjyo† Kiyoshi Arai* Hitachi, Ltd
ABSTRACT techniques are fully available.When the input image is given in A new method called TIP (Tour Into the Picture)is presented for advance,first of all,the animator has to make the 3D scene easily making animations from one 2D picture or photograph of a model by trial and error until the projected image of the model scene.In TIP,animation is created from the viewpoint of a fits well with the input image of the scene.At the very beginning camera which can be three-dimensionally "walked or flown- of this process,the virtual camera position in 3D space must also through"the 2D picture or photograph.To make such animation. be known as one of the conditions for the input image to be conventional computer vision techniques cannot be applied in the regenerated from the scene model.This poses the question,how 3D modeling process for the scene,using only a single 2D image. is the camera position known by a single image Unfortunately Instead a spidery mesh is employed in our method to obtain a existing approaches to create models directly from photographs, simple scene model from the 2D image of the scene using a such as image-based techniques,require multiple input images of graphical user interface.Animation is thus easily generated photographs,and the cases discussed in this paper are outside without the need of multiple 2D images. their scope.If animating a painting is desired,making the Unlike existing methods,our method is not intended to animation may become more difficult,because a painting does construct a precise 3D scene model.The scene model is rather not give as precise information for creating the 3D scene model simple,and not fully 3D-structured.The modeling process starts as a photograph does. by specifying the vanishing point in the 2D image.The The best possible approach currently available to making background in the scene model then consists of at most five animation from a single image therefore depends largely on the rectangles,whereas hierarchical polygons are used as a model for skill,sense,and eye of the animators,though this naivety may each foreground object.Furthermore a virtual camera is moved place an excessive and tedious task load on the animators.They around the 3D scene model,with the viewing angle being freely can then develop the scene structure freely,using vague and controlled.This process is easily and effectively performed using incomplete information included in the input to animate the the spidery mesh interface.We have obtained a wide variety of scene to their liking.The scene structure,however,may still be animated scenes which demonstrate the efficiency of TIP. incomplete.A more straightforward method is thus desired for creating the scene animation,in which the 3D modeling process CR Categories and Subject Descriptors:I.3.3 [Computer of the scene is rather simplified or skipped. Graphics]:Picture/Image Generation-viewing algorithms;1.3.7 In this paper we propose a simple method,which we call TIP [Computer Graphics]Three-dimensional Graphics and Realism, (Tour Into the Picture),for making animations from one 2D picture or photograph of a scene.This method provides a simple Animation scene model,which is extracted from the animator's mind.Thus the scene model is not exactly 3D structured,but is geometrically Additional Keywords:graphical user interface,image-based modeling/rendering,vanishing point,field-of-view angle just a collection of "billboards"and several 3D polygons. Therefore,the animations obtained with our method are not strictly three-dimensional.However,as we show,the proposed 1 INTRODUCTION method allows easy creation of various animations,such as Making animation from one picture,painting,or photograph is “walk-through”or“fly-through”,while visually giving not a new idea.Such animations have been mainly used for convincing 3D quality Central Research Laboratory,1-2 Higashi-Koigakubo Kokubunji Tokyo 185 bery,.co.jp 1.1 Related work Visualware Planning Department,4.6 Kanda-Surugadai Chiyoda Tokyo 101 If a photograph is used as the 2D input image,then image-based anjyo@cm.head hitachi.co.jp methods,including [2,4,7]may be used effectively.In [2],the Currently visiting INRLA Ro 78153 Le Chesnay Cedex France panoramic image is made from overlapping photographs taken boryfa bora.inna.fr by a regular camera to represent a virtual environment,so that real-time walk-through animations can be made with the viewpoint fixed.The method in [7]provides animations,with many closely spaced images being required as input,and its theoretical background largely relies on computer vision techniques.This work can also be considered to belong to the category of techniques for light field representation [6],which gives a new framework for rendering new views using large arrays of both rendered and digitized images.Similarly,in [4]a "sparse"set of photographs is used for existing architectural scenes to be animated.Though the input condition is improved due to architectural use,multiple input images are still required art and entertainment purposes,often with striking visual effects. Despite successful results with these image-based approaches, For instance,2D animations are commonly seen,where 2D we need a new methodology,especially for dealing with the figures of persons or animals in the original image move around, situations where the input is a single photograph. with the 2D background fixed.In relatively simple cases,these For paintings or illustrations,there are relatively fewer animations may be created using traditional cel animation research reports on their animation.A new rendering technique techniques.If the animations are computer-generated,then 2D was presented in [8]for making painterly animations.Assuming digital effects,such as warping and affine transformations,can that the 3D geometric models of the objects in a scene are known also be employed. in advance,animations in a painterly style are then made by the However,it is still hard and tedious for a skilled animator to method using 3D particles and 2D brush strokes. make computer animations from a single 2D image of a 3D scene Morphing techniques including [1]provide 3D effects without knowing its 3D structure,even if established digital visually,requiring at least two images as input,although actually
ABSTRACT A new method called TIP (Tour Into the Picture) is presented for easily making animations from one 2D picture or photograph of a scene. In TIP, animation is created from the viewpoint of a camera which can be three-dimensionally "walked or flownthrough" the 2D picture or photograph. To make such animation, conventional computer vision techniques cannot be applied in the 3D modeling process for the scene, using only a single 2D image. Instead a spidery mesh is employed in our method to obtain a simple scene model from the 2D image of the scene using a graphical user interface. Animation is thus easily generated without the need of multiple 2D images. Unlike existing methods, our method is not intended to construct a precise 3D scene model. The scene model is rather simple, and not fully 3D-structured. The modeling process starts by specifying the vanishing point in the 2D image. The background in the scene model then consists of at most five rectangles, whereas hierarchical polygons are used as a model for each foreground object. Furthermore a virtual camera is moved around the 3D scene model, with the viewing angle being freely controlled. This process is easily and effectively performed using the spidery mesh interface. We have obtained a wide variety of animated scenes which demonstrate the efficiency of TIP. CR Categories and Subject Descriptors: I.3.3 [Computer Graphics]: Picture/Image Generation - viewing algorithms; I.3.7 [Computer Graphics] Three-dimensional Graphics and Realism, Animation Additional Keywords: graphical user interface, image-based modeling/rendering, vanishing point, field-of-view angle 1 INTRODUCTION Making animation from one picture, painting, or photograph is not a new idea. Such animations have been mainly used for ∗Central Research Laboratory, 1-280 Higashi-Koigakubo Kokubunji Tokyo 185 {horry, arai}@crl.hitachi.co.jp †Visualware Planning Department, 4-6 Kanda-Surugadai Chiyoda Tokyo 101 anjyo@cm.head.hitachi.co.jp ‡Currently visiting INRIA Rocquencourt, Domaine de Volceau - Rocquencourt 78153 Le Chesnay Cedex France horry@bora.inria.fr art and entertainment purposes, often with striking visual effects. For instance, 2D animations are commonly seen, where 2D figures of persons or animals in the original image move around, with the 2D background fixed. In relatively simple cases, these animations may be created using traditional cel animation techniques. If the animations are computer-generated, then 2D digital effects, such as warping and affine transformations, can also be employed. However, it is still hard and tedious for a skilled animator to make computer animations from a single 2D image of a 3D scene without knowing its 3D structure, even if established digital techniques are fully available. When the input image is given in advance, first of all, the animator has to make the 3D scene model by trial and error until the projected image of the model fits well with the input image of the scene. At the very beginning of this process, the virtual camera position in 3D space must also be known as one of the conditions for the input image to be regenerated from the scene model. This poses the question, how is the camera position known by a single image ? Unfortunately existing approaches to create models directly from photographs, such as image-based techniques, require multiple input images of photographs, and the cases discussed in this paper are outside their scope. If animating a painting is desired, making the animation may become more difficult, because a painting does not give as precise information for creating the 3D scene model as a photograph does. The best possible approach currently available to making animation from a single image therefore depends largely on the skill, sense, and eye of the animators, though this naivety may place an excessive and tedious task load on the animators. They can then develop the scene structure freely, using vague and incomplete information included in the input to animate the scene to their liking. The scene structure, however, may still be incomplete. A more straightforward method is thus desired for creating the scene animation, in which the 3D modeling process of the scene is rather simplified or skipped. In this paper we propose a simple method, which we call TIP (Tour Into the Picture), for making animations from one 2D picture or photograph of a scene. This method provides a simple scene model, which is extracted from the animator’s mind. Thus the scene model is not exactly 3D structured, but is geometrically just a collection of “billboards” and several 3D polygons. Therefore, the animations obtained with our method are not strictly three-dimensional. However, as we show, the proposed method allows easy creation of various animations, such as “walk-through” or “fly-through”, while visually giving convincing 3D quality. 1.1 Related work If a photograph is used as the 2D input image, then image-based methods, including [2 , 4, 7] may be used effectively. In [2], the panoramic image is made from overlapping photographs taken by a regular camera to represent a virtual environment, so that real-time walk-through animations can be made with the viewpoint fixed. The method in [7] provides animations, with many closely spaced images being required as input, and its theoretical background largely relies on computer vision techniques. This work can also be considered to belong to the category of techniques for light field representation [6], which gives a new framework for rendering new views using large arrays of both rendered and digitized images. Similarly, in [4] a "sparse" set of photographs is used for existing architectural scenes to be animated. Though the input condition is improved due to architectural use, multiple input images are still required. Despite successful results with these image-based approaches, we need a new methodology, especially for dealing with the situations where the input is a single photograph. For paintings or illustrations, there are relatively fewer research reports on their animation. A new rendering technique was presented in [8] for making painterly animations. Assuming that the 3D geometric models of the objects in a scene are known in advance, animations in a painterly style are then made by the method using 3D particles and 2D brush strokes. Morphing techniques including [1] provide 3D effects visually, requiring at least two images as input, although actually
only 2D image transformations are used.For example the view perform the following operations interpolation technique [3]is an efficient application of (1)Adding "virtual"vanishing points for the scene-The morphing,which generates intermediate images,from images specification of the vanishing point should be done by the prestored at nearby viewpoints.View morphing [9]also gives a user,not automatically,as mentioned above. strong sense of 3D metamorphosis in the transition between (2)Distinguishing foreground objects from background-The images of the objects.Then we note that most of these decision as to whether an object in the scene is near the techniques require no knowledge of 3D shape in morphing. viewer should be made by the user,since no 3D geometry Existing methods cited above work effectively,when multiple of the scene is known.In other words,this means that the input images are available,or when the 3D geometric structure of user can freely position the foreground object,with the a scene to be animated is known in advance.Our approach treats camera parameters being arranged. the cases when one input image of a scene is given without any (3)Constructing the background scene and the foreground knowledge of 3D shapes in the scene.Theoretically it is impossible to create an animation from a single view of the objects by simple polygons-In order to approximate the scene.Instead,our approach actually gives a new type of visual geometry of the background scene,several polygons should be generated to represent the background.This model is effect for making various animations,rather than constructing a rigid 3D model and animation of the scene. then a polyhedron-like form with the vanishing point being on its base.The "billboard"-like representation and its (a)Input image (d)Fitting perspective (e)Modeling the (f)Modeling (g)Camera positioning projection background foreground objects (b)Background image (c)Foreground mask (h)Rendered image Figure 1.Process flow diagram 1.2 Main Idea variation are used for foreground objects. If we consider traditional paintings or landscape photographs, These three operations are closely related to each other so that their perspective views give a strong impression that the scenes the interactive user interface should be able to provide their easy depicted are 3D.It is hard for us to find an exact position for the and simultaneous performance.A spidery mesh is the key to vanishing point of the scene in the picture.In particular,for fulfilling this requirement. paintings or drawings,the vanishing point is not precisely The proposed method is outlined as follows.Fig.I shows the prescribed,being largely dependent on the artist's imagination. process flow. Therefore,rigid approaches,such as computer vision techniques, After an input image is digitized (Fig.I (a)),the 2D image of are not valid for the purpose of exactly finding the vanishing the background and 2D mask image of the foreground objects point.However,it is relatively easy for us to roughly specify the are made(Figs.1(b),(c)).TIP uses a spidery mesh to prescribe a vanishing point,by manually drawing guide lines for perspective few perspective conditions,including the specification of a viewing.Then we can expect that the "visually 3D"geometry of vanishing point (Fig.I(d)).In the current implementation of the scene's background is defined as a simple model (with TIP,we can specify one vanishing point for a scene.This is not polygons,for instance)centering around the user-specified restrictive because many paintings,illustrations,or photographs vanishing point.Similarly,in many cases,we can easily tell the can actually be considered one-point perspective,and because,as foreground objects from the background through our own eyes demonstrated later,the one-point perspective representation A simple and intuitive model of the foreground object can then using spidery mesh works very well even for the cases where it is be like a"billboard"that stands on a polygon of the background hard for us to tell if the input is one-point perspective or not. model. Next,the background is modeled with less than five 3D The main idea of the proposed method is simply to provide a rectangles (Fig.I (e)),and simple polygonal models for the user interface which allows the user to easily and interactively foreground objects are also constructed (Fig.1 (f)).Finally,by
only 2D image transformations are used. For example the view interpolation technique [3] is an efficient application of morphing, which generates intermediate images, from images prestored at nearby viewpoints. View morphing [9] also gives a strong sense of 3D metamorphosis in the transition between images of the objects. Then we note that most of these techniques require no knowledge of 3D shape in morphing. Existing methods cited above work effectively, when multiple input images are available, or when the 3D geometric structure of a scene to be animated is known in advance. Our approach treats the cases when one input image of a scene is given without any knowledge of 3D shapes in the scene. Theoretically it is impossible to create an animation from a single view of the scene. Instead, our approach actually gives a new type of visual effect for making various animations, rather than constructing a rigid 3D model and animation of the scene. 1.2 Main Idea If we consider traditional paintings or landscape photographs, their perspective views give a strong impression that the scenes depicted are 3D. It is hard for us to find an exact position for the vanishing point of the scene in the picture. In particular, for paintings or drawings, the vanishing point is not precisely prescribed, being largely dependent on the artist’s imagination. Therefore, rigid approaches, such as computer vision techniques, are not valid for the purpose of exactly finding the vanishing point. However, it is relatively easy for us to roughly specify the vanishing point, by manually drawing guide lines for perspective viewing. Then we can expect that the “visually 3D” geometry of the scene’s background is defined as a simple model (with polygons, for instance) centering around the user-specified vanishing point. Similarly, in many cases, we can easily tell the foreground objects from the background through our own eyes. A simple and intuitive model of the foreground object can then be like a “billboard” that stands on a polygon of the background model. The main idea of the proposed method is simply to provide a user interface which allows the user to easily and interactively perform the following operations. (1) Adding “virtual” vanishing points for the scene - The specification of the vanishing point should be done by the user, not automatically, as mentioned above. (2) Distinguishing foreground objects from background - The decision as to whether an object in the scene is near the viewer should be made by the user, since no 3D geometry of the scene is known. In other words, this means that the user can freely position the foreground object, with the camera parameters being arranged. (3) Constructing the background scene and the foreground objects by simple polygons - In order to approximate the geometry of the background scene, several polygons should be generated to represent the background. This model is then a polyhedron-like form with the vanishing point being on its base. The “billboard”-like representation and its variation are used for foreground objects. These three operations are closely related to each other so that the interactive user interface should be able to provide their easy and simultaneous performance. A spidery mesh is the key to fulfilling this requirement. The proposed method is outlined as follows. Fig. 1 shows the process flow. After an input image is digitized (Fig.1 (a)), the 2D image of the background and 2D mask image of the foreground objects are made (Figs. 1 (b), (c)). TIP uses a spidery mesh to prescribe a few perspective conditions, including the specification of a vanishing point (Fig. 1 (d)). In the current implementation of TIP, we can specify one vanishing point for a scene. This is not restrictive because many paintings, illustrations, or photographs can actually be considered one-point perspective, and because, as demonstrated later, the one-point perspective representation using spidery mesh works very well even for the cases where it is hard for us to tell if the input is one-point perspective or not. Next, the background is modeled with less than five 3D rectangles (Fig. 1 (e)), and simple polygonal models for the foreground objects are also constructed (Fig. 1 (f)). Finally, by (b) Background image (a) Input image (d) Fitting perspective projection (e) Modeling the background (f) Modeling foreground objects (g) Camera positioning (c) Foreground mask (h) Rendered image Figure 1. Process flow diagram
changing the virtual camera parameters (Fig.I (g)),images at figure consisting of:a vanishing point;and an inner rectangle, different views are rendered (Fig.I (h)),so that the desired which intuitively means the window out of which we look at animation is obtained infinity;radial lines that radiate from the vanishing point,an In section 2 the modeling process of the 3D scene(Figs.I (a)- outer rectangle which corresponds to the outer frame of the (f))in TIP is described.In section 3,after the rendering input image.Each side of the inner rectangle is made to be technique (Figs.I (g),(h))is briefly mentioned,several parallel to a side of the outer rectangle.In TIP,the specification animation examples are shown,which demonstrate well the of the inner rectangle is done as well as that of the vanishing efficiency and usefulness of the proposed method.Conclusions point.It should then be noted that,as described later,the inner and future research directions of the method are summarized in rectangle is also used to specify the rear window in the 3D space section 4. (see 2.3 and 2.4).The rear window is a border that the virtual camera,which will be used in making an animation,cannot go 2 SCENE MODELING FROMA SINGLE IMAGE through.The inner rectangle is consequently defined as the 2D In our method we use one picture or photograph of a 3D scene projection of this window onto the 2D image space (i.e.,the as input,from which we wish to make a computer animation. projection plane).In practice the 3D window is considered to be Then we specify one "virtual"(i.e."user-specified")vanishing so distant from the current (initial)position of the virtual point for the scene.As described later,this does not always camera,that the camera does not zoom in beyond this window mean that the input image must be one-point perspective.For convenience,the line that goes through the vanishing point and view point is vertical to the view plane.As for camera Outer rectangle positioning,default values of camera position,view-plane normal,and view angle (field-of-view angle)are assigned in advance (see [5]for technical terms).These parameters are changed later using our GUI (Graphical User Interface)in 3.1 for making animations.For simplicity,the input images used are taken by the virtual camera without tilting,(though actually this condition can easily be eliminated).This means that the view up vector,which is parallel to the view plane in this paper,is vertical to the ground of the 3D scene to be modeled. Inner rectangle Vanishing point 21 Foreground Mask and Background Image (a)Initial state (b)Specification result In the modeling process we first derive two types of image Figure 2.Spidery mesh on the 2D image information from the input 2D image:foreground mask and background image.Let F1,F2,...,Fp be subimages of the input image I,each of which is supposed to correspond to a from the current position. foreground object in the 3D scene and is relatively close to the We now describe how to use the spidery mesh in order to w。remoc是时 are specified specify the vanishing point.As described above,we then position the inner rectangle,along with the vanishing point.First corresponding 3D scene (see 2.3).The foreground mask is then we consider typical cases where the vanishing point is located in where ai is a defined as the 2D image consistingrund the input image (i.e.,within the outer rectangle of the spidery grey-scaled masking value (a -value)of Fi. mesh).Fig.3 (a)is such a case.Then,using a pointing device (a image is the 2D image which is made from I by retouching the mouse in the current implementation),we can control the traces of (Fi)after the subimages (Fi are removed from I.The geometry of the spidery mesh using the following functions retouching process consists of occluding the traces of these [a]Deformation of the inner rectangle-If the right-bottom subimages using color information for the neighborhood of each edge of the inner rectangle is dragged with the pointing point (pixel)in Fi. device,then the left-top edge of the rectangle is fixed,and There is commercially available software,such as 2D paint the right-bottom edge is moved according to the dragging tools,that enable us to easily make 2D images for the (see Fig.3(a)). foreground mask and the background,from an input image. [b]Translation of the inner rectangle-If we drag a point on Fig.1 presented an example.Fig.1(a)showed the input image (of one of the sides of the rectangle (except the point at the a photograph).The background image in Fig.1 (b),as well as right-bottom corner).then the rectangle is moved by the the foreground mask in Fig.1 (c),were obtained using a dragging distance Fig.3(b)). standard 2D paint tool.To get the foreground mask in Fig.1 (c), [c]Translation of the vanishing point-If the vanishing point a street lamp and two persons were selected by the user,as the is dragged,then it is translated.The four radial lines,which subimages (Fi}mentioned above. are drawn boldly in Fig.3,are also moved under the condition that these radial lines always go through the four 22Specifying the Vanishing Pointand Inner Rectangle edges of the inner rectangle,respectively (Fig.3(c)).If the In order to model the 3D scene from the input image,we use our cursor is dragged out of the inner rectangle,then the software called TIP,starting with the specification of the vanishing point is moved in the direction,and by the vanishing point of this image.TIP employs a unique GUI with a distance of,the dragging.Conversely,if one of these bold spidery mesh,which plays an essential role not only in the radial lines is translated by moving its edge on the outer specification process but also in the processes thereafter. rectangle,the vanishing point is moved based on a certain Fig.2 (a)shows the initial state of the spidery mesh in rule that we call servility of the vanishing point to the four applying it to the input image in Fig.I (a).In general,as (bold)radial lines.This means,for example,that,if we drag illustrated in Fig.2 (a),the spidery mesh is defined as the 2D the edge of radial line L,in Fig.3 (d)along the outer rectangle,then radial line L,is fixed and the vanishing point
changing the virtual camera parameters (Fig. 1 (g)), images at different views are rendered (Fig. 1 (h)), so that the desired animation is obtained. In section 2 the modeling process of the 3D scene (Figs. 1 (a) - (f)) in TIP is described. In section 3, after the rendering technique (Figs. 1 (g), (h)) is briefly mentioned, several animation examples are shown, which demonstrate well the efficiency and usefulness of the proposed method. Conclusions and future research directions of the method are summarized in section 4. 2 SCENE MODELING FROM A SINGLE IMAGE In our method we use one picture or photograph of a 3D scene as input, from which we wish to make a computer animation. Then we specify one “virtual” (i.e. “user-specified”) vanishing point for the scene. As described later, this does not always mean that the input image must be one-point perspective. For convenience, the line that goes through the vanishing point and view point is vertical to the view plane. As for camera positioning, default values of camera position, view-plane normal, and view angle (field-of-view angle) are assigned in advance (see [5] for technical terms). These parameters are changed later using our GUI (Graphical User Interface) in 3.1 for making animations. For simplicity, the input images used are taken by the virtual camera without tilting, (though actually this condition can easily be eliminated). This means that the view up vector, which is parallel to the view plane in this paper, is vertical to the ground of the 3D scene to be modeled. 2.1 Foreground Mask and Background Image In the modeling process we first derive two types of image information from the input 2D image: foreground mask and background image. Let F1, F2 ,..., Fp be subimages of the input image I, each of which is supposed to correspond to a foreground object in the 3D scene and is relatively close to the virtual camera. In practice the subimages {Fi }1_i_p are specified by a user and are modeled as polygonal objects in the corresponding 3D scene (see 2.3). The foreground mask is then defined as the 2D image consisting of {αi }1_i_p , where α i is a grey-scaled masking value (α -value) of Fi . The background image is the 2D image which is made from I by retouching the traces of {Fi } after the subimages {Fi } are removed from I. The retouching process consists of occluding the traces of these subimages using color information for the neighborhood of each point (pixel) in Fi . There is commercially available software, such as 2D paint tools, that enable us to easily make 2D images for the foreground mask and the background, from an input image. Fig.1 presented an example. Fig.1(a) showed the input image (of a photograph). The background image in Fig. 1 (b), as well as the foreground mask in Fig. 1 (c), were obtained using a standard 2D paint tool. To get the foreground mask in Fig. 1 (c), a street lamp and two persons were selected by the user, as the subimages {Fi } mentioned above. 2.2 Specifying the Vanishing Point and Inner Rectangle In order to model the 3D scene from the input image, we use our software called TIP, starting with the specification of the vanishing point of this image. TIP employs a unique GUI with a spidery mesh, which plays an essential role not only in the specification process but also in the processes thereafter. Fig. 2 (a) shows the initial state of the spidery mesh in applying it to the input image in Fig. 1 (a). In general, as illustrated in Fig. 2 (a), the spidery mesh is defined as the 2D figure consisting of: a vanishing point; and an inner rectangle, which intuitively means the window out of which we look at infinity; radial lines that radiate from the vanishing point; an outer rectangle which corresponds to the outer frame of the input image. Each side of the inner rectangle is made to be parallel to a side of the outer rectangle. In TIP, the specification of the inner rectangle is done as well as that of the vanishing point. It should then be noted that, as described later, the inner rectangle is also used to specify the rear window in the 3D space (see 2.3 and 2.4). The rear window is a border that the virtual camera, which will be used in making an animation, cannot go through. The inner rectangle is consequently defined as the 2D projection of this window onto the 2D image space (i.e., the projection plane). In practice the 3D window is considered to be so distant from the current (initial) position of the virtual camera, that the camera does not zoom in beyond this window from the current position. We now describe how to use the spidery mesh in order to specify the vanishing point. As described above, we then position the inner rectangle, along with the vanishing point. First we consider typical cases where the vanishing point is located in the input image (i.e., within the outer rectangle of the spidery mesh). Fig. 3 (a) is such a case. Then, using a pointing device (a mouse in the current implementation), we can control the geometry of the spidery mesh using the following functions. [a] Deformation of the inner rectangle - If the right-bottom edge of the inner rectangle is dragged with the pointing device, then the left-top edge of the rectangle is fixed, and the right-bottom edge is moved according to the dragging (see Fig. 3 (a)). [b] Translation of the inner rectangle - If we drag a point on one of the sides of the rectangle (except the point at the right-bottom corner), then the rectangle is moved by the dragging distance ( Fig. 3 (b)). [c] Translation of the vanishing point - If the vanishing point is dragged, then it is translated. The four radial lines, which are drawn boldly in Fig. 3, are also moved under the condition that these radial lines always go through the four edges of the inner rectangle, respectively (Fig. 3 (c)). If the cursor is dragged out of the inner rectangle, then the vanishing point is moved in the direction, and by the distance of, the dragging. Conversely, if one of these bold radial lines is translated by moving its edge on the outer rectangle, the vanishing point is moved based on a certain rule that we call servility of the vanishing point to the four (bold) radial lines. This means, for example, that, if we drag the edge of radial line L1 in Fig. 3 (d) along the outer rectangle, then radial line L2 is fixed and the vanishing point (a) Initial state (b) Specification result Figure 2. Spidery mesh on the 2D image Inner rectangle Vanishing point Outer rectangle
is moved along L,.The dotted lines in Fig.3 (d)show the those of the corresponding 2D rectangles. new positions of the bold radial lines with the source point of the dotted lines obtained as a result for the vanishing The vertices of these 3D rectangles are therefore easily point. estimated.For simplicity,we set the coordinate system of the 3D Using these functions in our GUI,we can specify the space so that the view up vector=(0,1,0)and the 3D floor is vanishing point and the inner rectangle.In practice the radial on the plane y =0.Then the vertices of the 3D rectangles,which lines are very helpful in the specification process.For example a user can specify the vanishing point,while controlling the radial lines so that they go along the borderlines between buildings and roads (see Fig.2(b)).Then servility of the vanishing point in [c] Ceiling is useful in controlling the radial lines.It should also be noted that the concept of the spidery mesh is totally 2D,which assures Rear Left easy-to-use and real-time feedback in the specification process. wall wall wall As for the cases when the vanishing point is out of the input image (outer rectangle),functions similar to those described Floor above can be applied,so that the inner rectangle is specified in the outer rectangle. Vanishing point (a)Specified spidery mesh (b)Deduced 2D polygons 23Modeling the 3D Background The next thing we do is to model the 3D background of the 11 scene using very few polygons. 10 23456 Let us suppose that the vanishing point and the inner rectangle (y=H) y=0) 9101112 Vertices to be calculated Calculation flow (a)Deformation of the (b)Translation of the (c)Estimating the vertices of the 3D rectangles inner rectangle inner rectangle fixed) (c)Translationof the (d)Servility of the vanishing point vanishing point Figure 3.Controlling the spidery mesh are specified as shown in Fig.4 (a).We can then make a 2D (d)3D background model obtained decomposition of the outer rectangle into five smaller regions each of which is a 2D polygon in the outer rectangle.As Figure 4.Modeling the 3D background illustrated in Fig.4 (b),the five 2D rectangles may be deduced from these regions,and the rectangles are tentatively called the are numbered as shown in Fig.4 (c),are calculated as follows floor,right wall,left wall,rear wall,and ceiling,respectively (also see calculation flow in Fig.4(c)).First we note that the 3D (the rear wall is actually the inner rectangle).We define the coordinate values of a point are easily obtained,if we know that textures of these 2D rectangles to be taken from the background it is on a certain (known)plane,and that its view plane image.Suppose that these rectangles are the projection of the 3D coordinate values are known.Since we see the 2D positions of rectangles.We name each of these 3D rectangles the same as the vertices 1-4 in Fig.4 (c),we get the 3D positions of these four 2D corresponding projection.We then define the 3D points,considering that these are on the plane y=0.Similarly background model in 3D space as being these five 3D we get the values of vertices 5 and 6.Next we consider the plane rectangles,assuming that the following conditions hold: which the 3D rear wall is on.The equation of this plane is then [A-1]Every adjacent 3D rectangle mentioned above is known,because it is vertical to the plane y =0 containing the known vertices 1 and 2.Since vertices 7 and 8 are on this known orthogonal to the others plane,we can get the values of these two vertices.Then we [A-2]The 3D rear wall is parallel to the view plane. estimate the "height"of the 3D ceiling.Since the 3D ceiling is [A-3]The 3D floor is orthogonal to the view up vector. on the plane parallel to the plane y =0,we may assume that the [A-4]The textures of the 3D rectangles are inherited from 3D ceiling is on the plane y =H,for some H.If calculation of
is moved along L2 . The dotted lines in Fig. 3 (d) show the new positions of the bold radial lines with the source point of the dotted lines obtained as a result for the vanishing point. Using these functions in our GUI, we can specify the vanishing point and the inner rectangle. In practice the radial lines are very helpful in the specification process. For example a user can specify the vanishing point, while controlling the radial lines so that they go along the borderlines between buildings and roads (see Fig. 2 (b)). Then servility of the vanishing point in [c] is useful in controlling the radial lines. It should also be noted that the concept of the spidery mesh is totally 2D, which assures easy-to-use and real-time feedback in the specification process. As for the cases when the vanishing point is out of the input image (outer rectangle), functions similar to those described above can be applied, so that the inner rectangle is specified in the outer rectangle. 2.3 Modeling the 3D Background The next thing we do is to model the 3D background of the scene using very few polygons. Let us suppose that the vanishing point and the inner rectangle are specified as shown in Fig. 4 (a). We can then make a 2D decomposition of the outer rectangle into five smaller regions each of which is a 2D polygon in the outer rectangle. As illustrated in Fig.4 (b), the five 2D rectangles may be deduced from these regions, and the rectangles are tentatively called the floor, right wall, left wall, rear wall, and ceiling, respectively (the rear wall is actually the inner rectangle). We define the textures of these 2D rectangles to be taken from the background image. Suppose that these rectangles are the projection of the 3D rectangles. We name each of these 3D rectangles the same as the 2D corresponding projection. We then define the 3D background model in 3D space as being these five 3D rectangles, assuming that the following conditions hold: [A-1] Every adjacent 3D rectangle mentioned above is orthogonal to the others. [A-2] The 3D rear wall is parallel to the view plane. [A-3] The 3D floor is orthogonal to the view up vector. [A-4] The textures of the 3D rectangles are inherited from those of the corresponding 2D rectangles. The vertices of these 3D rectangles are therefore easily estimated. For simplicity, we set the coordinate system of the 3D space so that the view up vector = (0, 1, 0) and the 3D floor is on the plane y = 0. Then the vertices of the 3D rectangles, which are numbered as shown in Fig. 4 (c), are calculated as follows (also see calculation flow in Fig. 4 (c)). First we note that the 3D coordinate values of a point are easily obtained, if we know that it is on a certain (known) plane, and that its view plane coordinate values are known. Since we see the 2D positions of vertices 1 - 4 in Fig. 4 (c), we get the 3D positions of these four points, considering that these are on the plane y = 0. Similarly we get the values of vertices 5 and 6. Next we consider the plane which the 3D rear wall is on. The equation of this plane is then known, because it is vertical to the plane y = 0 containing the known vertices 1 and 2. Since vertices 7 and 8 are on this known plane, we can get the values of these two vertices. Then we estimate the "height" of the 3D ceiling. Since the 3D ceiling is on the plane parallel to the plane y = 0, we may assume that the 3D ceiling is on the plane y = H, for some H. If calculation of Figure 3. Controlling the spidery mesh (a) Deformation of the inner rectangle (b) Translation of the inner rectangle (c) Translation of the vanishing point L2 (d) Servility of the vanishing point L1 (fixed) (c) Estimating the vertices of the 3D rectangles (a) Specified spidery mesh (b) Deduced 2D polygons (d) 3D background model obtained Figure 4. Modeling the 3D background Vertices to be calculated 78H 9 10 11 12 12 3 4 5 6 Calculation flow Vanishing point Left wall Right wall Floor Ceiling Rear wall 1 2 3 4 5 6 7 8 9 10 11 12 (y = 0) (y = H)