426 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,VOL.14,NO.2,MARCH/APRIL 2008 Mesh-Guided Optimized Retexturing for Image and Video Yanwen Guo,Hangiu Sun,Member,/EEE,Qunsheng Peng,and Zhongding Jiang,Member,/EEE Abstract-This paper presents a novel approach for replacing textures of specified regions in the input image and video using stretch- based mesh optimization.The retexturing results have the similar distortion and shading effect conforming to the unknown underlying geometry and lighting conditions.For replacing textures in a single image,two important steps are developed:The stretch-based mesh parameterization incorporating the recovered normal information is deduced to imitate perspective distortion of the region of interest:the Poisson-based refinement process is exploited to account for texture distortion at fine scale.The luminance of the input image is preserved through color transfer in YCbCr color space.Our approach is independent of the replaced textures.Once the input image is processed,any new textures can be applied to efficiently generate the retexturing results.For video retexturing,we propose key-frame- based texture replacement extended and generalized from the image retexturing.Our approach repeatedly propagates the replacement results of key frames to the rest of the frames.We develop the local motion optimization scheme to deal with the inaccuracies and errors of robust optical flow when tracking moving objects.Visibility shifting and texture drifting are effectively alleviated using graphcut segmentation algorithm and the global optimization to smooth trajectories of the tracked points over temporal domain.Our experimental results showed that the proposed approach can generate visually pleasing results for retextured images and video. Index Terms-Texture replacement,parameterization,Poisson equation,graphcut segmentation. 1 INTRODUCTION ap tmmuty por the onrihuigu community.For the second issue,relighting techniques can common task for creating visual effects.This process is be adopted to change intensities of pixels of the new texture commonly referred to as retexturing or texture replacement. when properties of light sources and surface appearances are The key issue of texture replacement is how to preserve the known beforehand.However,accurate recovery of these original shading effect and texture distortion without properties from a real-world image is more difficult than knowing the underlying geometry and lighting conditions. geometry recovery.Hence,relighting techniques are imprac- Retexturing objects of images and video clips has wide tical for texture replacement. applications in digital entertainment,virtual exhibition,art, For generating plausible visual effects,full recovery of and industry design. 3D geometry and lighting conditions can be relaxed in For retexturing image,two fundamental issues must be practice.Fang and Hart proposed one normal-guided texture addressed:how to deform the new texture for conforming to synthesis method that produced compelling replacement scene geometry and how to keep shading effect encoded in effects [1].This method works well when a 3D surface is the original image for consistent lighting conditions.One untextured,nearly diffuse,and illuminated by a single possible solution to the first issue is recovering 3D surface directional light source.One limitation of this texture geometry using shape-from-shading techniques,then estab- synthesis approach is that the synthesis process must be lishing parameterization between the surface and the texture. repeated when a new texture is applied.For regular/near- Unfortunately,shape-from-shading techniques for a single regular textures that are popular in the real world,Liu et al. image cannot accurately recover the 3D geometry with high suggested an interactive scheme to extract the deformation efficiency.Even with multiple images,full recovery of field of texture image with respect to the original sample [2]. The extraction of lighting information can also benefit from Y.Guo is with the National Laboratory for Novel Software Technology, this restriction through a Markov process [3].Nevertheless, Nanjing University,Nanjing 210093,P.R.China. this approach usually needs tedious user interaction with E-mail:ywguo@nju.edu.cn. high accuracy. .H.Sun is with the Department of Computer Science and Engineering,the Chinese University of Hong Kong,Shatin,N.T.,Hong Kong. Video clip is an image sequence in time domain,which E-mail:hangiu@cse.cuhk.edu.hk. usually contains dynamic objects and lighting changes. Q.Peng is with the State Key Lab of CAD&CG,Zhejiang University, Retexturing video is more challenging than retexturing Hangzhou 310058,P.R.China.E-mail:peng@cad.zju.edu.cn. .Z.Jiang is with the Software School,Fudan UIniversity,Shanghai,201203, image due to these dynamic phenomena.In particular, P.R.China.E-mail:zdjiang@fudan.edu.cn. keeping the texture coherence over time is more challen- Manuscript received 26 Dec.2006;revised 11 July 2007;accepted 27 Aug. ging.The temporal coherence demands that the new texture 2007;published online 17 Sept.2007. should be perceptually fixed on 3D surface when an object Recommended for acceptance by J.Dorsey. For information on obtaining reprints of this article,please send e-mail to: or camera moves.For achieving temporal coherence,the tocg@computer.org,and reference IEEECS Log Number TVCG-0231-1206. key-frame-based methods [4],[5]consist of two steps.First, Digital Object Identifier no.10.1109/TVCG.2007.70438. a few of video frames are selected as key frames on which 1077-2626/08/25.0020081EEE Published by the IEEE Computer Society
Mesh-Guided Optimized Retexturing for Image and Video Yanwen Guo, Hanqiu Sun, Member, IEEE, Qunsheng Peng, and Zhongding Jiang, Member, IEEE Abstract—This paper presents a novel approach for replacing textures of specified regions in the input image and video using stretchbased mesh optimization. The retexturing results have the similar distortion and shading effect conforming to the unknown underlying geometry and lighting conditions. For replacing textures in a single image, two important steps are developed: The stretch-based mesh parameterization incorporating the recovered normal information is deduced to imitate perspective distortion of the region of interest; the Poisson-based refinement process is exploited to account for texture distortion at fine scale. The luminance of the input image is preserved through color transfer in YCbCr color space. Our approach is independent of the replaced textures. Once the input image is processed, any new textures can be applied to efficiently generate the retexturing results. For video retexturing, we propose key-framebased texture replacement extended and generalized from the image retexturing. Our approach repeatedly propagates the replacement results of key frames to the rest of the frames. We develop the local motion optimization scheme to deal with the inaccuracies and errors of robust optical flow when tracking moving objects. Visibility shifting and texture drifting are effectively alleviated using graphcut segmentation algorithm and the global optimization to smooth trajectories of the tracked points over temporal domain. Our experimental results showed that the proposed approach can generate visually pleasing results for retextured images and video. Index Terms—Texture replacement, parameterization, Poisson equation, graphcut segmentation. Ç 1 INTRODUCTION EDITING contents of photos/footages by changing the appearances of some regions with new textures is a common task for creating visual effects. This process is commonly referred to as retexturing or texture replacement. The key issue of texture replacement is how to preserve the original shading effect and texture distortion without knowing the underlying geometry and lighting conditions. Retexturing objects of images and video clips has wide applications in digital entertainment, virtual exhibition, art, and industry design. For retexturing image, two fundamental issues must be addressed: how to deform the new texture for conforming to scene geometry and how to keep shading effect encoded in the original image for consistent lighting conditions. One possible solution to the first issue is recovering 3D surface geometry using shape-from-shading techniques, then establishing parameterization between the surface and the texture. Unfortunately, shape-from-shading techniques for a single image cannot accurately recover the 3D geometry with high efficiency. Even with multiple images, full recovery of 3D geometry is still an open problem in the computer vision community. For the second issue, relighting techniques can be adopted to change intensities of pixels of the new texture when properties of light sources and surface appearances are known beforehand. However, accurate recovery of these properties from a real-world image is more difficult than geometry recovery. Hence, relighting techniques are impractical for texture replacement. For generating plausible visual effects, full recovery of 3D geometry and lighting conditions can be relaxed in practice. Fang and Hart proposed one normal-guided texture synthesis method that produced compelling replacement effects [1]. This method works well when a 3D surface is untextured, nearly diffuse, and illuminated by a single directional light source. One limitation of this texture synthesis approach is that the synthesis process must be repeated when a new texture is applied. For regular/nearregular textures that are popular in the real world, Liu et al. suggested an interactive scheme to extract the deformation field of texture image with respect to the original sample [2]. The extraction of lighting information can also benefit from this restriction through a Markov process [3]. Nevertheless, this approach usually needs tedious user interaction with high accuracy. Video clip is an image sequence in time domain, which usually contains dynamic objects and lighting changes. Retexturing video is more challenging than retexturing image due to these dynamic phenomena. In particular, keeping the texture coherence over time is more challenging. The temporal coherence demands that the new texture should be perceptually fixed on 3D surface when an object or camera moves. For achieving temporal coherence, the key-frame-based methods [4], [5] consist of two steps. First, a few of video frames are selected as key frames on which 426 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 14, NO. 2, MARCH/APRIL 2008 . Y. Guo is with the National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, P.R. China. E-mail: ywguo@nju.edu.cn. . H. Sun is with the Department of Computer Science and Engineering, the Chinese University of Hong Kong, Shatin, N.T., Hong Kong. E-mail: hanqiu@cse.cuhk.edu.hk. . Q. Peng is with the State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, P.R. China. E-mail: peng@cad.zju.edu.cn. . Z. Jiang is with the Software School, Fudan University, Shanghai, 201203, P.R. China. E-mail: zdjiang@fudan.edu.cn. Manuscript received 26 Dec. 2006; revised 11 July 2007; accepted 27 Aug. 2007; published online 17 Sept. 2007. Recommended for acceptance by J. Dorsey. For information on obtaining reprints of this article, please send e-mail to: tvcg@computer.org, and reference IEEECS Log Number TVCG-0231-1206. Digital Object Identifier no. 10.1109/TVCG.2007.70438. 1077-2626/08/$25.00 2008 IEEE Published by the IEEE Computer Society
GUO ET AL.:MESH-GUIDED OPTIMIZED RETEXTURING FOR IMAGE AND VIDEO 427 texture replacements are conducted [4],[5].Second,the Since tracking objects throughout video sequence is one generated results are propagated to the rest frames.These important step of our approach,it is briefly reviewed. methods either need cumbersome interaction to locate the region of interest(ROD)frame by frame [4]or utilize special 2.1 Image Texture Replacement color-coded pattern as input [5]. The pioneering work on texture replacement dealt with In this paper,we propose one novel approach for extracting lighting map from given image [3].Based on optimized retexturing on image and video.For image certain lighting distribution models,Tsin et al.introduced texture replacement,we formulate texture distortion as one one Bayesian framework for near-regular texture,which stretch-based parameterization.The ROI is represented as a relies on the color observation at each pixel [3].Oh et al. feature-mesh coupled with normal field.The corresponding assumed that large-scale luminance variations are due to mesh of the feature-mesh is computed in texture space geometry and lighting [6]and presented an algorithm for during parameterization.For simulating the effect of texture decoupling texture luminance from image by applying an deformation at fine scale,one Poisson-based refinement improved bilateral filter.Currently,accurate recovery of process is developed.Based on our image-retexturing lighting from natural image is still a challenging problem. scheme,we design one key-frame-based video retexturing Assuming that object appearance satisfies the Lamber- tian reflectance model,Textureshop [1]recovered normal approach similar to RotoTexture [4].Once replacing field of specified region using a simplified shape-from- textures of the specified regions of key frames,these shading algorithm [7].One propagation rule of adjacent generated effects are iteratively transferred to the rest texture coordinates is deduced to guide a normal-related frames.For achieving temporal coherence,mesh points of synthesis process.The limitation of employing texture key frame serve as features that are tracked and further synthesis is that the synthesis process must be re-executed optimized using motion analysis as well over the whole when a new texture is applied.The work developed by image sequence through temporal smoothing.Graphcut Zelinka et al.[8]is an improvement over Textureshop.It segmentation is adopted for handling object occlusions by reduces user interaction of object specification by employ- extracting new appearing parts in each frame. ing efficient object cutout algorithm [9].In addition,jump Our optimized retexturing approach has the following map-based synthesis [10]is adopted to speed up the new features: computation process.Instead of texture synthesis,our method belongs to the texture mapping approach.Texture ● A two-step mesh guided process for image texture replacement can be efficiently carried out after the mesh replacement.Coupled with recovered normal field, parameterization is completed. visually pleasing deformation effect of the replaced The method presented in [4]warps an entire texture onto texture is produced by performing stretch-based photographed surface.It minimizes one energy function of mesh parameterization.Furthermore,a Poisson- a spring network with known evenly distributed rectilinear based refinement process is used to improve the grid in texture space.In most cases,the specified region of effect and enhance the efficiency. the image is a usually irregular grid.Hence,it is difficult for Creation of special retexturing effects.Based on mesh this approach to accurately control the mapping position of parameterization,we can easily generate a replace- the replaced texture. ment effect with progressively variant texton scales. For extracting deformation fields of textures in natural In addition,texture discontinuities can be realisti- images,Liu et al.introduced one user-assisted adjustment cally simulated in self-occlusion regions,which are scheme on the regular lattice of real texture [2].A bijective usually difficult to produce for most previous mapping between the regular lattice and its deformed approaches. shape on the surface image is obtained.Any new texture An optimized framework of video retexturing.We extend can thus be replaced onto the source image by exerting the and generalize our image retexturing approach to video.Rather than presegmenting the ROI through- corresponding mapping.Since this method often requires elaborate user interaction,it is more suitable for regular/ out the whole image sequence,our approach only near-regular textures. needs to select a few of key frames.The generated results are optimally propagated to the rest frames. Besides image texture replacement,recent research Texture drifting and visibility shifting are also demonstrated that material properties of objects can be changed in image space [11].Exploiting the fact that human tackled effectively. vision is surprisingly tolerant of certain physical inaccura- The rest of the paper is organized as follows:The related cies,Khan et al.reconstructed depth map of the concerned work is described in Section 2.Our optimized retexturing object with other environment parameters [11]and realized approach for image is presented in Section 3.In Section 4, compelling material editing effects using complex relight- the image retexturing approach is extended and generalized ing techniques to video.The experimental results are presented in Section 5. Finally,we draw conclusions and point out the future work. 2.2 Video Texture Replacement Rototexture [4]generalized the method of Textureshop [1] 2 RELATED WORK to video.It provides two means of texturing a raw video sequence,namely,texture mapping and texture synthesis. This paper is made possible by many inspirations from The texture mapping method uses one nonlinear optimiza- previous work on image and video texture replacement.tion of a spring model to control the behavior of texture
texture replacements are conducted [4], [5]. Second, the generated results are propagated to the rest frames. These methods either need cumbersome interaction to locate the region of interest (ROI) frame by frame [4] or utilize special color-coded pattern as input [5]. In this paper, we propose one novel approach for optimized retexturing on image and video. For image texture replacement, we formulate texture distortion as one stretch-based parameterization. The ROI is represented as a feature-mesh coupled with normal field. The corresponding mesh of the feature-mesh is computed in texture space during parameterization. For simulating the effect of texture deformation at fine scale, one Poisson-based refinement process is developed. Based on our image-retexturing scheme, we design one key-frame-based video retexturing approach similar to RotoTexture [4]. Once replacing textures of the specified regions of key frames, these generated effects are iteratively transferred to the rest frames. For achieving temporal coherence, mesh points of key frame serve as features that are tracked and further optimized using motion analysis as well over the whole image sequence through temporal smoothing. Graphcut segmentation is adopted for handling object occlusions by extracting new appearing parts in each frame. Our optimized retexturing approach has the following new features: . A two-step mesh guided process for image texture replacement. Coupled with recovered normal field, visually pleasing deformation effect of the replaced texture is produced by performing stretch-based mesh parameterization. Furthermore, a Poissonbased refinement process is used to improve the effect and enhance the efficiency. . Creation of special retexturing effects. Based on mesh parameterization, we can easily generate a replacement effect with progressively variant texton scales. In addition, texture discontinuities can be realistically simulated in self-occlusion regions, which are usually difficult to produce for most previous approaches. . An optimized framework of video retexturing. We extend and generalize our image retexturing approach to video. Rather than presegmenting the ROI throughout the whole image sequence, our approach only needs to select a few of key frames. The generated results are optimally propagated to the rest frames. Texture drifting and visibility shifting are also tackled effectively. The rest of the paper is organized as follows: The related work is described in Section 2. Our optimized retexturing approach for image is presented in Section 3. In Section 4, the image retexturing approach is extended and generalized to video. The experimental results are presented in Section 5. Finally, we draw conclusions and point out the future work. 2 RELATED WORK This paper is made possible by many inspirations from previous work on image and video texture replacement. Since tracking objects throughout video sequence is one important step of our approach, it is briefly reviewed. 2.1 Image Texture Replacement The pioneering work on texture replacement dealt with extracting lighting map from given image [3]. Based on certain lighting distribution models, Tsin et al. introduced one Bayesian framework for near-regular texture, which relies on the color observation at each pixel [3]. Oh et al. assumed that large-scale luminance variations are due to geometry and lighting [6] and presented an algorithm for decoupling texture luminance from image by applying an improved bilateral filter. Currently, accurate recovery of lighting from natural image is still a challenging problem. Assuming that object appearance satisfies the Lambertian reflectance model, Textureshop [1] recovered normal field of specified region using a simplified shape-fromshading algorithm [7]. One propagation rule of adjacent texture coordinates is deduced to guide a normal-related synthesis process. The limitation of employing texture synthesis is that the synthesis process must be re-executed when a new texture is applied. The work developed by Zelinka et al. [8] is an improvement over Textureshop. It reduces user interaction of object specification by employing efficient object cutout algorithm [9]. In addition, jump map-based synthesis [10] is adopted to speed up the computation process. Instead of texture synthesis, our method belongs to the texture mapping approach. Texture replacement can be efficiently carried out after the mesh parameterization is completed. The method presented in [4] warps an entire texture onto photographed surface. It minimizes one energy function of a spring network with known evenly distributed rectilinear grid in texture space. In most cases, the specified region of the image is a usually irregular grid. Hence, it is difficult for this approach to accurately control the mapping position of the replaced texture. For extracting deformation fields of textures in natural images, Liu et al. introduced one user-assisted adjustment scheme on the regular lattice of real texture [2]. A bijective mapping between the regular lattice and its deformed shape on the surface image is obtained. Any new texture can thus be replaced onto the source image by exerting the corresponding mapping. Since this method often requires elaborate user interaction, it is more suitable for regular/ near-regular textures. Besides image texture replacement, recent research demonstrated that material properties of objects can be changed in image space [11]. Exploiting the fact that human vision is surprisingly tolerant of certain physical inaccuracies, Khan et al. reconstructed depth map of the concerned object with other environment parameters [11] and realized compelling material editing effects using complex relighting techniques. 2.2 Video Texture Replacement Rototexture [4] generalized the method of Textureshop [1] to video. It provides two means of texturing a raw video sequence, namely, texture mapping and texture synthesis. The texture mapping method uses one nonlinear optimization of a spring model to control the behavior of texture GUO ET AL.: MESH-GUIDED OPTIMIZED RETEXTURING FOR IMAGE AND VIDEO 427
428 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,VOL.14,NO.2,MARCH/APRIL 2008 image that is deformed to match the evolvement of normal field throughout the video.For the synthesis method,the minimum advection tree is constructed to deal with the visibility issue due to dynamic motions of moving objects. Such tree determines the initial frame for each image cluster and the advection for clusters among frames.The main challenge of video texture replacement is how to stably track the moving objects and their interior regions.At present,accurately tracking moving objects of dynamic video is an open problem.The replaced textures drift in the experimental results [4]. (a) 6 For stably tracking moving objects and their interior Fig.1.Mesh generation.(a)The input image.(b)The generated mesh parts,Scholz and Magnor presented one system of video The yellow dots are detected by Canny operator,whereas the green texture replacement [5]using color-coded patterns.The ones are added automatically with a distance threshold to maintain deformation process of the texture throughout video clip mesh uniformity. can be accurately extracted.Since the deformation is accurate,compelling results can be achieved.However, into several nearly developable parts,each of them is videos are usually captured by off-the-shelf camera without handled using the texture replacement.Based on this the color-coded patterns,the system is not applicable to assumption,the basic idea is converting reconstruction of them.Our approach is designed for those videos in which the underlying 3D surface of ROI into computation of its the special patterns are unavailable. corresponding mesh in texture space.Using projective Recently,White and Forsyth proposed another video geometry,we further formulate the retexturing task as a retexturing method [12].At coarse scale,old texture is stretch-based mesh parameterization problem.After the replaced with a new one by tracking deforming surface in parameterization is completed,the result is further refined 2D.At fine scale,local irradiance is estimated to preserve with one Poisson-based refinement process the structure information in real lighting environment. Since local irradiance estimation is difficult and unreliable, 3.1 Mesh Generation the approach is limited to screen printing with a finite We first generate an initial mesh on the concerned region number of colors.Our method can be applied to video and make its shape consistent with the underlying sequences with rich color details. geometry of this region.Mesh generation for image was 2.3 Object Tracking addressed in motion compensation for video compression Object tracking is the process of locating moving object [20].The content-based mesh was computed by extracting a throughout the whole image sequence taken by video set of feature points followed by Delaunay triangulation. camera.For general object motion,the nonparametric Our algorithm as follows shares the same idea in [20]. algorithm such as optical flow [13]can be applied.When First,the concerned region is specified interactively by the motion can be described using simple models,methods outlining the boundary of ROI using snakes.For reducing based on feature points and parametric models are more user intervention,our approach supports extracting ROI preferable [14].For instance,Jin et al.presented one using the up-to-date segmentation techniques [21],[9]. combined model of geometry and photometry to track Second,we employ the edge detection operator,for features and detect outliers in video [15].Contour tracking example,Canny operator,for extracting some feature points can be more effective for nonrigid objects than isolated inside ROI.For keeping uniformity of the points,some point tracking.Agarwala et al.[16]and Wang et al.[17] auxiliary ones are usually added.Finally,the constrained introduced frameworks for tracking contours of moving Delaunay triangulation algorithm is adopted to generate a objects in video sequences,which are based on spatiotem- feature-consistent mesh.Fig.1 shows an example of mesh poral optimization and user assistance.Chuang et al. generation. described one method of accurately tracking specified trimap for video matting [18].A trimap is one labeling 3.2 Mesh parameterization image in which 0 stands for background,1 stands for Let M denote the generated mesh of input image.M is one foreground,and the rest is the unknown region to be 2D mesh that represents the 2D projection of 3D surface of labeled.Stably tracking of the trimap is carried out based on ROI.If normal vector of every mesh point of M is recovered, robust optical flow algorithm [19]. the normal field of M will encode the geometry shape of the underlying surface.For obtaining the distortion effect of the 3 IMAGE RETEXTURING new texture,it is feasible to first parameterize M,then map the new texture onto ROI.Since M is one 2D mesh, The key issue of image texture replacement is how to parameterizing M onto the texture space is 2D-to-2D,which preserve the distortion effect of texture,as well as the can be computed using the geometry information of M. shading effect encoded in the original image.Texture Let M'be the parameterized mesh in texture space. distortion is mainly caused by the undulation of underlying Theoretically,M'can be completely determined by lengths surface of object in the given image.Assume that the of all edges and topology of M.For avoiding artifacts,the surface where texture replacement is performed on,is topology of M'should be the same as that of M.The length nearly developable.Otherwise,the surface can be divided of each edge in M'should be ideally equal to the 3D length
image that is deformed to match the evolvement of normal field throughout the video. For the synthesis method, the minimum advection tree is constructed to deal with the visibility issue due to dynamic motions of moving objects. Such tree determines the initial frame for each image cluster and the advection for clusters among frames. The main challenge of video texture replacement is how to stably track the moving objects and their interior regions. At present, accurately tracking moving objects of dynamic video is an open problem. The replaced textures drift in the experimental results [4]. For stably tracking moving objects and their interior parts, Scholz and Magnor presented one system of video texture replacement [5] using color-coded patterns. The deformation process of the texture throughout video clip can be accurately extracted. Since the deformation is accurate, compelling results can be achieved. However, videos are usually captured by off-the-shelf camera without the color-coded patterns, the system is not applicable to them. Our approach is designed for those videos in which the special patterns are unavailable. Recently, White and Forsyth proposed another video retexturing method [12]. At coarse scale, old texture is replaced with a new one by tracking deforming surface in 2D. At fine scale, local irradiance is estimated to preserve the structure information in real lighting environment. Since local irradiance estimation is difficult and unreliable, the approach is limited to screen printing with a finite number of colors. Our method can be applied to video sequences with rich color details. 2.3 Object Tracking Object tracking is the process of locating moving object throughout the whole image sequence taken by video camera. For general object motion, the nonparametric algorithm such as optical flow [13] can be applied. When the motion can be described using simple models, methods based on feature points and parametric models are more preferable [14]. For instance, Jin et al. presented one combined model of geometry and photometry to track features and detect outliers in video [15]. Contour tracking can be more effective for nonrigid objects than isolated point tracking. Agarwala et al. [16] and Wang et al. [17] introduced frameworks for tracking contours of moving objects in video sequences, which are based on spatiotemporal optimization and user assistance. Chuang et al. described one method of accurately tracking specified trimap for video matting [18]. A trimap is one labeling image in which 0 stands for background, 1 stands for foreground, and the rest is the unknown region to be labeled. Stably tracking of the trimap is carried out based on robust optical flow algorithm [19]. 3 IMAGE RETEXTURING The key issue of image texture replacement is how to preserve the distortion effect of texture, as well as the shading effect encoded in the original image. Texture distortion is mainly caused by the undulation of underlying surface of object in the given image. Assume that the surface where texture replacement is performed on, is nearly developable. Otherwise, the surface can be divided into several nearly developable parts, each of them is handled using the texture replacement. Based on this assumption, the basic idea is converting reconstruction of the underlying 3D surface of ROI into computation of its corresponding mesh in texture space. Using projective geometry, we further formulate the retexturing task as a stretch-based mesh parameterization problem. After the parameterization is completed, the result is further refined with one Poisson-based refinement process. 3.1 Mesh Generation We first generate an initial mesh on the concerned region and make its shape consistent with the underlying geometry of this region. Mesh generation for image was addressed in motion compensation for video compression [20]. The content-based mesh was computed by extracting a set of feature points followed by Delaunay triangulation. Our algorithm as follows shares the same idea in [20]. First, the concerned region is specified interactively by outlining the boundary of ROI using snakes. For reducing user intervention, our approach supports extracting ROI using the up-to-date segmentation techniques [21], [9]. Second, we employ the edge detection operator, for example, Canny operator, for extracting some feature points inside ROI. For keeping uniformity of the points, some auxiliary ones are usually added. Finally, the constrained Delaunay triangulation algorithm is adopted to generate a feature-consistent mesh. Fig. 1 shows an example of mesh generation. 3.2 Mesh parameterization Let M denote the generated mesh of input image. M is one 2D mesh that represents the 2D projection of 3D surface of ROI. If normal vector of every mesh point of M is recovered, the normal field of M will encode the geometry shape of the underlying surface. For obtaining the distortion effect of the new texture, it is feasible to first parameterize M, then map the new texture onto ROI. Since M is one 2D mesh, parameterizing M onto the texture space is 2D-to-2D, which can be computed using the geometry information of M. Let M0 be the parameterized mesh in texture space. Theoretically, M0 can be completely determined by lengths of all edges and topology of M. For avoiding artifacts, the topology of M0 should be the same as that of M. The length of each edge in M0 should be ideally equal to the 3D length 428 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 14, NO. 2, MARCH/APRIL 2008 Fig. 1. Mesh generation. (a) The input image. (b) The generated mesh. The yellow dots are detected by Canny operator, whereas the green ones are added automatically with a distance threshold to maintain mesh uniformity
GUO ET AL.:MESH-GUIDED OPTIMIZED RETEXTURING FOR IMAGE AND VIDEO 429 Parameterization Surface Texture mapping VA Image space Texture space A d B Fig.3.In virtue of the normal field,simulating texture distortion is Image Plane converted into one 2D-to-2D parameterization,after which texture mapping is then applied. Camera The 3D length of each edge in M can be figured out using the above equation.Although some approximations Fig.2.Calculation of the 3D lengthFD for the observed edge e(AB). are made,our experimental results show that the method V:view direction,N:the projection of edge e(FD)'s normal vector on the can generate visually pleasing effects. plane OAB,CD//AB,and CD LCE. 3.2.2 The Stretch-Based Parameterization of its corresponding edge in M.The 3D length reflects the With the calculated edge lengths,M'can be obtained by edge length on the underlying surface.In the following one stretch-based parameterization method. sections,we first introduce our method of computing the Mesh parameterization has been extensively studied in 3D length of each edge in M,then present the stretch-based computer graphics [22],[23],[24],[25],[26],[271,[28],and parameterization scheme for computing M'in texture [29].These methods take different metric criterions as the space. objective of energy minimization process.Among them, stretch-based methods work well when reducing the global 3.2.1 Computing Length of 3D Edge mesh distortion [22],[25],[29].Our concern is computing Since normal field encodes the 3D geometry information of one 2D-to-2D parameterization,which is different from the M,we first recover one rough normal field of M using the aforementioned 3D-to-2D process(Fig.3). approach in [1].The shape-from-shading algorithm in [1] We first describe some related notations.Let {P= has a linear complexity when recovering the normal field (i,)i=1,...,n}denote the nodes of M.Qi =(ui,vi)li= for diffuse surface.It is easy to implement and quite 1,...,n}represent the corresponding nodes of M'to be effective.For more details,please refer to [1]and [8]. solved in texture space.M'has the same topology as M,Qi With the recovered normal vector,3D length of each corresponds to Pi,and edge e(QiQj)corresponds to e(PP) edge in M can be calculated,as illustrated in Fig.2.The 3D length lij of e(Q:Qj)is obtained using the aforemen- Suppose the observed length of edge e(AB)in the input tioned method. image is d.Its corresponding 3D length is the distance M'is completely characterized by the lengths of its between F and D on the underlying surface.The length is edges.As each edge of M'has been figured out,M'can be denoted as FDll. computed by minimizing the following energy function: Let N be the projection of normal vector of edge e(FD) on the plane OAB.It can be calculated first by averaging the =∑(Io-0-)/ (4) recovered normal vectors of A and B,then projecting the (i.j)Eedges average vector onto plane OAB.From the geometry relationship in Fig.2,we derive the length of ED: Exploiting the symmetry of the above equation,energy gradients on point Q;are IEDI=CDI/(V·N)=d/V·N): (1) where V is the view direction of camera,both V and N have B=8∑(le-0P-)°4-西/g, (i.j)Eedges been normalized.With an approximation FDEDIl,the 3D length of edge e(AB)can be expressed as follows: l=IFDI=d/(V·N) (2) 5=8∑(IlQ-Q,P-)°-/o (ij)Eedges In the above equation,d'is determined by the observed When M is dense,directly solving the above equations length dofe(AB)and scene depth ofe(FD).In most cases,the may cause adjacent triangles of M'to flip over,leading to object is relatively far from camera.It is reasonable to assume invalid topology.This is caused by inverting the orientation that the scene depths for all edges in M are close enough with of the three points of triangle.To tackle this issue,we revise small variation.Consequently,the homogeneous scale factor the energy function by penalizing the orientations of of every edge of M can be eliminated.The surface length of triangles using the sgn function [30]. edge e(AB)can then be approximated as Assume that the adjacent triangles incident upon edge FDABI/(V.N)=d/(V.N). (3) e(QiQj)are To =T(QiQRQj)and To2 =T(QiQjQ2). Their corresponding triangles in the input image are
of its corresponding edge in M. The 3D length reflects the edge length on the underlying surface. In the following sections, we first introduce our method of computing the 3D length of each edge in M, then present the stretch-based parameterization scheme for computing M0 in texture space. 3.2.1 Computing Length of 3D Edge Since normal field encodes the 3D geometry information of M, we first recover one rough normal field of M using the approach in [1]. The shape-from-shading algorithm in [1] has a linear complexity when recovering the normal field for diffuse surface. It is easy to implement and quite effective. For more details, please refer to [1] and [8]. With the recovered normal vector, 3D length of each edge in M can be calculated, as illustrated in Fig. 2. Suppose the observed length of edge eðABÞ in the input image is d. Its corresponding 3D length is the distance between F and D on the underlying surface. The length is denoted as kFDk. Let N be the projection of normal vector of edge eðFDÞ on the plane OAB. It can be calculated first by averaging the recovered normal vectors of A and B, then projecting the average vector onto plane OAB. From the geometry relationship in Fig. 2, we derive the length of kEDk: kEDk¼kCDk=ðV NÞ ¼ d0 =ðV NÞ; ð1Þ where V is the view direction of camera, both V and N have been normalized. With an approximation kFDk¼: kEDk, the 3D length of edge eðABÞ can be expressed as follows: l ¼ kFDk¼: d0 =ðV NÞ: ð2Þ In the above equation, d0 is determined by the observed length d of eðABÞ and scene depth of eðFDÞ. In most cases, the object is relatively far from camera. It is reasonable to assume that the scene depths for all edges inMare close enough with small variation. Consequently, the homogeneous scale factor of every edge of M can be eliminated. The surface length of edge eðABÞ can then be approximated as l ¼ kFDk¼: kABk=ðV NÞ ¼ d=ðV NÞ: ð3Þ The 3D length of each edge in M can be figured out using the above equation. Although some approximations are made, our experimental results show that the method can generate visually pleasing effects. 3.2.2 The Stretch-Based Parameterization With the calculated edge lengths, M0 can be obtained by one stretch-based parameterization method. Mesh parameterization has been extensively studied in computer graphics [22], [23], [24], [25], [26], [27], [28], and [29]. These methods take different metric criterions as the objective of energy minimization process. Among them, stretch-based methods work well when reducing the global mesh distortion [22], [25], [29]. Our concern is computing one 2D-to-2D parameterization, which is different from the aforementioned 3D-to-2D process (Fig. 3). We first describe some related notations. Let fPi ¼ ðxi; yiÞji ¼ 1; ... ; ng denote the nodes of M. fQi ¼ ðui; viÞji ¼ 1; ... ; ng represent the corresponding nodes of M0 to be solved in texture space. M0 has the same topology as M, Qi corresponds to Pi, and edge eðQiQjÞ corresponds to eðPiPjÞ. The 3D length lij of eðQiQjÞ is obtained using the aforementioned method. M0 is completely characterized by the lengths of its edges. As each edge of M0 has been figured out, M0 can be computed by minimizing the following energy function: El ¼ X ði;jÞ2edges Qi Qj 2 l 2 ij 2. l 2 ij: ð4Þ Exploiting the symmetry of the above equation, energy gradients on point Qi are @El @ui ¼ 8 X ði;jÞ2edges Qi Qj 2 l 2 ij 2 ðui ujÞ=l2 ij; ð5Þ @El @vi ¼ 8 X ði;jÞ2edges Qi Qj 2 l 2 ij 2 ðvi vjÞ=l2 ij: ð6Þ When M is dense, directly solving the above equations may cause adjacent triangles of M0 to flip over, leading to invalid topology. This is caused by inverting the orientation of the three points of triangle. To tackle this issue, we revise the energy function by penalizing the orientations of triangles using the sgn function [30]. Assume that the adjacent triangles incident upon edge eðQiQjÞ are TQ1 ¼ TðQiQk1QjÞ and TQ2 ¼ TðQiQjQk2Þ. Their corresponding triangles in the input image are GUO ET AL.: MESH-GUIDED OPTIMIZED RETEXTURING FOR IMAGE AND VIDEO 429 Fig. 2. Calculation of the 3D length kFDk for the observed edge eðABÞ. V : view direction, N: the projection of edge eðFDÞ’s normal vector on the plane OAB, CD==AB, and CD ? CE. Fig. 3. In virtue of the normal field, simulating texture distortion is converted into one 2D-to-2D parameterization, after which texture mapping is then applied.
430 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,VOL.14,NO.2,MARCH/APRIL 2008 interior values of the function with known Dirichlet boundary condition and a guidance vector field.Due to its sparse linear property,the Poisson equation can be solved efficiently using conjugate gradient method or multigrid method [34]. Our Poisson-based algorithm adopts the normal-guided propagation rules for texture synthesis developed in Image space Texture space Textureshop [1].These rules describe the offsets of texture coordinates among adjacent pixels.We rewrite them as Fig.4.The orientation of the triangles(a)in image space should keep follows: consistent with that of their corresponding ones (b)in texture space. u(x+1,y)-u(x,y)=fur(N(x,y)), (9) TPI =T(PiPl Pi),and Tp2 =T(PiPi P2)(Fig.4).For each pair of corresponding triangles,the orientations of points v(x+1,))-v(x,=fm(N(x) (10) should be equal.To achieve this,we define wij=sgn min(det(QiQ,QjQridet(P.Pei,P;Pe), u(x,y+1)-u(x,y)=f(N(x,) (11) det(oQa,QQa)·det(BP,BPa)) v(,y+1)-(x,)=fw(N(x,) (12) (7) where (u(r,y),u(,y))is the texture coordinate of the pixel (y)in the concerned region,N(r,y)=(N,Ny,N:)is the The energy function is then transformed into normal vector at pixel(x,y),and =∑(I.-Q,2-)/% (8) fa(N(c,)=(1+N:-N)/(1+N)N), (13) (i,j∈edges where the coefficient w;penalizes the triangle in M'whose fw(N(,y))=(NzNu)/((1+N2)Na), (14) orientation of points flips over with respect to its corre- sponding triangle in M.If so,wij is chosen as-1,otherwise, fw(N(x,)=(1+N:-N)/(1+N)N). (15) +1.With this energy function,a valid mesh in texture space is obtained. Since the texture coordinates of nodes in M are available, The minimal value of (8)is computed by the multi- (9)-(12)can be used directly to calculate texture coordinates dimensional Newton's method.For each iteration of New- of the interior pixels of triangles in M.In practice,this may ton's method,one multigrid solver is adopted for solving result in a wired mapping.For avoiding it,the texture the sparse linear equations.In practice,it converges to the coordinates are obtained by solving the energy minimiza- final solution within several seconds. tion problem below with respect to the u component.The v Once M'is obtained with the parameterization process, component can be computed in a similar way: a new texture can be mapped onto the ROI of the input image.Since the parameterization takes into account the minu(z.y) IVu(z,y)-Du(y)2 (16) underlying geometry of ROI,the new texture deforms naturally with respect to the underlying surface.Our where Vu(x,y)=(u(x+1,y)-u(x,y),u(x,y+1)-u(x,y)), experimental results demonstrate that the distortion effects and Du(,y)=(fur(N(,))fu(N(,y)). of the new textures are visually pleasing. Minimizing (16)yields a set of Poisson equations: 3.3 Poisson-Based Refinement △u(x,y)=divDu(z,y), (17) After the stretch-based parameterization,texture coordinates where A and div represent the Laplacian and divergence of the nodes in M have been obtained.Texture coordinates of operators,respectively.We adopt a multigrid solver to the interior pixels of triangles in M can be computed by obtain the solution with high efficiency. interpolating the obtained ones using barycenter coordinates Unlike the widely used Dirichlet Boundary conditions or the Radial Basis Functions (RBFs).However,such [32],[34]of the generic Poisson process,the external force in interpolation techniques cannot reflect the distortion effect our Poisson equations is imposed by the discrete texture of the new texture in the interior of each triangle.For coordinates of nodes of M. obtaining natural and smoother distortion,we design a Poisson-based refinement process instead of using interpola- 3.4 Lighting Effect Transfer tion techniques with barycenter coordinates or the RBFs. Using texture coordinates deduced by the stretch-based The origin of Poisson equation is from Isaac Newton's parameterization and Poisson-based refinement process,a laws of gravitation [31].It has been widely used in new texture is mapped onto ROI of the input image to computer graphics,including seamlessly image editing overwrite the old one.Due to the lack of simulating lighting [32],digital photomontage [33],gradient field mesh manip- effect exhibited in ROI,the mapping result looks flattening. ulation [34],and mesh metamorphosis [35].The main For realistic appearance,transferring the lighting effect principle of Poisson equation lies in how to compute the must be considered
TP1 ¼ TðPiPk1PjÞ, and TP2 ¼ TðPiPjPk2Þ (Fig. 4). For each pair of corresponding triangles, the orientations of points should be equal. To achieve this, we define wij ¼ sgn min det QiQk1 !; QjQk1 ! det PiPk1 !; PjPk1 ! ; det QiQk2 !; QjQk2 ! det PiPk2 !; PjPk2 ! : ð7Þ The energy function is then transformed into El ¼ X ði;jÞ2edges wij kQi Qjk2 l 2 ij 2. l 2 ij; ð8Þ where the coefficient wij penalizes the triangle in M0 whose orientation of points flips over with respect to its corresponding triangle in M. If so, wij is chosen as 1, otherwise, þ1. With this energy function, a valid mesh in texture space is obtained. The minimal value of (8) is computed by the multidimensional Newton’s method. For each iteration of Newton’s method, one multigrid solver is adopted for solving the sparse linear equations. In practice, it converges to the final solution within several seconds. Once M0 is obtained with the parameterization process, a new texture can be mapped onto the ROI of the input image. Since the parameterization takes into account the underlying geometry of ROI, the new texture deforms naturally with respect to the underlying surface. Our experimental results demonstrate that the distortion effects of the new textures are visually pleasing. 3.3 Poisson-Based Refinement After the stretch-based parameterization, texture coordinates of the nodes inMhave been obtained. Texture coordinates of the interior pixels of triangles in M can be computed by interpolating the obtained ones using barycenter coordinates or the Radial Basis Functions (RBFs). However, such interpolation techniques cannot reflect the distortion effect of the new texture in the interior of each triangle. For obtaining natural and smoother distortion, we design a Poisson-based refinement process instead of using interpolation techniques with barycenter coordinates or the RBFs. The origin of Poisson equation is from Isaac Newton’s laws of gravitation [31]. It has been widely used in computer graphics, including seamlessly image editing [32], digital photomontage [33], gradient field mesh manipulation [34], and mesh metamorphosis [35]. The main principle of Poisson equation lies in how to compute the interior values of the function with known Dirichlet boundary condition and a guidance vector field. Due to its sparse linear property, the Poisson equation can be solved efficiently using conjugate gradient method or multigrid method [34]. Our Poisson-based algorithm adopts the normal-guided propagation rules for texture synthesis developed in Textureshop [1]. These rules describe the offsets of texture coordinates among adjacent pixels. We rewrite them as follows: uðx þ 1; yÞ uðx; yÞ ¼ fuxð Þ Nðx; yÞ ; ð9Þ vðx þ 1; yÞ vðx; yÞ ¼ fuvð Þ Nðx; yÞ ; ð10Þ uðx; y þ 1Þ uðx; yÞ ¼ fuvð Þ Nðx; yÞ ; ð11Þ vðx; y þ 1Þ vðx; yÞ ¼ fvyð Þ Nðx; yÞ ; ð12Þ where ðuðx; yÞ; vðx; yÞÞ is the texture coordinate of the pixel ðx; yÞ in the concerned region, Nðx; yÞ¼ðNx; Ny; NzÞ is the normal vector at pixel ðx; yÞ, and fuxð Þ¼ð Nðx; yÞ 1 þ Nz N2 y Þ=ð Þ ð1 þ NzÞNz ; ð13Þ fuvð Þ¼ð Nðx; yÞ NxNyÞ=ð Þ ð1 þ NzÞNz ; ð14Þ fvyð Þ¼ð Nðx; yÞ 1 þ Nz N2 xÞ=ð Þ ð1 þ NzÞNz : ð15Þ Since the texture coordinates of nodes in M are available, (9)-(12) can be used directly to calculate texture coordinates of the interior pixels of triangles in M. In practice, this may result in a wired mapping. For avoiding it, the texture coordinates are obtained by solving the energy minimization problem below with respect to the u component. The v component can be computed in a similar way: minuðx;yÞ Z M j j ruðx; yÞ Duðx; yÞ 2 ; ð16Þ where ruðx; yÞ¼ðuðx þ 1; yÞ uðx; yÞ, uðx; y þ 1Þ uðx; yÞÞ, and Duðx; yÞ¼ðfuxðNðx; yÞÞ; fuvðNðx; yÞÞ. Minimizing (16) yields a set of Poisson equations: uðx; yÞ ¼ divDuðx; yÞ; ð17Þ where and div represent the Laplacian and divergence operators, respectively. We adopt a multigrid solver to obtain the solution with high efficiency. Unlike the widely used Dirichlet Boundary conditions [32], [34] of the generic Poisson process, the external force in our Poisson equations is imposed by the discrete texture coordinates of nodes of M. 3.4 Lighting Effect Transfer Using texture coordinates deduced by the stretch-based parameterization and Poisson-based refinement process, a new texture is mapped onto ROI of the input image to overwrite the old one. Due to the lack of simulating lighting effect exhibited in ROI, the mapping result looks flattening. For realistic appearance, transferring the lighting effect must be considered. 430 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 14, NO. 2, MARCH/APRIL 2008 Fig. 4. The orientation of the triangles (a) in image space should keep consistent with that of their corresponding ones (b) in texture space.