Computers Graphics 38 (2014)174-182 Contents lists available at ScienceDirect Computers Graphics ELSEVIER journal homepage:www.elsevier.com/locate/cag CAD/Graphics 2013 Efficient view manipulation for cuboid-structured images CrossMark Yanwen Guo .*Guiping Zhang,Zili Lan3,Wenping Wang State Key Lab for Novel Software Technology.Nanjing University.PR China Department of Computer Science,The University of Hong Kong Hong Kong ARTICLE INFO ABSTRACT Article history: We present in this paper an efficient algorithm for manipulating the viewpoints of cuboid-structured Received 31 July 2013 images with moderate user interaction.Such images are very popular,and we first recover an Received in revised form approximate geometric model with the prior knowledge of the latent cuboid.While this approximated 27 October 2013 Accepted 28 October 2013 cuboid structure does not provide an accurate scene reconstruction,we demonstrate that it is sufficient Available online 13 November 2013 to re-render the images realistically under new viewpoints in a nearly geometrically accurate manner. The new image with high visual quality is generated by making the rest image region deform in Keywords: accordance with the re-projected cuboid structure,via a triangular mesh deformation scheme.The View manipulation energy function has been carefully designed to be a quadratic function so that it can be efficiently Mesh deformation Optimization minimized via solving a sparse linear system.We verify the effectiveness of our technique through testing images with standard and non-standard cuboid structures,and demonstrate an application of upright adjustment of photographs and a user interface which enables the user to watch the scene under new viewpoints on a viewing sphere interactively. 2013 Elsevier Ltd.All rights reserved 1.Introduction Stereoscopic devices and content relying on stereopsis are now widely available,and the problem of manipulating perspective in Advances in imaging technology and hardware improvement of stereoscopic pairs is addressed in [2].Assuming that depth digital cameras result in continuous improvements of image quality. variations of the scene relative to its distance from the camera People can take high quality photos at high resolutions without are small,slanted man-made structures can be straightened up by always suffering from noises,low contrast,and blur that may an improved homography model [3]. degrade photo quality,more easily than before.However,photos We do not intend to study the aesthetics of whether or not a taken by amateur photographers are often with bad viewpoints.for photograph looks visually pleasing under the current viewpoint. instance slanted man-made structures and unbalanced compositions, Instead,our goal is to enable the generation of novel images with making scenes look dull and less vivid.On the other hand,when new viewpoints given only a single image as input,with moderate looking at a photo shared by friends or downloaded from Flicker or user assistance.To this end,our primary observation is that many Photobucket,people may imagine naturally what the scene looks like images of man-made scenes exhibit the cuboid dominated three- if it is taken under a new viewpoint.Automatic optimization of the dimensional structures,in which projections of two perpendicular viewpoint of a given photograph is thus desired. planes dominating the latent three-dimensional geometry.occu- For rendering the image with a novel viewpoint,direct repro- pying the major part of an image.A pair of projected parallel lines jection of the 3D latent scene remains elusive since accurate in each plane can be found in the image.Such an image either reconstruction of the whole scene is still challenging.Recent itself has a cuboid structure or its scene is dominated by a cuboid- efforts have been made to optimize perspective or to imitate re- like object.As shown in Fig.1,the cuboid-structured images are projection by means of image transformation.Manipulation of very popular,for example,the photos of buildings(upper left and photographic perspective is enabled in [1]by combining recent lower left).indoor scenes (upper right),apartments,and buses. image warping techniques and constraints from projective geo- Essentially,some photos exhibit latent cuboid structures.For metry.Heavy user assistance based on understanding of the basic example,the lower right photo of Fig.1 is such an image since principles of perspective construction is often required to accu- we can easily construct two perpendicular planes by specifying rately mark the image with a number of image space constraints. auxiliary lines shown as the dotted lines in this photo,even though physically one of the two planes containing the auxiliary lines does not exist.Such a cuboid structure is the major visual cue .Corresponding author.Tel.:+86 1391 3028 596:fax:+86 25 896 86596 to depict a three-dimensional scene and to convey perspective.By E-mail addresses:ywguo.nju@gmail.com,ywguo@nju.edu.cn (Y.Guo). manipulating the cuboid structure reconstructed by acceptable 0097-8493/S-see front matter e 2013 Elsevier Ltd.All rights reserved. htp:/dx.doi.org10.10160.cag2013.10.038
CAD/Graphics 2013 Efficient view manipulation for cuboid-structured images Yanwen Guo a,n , Guiping Zhang a , Zili Lan a , Wenping Wang b a State Key Lab for Novel Software Technology, Nanjing University, PR China b Department of Computer Science, The University of Hong Kong, Hong Kong article info Article history: Received 31 July 2013 Received in revised form 27 October 2013 Accepted 28 October 2013 Available online 13 November 2013 Keywords: View manipulation Mesh deformation Optimization abstract We present in this paper an efficient algorithm for manipulating the viewpoints of cuboid-structured images with moderate user interaction. Such images are very popular, and we first recover an approximate geometric model with the prior knowledge of the latent cuboid. While this approximated cuboid structure does not provide an accurate scene reconstruction, we demonstrate that it is sufficient to re-render the images realistically under new viewpoints in a nearly geometrically accurate manner. The new image with high visual quality is generated by making the rest image region deform in accordance with the re-projected cuboid structure, via a triangular mesh deformation scheme. The energy function has been carefully designed to be a quadratic function so that it can be efficiently minimized via solving a sparse linear system. We verify the effectiveness of our technique through testing images with standard and non-standard cuboid structures, and demonstrate an application of upright adjustment of photographs and a user interface which enables the user to watch the scene under new viewpoints on a viewing sphere interactively. & 2013 Elsevier Ltd. All rights reserved. 1. Introduction Advances in imaging technology and hardware improvement of digital cameras result in continuous improvements of image quality. People can take high quality photos at high resolutions without always suffering from noises, low contrast, and blur that may degrade photo quality, more easily than before. However, photos taken by amateur photographers are often with bad viewpoints, for instance slanted man-made structures and unbalanced compositions, making scenes look dull and less vivid. On the other hand, when looking at a photo shared by friends or downloaded from Flicker or Photobucket, people may imagine naturally what the scene looks like if it is taken under a new viewpoint. Automatic optimization of the viewpoint of a given photograph is thus desired. For rendering the image with a novel viewpoint, direct reprojection of the 3D latent scene remains elusive since accurate reconstruction of the whole scene is still challenging. Recent efforts have been made to optimize perspective or to imitate reprojection by means of image transformation. Manipulation of photographic perspective is enabled in [1] by combining recent image warping techniques and constraints from projective geometry. Heavy user assistance based on understanding of the basic principles of perspective construction is often required to accurately mark the image with a number of image space constraints. Stereoscopic devices and content relying on stereopsis are now widely available, and the problem of manipulating perspective in stereoscopic pairs is addressed in [2]. Assuming that depth variations of the scene relative to its distance from the camera are small, slanted man-made structures can be straightened up by an improved homography model [3]. We do not intend to study the aesthetics of whether or not a photograph looks visually pleasing under the current viewpoint. Instead, our goal is to enable the generation of novel images with new viewpoints given only a single image as input, with moderate user assistance. To this end, our primary observation is that many images of man-made scenes exhibit the cuboid dominated threedimensional structures, in which projections of two perpendicular planes dominating the latent three-dimensional geometry, occupying the major part of an image. A pair of projected parallel lines in each plane can be found in the image. Such an image either itself has a cuboid structure or its scene is dominated by a cuboidlike object. As shown in Fig. 1, the cuboid-structured images are very popular, for example, the photos of buildings (upper left and lower left), indoor scenes (upper right), apartments, and buses. Essentially, some photos exhibit latent cuboid structures. For example, the lower right photo of Fig. 1 is such an image since we can easily construct two perpendicular planes by specifying auxiliary lines shown as the dotted lines in this photo, even though physically one of the two planes containing the auxiliary lines does not exist. Such a cuboid structure is the major visual cue to depict a three-dimensional scene and to convey perspective. By manipulating the cuboid structure reconstructed by acceptable Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/cag Computers & Graphics 0097-8493/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.cag.2013.10.038 n Corresponding author. Tel.: þ86 1391 3028 596; fax: þ86 25 896 86596. E-mail addresses: ywguo.nju@gmail.com, ywguo@nju.edu.cn (Y. Guo). Computers & Graphics 38 (2014) 174–182
Y.Guo et aL.Computers Graphics 38(2014)174-182 175 A spidery mesh is employed to obtain a simple scene model from the central perspective image using a graphical interface. The animators utilize this incomplete scene information to make animation from the input pictures.Instead of attempting to recover precise geometry,a rough 3D environment is constructed from a single image by applying a statistical framework [8].The model is constructed directly from the learned geometric labels: ground,vertical and sky,on the image.None of the above methods aim to re-generate a new image with high visual quality as if it is captured from a novel viewpoint.In contrast,we only need to partially recover a cuboid dominated 3D representation of the image with moderate user interaction,the whole image is re-rendered by making the rest image region deform in accor- dance with the re-projection of the cuboid structure. Recently.the advances of shape deformation [9,10]and retar- geting techiques [11-15]make it possible to manipulate perspec- Fig.1.Images with cuboid structures. tive by the means of image deformation [1.A 2D image warp is computed by optimizing an energy function such that the entire warp is as shape-preserving as possible,and meanwhile satisfies optimization,we are able to simulate novel viewpoints and to the constraints originated from projective geometry.The user first render the new images with high visual quality by letting the rest annotates an image by marking a number of image space con- image region deform in accordance with the transformation of the straints,with pixel accuracy.User assistance is required to accu- cuboid structure. rately mark the image and manipulate its perspective with a Although high-quality 3D reconstruction from a single image number of image space constraints.Overall eight different types remains difficult,we can recover an approximation of the three- of constraints which may oppose directly each other are incorpo- dimensional cuboid structure easily,with only a few user-specified rated into the energy function.Taking care of these constraints auxiliary lines on the image.It should be noted that although cautiously for efficient optimization poses a challenge for amateur simplistic model of geometry can be recovered by leveraging small users.The problem of manipulating perspective in stereoscopic amounts of annotation [4]or by user annotation assisted scene pairs is addressed in [2].Given a new perspective,correspondence analysis [5,6]for the applications of augmented reality.there is no constraints between stereoscopic image pairs are determined,and doubt that fully automatic recovering the geometry without any user a warp for each image which preserves salient image features and intervention is still challenging.especially for the images with our so guarantees proper stereopsis relative to the new camera is called non-standard cuboid structures where some cuboid edges computed. never exist.We thus allow the users to specify some auxiliary lines Perspective projections are limited to fairly narrow view angles. with trivial user efforts.More importantly,we show that the re- Correction of image deformations incurred by projecting wide projection of this approximated cuboid structure is sufficient to meet fields of view onto a flat 2D display surface is address in[16,171. the requirement of accuracy of viewpoint changes.Given a new Our work is also inspired by the recent efforts on photo viewpoint,the image is rendered by optimizing a quadratic image composition assessment and enhancement [18-20].Most methods warping energy.The energy function incorporates the hard constraint build their measures of visual aesthetics on the rule of thirds of cuboid transformation,and constraints on shape and straight lines which means that an image should be imaged as divided into nine Without significant manual effort,the newly perspective image equal parts by two equally spaced horizontal lines and two vertical rendered is nearly geometrically accurate,and visually pleasing. lines,and important compositional elements should be placed Applications:Firstly,our technique can be used for correcting along these lines or their intersections.Bhattacharya et al.[18 those slanted structures of photos taken by casual photographers. learn a support vector regression model for capturing aesthetics. Secondly,unlike previous image deformation driven methods,we Image quality is improved by recomposing the salient object onto can generate novel images under key viewpoints around the the inpainted background or by using a visual weight balancing viewpoint of input image,with given viewing angles.This enables technique.Liu et al.[20]modify image composition by using a us to design an interface through which the user can watch the compound operator of crop-and-retarget and seek the solution by scene by changing viewpoints smoothly on a viewing sphere, particle swarm optimization. mimicking 3D browsing experience.The images under key view- points are interpolated to produce intermediate results. To summarize,our main contributions are as follows: 3.View manipulation of the cuboid structure .Present an algorithm for manipulating views of cuboid- structured images with very little user effort. Our view manipulation method is specifically designed to Show that re-projection of the approximated cuboid structure optimize viewpoints of those images that show cuboid-dominated is sufficient to meet the requirement of viewpoint change. three-dimensional structures.Extracting a 3D representation from a Provide an interface that allows the users to watch the scene single-view image depicting a 3D object has been a longstanding under new viewpoints on a viewing sphere interactively. goal of computer vision.It has been shown recently that 3D cuboids in single-view images can be automatically localized by using a discriminative parts-based detector [21.We allow the users to interactively specify projected lines of the latent cuboid structure on 2.Related work the image,with which we estimate an approximation of the cuboid geometry.Hough transform and Canny edge detector are used Manipulation of the perspective in a photograph for the tasks to assist users and to reduce interaction errors in this process of touring into the pictures is made possible by Horry et al.[7]. We show that,given a new viewpoint,the re-projection of this
optimization, we are able to simulate novel viewpoints and to render the new images with high visual quality by letting the rest image region deform in accordance with the transformation of the cuboid structure. Although high-quality 3D reconstruction from a single image remains difficult, we can recover an approximation of the threedimensional cuboid structure easily, with only a few user-specified auxiliary lines on the image. It should be noted that although simplistic model of geometry can be recovered by leveraging small amounts of annotation [4] or by user annotation assisted scene analysis [5,6] for the applications of augmented reality, there is no doubt that fully automatic recovering the geometry without any user intervention is still challenging, especially for the images with our so called non-standard cuboid structures where some cuboid edges never exist. We thus allow the users to specify some auxiliary lines with trivial user efforts. More importantly, we show that the reprojection of this approximated cuboid structure is sufficient to meet the requirement of accuracy of viewpoint changes. Given a new viewpoint, the image is rendered by optimizing a quadratic image warping energy. The energy function incorporates the hard constraint of cuboid transformation, and constraints on shape and straight lines. Without significant manual effort, the newly perspective image rendered is nearly geometrically accurate, and visually pleasing. Applications: Firstly, our technique can be used for correcting those slanted structures of photos taken by casual photographers. Secondly, unlike previous image deformation driven methods, we can generate novel images under key viewpoints around the viewpoint of input image, with given viewing angles. This enables us to design an interface through which the user can watch the scene by changing viewpoints smoothly on a viewing sphere, mimicking 3D browsing experience. The images under key viewpoints are interpolated to produce intermediate results. To summarize, our main contributions are as follows: Present an algorithm for manipulating views of cuboidstructured images with very little user effort. Show that re-projection of the approximated cuboid structure is sufficient to meet the requirement of viewpoint change. Provide an interface that allows the users to watch the scene under new viewpoints on a viewing sphere interactively. 2. Related work Manipulation of the perspective in a photograph for the tasks of touring into the pictures is made possible by Horry et al. [7]. A spidery mesh is employed to obtain a simple scene model from the central perspective image using a graphical interface. The animators utilize this incomplete scene information to make animation from the input pictures. Instead of attempting to recover precise geometry, a rough 3D environment is constructed from a single image by applying a statistical framework [8]. The model is constructed directly from the learned geometric labels: ground, vertical and sky, on the image. None of the above methods aim to re-generate a new image with high visual quality as if it is captured from a novel viewpoint. In contrast, we only need to partially recover a cuboid dominated 3D representation of the image with moderate user interaction, the whole image is re-rendered by making the rest image region deform in accordance with the re-projection of the cuboid structure. Recently, the advances of shape deformation [9,10] and retargeting techiques [11–15] make it possible to manipulate perspective by the means of image deformation [1]. A 2D image warp is computed by optimizing an energy function such that the entire warp is as shape-preserving as possible, and meanwhile satisfies the constraints originated from projective geometry. The user first annotates an image by marking a number of image space constraints, with pixel accuracy. User assistance is required to accurately mark the image and manipulate its perspective with a number of image space constraints. Overall eight different types of constraints which may oppose directly each other are incorporated into the energy function. Taking care of these constraints cautiously for efficient optimization poses a challenge for amateur users. The problem of manipulating perspective in stereoscopic pairs is addressed in [2]. Given a new perspective, correspondence constraints between stereoscopic image pairs are determined, and a warp for each image which preserves salient image features and guarantees proper stereopsis relative to the new camera is computed. Perspective projections are limited to fairly narrow view angles. Correction of image deformations incurred by projecting wide fields of view onto a flat 2D display surface is address in [16,17]. Our work is also inspired by the recent efforts on photo composition assessment and enhancement [18–20]. Most methods build their measures of visual aesthetics on the rule of thirds which means that an image should be imaged as divided into nine equal parts by two equally spaced horizontal lines and two vertical lines, and important compositional elements should be placed along these lines or their intersections. Bhattacharya et al. [18] learn a support vector regression model for capturing aesthetics. Image quality is improved by recomposing the salient object onto the inpainted background or by using a visual weight balancing technique. Liu et al. [20] modify image composition by using a compound operator of crop-and-retarget and seek the solution by particle swarm optimization. 3. View manipulation of the cuboid structure Our view manipulation method is specifically designed to optimize viewpoints of those images that show cuboid-dominated three-dimensional structures. Extracting a 3D representation from a single-view image depicting a 3D object has been a longstanding goal of computer vision. It has been shown recently that 3D cuboids in single-view images can be automatically localized by using a discriminative parts-based detector [21]. We allow the users to interactively specify projected lines of the latent cuboid structure on the image, with which we estimate an approximation of the cuboid geometry. Hough transform and Canny edge detector are used to assist users and to reduce interaction errors in this process. We show that, given a new viewpoint, the re-projection of this Fig. 1. Images with cuboid structures. Y. Guo et al. / Computers & Graphics 38 (2014) 174–182 175
176 Y.Guo et al.Computers Graphics 38(2014)174-182 system.The new imaging plane passing through the world center o is I'whose normal vector is 0o'.Let (Qo....Qs)denote the real coordinates of the 3D cuboid,corresponding to the projected points (po.....Ps)on the input image.We prove that the 2D projections of (Po.....Ps)on the new imaging plane I'is identical to the projections of (Qo.....Qs).by eliminating translation.To achieve this,we only need to prove Fig.2.Left:a standard case of the projection of a cuboid structure.Note that paP1=9091: 2) IPop4l=IpPsl and lPopal=IpiPal are not required.paps and pap3 are not necessa- rily the projections of two vertical edges of the cuboid.Right:the 3D cuboid structure. where PoP;represents the projected vector of PoP on I',and qoq denotes the projected vector of QoQ on lo the imaging plane with approximated cuboid structure is sufficient to meet the require- focal length f if (Qo.Qs)are given (see Fig.3). ment of accuracy of viewpoint change. We should rotate the camera around 0o(0,0.z)if (Qo.....Qs} are known.In this case,the new viewpoint is 0o.We have 3.1.A standard case IOoFI=1000ol.Let lo passing through Oo be an auxiliary plane 3.1.1.Cuboid reconstruction whose normal is 0o0o.Let PooPo be the projected vector of For ease of exposition,it is initially assumed that we can find, QoQ on Io.we get on the image plane,the projections of a vertical edge and two pairs of parallel edges with two interaction points on the vertical edge of a standard cuboid structure,as shown in Fig.2.Without 9o=OaFPaoPa (3) loss of generality,we assume that the imaging plane is placed It is obvious that (0o'pi)and (OoOoPo)are similar triangles. along with the XY plane of the world coordinate system with its center at the world origin.The camera is stationed at Z-axis with A(poopi)and A(PoooPo)are similar triangles.We get center of projection F(0.0.f).fis the focal length of the camera.Let [Po....,Ps}and (po.....ps}denote six key vertexes on the latent PoP O'P 00 f (4) cuboid facing the camera and their projections on the image, PQoPQ1 OoPQ OQOo =0o万 respectively.(Po.....Ps)are specified by user on the image.Po and P are two corners of the cuboid shared by two perpendicular Combining(3)and (4).we get pop=gog1. planes P1(P1,Po,P4,Ps)and P2(Po,P1,P2,P3)on the three-dimen- sional cuboid.Since PoP3//P1P2 in 3D space,the extensions of Pop3 3.2.A more complex case and pipz will meet at a vanishing point c1.except for the special case PoP3//piP2.Similarly,the extensions of piPs and pop meet at In some images we cannot find,on the projected vertical edge, another one c2.Imagine that PoP3 and P1P2 meet at a point at two interaction points of the projected parallel edges.Fig.4 shows infinity whose projection on the imaging plane is c.The extension such an example.To tackle this issue,we first compute the of Fc will meet with PoP3 and P P2 at this point as well.We thus vanishing points of the projected parallel line segments.Two have FC1//PoP3.FC1//P1P2.and Fc1//P1.Similarly,Fc2 is parallel to auxiliary lines,shown as dotted segments in Fig.4,which pass P2,and Fc is perpendicular to Fc2 with which fcan be obtained easily. through the corresponding vanishing points can be drawn.This It is still impossible to recover the accurate geometry and case is then converted to the standard case we have described position of the 3D cuboid structure without any other prior previously.We then compute equations of the two perpendicular knowledge.By making reasonable assumptions,we wish to gen- erate an approximation of the structure which is exactly the same as the accurate one without considering the scale difference.Given any new viewpoint,we show that the re-projection of this approximated cuboid structure is sufficient to meet the require- ment of accuracy of viewpoint changes. Considering that IFpol is proportional to IFPol,coordinate of Po 00 can be obtained by setting the ratio of IFpol to IFPol to a constant. We set it to 1 in our experiments.P can be represented by parametric coordinate with F and p.We then compute (P1.....Ps) 00 by exploiting the geometric relationships PoP1 L (PoP3,PoP4,P1P2,P1Ps),PoP4//PsP1,PoP3//P1P2.(1) 3.1.2.Analysis of accuracy Given a new viewpoint,we assume that focal length f of the camera remains fixed,since zooming in and out can be easily imitated by upsampling and downsampling the image.Without loss of generality,center of the scene is placed at(0,0,zP)with Zpo the z-coordinate of Po.Recall that we set the ratio of IFpol to IFPol to 1.Therefore,center of the scene is (0,0,0)which is in accordance with the world center O.Let us denote the new viewpoint by o'(f sin o cos 6.f sin o sine.f cos o)where (0.) Fig.3.Projection of PoPi on the new imaging plane I'is identical to the projected is the polar angle and azimuthal angle in spherical coordinate QoQ on lo by eliminating translation
approximated cuboid structure is sufficient to meet the requirement of accuracy of viewpoint change. 3.1. A standard case 3.1.1. Cuboid reconstruction For ease of exposition, it is initially assumed that we can find, on the image plane, the projections of a vertical edge and two pairs of parallel edges with two interaction points on the vertical edge of a standard cuboid structure, as shown in Fig. 2. Without loss of generality, we assume that the imaging plane is placed along with the XY plane of the world coordinate system with its center at the world origin. The camera is stationed at Z-axis with center of projection Fð0; 0; fÞ. f is the focal length of the camera. Let fP0; …; P5g and fp0; …; p5g denote six key vertexes on the latent cuboid facing the camera and their projections on the image, respectively. fp0; …; p5g are specified by user on the image. P0 and P1 are two corners of the cuboid shared by two perpendicular planes P1ðP1; P0; P4; P5Þ and P2ðP0; P1; P2; P3Þ on the three-dimensional cuboid. Since P0P3==P1P2 in 3D space, the extensions of p0p3 and p1p2 will meet at a vanishing point c1, except for the special case p0p3==p1p2. Similarly, the extensions of p1p5 and p0p4 meet at another one c2. Imagine that P0P3 and P1P2 meet at a point at infinity whose projection on the imaging plane is c1. The extension of Fc1 will meet with P0P3 and P1P2 at this point as well. We thus have Fc1==P0P3, Fc1==P1P2, and Fc1==P1. Similarly, Fc2 is parallel to P2, and Fc1 is perpendicular to Fc2 with which f can be obtained easily. It is still impossible to recover the accurate geometry and position of the 3D cuboid structure without any other prior knowledge. By making reasonable assumptions, we wish to generate an approximation of the structure which is exactly the same as the accurate one without considering the scale difference. Given any new viewpoint, we show that the re-projection of this approximated cuboid structure is sufficient to meet the requirement of accuracy of viewpoint changes. Considering that jFp0j is proportional to jFP0j, coordinate of P0 can be obtained by setting the ratio of jFp0j to jFP0j to a constant. We set it to 1 in our experiments. Pi can be represented by parametric coordinate with F and pi. We then compute fP1; …; P5g by exploiting the geometric relationships P0P1 ? fP0P3; P0P4; P1P2; P1P5g; P0P4==P5P1; P0P3==P1P2: ð1Þ 3.1.2. Analysis of accuracy Given a new viewpoint, we assume that focal length f of the camera remains fixed, since zooming in and out can be easily imitated by upsampling and downsampling the image. Without loss of generality, center of the scene is placed at ð0; 0; zP0 Þ with zP0 the z-coordinate of P0. Recall that we set the ratio of jFp0j to jFP0j to 1. Therefore, center of the scene is ð0; 0; 0Þ which is in accordance with the world center O. Let us denote the new viewpoint by O′ðf sin φ cos θ; f sin φ sin θ; f cos φÞ where ðθ; φÞ is the polar angle and azimuthal angle in spherical coordinate system. The new imaging plane passing through the world center O is I′ whose normal vector is OO′ !. Let fQ0; …;Q5g denote the real coordinates of the 3D cuboid, corresponding to the projected points fp0; …; p5g on the input image. We prove that the 2D projections of fP0; …; P5g on the new imaging plane I′ is identical to the projections of fQ0; …;Q5g, by eliminating translation. To achieve this, we only need to prove p′ 0p′ 1 ! ¼ q′ 0q′ 1 !; ð2Þ where p′ 0p′ 1 ! represents the projected vector of P0P1 ! on I′, and q′ 0q′ 1 ! denotes the projected vector of Q0Q1 ! on I ′ Q the imaging plane with focal length f if fQ0; …;Q5g are given (see Fig. 3). We should rotate the camera around OQ ð0; 0; zQ0 Þ if fQ0; …;Q5g are known. In this case, the new viewpoint is O′ Q . We have jOQ Fj¼jOQO′ Q j. Let I ′ Q passing through OQ be an auxiliary plane whose normal is OQO′ Q !. Let p′ Q0p′ Q1 ! be the projected vector of Q0Q1 ! on I ′ Q , we get q′ 0q′ 1 ! ¼ f jOQ Fj p′ Q0p′ Q1 !: ð3Þ It is obvious that▵ðOO′p′ 1Þ and▵ðOQO′ Q p′ Q1Þ are similar triangles. ▵ðp′ 0O′p′ 1Þ and▵ðp′ Q0O′ Q p′ Q1Þ are similar triangles. We get p′ 0p′ 1 ! p′ Q0p′ Q1 ! ¼ O′p′ 1 ! O′ Q p′ Q1 ! ¼ OO′ ! OQO′ Q ! ¼ f jOQ Fj : ð4Þ Combining (3) and (4), we get p′ 0p′ 1 ! ¼ q′ 0q′ 1 !. 3.2. A more complex case In some images we cannot find, on the projected vertical edge, two interaction points of the projected parallel edges. Fig. 4 shows such an example. To tackle this issue, we first compute the vanishing points of the projected parallel line segments. Two auxiliary lines, shown as dotted segments in Fig. 4, which pass through the corresponding vanishing points can be drawn. This case is then converted to the standard case we have described previously. We then compute equations of the two perpendicular p0 1p 2p 3p 4 p 5 p c2 c1 P0 1 2P 3 P P4 P5 P Fig. 2. Left: a standard case of the projection of a cuboid structure. Note that, jp0p4j¼jp1p5j and jp0p3j¼jp1p2j are not required. p4p5 and p2p3 are not necessarily the projections of two vertical edges of the cuboid. Right: the 3D cuboid structure. F Q0 Q1 P1 P0 O ' 0 p ' 1p ' 0q ' 1q OQ ' OQ ' Q0 p ' Q1 p X Y Z I O' ' I' Q I' q Fig. 3. Projection of P0P1 on the new imaging plane I′ is identical to the projected Q0Q1 on I ′ Q by eliminating translation. 176 Y. Guo et al. / Computers & Graphics 38 (2014) 174–182
Y.Guo et aL.Computers Graphics 38(2014)174-182 177 planes with which spatial coordinates of the eight endpoints of the re-projected cuboid structure and other important visual features. originally specified line segments can be easily obtained. This is formulated as a mesh deformation problem which tries to find a target mesh M'=(V',E',F')that has the same topology as M. M'is solved for by optimizing the energy function integrating 4.Cuboid-guided image warp mesh deformation and other constraints Cuboid constraint:Given a novel viewpoint,target positions of Given a new viewpoint,the cuboid structure is projected onto mesh vertices (ve.....vk)on the cuboid structure are determined by perspective projection of this structure on the new imaging the new imaging plane.The new image is rendered by making the rest image region deform in accordance with the transformation of plane.This is treated as the cuboid constraint Fc(va.....vk)=0. It is a hard constraint in our system.That is to say,to construct the the cuboid structure.We use a mesh representation to realize mesh deformation function coordinates of those mesh vertices on image deformation as shown in Fig.5.Generating a mesh for an the cuboid structure are computed by the projection in advance. image for the tasks of image resizing and manipulation has been discussed in [13,14,1,22,231.Unlike the quad mesh employed by They remain unchanged during optimization. Shape constraint:To ensure a globally smooth warp,we most previous methods,we use triangular mesh to represent the input image.An advantage of the triangular mesh over quad mesh formulate shape deformation energy of the mesh in terms of conformality.Producing conformal maps in the least squares sense is that cuboid-structured region-of-interest (ROl)can be repre- sented compactly by the meshes with moderate density.since it for automatic texture atlas generation has been discussed in [24] A similar shape preservation term for warping quad mesh is used may have slant and irregular borders.Furthermore,slant edge in [1].We consider the map:M (x,y)(x',y').M is conformal if it structures can be approximated accurately by triangle edges. satisfies the Cauchy-Riemann equations, enabling easy preservation of them during image warp. We use constrained Delaunay triangulation to create a content- +4 ax =0 (5) aware mesh representation.Points are first evenly sampled from image borders,the cuboid structure,and strong edges detected where M is rewritten using complex numbers,i.e.M=x'+iy'. using Hough transform,and their connectivity are constraints for Obviously,the Jacobian matrix should be of the form triangulation.To keep uniformity of point density,we detect some corners and if necessary further add some auxiliary ones,since 「ab1 nearly uniform point density normally facilitates mesh processing. J=(b -a (6) We represent the triangular mesh as M=(V.E.T)with vertices We now consider the restriction of M to a source triangle V,edges E,and triangles T.V=[vo.v1.....Vn]with vi=v(Xi.y)ER2 T(vi.vj.vk)in the input image.Its counterpart to be solved is denote initial vertex positions. denoted by T'(v;,vi,v).In general,the affine transformation from For a new viewpoint,the cuboid structure is re-projected using T to T'can be expressed as the approximated geometry.With this re-projected cuboid struc- ture,we render the new image with high visual quality by making A=TT-1. (7) those important image structures be consistent with the projec- We rewrite the above equation using the homogeneous coordi- tion.Since geometry of the whole image is inaccessible,we seek to nates,and let T-be reduce possible visual artifacts caused by the inconsistency of the Ta bi di T-1 a2 b2 d2 (8) a3 b3 d3 From (6)and (8).the map is conformal if the following equations hold: ET=a1Xi+a2xj+a3Xk+(b1y;+b2yj+b3Vg)=0. ET=b1Xj+b2xj+b3xx-(a1Yi+a2yj+a3Vk)=0. (9) We define the total conformality energy by summing up the Fig.4.A more complex case.User specified solid line segments which represent individual energy terms on each T. the projections of two pairs of parallel edges do not intersect on the projected vertical edge.To handle this case,we draw auxiliary lines and convert it to the Es=+民) (10) standard case. Fig.5.Workflow of cuboid-guided image warp.The triangular mesh M is shown in the 2nd image where the blue and green line segments denote the cuboid structure and lines detected or specified by the user.The M'shown in the 3rd image is computed by solving a cuboid-guided mesh deformation problem.The final result with regular borders is generated by cropping the result with non-regular boundaries.(For interpretation of the references to color in this figure caption,the reader is referred to the web version of this article.)
planes with which spatial coordinates of the eight endpoints of the originally specified line segments can be easily obtained. 4. Cuboid-guided image warp Given a new viewpoint, the cuboid structure is projected onto the new imaging plane. The new image is rendered by making the rest image region deform in accordance with the transformation of the cuboid structure. We use a mesh representation to realize image deformation as shown in Fig. 5. Generating a mesh for an image for the tasks of image resizing and manipulation has been discussed in [13,14,1,22,23]. Unlike the quad mesh employed by most previous methods, we use triangular mesh to represent the input image. An advantage of the triangular mesh over quad mesh is that cuboid-structured region-of-interest (ROI) can be represented compactly by the meshes with moderate density, since it may have slant and irregular borders. Furthermore, slant edge structures can be approximated accurately by triangle edges, enabling easy preservation of them during image warp. We use constrained Delaunay triangulation to create a contentaware mesh representation. Points are first evenly sampled from image borders, the cuboid structure, and strong edges detected using Hough transform, and their connectivity are constraints for triangulation. To keep uniformity of point density, we detect some corners and if necessary further add some auxiliary ones, since nearly uniform point density normally facilitates mesh processing. We represent the triangular mesh as M ¼ ðV; E; TÞ with vertices V, edges E, and triangles T. V ¼ ½v0; v1; …; vn with vi ¼ vðxi; yiÞAR2 denote initial vertex positions. For a new viewpoint, the cuboid structure is re-projected using the approximated geometry. With this re-projected cuboid structure, we render the new image with high visual quality by making those important image structures be consistent with the projection. Since geometry of the whole image is inaccessible, we seek to reduce possible visual artifacts caused by the inconsistency of the re-projected cuboid structure and other important visual features. This is formulated as a mesh deformation problem which tries to find a target mesh M′ ¼ ðV′; E′; F′Þ that has the same topology as M. M′ is solved for by optimizing the energy function integrating mesh deformation and other constraints. Cuboid constraint: Given a novel viewpoint, target positions of mesh vertices fvc1; …; vckg on the cuboid structure are determined by perspective projection of this structure on the new imaging plane. This is treated as the cuboid constraint FCðv′ c1; …; v′ ckÞ ¼ 0. It is a hard constraint in our system. That is to say, to construct the mesh deformation function coordinates of those mesh vertices on the cuboid structure are computed by the projection in advance. They remain unchanged during optimization. Shape constraint: To ensure a globally smooth warp, we formulate shape deformation energy of the mesh in terms of conformality. Producing conformal maps in the least squares sense for automatic texture atlas generation has been discussed in [24]. A similar shape preservation term for warping quad mesh is used in [1]. We consider the map: M : ðx; yÞ↦ðx′; y′Þ. M is conformal if it satisfies the Cauchy–Riemann equations, ∂M ∂x þi ∂M ∂y ¼ 0; ð5Þ where M is rewritten using complex numbers, i.e. M ¼ x′þiy′. Obviously, the Jacobian matrix should be of the form J ¼ a b b a : ð6Þ We now consider the restriction of M to a source triangle Tðvi; vj; vkÞ in the input image. Its counterpart to be solved is denoted by T′ðv′ i ; v′ j ; v′ kÞ. In general, the affine transformation from T to T′ can be expressed as A ¼ T′T 1: ð7Þ We rewrite the above equation using the homogeneous coordinates, and let T 1 be T 1 ¼ a1 b1 d1 a2 b2 d2 a3 b3 d3 2 6 4 3 7 5: ð8Þ From (6) and (8), the map is conformal if the following equations hold: ETJ1 ¼ a1x′ i þa2x′ j þa3x′ kþ ðb1y′ i þb2y′ j þb3y′ kÞ ¼ 0; ETJ2 ¼ b1x′ i þb2x′ j þb3x′ k ða1y′ i þa2y′ j þa3y′ kÞ ¼ 0: ð9Þ We define the total conformality energy by summing up the individual energy terms on each T, ES ¼ ∑ T ðE2 TJ1 þE2 TJ2 Þ: ð10Þ p0 1p 2p 3 p 4p 5 p Fig. 4. A more complex case. User specified solid line segments which represent the projections of two pairs of parallel edges do not intersect on the projected vertical edge. To handle this case, we draw auxiliary lines and convert it to the standard case. Fig. 5. Workflow of cuboid-guided image warp. The triangular mesh M is shown in the 2nd image where the blue and green line segments denote the cuboid structure and lines detected or specified by the user. The M′ shown in the 3rd image is computed by solving a cuboid-guided mesh deformation problem. The final result with regular borders is generated by cropping the result with non-regular boundaries. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) Y. Guo et al. / Computers & Graphics 38 (2014) 174–182 177
178 Y.Guo et al.Computers Graphics 38(2014)174-182 Fig.6.Viewpoint adjustment for the images with standard cuboid structures.The 1st row is the inputs with cuboid structures marked by blue segments and line constraints by green ones.The 2nd row shows corresponding results with constrained lines,and the results without constraints are given in the 3rd row.(For interpretation of the references to color in this figure caption,the reader is referred to the web version of this article.) Line constraint:Strong edges such as straight lines are impor- Border constraint:Physically each side of image borders should tant visual features.They are vital clues for understanding image be constrained to remain straight.The energy term Es of this content,and should be maintained as-rigid-as possible.We detect constraint is defined similarly to the line constraint. those line segments using Hough transform.Users can also specify Total energy:In summary,combining all the above energy some additional curved edges.Points sampled from the edges and terms,we wish to minimize the following energy function: their connectivity yielded from the corresponding edges are fed arg max AsEs+ALEL+AvEy+4BEB, into the triangulation process beforehand.Let (vi,vi,vk)denote a triplet of vertices on a straight line.To preserve the shape of strong (14) edges,we preserve the length ratio ry of vivk to vivj.and the angle s.t.Fc(Vel...Vek)=0. 0 formed by vivj and vivg in each triplet locally [15].We express where is.i,Ay,and ig are the coefficients weighting different the energy term regarding line constraint as energy terms.Straight and important curved lines as visually E=∑Iwk-9-·R·(y-2, prominent features should be kept.To mimic hard constraints, (11) AL.the weight of straight line constraint,is often set to a bigger value compared with the weight of shape constraint.ly,the with weight of vertical and horizontal line constraint,can be set by COS j the user with respect to image content.Obviously.Ev can be Rj= sin 0j cos 0j (12) enforced as a hard constraint with a bigger iv.It is useful to straighten up those slanted man-made structures in an input Besides the lines detected by Hough transform or specified by image to improve its perceptual quality.In practice,to deal with the user,we use line constraint to preserve the shapes of those possible confliction between border constraint and constraint on salient objects that lie across two different faces of the latent the cuboid structure near image border,Eg often takes effect as a cuboid or the cuboid and the rest image region.A line segment is soft constraint by setting ig to a small value.For all the results in specified on each of such objects,and is fed into E for avoiding this paper we use weights of is=1.=100.v and is are set to heavy distortion. 100 as well if Ey and Eg are taken as hard constraints,and are set to Vertical and horizontal line constraint:Photos taken by amateur 10 for soft constraints. photographers often contain slanted vertical or horizontal lines In is noted that we do not impose the constraint for avoiding due to improper camera rotations,for instance slanted buildings, mesh flip-over in the above energy function.In all our experi- windows,and picture frames.This may cause visual discomfort ments,mesh flipping is seldom encountered.We check it after the when we look at such photos.Our system supports automatic deformed mesh is obtained,and once flipping is detected,we correction of slanted line segments when viewpoint is changed correct them locally. and image is warped.For the slanted cuboid structure,the idea is The energy function is a quadratic function of V.The solution to re-project it properly.The new viewpoint is computed auto- can be obtained efficiently by solving a sparse linear system. matically by letting the projected edges to be horizontal or vertical.While for the rest slanted line segments,this is treated as the vertical and horizontal line constraint.Let I denote a slanted 5.Experiments line segment which can be detected automatically or specified by the user,and (vn,...,vim}represent vertices on I.The vertical line We have implemented our view manipulation algorithm on a constraint is expressed as PC with Intel Core i3-2100 CPU at 3.1 GHz,and experimented with our technique on a variety of images.Some representative results Ev =( (13) are shown in Figs.6-10 and 12. Figs.6 and 7 demonstrate the results on several images of man- The horizontal line constraint is defined similarly. made buildings.The first and third rows are the input images
Line constraint: Strong edges such as straight lines are important visual features. They are vital clues for understanding image content, and should be maintained as-rigid-as possible. We detect those line segments using Hough transform. Users can also specify some additional curved edges. Points sampled from the edges and their connectivity yielded from the corresponding edges are fed into the triangulation process beforehand. Let 〈vi; vj; vk〉 denote a triplet of vertices on a straight line. To preserve the shape of strong edges, we preserve the length ratio rj of vjvk to vivj, and the angle θj formed by vivj and vjvk in each triplet locally [15]. We express the energy term regarding line constraint as EL ¼ ∑ 〈vi;vj;vk〉 ‖ðv′ k v′ j Þrj Rj ðv′ j v′ i Þ‖2; ð11Þ with Rj ¼ cos θj sin θj sin θj cos θj !: ð12Þ Besides the lines detected by Hough transform or specified by the user, we use line constraint to preserve the shapes of those salient objects that lie across two different faces of the latent cuboid or the cuboid and the rest image region. A line segment is specified on each of such objects, and is fed into EL for avoiding heavy distortion. Vertical and horizontal line constraint: Photos taken by amateur photographers often contain slanted vertical or horizontal lines due to improper camera rotations, for instance slanted buildings, windows, and picture frames. This may cause visual discomfort when we look at such photos. Our system supports automatic correction of slanted line segments when viewpoint is changed and image is warped. For the slanted cuboid structure, the idea is to re-project it properly. The new viewpoint is computed automatically by letting the projected edges to be horizontal or vertical. While for the rest slanted line segments, this is treated as the vertical and horizontal line constraint. Let l denote a slanted line segment which can be detected automatically or specified by the user, and fvl1;…; vlmg represent vertices on l. The vertical line constraint is expressed as EV ¼ ∑ l ∑ lm i ¼ 1 ðx′ li x′ l1Þ 2: ð13Þ The horizontal line constraint is defined similarly. Border constraint: Physically each side of image borders should be constrained to remain straight. The energy term EB of this constraint is defined similarly to the line constraint. Total energy: In summary, combining all the above energy terms, we wish to minimize the following energy function: arg max V′ λSES þλLEL þλV EV þλBEB; s:t: FCðv′ c1; …; v′ ckÞ ¼ 0: ð14Þ where λS, λL, λV , and λB are the coefficients weighting different energy terms. Straight and important curved lines as visually prominent features should be kept. To mimic hard constraints, λL, the weight of straight line constraint, is often set to a bigger value compared with the weight of shape constraint. λV , the weight of vertical and horizontal line constraint, can be set by the user with respect to image content. Obviously, EV can be enforced as a hard constraint with a bigger λV . It is useful to straighten up those slanted man-made structures in an input image to improve its perceptual quality. In practice, to deal with possible confliction between border constraint and constraint on the cuboid structure near image border, EB often takes effect as a soft constraint by setting λB to a small value. For all the results in this paper we use weights of λS ¼ 1, λL ¼ 100. λV and λB are set to 100 as well if EV and EB are taken as hard constraints, and are set to 10 for soft constraints. In is noted that we do not impose the constraint for avoiding mesh flip-over in the above energy function. In all our experiments, mesh flipping is seldom encountered. We check it after the deformed mesh is obtained, and once flipping is detected, we correct them locally. The energy function is a quadratic function of V′. The solution can be obtained efficiently by solving a sparse linear system. 5. Experiments We have implemented our view manipulation algorithm on a PC with Intel Core i3-2100 CPU at 3.1 GHz, and experimented with our technique on a variety of images. Some representative results are shown in Figs. 6–10 and 12. Figs. 6 and 7 demonstrate the results on several images of manmade buildings. The first and third rows are the input images. Fig. 6. Viewpoint adjustment for the images with standard cuboid structures. The 1st row is the inputs with cuboid structures marked by blue segments and line constraints by green ones. The 2nd row shows corresponding results with constrained lines, and the results without constraints are given in the 3rd row. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this article.) 178 Y. Guo et al. / Computers & Graphics 38 (2014) 174–182