Adv Search
Home | Accepted | Article In Press | Current Issue | Archive | Special Issues | Collections | Featured Articles | Statistics

2019,  1 (4):   341 - 385   Published Date:2019-8-20

DOI: 10.1016/j.vrih.2019.01.001
1 Introduction2 Proposal and development of 3D viewpoint theory 2.1 The proposal of canonical views 2.2 Methodological overview of the viewpoints 2.3 Classification of viewpoint research methods 3 Algorithm classification and description 3.1 Methods based on geometric information 3.1.1 "General Position" theory 3.1.2 Methods based on aesthetic information (1) Viewpoint research based on artistic composition(2) Viewpoint scoring method based on projected area 3.1.3 View selection method based on octree 3.1.4 Viewpoint quality evaluation based on information entropy (1) Method based on viewpoint entropy(2) Method based on curvature entropy(3) Method based on depth entropy and silhouette entropy(4) Other information entropy methods 3.1.5 Method based on Kullback-Leibler distances 3.2 Methods based on visual features 3.2.1 Viewpoint evaluation based on visual perception theory (1) Visual attention model(2) Visual perception model based on visual preference 3.2.2 Viewpoint quality evaluation based on curvature 3.2.3 Viewpoint quality evaluation based on viewpoint complexity (1) View complexity calculation based on histogram(2) Viewpoint complexity calculation based on information theory 3.2.4 Viewpoint selection method based on mesh saliency (1) Review of mesh saliency methods(2) Mesh saliency computation(3) The implicit relationship between visual attention and saliency(4) Mesh saliency calculation based on jensen-shannon divergence(5) Viewpoint score based on saliency segmentation(6) Saliency detection method based on self-diffusion function(7) Saliency detection of viewpoint correlation(8) Random walk algorithm(9) Mesh saliency method based on deep learning(10) Visual saliency study based on solid objects 3.2.5 Viewpoint selection method based on viewpoint mutual information 3.2.6 Viewpoint mutual information method based on polygon information and mesh saliency 3.3 Semantic-based viewpoint selection method 3.3.1 Semantic-based mesh segmentation (1) Segmentation method based on fitting primitive body(2) Segmentation based on reeb graph(3) Semantic method based on morphological features 3.3.2 Viewpoint selection based on semantics feature 3.3.3 Semantic metrics based on aesthetic characteristics 4 Application of viewpoint evaluation4.1 Radiosity calculation and global illumination optimization based on viewpoint evaluation 4.1.1 Radiosity sampling based on monte carlo method 4.1.2 Improvement of monte carlo radiosity based on viewpoint complexity 4.1.3 Raytrace optimization based on viewpoint quality analysis 4.2 Modeling and rendering based on view quality analysis 4.3 Application of viewpoint complexity in molecular visualization 4.4 Model simplify based on mesh saliency 4.4.1 Mesh simplification based on visual saliency 4.4.2 Mesh segmentation based on mesh saliency 5 Selection and application of goodness viewpoints set 5.1 Virtual camera guidance based on goodness viewpoints set 5.1.1 Camera guidance based on trajectory ball control 5.1.2 Camera guidance method based on magnets 5.1.3 Camera path planning based on greedy algorithm 5.2 Viewpoint selection based on deep learning 5.2.1 Viewpoint selection based on region of interest 1) The region of interest observed by viewpoints set should been reached at least 60% of the total interesting areas of mesh2) There are no new viewpoint p could enlarged the region of interest 5.2.2 Estimation of camera viewpoint position based on CNN 5.2.3 Large-scale scene reconstruction based on multiple viewpoints 6 Summary and prospect


The research on 3D scene viewpoints has been a frontier problem in computer graphics and virtual reality technology. In a pioneering study, it had been extensively used in virtual scene understanding, image-based modeling, and visualization computing. With the development of computer graphics and the human-computer interaction, the viewpoint evaluation becomes more significant for the comprehensive understanding of complex scenes. The high-quality viewpoints could navigate observers to the region of interest, help subjects to seek the hidden relations of hierarchical structure, and improve the efficiency of virtual exploration. These studies later contributed to research such as robot vision, dynamic scene planning, virtual driving and artificial intelligence navigation.The introduction of visual perception had The introduction of visual perception had contributed to the inspiration of viewpoints research, and the combination with machine learning made significant progress in the viewpoints selection. The viewpoints research also has been significant in the optimization of global lighting, visualization calculation, 3D supervising rendering, and reconstruction of a virtual scene. Additionally, it has a huge potential in novel fields such as 3D model retrieval, virtual tactile analysis, human visual perception research, salient point calculation, ray tracing optimization, molecular visualization, and intelligent scene computing.


1 Introduction
Research of the 3D viewpoint made great progress with the development of graphics technology. In the past years, many researchers used the viewpoints complexity analysis to improve efficiency, especially in virtual scene understanding, intelligent roaming, global illumination, and radiosity optimization. The 3D viewpoint also promoted the development in visual performance such as 3D rendering, scene modeling, and virtual camera controlling.
In virtual scene understanding, 3D viewpoint research could help the observer to intelligently analyze the virtual scene, and generated goodness viewpoints automatically to improve the efficiency of the 3D model retrieval. The 3D viewpoint selection method had a important application in virtual camera controlling. In this field, the camera path could be planed intelligently based on goodness viewpoints and drive the participants’ attention to optimal positions. The research of the 3D viewpoint has a potential application value in many research fields, which we will describe comprehensively below.
2 Proposal and development of 3D viewpoint theory
2.1 The proposal of canonical views
The concept of a "good viewpoint" has been repeatedly mentioned in computer graphics and virtual reality, but still has no unified definition. Palmer first proposed a concept of goodness viewpoints[1]. Koenderink considered viewpoint quality as a key information to scene understanding[2,3].
Blanz pioneering introduced a concept of canonical viewpoints based on Palmer's work[4]. He pointed out that the characteristics of canonical viewpoints mainly covered the aspects of legibility, aesthetics, and application fields. Blanz proposed three factors in the aesthetic quality of the viewpoint: the saliency features to the observer, the influence of the viewpoint stability when slightly changing viewport, and the occlusion of the characteristic.
2.2 Methodological overview of the viewpoints
In the early development stage, researchers often evaluated viewpoints based on geometric methods. Kamada and Kawai adopted a method by analyzing the number of minimum degenerate faces in orthogonal projection[5]. Roberts and Marshall took a measure to obtain goodness viewpoints based on the morphological graph method[6]. Plemenos and Benayada improved Kamada's method and proposed a good view measurement based on projected area[7]. Fleishman proposed a new method to locate virtual camera based on scene visibility and viewpoint quality[8]. Barral designed a algorithm to guide camera in exploring 3D scene automatically based on optimal viewpoints[9].
The above method could obtain scene geometry information adequately, but brought continuous sampling and heavy computation. The introduction of information entropy and visual perception brought some advantages, then some influential methods had been established, such as the viewpoint entropy method proposed by Vázquez[10], the silhouette stability method proposed by Gooch[11], the viewpoint mutual information theory proposed by Feixas[12], the shape analysis algorithm based on information theory proposed by Page[13], and the graph-based image segmentation method for viewpoint research proposed by Felzenszwalb[14].
With the development of information theory and visual perception theory, many new methods sprung out in 2005. Sbert proposed a relative entropy method[15], Katz and Leifman proposed a mesh segmentation algorithm based on feature point extraction[16], Lee proposed the mesh saliency method[17], and Polonsky proposed an inductive method in viewpoint selection[18].
The mesh saliency and mesh segmentation technology also deeply affected the viewpoint calculation. Attene conducted a comparative study on the grid segmentation algorithm[19], Shilane analyzed the special region of 3d surface[20]. Fu studied the position and posture of 3D objects related to viewpoints[21], Vieira realized a 3D scene understanding method through intelligent viewpoint learning[22], Laga adopted a semantic-driven method to automatically select the best viewpoint of a 3D object[23], Biedl proposed an effective viewpoint selection method by using convex polyhedral silhouette[24], Serin proposed Kullback Leibler (vSKL) distance method to measure viewpoint significance[25].
After 2011, mesh saliency and interesting region gradually aroused attention of academics. Leifman proposed a viewpoint selection method based on interesting region[26], Serin used representational images to selected optimal viewpoints[27], Han studied a viewpoint selection method based on significance segmentation[28]. Xing proposed heuristic algorithm of surface reconstruction based on visual perception[29], and such methods opened a new way for viewpoint research.
With the development of deep learning, after 2014, viewpoints research also tend to use neural networks. Su used 3D models to train the convolutional neural network (CNN) for the viewpoint estimation[30,31], Francisco Massa designed a multi-task CNN to realize viewpoint estimation[32]. At the same time, the interesting region of mesh also attracted attention of academics. George Leifman selected viewpoint sets based on visual interesting region of the mesh[33], Jeong used a semi-regular grid to detect a saliency region[34]. Multi-viewpoints also attracted attention. Torsten used multi-pair virtual cameras to achieve a recursive estimation in 3D scene reconstruction[35]. Rahul Sawhney designed a operation for correlation, retrieval and reconstruction in large-scale 3D scenes based on multiple viewpoints[36].
2.3 Classification of viewpoint research methods
Secord et al. reviewed the viewpoint theory before 2012 and clarified the relationship between viewpoints and views[37]. According to the theory of canonical viewpoints, Secord divided commonly view selection methods into five types, mainly including: based on area[7,10,25], based on silhouette[22,38], based on depth[4,39], based on surface curvature[13,17,18,26,40], and based on the semantic[4,19,21,23].
3 Algorithm classification and description
3.1 Methods based on geometric information
Such methods adopted geometric area, projected area, silhouette, curvature, and other measurement for the viewpoint evaluation. They were simple and efficient, but usually ignored structural information of the 3D model, and inconformity with human visual perception habits.
3.1.1 "General Position" theory
Kamada et al. proposed a concept of the "general position", which depicts the good viewpoint position[5]. In the standpoint, the "canonical view" problem had been converted to the extrenal problem of projection direction vectors, and the evaluation function had been shown as follows:
G ( p ) m i n t T ( p t )
L denotes the set of all line segments of the 3D object, T represents the set of unit normal vectors in a plane of L, t∈T, p∈L, p and t were unit vectors, |p·t| denotes a cosine value of the angle between p and t, and G(p) denotes a cosine value of maximum angle between p and t. The purpose of the "general position" problem was to search the extremum of p and make G(p) reach the extremum[5] (Figure1):
m a x | p | = 1 m i n t T ( | p t | )
Kamada applied this method in visualization of cyclohexane (C6H12) molecules, and obtained the "general viewpoint"[5]. This is a classic method in molecular visualization (Figure 2). Representative work was Gooch’s method that used a silhouette to produce the aesthetic layout[11].
3.1.2 Methods based on aesthetic information
(1) Viewpoint research based on artistic composition
Sander studied the golden section theory in detail, and established the basis[41]. Feiner used art composition rules for image production[42]. Kawai proposed a automated lighting layout method based on artistic composition[5]. Karp proposed a animation sequence generation method based on artistic composition[43]. Kowalski proposed a method to guided user to compose the creative work[44].
Gooch proposed a method based on artistic composition. The layout was determined by orthogonal projection, and silhouette was calculated by the violence algorithm (Figure3). The composition template considered gray pixels as a "magnet" charac-teristic. The gray points drive the layout function by using downhill simplex method. This inspired such methods based on artistic composition to come forth.
(2) Viewpoint scoring method based on projected area
Low level view quality evaluation method: Low level geometric data included basic elements such as lines, planes, etc. Kamada and Kawai's "80-ties" method was the earliest literature regarding this method[5]. Plemenos and Benayada proposed a method based on projected area of visible surfaces[7]. Barral et al. modified the coefficients of Kamada method for optimizing the perspective projection[9]. Plemenos and Benayada evaluated the view quality by combining the projected area of visible patches[7] :
I p = i = 1 n P i p P i p + 1 n + i = 1 n P i p r
Here, I(p) denotes the given viewpoint p, and Pi (p) represents the polygons with index i obtained from viewpoint p; r denotes the resolution of the image; n denotes the number of total polygons, [·] was the upper limit function.
High-level viewpoint quality evaluation method: The low level viewpoint quality evaluation method was unaccommodated to a large scene. Sokolov pioneering proposed a high-level evaluation method in 2006[45]. It used the information of the object's bounding box to build a importance function q (w ), and Vw denoted the vertex set of object w :
q w = m a x u , v V w u x - v x + m a x u , v V w u y - v y + m a x u , v V w u z - v z
Sokolov then introduced a parameter to describe the predictability of a object: ρω : Ω R +, and calculated viewpoint visibility as following:
I ( p ) = ω Ω q ( ω ) ρ ω + 1 ρ ω + θ p , ω θ p ,   ω
Follow up, Sokolov and Plemenos proposed a high-level viewpoint evaluation method[46]: if the viewpoint had been determined by angle α and a pair of combination (s,d)∈S⊕D, where s denoted the camera position and d represented the viewpoint direction, the set of visible vertex from any viewpoint denoted as V α({s, d}), vertex vV was visible from the viewpoint s in view direction d. If the vector v−s existed in a visual cone with axis d and angle α, then there have:
V α ( { s ,   d } ) = v V ( s , v ) E , c o s α 2 d ( v - s ) d v - s
Select a set X of camera views from S⊕D, which had been composed of all possible camera positions and viewpoint directions. Here, the visibility of the 3D object in multiple camera views denoted as ω, its visible part defined as θα, X, ω , and the curvature of vertex v ∈ V denoted as C(v).
The total curvature of mesh V 1V, denoted as C(V 1) = ∑ v V 1C(v)C(V 1), with visibility θα, X, ω of the given object ω was equal to the curvature of a visible segmentation divided by the total curvature of surface:
θ α , X , ω = C ( V α ( X ) ω ) C ( ω ) ,V α ( X ) = x X V α ( x )
The visible evaluation method had been shown as follows:
I α ( X S D ) = ω Ω q ρ ω + 1 ρ ω + θ α , X , ω θ α , X , ω
This method was suitable for complex scenes.
3.1.3 View selection method based on octree
Colin proposed a viewpoints selection method based on octree[47]. Plemenos proposed an iterative algorithm for the automatic viewpoint selection based on Colin's method[7]. The scene had been placed in the center of a sphere, whose surface placed all possible observation viewpoints. Plemenos divided the sphere into eight spherical triangles (Figure 4a). Selected best spherical triangle, which determined best viewpoint orientation , then recursively subdivide the selected spherical triangle until closest to goodness viewpoint (Figure 4b).
Sokolov introduced the point-to-region way on the basis of the Plemenos' method for visibility calculation, and used projection region as an important calculation parameter, as follows:
I ( p ) = C F p f F p P f
Where F(p) denoted the set of visible surfaces of viewpoint p, C(F(p)) was the total curvature of the observable patches form p, and P(f) denoted the projected area of surface f. This method kept the invariance characteristics of topology when the scene changed, however the computation of this methodwas heavy.
(1) Proxy information method based on projected area
Gao’s method extracted proxy derived from orthogonal views as PVs (principal viewing), and adopted the PCA method to build bounding box of model for obtaining proxy information[48] (Figure 5).
It generated proxy information from six orientations of the bounding box and used PVs to evaluate the viewpoints. For simplify calculation, Gao built a evaluation function E(v) to evaluate viewpoint v :
I n f i = ( 1 - λ ) A i + λ N i
E ( v ) = i = 1   6 w ( v ,   n i ) I n f i
Here, Ai denotes the ratio of the model's orthogonal projected areas along the i-th main observation direction. Ni denotes the ratio of the number of visible patches from the i-th main observation direction (PV) to the total patch number; λ denotes the weight to adjust the proportion of Ai and Ni . Generally, λ= 0.5 and λ ∈ (0, 1, 0). w denotes the dot product of the vectors ni and v. If the dot product value had been less than 0, then w=0. This could generate proxy information in an approximate way, and avoid the degradation of observation quality.
3.1.4 Viewpoint quality evaluation based on information entropy
(1) Method based on viewpoint entropy
Vazquez and Feixas proposed a method based on the Shannon entropy, then called it a viewpoint entropy method[10]. Specifically, the projected areas were taken as a unit of information measurement by the entropy value:
H p ( X ) = - i = 0 N f A i A t l o g A i A t
Here, Nf denotes the number of patches of scene, Ai is the projected area of the number of patches i, At represents the total coverage areas of the viewpoints sphere, and A 0 denotes the projected area of the background in an open scene. In a closed scene, A 0=0. The proportion of projection areas was equated with Ai /At .
(2) Method based on curvature entropy
Page improved the viewpoint entropy method with the silhouette and curvature[13]. The curvature entropy depended on the depth information and observation details. However, it would produce undesirable results in some cases when there were excessive symmetric structures. As shown in Figure 6a, the higher curvature entropy region had been exhibited, but the viewpoint was worse to human vision.
(3) Method based on depth entropy and silhouette entropy
Vazquez extracted the depth entropy and silhouette entropy from the observation of 320 viewpoints. The higher entropy regions had shown warmer colors (Figure 7):
Vazquez optimized view direction by silhouette entropy and viewpoints positions with depth entropy,t hen obtained a more reasonable results (Figure 8).
The calculation of viewpoint quality was shown as below:
T I H p = N D e p t h H p × ( 1 - α ) + N S i l H p × α
In observation of viewpoint p, where NDepthHp denoted the normalized value of depth entropy, NSilHp was the normalized value of silhouette entropy, and α ∈(0,1). The method was independence of tessellation, and it is very suitable for mesh simplification.
(4) Other information entropy methods
Shannon entropy would lead to some unmanageable minimization problems, so Vazquez proposed a multi-scale method based on the Gull and Skilling entropy (as shown in formula (14)):
H w ( X ) = - l = 1   L k = 0   N l - i = 1   n p ( w ( l , k ) i ) l o g p ( w ( l , k ) i )
Where wl,k represents the wavelet transform coefficients, and L is the multi-scale level. This method might lead to distorted results if the tessellation is not uniform (Figure 9).
3.1.5 Method based on Kullback-Leibler distances
Sbert proposed a viewpoint selection algorithm based on the K-L Distance (Kullback-Leibler Distance) to seek the minimum representative viewpoints[15]. The VKL (View Kullback-Leibler) of a viewpoint v had been defined as follows:
D K L ( p ( Z | v ) ,   a ( Z ) ) = z Ζ p p ( z | v ) a ( z )
It denotes the value of the projection of z divided by total area as a(z). p(z|v) represents a conditional probability matrix defined by a normalized projected area of a polygon z in the sight of viewpoint v. The high-quality view selection should minimize VKL. Although this was efficient, when altering the object’s tessellation without adjusting topology, the results would be incorrect.
3.2 Methods based on visual features
The method based on visual characteristics mainly depends on visual attributes, such as silhouette, curvature, mesh saliency, and volume feature. These methods could efficiently measure the visual characteristics, but omitted important geometric information on the scene.
Feixas proposed a novel theory about object recognition[12]. He considered semantics as a more important attribute than geometric in visual perception. The theory inspired many other algorithms, such as: Mortara’s semantic segmentation method[19], Shilane’s viewpoints semantic method based on saliency region[20], Fu’s characteristic semantic method based on vertical direction[21]. Some novel algorithms proposed later, e.g., the semantic method based on interesting area proposed by Leifman[26], the viewpoints selection method based on speculation proposed by Sokolov[45], and Chen’s method based on Schelling point[49].
3.2.1 Viewpoint evaluation based on visual perception theory
Loken considered that human visual perception observed the scene as a whole, and not as a isolated component[50]. Kucerova researched the influence of human intuition in a virtual tour, and combined Gestalt psychology with visual perception theory to evaluate viewpoint quality[51].
Goldstein proposed three theories based on the visual perception mechanism, which included phenomenist, indirect realist, and direct realist[52]. Based on this, Coe proposed a visual organizational principle according to Gestaltisa psychology[53].
(1) Visual attention model
Based on the above theories, Itti proposed a famous theory based on the research about the visual behavior pattern and neuron architecture of early primates, which is known as the "center surrounding mechanism"[54]. This method realized the linear center wrapping operation[55]. Kucerova employed Itti’s method to realize mesh salient calculation in Matlab, and adopted watershed trans-formation to segment the salient map[51] (Figure10):
(2) Visual perception model based on visual preference
Polonsky originally proposed a concept of view descriptors, namely the standard specification of viewpoint attributes. These descriptors could combine into a unified measurement function[18]. Vieira trained support vector machines to classify view descriptors for selecting more suitable viewpoints[22]. Secord defined several descriptors according to view quality and combined them into a global measurement function[37]. Secord classified the attributes into five categories, included region descriptors, silhouette descriptors, depth descriptors, curvature descriptors, and semantic descriptors.
3.2.2 Viewpoint quality evaluation based on curvature
Barral and Noser analyzed the characteristic of visible surface by histogram[9,55]. On the basis of Barral’s method, Sokolov proposed a viewpoint quality estimation method based on total curvature[45]:
I ( p ) = v V ( p ) 2 π - α i α ( v ) α i f F ( p ) P
Where, F(p) denoted a set of observable patches from point p, P(f) denoted the projected area of surface f ,V(p) represented the set of visible vertices observed from point p, and α(v) denoted a set of adjacent angles of vertex v. It adopted the "point-to-point" way instead of the "point-to-region" method, and improved the efficiency of calculation (Figure 11).
3.2.3 Viewpoint quality evaluation based on viewpoint complexity
Plemenos proposed a calculation method of viewpoint complexity, which depended on the number, area, orientation, and the distance of visible patches[7].
(1) View complexity calculation based on histogram
Plemenos adopted a histogram method to calculate viewpoint complexity by assigning different colors to each patch. This approach could calculate the proportion of each color in the scene (Figure 12).
The complexity of the given viewpoint could be calculated by equation (3). In this way, Sokolov calculated the viewpoint quality by coloring patches with different color ID. The histogram provided the information about the number of display colors and the ratio of each color. The algorithm was shown as below:
I p = # v i s i b l e c o l o r s # u s e d c o l o r s + h i g h l i g h t e d a r e a s c r e e n r e s o l u t i o n
The first term in equation (17) represents the percentage of the visible polygon with color, and the second term denotes the percentage of the screen area.
(2) Viewpoint complexity calculation based on information theory
This kinds of methods obtained viewpoint complexity by calculating viewpoint entropy:
H ( S ,   P ) = - i = 0 N f A i A t l o g A i A t
Here p denotes the current view point, Nf is the number of patches in scene S , Ai represents the projected area of patch i, and At denotes the total areas covered by a virtual sphere with adequately distributed viewpoints.
3.2.4 Viewpoint selection method based on mesh saliency
Human visual attention deeply influences the subconscious and leads to different attention behavior. Koch, Ullmans considered that the saliency area could attract visual attention[56]. In 1998, Itti proposed an influential method to calculate salience based on a center-surround mechanism[54].
Early saliency calculation methods had been mainly applied in 2D images. Tsotsos selected visual saliency regions through attention analysis[57], and Milanese used non-linear relaxation method to seek visual attention cues[58]. Privitera proposed an automatic pre-recognition method to calculate interesting regions[59], Chen proposed a visual perception model to adjust the visual attention area[49]. Suh proposed a method for the automatic thumbnail clipping based on saliency[60]. DeCarlo proposed a stylization method of photos by extracting the interesting region[61]. Santella simplified the mesh and generated non-realistic rendering images based on saliency regions[62].
There were some influential achievements in mesh saliency, as follows: Lee proposed a classical method in saliency calculation based on center surrounding mechanism[17]. Yamauchi adopted silhouette features for seeking stable viewpoints[63]. Sbert realized mesh saliency calculation and geometric simplification based on viewpoint mutual information method[15]. Han proposed a viewpoint selection method based on saliency segmentation without manual handling[28].
(1) Review of mesh saliency methods
The early methods for saliency calculation were based on 2D projection areas, which were calculated by 3D saliency characteristics. Guy proposed a method to calculate the saliency map of edges in 2D images[64]. Shashua detected the global saliency region of 2D images from 3D structure[65]. Ullman analyzed 2D structures based on marginal saliency information[56]. It’s well-known that sparse and noisy 3D data had been hard to deal with, some researchers presented their views and solutions. Medioni used 3D curves, connection points and surface saliency to calculate mesh saliency[64]. Guy and Medioni inferred the global perceptual silhouette from local saliency features[64]. Lee calculated the 2D projection saliency map according to dynamic 3D scene based on Itti 's algorithm[17]. Frintrop used depth and intensity information from 2D images to detect silhouette[66]. Kim and Varshney adopted an eye tracker to record the eyeball traces of human for saliency region calculation[67].
Curvature entropy can also be introduced in the calculation of mesh saliency. Yamauchi adopted curvature along the main directions of 3D object to obtain local saliency[63]. Hisada detected the convex spines and concaves by calculating the skeleton and seeking the non-manifold points of surface. It had been suitable for saliency calculation at different scales[68].
(2) Mesh saliency computation
Itti proposed a decomposition method based on visual attention[54]. On this basis, Lee developed a saliency model[17]. Kim and Varshney developed Lee’s method and researched the correlation between saliency and fixation of human eyes. Kim’ work also confirmed the objectivity of saliency (Figure 13)[67].
Lee’s mesh saliency method had been attributed to "center-surround" operation, which was proposed by Itti (Figure 14, Figure 15)[54]. It calculated the mean curvature of vertices by using Gaussian filtering. For each Gaussian filter, the average Gaussian weight of a region with radius 2 σ with a standard deviation σ was calculated, after which the saliency was calculated in different scales by changing the value of σ . The final global saliency was equaled to the sum of non-linear normalization of saliency at all scale levels.
Lee took G(C(v), σ) as a Gaussian weight of the average curvature, where (v) denotes the average curvature of vertex v:
G C v ,   σ = x N C x e x p - x - v 2 / 2 σ 2 x N v , 2 σ e x p - x - v 2 / 2 σ 2
The operating distance of the Gaussian filter had been set to 2σ. If σi denotes the standard deviation of the Gaussian filter at a specific scale level of i, G C v , σ i - G C v ,   2 σ i denotes the absolute difference value of the Gaussian mean curvature at vertex v in different scales i, and denotes the value as ζ. Let P(p) represent the set of visible vertices under viewpoint p, then the saliency observed form point p could be described as follows:
U p = v P p ζ v
Lee adopted the range σ i { 2 ε ,   3 ε ,   4 ε ,   5 ε ,   6 ε } as the scale range, where [ε] adopted the value of bounding box’s diagonal length multiply by 0.3%. Lee used a nonlinear suppression operator which proposed by Itti to highlight the saliency map of unique regions (Figure 16a) and suppressed the saliency value of repetitive similar peaks[17] (Figure 16b).
The nonlinear suppression operator could reduce the saliency of repetitive similar regions, but might cause a heavy computation, so Lee adopted gradient descent to optimize the process. Lee set the optimization variables ( θ , φ), where θ and φ represented longitude and latitude respectively, then set up a objective function U( θ , φ) to calculate saliency. The algorithm started from a random view direction, then employed iterative gradient descent to obtain local maximum value (Figure 17).
It is noticeable that in some special cases, the mesh saliency method even performed worse than curvature method. Some researchers argued that the combination of the appearance attributes might bring some advantages to this method, such as color, texture, even reflectivity.
(3) The implicit relationship between visual attention and saliency
Lee’s mesh saliency model had been widely considered as a classical method, based on the "center-surround mechanism"[54]. Kim and Varshney researched the implicit correlation between Lee's mesh saliency model, Parkhurst’s pure random model and the fixation distribution of human eye attention[67]. Kim’s work confirmed the closer correlation between Lee’s saliency model and human visual attention mechanism (Figure 18).
Heckbert and Garland proposed a standpoint that the quadratic error measure in mesh saliency calculation had been directly related to surface curvature, but only used curvature purely could not obtain the correctly saliency regions according with human visual perception[69].
(4) Mesh saliency calculation based on jensen-shannon divergence
Sbert defined the polygon saliency as the average difference between a polygon and its neighboring polygons[15], where the dissimilarity of polygon zi and zj had been defined as:
D ( z i , z j ) = J S p ( z i ) p ( z ˆ ) , p ( z j ) p ( z ˆ ) ; p ( V | z i ) ,   p ( V | z j )
Here, P(V|zi ) denotes the conditional probability matrix of normalized projection area projected from polygon zi to a sphere which centered on viewpoint v. JS(•) denotes the Jensen-Shannon divergence between p(V|z i ) and p(V|z j ) ,then adopts p(zi )/p(z) and p(zj )/p(z), respectively, as its weights. When the JS-divergence between two polygons was small, the two polygons were considered similar. The saliency of polygon zi has been defined as:
S ( z i ) = 1 N 0 j = 1   N 0 D ( z i , z j ) 0
Z j is the adjacent polygon of Z i , and N0 denotes the number of adjacent polygons of Z i . If the mean value of Z and its adjacent Jensen-Shannon divergence (JS-divergences) value presented is higher, the saliency value of polygon Z was considered higher.
Sbert adopted conditional probability of inverse channel to transmit the saliency information to surface of viewpoints sphere. The significance had been defined as:
S ( v ) = z Z S ( z ) p ( v | z )
Figure 20 shows the most saliency and least saliency views calculated by Sbert’s method.
(5) Viewpoint score based on saliency segmentation
Mortara introduced a viewpoint scoring formula which based on semantic driven[64]. It adopted semantic segmentation parts as computing units, and given a scoring formula for viewpoint w:
S c o r e ( w ) = N v f ( w ) c V ( c , w ) R ( c ) W ( c )
Here, N denotes the total number of visible patches according to current viewpoint, c denotes the visible segment, and R represents the area ratio of this patch to the total mesh. W denotes the weight assigned to specific patch. V represents the visibility attribute, which had been calculated by the ratio of projected area to total surface (Figure 21).
In this method, the influence of low-saliency patches was negligible. R was not suitable for weighting semantic importance, because it denoted the ratio of patch to total mesh, however, the number of covered areas had been calculated previously in mesh saliency. Therefore, Han optimized it as follows:
S c o r e ( w ) = c V ( c , w ) S ( c )
Han combined the mesh saliency method with high-level global structure characteristics to segment mesh[28]. It had been more consistent with eye perception habits (Figure 22), but the process of mesh segmentation was overly dependent on mesh saliency.
(6) Saliency detection method based on self-diffusion function
Gbal et al. proposed a Auto Diffusion Function (ADF) method and applied it in the 3D shape retrieval successfully[70]. It performed better to capture salient points. The ADF function had been defined as a linear sequence of the Laplace-Beltrami operator (LBO), as shown in equation (26).
A D F t ( x ) = K x , x , t λ 2 = n = 0 e x p - t λ n λ 2 h n 2 ( x )
ADF only relied on the eigenvector and eigenvalue of LBO. The local extreme value of ADF had been proved to be a natural interest point (NIP) observed by human eyes, namely the protruding point (Figure 23).
It used the 3D Harris Detector to extract the maximum Gaussian curvature and primary curvature, and filtered them through clustering and thresholding to calculate the saliency region.
Nassima proposed a 3D mesh watermarking method based on mesh saliency[71],which adopted the 3D salient point detector by using the self-diffusion function to detect salient features. Nassima simulated the process of "heat diffusion" through the "heat equation" to seek the salient regions. It is called the heat kernel method (HK). The heat kernel equation could be solved by spectral decomposition through Laplace-Beltrami operator by using different eigenvalues:
K t ( x , y ) = n = 0 e x p ( - λ n t ) h n ( x ) h n ( y )
Here, K t ( x , y ) denotes the probability of position transfer from vertex x to vertex y in time t and hn represents the corresponding eigenfunction. The heat kernel equation had a property of symmetry invariance in equidistant deformation. Nassima calculated the manifold geodesic distance as follows:
T 0 ( x , y ) = m i n γ ( t ) 0 0 P γ ' ( t ) T H ( γ ( t ) ) γ ' ( t ) d t
To (x,y) denotes the length of the shortest continuous path between the endpoints x and y , where γ(0) = x, γ(p) = y, γ'(t) denotes the partial derivative of a parametric curve γ(t). H(•) denotes an implicit metric, and generally considers that H(•)=1. The surfaces had been converted to the Geodesic Voronoi Diagram (GVD). Figure 24 shows an example of a shortcut Voronoi diagram for several models. The salient points represents as a small red dot, and the extracted regions were colored with a warm color.
(7) Saliency detection of viewpoint correlation
In many interactive applications, 3D models should be dynamically visualized according to variation of user’s observation points. Jeong and Sim proposed a view-dependent dynamic saliency detection method based on semi-regular meshes for dynamically seeking saliency patches in 2017[72]. This method solved the above problem effectively.
(8) Random walk algorithm
Jeong adopted Random Walk algorithm (RW) to establish a random motion model based on graph[73]. N and E denoted the node set and edge set respectively, then constructed a weighted graph G(N, E) based on observation direction. Through assigning a correlative weight w i j to each edge e i j ε ,P denoted a |N|×|N|matrix according to transition Markov chain. It had been constructed to simulate the random walk process, where P(i,j) represented the transition probability of a random walker from the j-th node to the i-th node:
P ( i , j ) = w i , j / W j ,   i f   e i j ε 0 , o t h e r w i s e
Where W j = k w k j , Markov chains could produce a unique stationary distribution π , which been satisfied with π = P π as below:
π = [ π ( 1 ) , π ( 2 ) , , π ( N | ) ] T
π ( i ) represents the probability of walker accessing the i-th node, which had been obtained by the iterative multiplication of the transition matrix P.
Saliency calculation of viewpoint correlation:
Jeong defined the visibility η i , v L of each patch f i L from the observation of viewpoint v , and calculated the angle φ ( f i L , v ) of the direction of viewpoint v to f i L between the patch’s normal ρ ( f i L ) :
ϕ ( f i L , v ) = a r c c o s ( v - p ( f i L ) ) ρ ( f i L ) v - p ( f i L ) ρ ( f i L )
If φ ( f i L , v ) is greater than 90°, then view v is invisible to patch f i L . The patch f i L with the best level of visibility η i , v L could be defined by the following formula:
η i , v L = c o s ϕ f i L , v ,   i f   0 ϕ ( f i L , v ) 90 0 , o t h e r w i s e
Surface visibility f i L can be estimated from the mean value of its subsurface visibility:
η i , v l = 1 4 k = 1 4 η i , k , v l + 1
Where η i , k , v l + 1 denotes the visibility of subsurface f i , k l + 1 , the saliency regions were highlighted in a specific viewpoint observation, while the back sides invisible from the viewpoints were effectively suppressed (Figure 25).
Jeong and Sim constructed a fully connected directed graph G v l based on random walk (RW). The graph took some patches with non-zero visibility as nodes. The method realized the RW simulation in graph and obtained the stationary distribution π as below[74]:
π v l = P v l π v l
Here, the transition matrix P was constructed by the weight of edge. It should be necessary to normalize π v l to π ¯ v l , then the result represented the saliency distribution observed in view point v.
Combined with the multi-scale saliency distribution rule (Figure 26), the distribution of viewpoint-dependence saliency is denoted as S f a c e , v 0 , and it be obtained from the normalized stationary distribution π v 0 . The saliency distribution obtained from a higher level distribution b v l is shown below:
b v l i = f k l Φ v f i l S f a c e , v l - 1 u f k l e x p - d ( f i l , f k l ) 2 k 3 δ l 2 f k l Φ v f i l e x p - d ( f i l , f k l ) 2 k 3 δ l 2
Here, Φ v ( f i l ) denotes the set of observable patches in Φ ( f i l ) , which observed only through viewpoint v. Similarly, the saliency distribution S f a c e , v l could be obtained by normalizing b v l to b ¯ v l ,
S f a c e , v l i = m a x π ¯ v l i , b ¯ v l i
(9) Mesh saliency method based on deep learning
Lau adopted Learning-to-Rank algorithm to calculate mesh saliency[75,76]. Lau considered that 3D shapes could be expressed as depth images obtained from multiple viewpoints. The saliency regions were mapped to the center of a patch, which be extracted from depth images and then ranked according to saliency scores by the neural network.
The collection of mesh saliency: Lau adopted a strategy for collecting saliency data from subjects[75]: Firstly, rendering images based on a 3D mesh were generated, and subjects were asked to mark them on Amazon Mechanical Turk HITs (Figure 27a-c). The test comprised 118 participating subjects and 4200 data samples.
Multi-view deep ranking and deep neural network design: Manfred adopted the learning-to-rank method to solve the saliency ranking problem with the back-propagation algorithm. The structure of the neural network was shown as Figure 28, where x denoted the patch which sampled twice in depth image, y denoted the saliency value of a patch center. The size of each depth image was 300×300 PPI. Manfred calculated the saliency of each vertex and clustered them based on multiple viewpoints, then optimized the result by gradient descent method (Figure 28).
Deep ranking formulation and backpropagation:
Manfred used the learning-to-rank formulation to learn W and b to minimize the loss function:
L ( W ,   b ) = 1 2 W 2 2 + C p a r a m I t r a i n ( x A , x B ) I t r a i n l 1 ( y A - y B ) + C p a r a m ε t r a i n ( x C , x D ) ε t r a i n l 2 ( y C - y D )  
Here W 2 2 denotes the L 2 regularization matrix to prevent overfitting. Cparam denotes the hyperparameter, Itrain and εtrain represent training data sets, i denotes the index of the viewpoint. | I t r a i n | and |εtrain | are the number of elements in Itrain and εtrain respectively. l 1 ( t ) and l 2 ( t ) are the loss functions, and there y = hw, b (xA ). Let l 1 ( t ) = m a x ( 0,11 - t ) 2 , l 2 ( t ) = t 2 , Manfred adopted Batch Gradient Descent (BGD) for backpropagation to minimize L(W, b). yA and yB had been obtained by forward propagation from the current (w,b) and ( x A ,   x B ) ε t r a i n . For the same propagation to deal with ( x C ,   x D ) ε t r a i n and obtained four copies, then deal with each copy by backpropagation and calculated δ (for the output layer):
δ i ( n l ) = y 1 - y
δ i ( l ) = k = 1 s l + 1 δ k ( l + 1 ) w k i ( l + 1 ) 1 - a i ( l ) 2
Gradient descent: The partial derivative of the whole process could be calculated as follows:
L w i j ( l ) = w i j ( l ) + 2 C p a r a m I t r a i n ( A , B ) m a x ( 0 ,   1 - y A + y B ) c h k ( y A - y B ) δ A i ( l + 1 ) a A j ( l ) - 2 C p a r a m I t r a i n ( A , B ) m a x ( 0,1 - y A + y B ) c h k ( y A - y B ) δ B i ( l + 1 ) a B j ( l ) + 2 C p a r a m ε t r a i n ( C , D ) ( y C - y D ) δ D i ( l + 1 ) a D j ( l )
When t≥1, chk(t) = 0, otherwise chk(t) = −1. Before backpropagation, the value of chk(yA yB ) should be checked for each pair (xA,xB ). If chk(yA yB )=0, backpropagation cannot occur. The calculation of partial derivative was similar to the above process.
Batch gradient descent started with initialization of w and b. The weights w and b were updated with the learning rate α :
w i j ( l ) = w i j ( l ) - α L w i j l
b i ( l ) = b i ( l ) - α L b i l
After learning w and b, then computed the mesh saliency information: selected a set of viewpoints viewj for each vertex vi , and assured that each vertex vi should be visible to viewpoint viewj. After obtaining x i ( v i e w j ) by the second sampling, w and b were adopted to calculate saliency value h w , b ( x i ( v i e w j ) ) , then take the mean value as the saliency value of vi (Table 1).
Comparison between the results of learning-to-rank obtained by RankSVM method and the deep learning-to-rank method proposed by Manfred based on CNN

No. of



(% error)

Deep ranking

(% error)

Mug 114 10.5 1.8
Cooking Pan 181 9.4 3.3
Screwdriver 64 7.8 1.6
Shovel 88 26.1 2.3
Cell Phone 76 27.6 2.6
Laptop 23 4.3 4.3
Alarm Clock 48 12.5 2.1
Game Controller 262 3.4 1.5
Statue of Dog 95 3.2 1.1
Statue of Human 49 10.2 4.1
Lau compared the results through NDCG (normalized discounted cumulative gains). The result proved that saliency ranking obtained by CNN method presented well consistent with the rank of subject’s saliency scoring (NDCG score had been above 0.92). The facts proved that Lau’s CNN method could improve user experience in human-computer interaction, and the saliency information could be adopted to render more interesting regions for 3D models (Figures 29 and 30).
This method could obtain the saliency distribution without manual annotation and well conformed to subject's visual perception, but it couldn't work effectively without familiar shapes.
(10) Visual saliency study based on solid objects
Wang et al. studied the observation behavior of people by using real 3D printed objects[77]. They used the eye movement tracker with camera to analyze eye trace, then took a specific algorithm for mapping the eye-tracking data to asolid object surface. The equipment could record the observation behavior and visual fixation region of subjects (Figures 31 and 32).
Wang collected the calibration data of pupils from subject n at the time tk , then denoted the relationship as g n i ( t k ) : N P 2 . If g n i ( t k ) was restricted to a radius ρ in at least τ milliseconds, the small region could be considered as a fixation mapping region that g n i ( t k ) to f n i . Parameter ρ , τ , and t could control the mapping relation from a pupil position g n i ( t k ) to a fixation position f n i (Figure 33). The mapping relationship from a point x i P 3 to a point w n i P 3 denoted as below:
W n i = R n i x i + t n i
The inverse mapping from a position w n P 3 to a pupil position P n could be expressed as follows:
s P n 1 = Q n W n 1 ,   Q n R 3 × 4
Where s denoted a scaling factor, Q n was determined by a set of corresponding relation {pi,wi}, which obtained from the calibration process. Q n denoted a projection transformation, and it could be decomposed into an implicit camera matrix A n Q and a rigid transformation T n Q = ( R n Q , t n Q ) P 3 × 4 ,where R n Q and T n Q respectively denoted rotation and translation. The relationship is shown as below:
f n i = A n Q R n Q r n i
Where f n i denoted a fixation point, r n i represented a ray emitted from the world camera space to object. Wang set the fixed mapping relation between a vertex v a of mesh M and a fixed position f n i closest to pupil coordinate as:
p ˆ a = Q n ( R n i v a + t n i )
Denoted the set of angular aperture c of all vertices v a where could observed mesh M as below:
Γ c ( f n i ) = v a M ( f n i ˆ ) T M n Q p ˙ a ( ( f n i ˆ ) T M n Q f n i ˆ ) 1 / 2 ( p a T ˆ M n Q p a ˆ ) 1 / 2 > c o s c
Where f n i ˆ = ( f n i , 1 ) , the set Γ c ( f n i ) could been calculated by M n Q = ( A n Q A n Q ) - 1 (Figure 34).
Wang brought a challenge to mesh saliency theory, and considered that it might be inappropriate to predict saliency by Lee’ method, and questioned Kim’s work[67]. In Wang’s experiment, 3D printing materials or physical lighting might affect the results of eye tracking, there still left some controversial issues.
3.2.5 Viewpoint selection method based on viewpoint mutual information
Vázquez proposed the Kullback-Leibler Distance method based on projected area[10]. Feixas proposed a viewpoint mutual information (VMI) method to solve the problem of occlusion[15]. Vázquez’s method was more typical among them. Vázquez built a viewpoint information channel and introduced a viewpoint evaluation function. The VMI method could obtain the most representative viewpoints, then mixed them into a good viewpoint set with minimum viewpoint mutual information.
The method selected the viewpoints v 1 and v 2, which corresponding to a minimum distribution. P(Z|v 1) and P(Z|v2 ) respectively denoted the probability of a polygon Z observed by viewpoint v 1 or v 2, and v ˆ represented the clustering of v 1 and v 2, thus formed a mixed distribution:
p ( v 1 ) p ( v ˆ ) p ( Z | v 1 ) + p ( v 2 ) p ( v ˆ ) p ( Z | v 2 )
The method could minimize v ˆ with p ( v ˆ ) = p ( v 1 ) + p ( v 2 ) , and produced new mixed distribution:
p v 1 p v ˆ p Z | v 1 + p v 2 p v ˆ p Z | v 2 + + p v n p v ˆ p Z | v n
Where p ( v ˆ ) = p ( v 1 ) + p ( v 2 ) + + p ( v n ) , the mixed distribution has been produced until the value of I ( v ˆ ; Z ) / I ( V ; Z ) falls below a given threshold, and I(V; Z) denoted the mutual information of viewpoints. Figure 35 shows the six best views obtained by using VMI algorithm.
It used geometric information to analyze viewpoint, so it does not conform to visual cognitive habit of the human. This weakness might be solved by introducing visual perception parameters.
3.2.6 Viewpoint mutual information method based on polygon information and mesh saliency
According to the Bayesian formula, Sbert defined a viewpoints mutual information (VMI) formula:
I ( Z ; V ) = z Z p ( z ) v V p ( v | z ) l o g p ( v | z ) p ( v ) = z Z p ( z ) I ( z ; V )
Where I(z;V) denoted the polygon mutual information of polygon z:
I ( z ; V ) = v V p ( v | z ) l o g p ( v | z ) p ( v )
PMI (Polygon Mutual Information) represents the correlation between polygon z and the viewpoint set V. The low value of PMI corresponded to a polygon which had been observed by the most viewpoints, and vice versa (Figure 36)[12].
3.3 Semantic-based viewpoint selection method
The Semantics-driven best View Selection method realized the viewpoint evaluation by using semantic segmentation. The evaluation properties included semantic components of scene, artificial labels, etc. The best viewpoint obtained by this way usually had a better semantic characteristics, but was especially difficult to realize the automatic segmentation.It often needed manual intervention.
Attene proposed an effective annotation method for virtual 3D objects, which employed a hybrid segmentation algorithm to support users' autonomous selection[19]. Takahashi proposed a feature-driven method to locate the best viewpoints, and introduced semantic information into a evaluation function[78]. Mortara calculated mesh saliency with semantic characteristics in a given orientation[79].
3.3.1 Semantic-based mesh segmentation
Polonsky calculated the best view by identifying a semantic part of model and maximized the visible projected area[18].On this basis, Mortara proposed a semantic segmentation method based on aesthetic characteristic to automatically obtain goodness viewpoints[79].
(1) Segmentation method based on fitting primitive body
Attene firstly proposed a mesh hierarchical segmentation method based on the fitting primitive body[19]. This method used a clustering algorithm for processing triangular patches, which extended the polygonal surface clustering algorithm proposed by Heckbert and Garland[69]. Attene’s method was very suitable for anthropomorphosis models.
(2) Segmentation based on reeb graph
Milnor proposed Morse Theory in 1934, which has a great significance to study topological structure[80]. According to Morse Theory, a topological representation of an object M could be encoded into a Reeb graph of a mapping function f, which reflected the morphological evolution of a object. Nodes in Reeb graph were corresponding to critical points of f. This method could decompose the shapes into different feature regions naturally.
(3) Semantic method based on morphological features
Mortara proposed the "Tailor" method to realize mesh segmentation at different scale levels by evaluating the geometric and topological properties of each region[81]. It calculated the convexity of neighborhood and clustered the adjacent vertices to achieve mesh segmentation. At a higher level research, Mortara proposed a "Plumber" method to decompose 3d model into tubular characteristic and trunk characteristic[81].The processing was determined by geometric and topological properties of neighborhood regions. It’s suitable for articulated objects(and unsuitable for uniform shape) (Figure 37).
3.3.2 Viewpoint selection based on semantics feature
Mortara used a set of discretely viewpoints which uniformly distributed around object, then introduced a evaluation function[81].It comprehensively considered the influence of saliency and semantic characteristics of scene (Figure 38).
Visibility feature could be denoted as follows: defined the visibility V of patch s from a viewpoint w as the percentage of its vertex visible in w:
V ( s , w ) = N v ( s , w ) / N s
Where Nv (s,w) denoted the number of visible vertices in s (from the observation of w), and Ns denoted the total number of vertices in s. Where,0≤V(s,w)≤1, and only if V(s,w)>0, s could be observed from w , and vice versa. If the number of vertexs denoted as N, its computing complexity is O(NlogN).
Mortara introduced a feature-relevance R to measure contribution of s to total surface area :
R s = A r e a s A r e a M
Where M represented the entire model.
The importance of segmentation should be intensified according to different feature types, because a tiny part might have a important semantic. Mortara proposed a viewpoint scoring function:
S c o r e ( w ) = ( N v f ( w ) ) S i ( s i , w ) R ( s i ) W t ( s i ) )
Where Wi (si ) denoted a specific weight assigned to each feature type. The score function improved the rationalisation of saliency region (Figure 39, Figure 40).
The view stability obtained by this way had been greatly affected by the number of segments, and the shape-based annotation process was cumbersome. Optimized the annotation approach with machine learning might improve the efficiency.
3.3.3 Semantic metrics based on aesthetic characteristics
Blanz observed that people tended to choose a horizon slightly higher than the horizon for observation[4].Gooch proposed a viewpoint selection method based on aesthetic template[11].Secord considered that semantic could reflect aesthetic features and visual preference[37].He proposed a novel algorithm, where ni denoted the number of subjects ,the number of observed image pairs denoted as i, ki represented the number of times when viewpoint vi 0 selected. Secord calculated the likelihood estimate through NaÏve model and Oracle model, then calculated the fitness as follows:
L * [ P ( v ) ] = c + i   M k i l n P ( v ) + ( n i - k i ) l n ( 1 - P ( v ) )
F [ P ] = ( L * [ P ] - L n a i v e * ) / ( L o r a c l e * - L n a i v e * )
Secord set up the goodness value of single viewpoint and denoted it as G0≡G(v 0) and G1≡G(v 1). According to Bradley-Terry model (1952), where the probability of user's choice v 0 or v 1 denoted as p, it could be calculated as below:
P ( G 0 , G 1 ) = 1 1 + e - σ G 0 - G 1
Secord used linear-K models to calculate the goodness value,and combined it with k attributes to improve single attribute model as following:
G ( v ) = j S w j a j
Where v denoted a viewpoint, and S was the index of attribute values in a model.Experimental results showed that the method could effectively predicted the goodness value of viewpoint.
4 Application of viewpoint evaluation
4.1 Radiosity calculation and global illumination optimization based on viewpoint evaluation
Viewpoint evaluation could be widely used in global lighting and rendering problem.Plemenos used viewpoint complexity information from a given orientation to optimize radiosity calculation[7].
4.1.1 Radiosity sampling based on monte carlo method
The traditional radiosity usually calculated based on monte carlo method,but monte carlo sampling might produce noisy data.In order to obtain a better light distribution, it should effectively eliminated noise data and calculated viewpoint complexity of scene.
4.1.2 Improvement of monte carlo radiosity based on viewpoint complexity
Plemenos and Benayada proposed a light distribution method based on viewpoints complexity[7]. It took a specific patch as a viewpoint position, then iteratively estimated the viewpoint complexity of adjacent region,then distributed the radiosity reasonably in 3D scene.
4.1.3 Raytrace optimization based on viewpoint quality analysis
Frequent sampling would brought tremendous computing, and uniform regions required less ray-tracing than geometrically discontinuous areas.Plemenos adopted a sampling methods according to viewpoint complexity,and employed viewpoint entropy as a viewpoint complexity measure (Figure 41).
4.2 Modeling and rendering based on view quality analysis
Plemenos proposed a standpoint: the core of viewpoint complexity was to calculate the minimum set of optimal viewpoints, and such viewpoints could represent most important characte-ristics of scene[15] (Figure 42). The principle of the method has been described as follows:
(1) The viewpoint complexity of a specific region could be defined as a average complexity value of a specific spherical triangle,where the spherical triangle could represented a specific region.
(2) Treated each viewpoint as a candidate viewpoint, then used the viewpoint evaluation function to eliminate elements iteratively from the viewpoint list by evaluate its contribution.
(3) If there were no viewpoints could be eliminated at the end of step, the process was finished, and the current viewpoints list contained the minimum set of viewpoints.
4.3 Application of viewpoint complexity in molecular visualization
Plemenos’s method had been applied to molecular visualization[7,15]. He proposed that the arrangement of molecules could be clearly observed form low-complexity viewpoint, so it’s very useful for using the arrangement to infer its physical properties.A high-complexity viewpoint could show the arrangement of atoms in molecules, so it could infer the chemical properties (Figure 43).
Vazuqez improved this method and applied it to orthogonal view, then realized the automatic evaluation of molecular structure[10].
4.4 Model simplify based on mesh saliency
4.4.1 Mesh simplification based on visual saliency
Hoppe proposed a new method based on geometric distance to improve the mesh simplification method[82]. Heckbert and Garland introduced a surface simplification method based on quadratic error measure[69].Cohen proposed a mesh simplify method based on Appearance-Preserving method[83]. Such methods were suitable for scanning model, and efficient to model simplification.
Image-based methods for mesh simplification usually created simplified mesh with similar appearance to the original mesh. Karni and Gotsman proposed a data compression method based on simplified grid[84]. Kim and Varshney improved a visualization enhancement method for volume rendering by using Saliency-Guided[67]. Lindstrom and Turk proposed a simplified method based on Image-Driven method[85]. Luebke and Hallen put forward a perception-driven simplification algorithm for interactive rendering[86]. Zhang and Turk used a simplified grid algorithm based on Visibility-Guided method[87]. Compared with geometric methods,above algorithms had a huge calculation burden but better visual effects. Such methods were mainly used in special fields with high requirements for visual effects, such as game animation and virtual military drills.
4.4.2 Mesh segmentation based on mesh saliency
Felzenszwalb proposed an efficient image segmentation method based on graph[14]. Inspired by this, Lee proposed a mesh segmentation method based on mesh saliency, which could capture the visual importance region according to visual perception mechanism[17].
Han used the absolute value of difference saliency between adjacent patches instead of the pixel-value differencing, and stipulated the patch saliency should be calculated from the average saliency of its three vertices[28]. The mesh saliency segmentation method was more suitable for model segmentation than curvature and other geometric methods (Figure 44).
5 Selection and application of goodness viewpoints set
5.1 Virtual camera guidance based on goodness viewpoints set
Automatic camera guidance has a great significance in many fields, such as virtual reality, visual servo, robot controlling and graphics rendering. Erdem and Sclaroff adopted a mixed setting method to combine omnidirectional and directional cameras, then established a global optimization method to improve layout[88,89]. Alam and Goodwin used virtual omni-directional camera to solve the viewpoints coverage problem by using heuristic method[90].Becker and Bove proposed a camera configuration method to selected the best camera position by using voting strategy and heuristic method[91]. By fixing the viewpoints height and optimizing the cameras layout, Yao et al. transformed the 3D problem into a 2D discrete coverage problem[92]. Amriki and Atrey proposed SmarkMax algorithm to quickly solve the camera association problem in order to realize the maximum coverage with minimum number of cameras[93]. Sokolov and Plemenos’s research has been influential in virtual camera guidance. Sokolov proposed a automatic camera roaming method based on goodness viewpoints set and optimized it based on semantic distance[45].
The methods of virtual camera control based on goodness viewpoints set also applied in many patented products. Kahng and Yoon's patent designed a multi-viewpoint virtual reality system to realize scene understanding[94].Piemonte’s patented product realized a observation method of 3D map by multi-directional virtual camera[95]. Vandrotti’s patent used multi-camera virtual reality system to realize automatic adjustment of camera roaming[96]. Molina realized the container-based intelligent control method to optimize virtual camera control[97]. Jeon’s patent realized the optimized configuration of virtual camera motion module including visual field, sight line and motion track[98].
5.1.1 Camera guidance based on trajectory ball control
Barral proposed a automatic camera motion algorithm to realize that visual understanding[99-101]. Colin researched camera roaming based on polygonal information[47]. Dorme’s doctoral thesis systematically studied the important technology of 3d scene understanding and elaborately stated a variety of camera control methods[102]. Jaubert studied a camera control technology applied to offline scene exploration[103].Lu's patent researched the virtual camera controlling in multidimensional space to obtain the best viewpoints[104]. Scord introduced a classical method of Trackball Grab to realize camera guidance based on goodness viewpoints[45]. During navigation,the camera would guided users along with goodness viewpoints or deviate from bad viewpoint (Figure 45).
5.1.2 Camera guidance method based on magnets
Sokolov proposed a global real-time camera guidance method (Figure 46). This method adopted a incremental build method to identify a unexplored area,then created the “Magnets” from the preferred viewpoint area[45]. The camera could be attracted by magnets, and the magnetic force of undetected area would attract the camera revert to a guidance path[46]. Sokolov introduced a evaluation function:
m ( p ) = | N ( p | P 0 k ) | V ( p ) I ( p )
Where p 0 k denoted the camera track.N(p|P 0 k ) denoted the new vertex set which observed from viewpoint p and it related to the new information observed from the camera trajectory. N(p|P 0 k ) could be calculated by the following formula:
N ( p | P 0 k ) = V ( p ) \ ( V ( p ) V ( P 0 k ) )
I(p) denoted the driving force of the navigation operator, it dragged the walker to the trajectory which composed of goodness viewpoints. I(p) could calculated from the following formula:
I p = v V p 2 π - α i α v α i f F p P f
Where F(p) denoted the set of patches which observed from viewpoint p, P(f) represented the projected area of surface f , V(p) denoted the set of visible vertices observed at viewpoint p, α(v) denoted a set of vertex angles around with vertex v.
There are other researchers contribute to virtual camera controlling. Marchand and Courty proposed a image-based camera motion strategy through a incremental construction method[105]. Abner proposed a new camera relocation method based on multi-output learning[106].The above method had been applied successfully in field of robot positioning and 3d reconstruction.
5.1.3 Camera path planning based on greedy algorithm
Sokolov used a greedy algorithm to obtain optimal viewpoints set for dynamically planning camera path. The observation quality of camera view set Mk denoted as I α(Mk ), and the observation quality of all possible camera views denoted as I α(S D). At the beginning M 0=∅, then the algorithm added a best view (s i , d i ), Mi =Mi- 1∪{(si , di )}.The relationship had shown as below:
I α ( M i - 1 { s i , d i } ) = m a x x S D Q ( M i - 1 { x } )
The camera could find the best perspective view of the unexplored regions at each step (Figure 47),and this method realized the intelligent exploration with less algorithm complexity in virtual scene.
5.2 Viewpoint selection based on deep learning
Cretu used Neural Gas Networks to detect regions of interest on mesh surface in 2016[107]. Martinetz realized multi-dimensional vector clustering by using neural gas network[108].Monette realized a mesh feature selection method based on neural network and applied it in 3d densification modeling[109]. These methods were novel and effective.
Ana’s method was the representativeness work[107]. It used a input vector x obtained from a limited training data set, which denoted as x=[X Y Z].When a new input vector x had been input to network, an index list (j0, j i , , j n -1) according to neighborhood ordering had been created. In learning process, network selected adapted nodes according to a ordering distances between its weight w and the input vector in index list. When at time t, network calculated the best matching neuron s(x) by using minimum Euclidean distance:
w j t + 1 = w j t + α t h λ k j x , w j x t - w j t
Where α ( t ) [ 0,1 ] represented the rate of variation in the whole process, k j ( x , w j ) denoted the sort function according to weight vector w j :
h λ ( k j ( x , w j ) ) = e x p ( - k j ( x , w j ) / λ ( t ) )
It should be minimized the cost function of network:
E ( w ,   λ ) = 1 2 C ( λ ) j = 1 N P ( x ( n ) ) h λ ( k j ( x , w ) ) ( x - w j ) 2
Where P(x) denoted the distribution of input vector x(n), and:
C ( λ ) = k = 0 N - 1 h λ k
Parameter λ could be considered as a temperature factor.When under a low temperature, there were only the winning nodes produced cost.If under high temperatures, many nodes might increased the cost of network.Learning rate α ( t ) and function λ ( t ) were time-dependent.In learning process, these parameters needed to be reduced slowly to ensure the convergence of algorithm:
α ( t ) = α o ( α T / α o ) t / T ,   λ ( t ) = λ o ( λ T / λ o ) t / T
The neural gas network has a fast convergence speed and small distortion error after convergence. Compared with other self-organizing network, it performed better in capturing details. Its shortcoming was the determine of initial node’s number before self-adaptation, which deeply affected the accuracy of results (FIgure 48, Figure 49).
5.2.1 Viewpoint selection based on region of interest
In 2016, George Leifman proposed a “Regions of Interest (ROI)” algorithm, and used it for viewpoint selection[33]. The algorithm took into account the selection tendency of human vision to the interest region. Where I(vi ) denoted as interestingness of vertex vi , and defined I(vi ) as a extreme value of vertex uniqueness, then integrated the results in different stages as below:
I ( v i ) = m a x D ( v i ) + A ( v i ) 2 , E ( v i )
Where D(vi ) denoted the uniqueness of vertex vi , A(vi ) represented the correlation degree of vertex vi , E(vi ) denoted the extreme value of vertex vi . When v n N v ,the extremum vertex v should be satisfied with the following:
v j S G e o d D i s t ( v ,   v j ) > v j S G e o d D i s t ( v n ,   v j )
For each vertex v, calculated GeodExt(vi ) of its nearest extremum vertex and normalized it to [0,1],then the extreme value of vertex vi had been calculated as below:
E ( v i ) = e - G e o d E x t 2 ( v i ) 2 σ 2
A(vi ) denoted the correlation degree of vertex vi :
A ( v i ) = D f o c i ( v i ) e - G e o d F o c i 2 ( v i ) 2 σ 2
Where σ took an empirical value σ=0.05. D foci(vi ) denoted the uniqueness value of the focus, and the uniqueness of vertex vi had been defined as below:
D ( v i ) = 1 - e x p - 1 K k = 1 K d ( v i , v k )
Where d(vi ,vj ) denoted the difference between vertex vi and vj .Leifman took 5% of the vertices number as value of K, it was a optimal empirical value through the rigorous experimental verification. If K was greater than the empirical value, it would increase the calculation time but produced unconspicuous influence on final result. The difference between vi and vj had been defined as below:
d ( v i , v j ) = D ( h ( v i ) ,   h ( v j ) ) 1 + c G e o d D i s t ( v i , v j )
Where c took the empirical value that c=3, and GeodDist(vi ,vj ) denoted the Geodesic distance between vertex vi and vj . Multi-scale difference was an average value of all scale differences (Figure 50, Figure 51).
Leifman realized automatically selection based on the most interesting viewpoint set for a given mesh surface, and the algorithm steps is shown in Table 2.
Leifman's viewpoint selection algorithm
1. Generate candidate viewpoints
2. Compute the viewed ROI form each candidate
Iterate on:
3. Select an information viewpoint
4. Improve the selection in the local neighborhood
This method consists of four steps:
(1) Firstly, produced a set of candidate viewpoints P s by uniformly sampling around object;
(2) Evaluated the quality of each viewpoint p i P s according to the interesting region:
I ¯ ( p i ) = v j V I ( v j ) w i ( v j )
Where V denoted the vertex set of mesh S, I(vj ) was the interestingness value of vertex vi , and wi (vj ) denoted the weight assigned to specific area. βij denoted the angle between a normal vector of the surface where vertex vj located and the direction of p i - v j ¯ . If vj was visible form viewpoint pi , then w i ( v j ) = c o s β i j , if not, then w i ( v j ) = 0 .
(3) Selected representative viewpoints sets and calculated the distribution of interesting regions.
(4) The first viewpoint should maximized interestingness I ¯ ( p i ) . Then iteratively added new viewpoints into current set and combined previous viewpoints to maximize the observation of interesting areas.Where wmax(vj ) denoted the maximum weight assigned to vj , δ(pi ) denoted the contribution of viewpoint pi to total interesting areas:
δ ( p i ) = v j V I ( v j ) m a x ( w i ( v j ) - w m a x ( v j ) ,   0 )
Added candidate viewpoints which could maximized δ to current set, until the following conditions had been satisfied:
1) The region of interest observed by viewpoints set should been reached at least 60% of the total interesting areas of mesh
0.6 I ( v ) < I ¯ ( p 0 ) + p i s e t δ ( p i )
2) There are no new viewpoint p could enlarged the region of interest
δ 0.1 v j V I ( v j )
(4) When selected a new viewpoint, it should maximize the weight w of interesting region and the contribution δ of interest degree. The method used detects reflective symmetries algorithm to avoid producing completely symmetric.
Leifman compared this method with four classical multi-viewpoint selection algorithms (Figure 52). The four classical algorithms included Lee’s Mesh Saliency method(2005)[17], Mortara’s Semantics -driven Best View of 3D Shapes (2009)[79], Yamauchi’s Towards Stable and Salient Multi-view Representation of 3D Shapes[63], and Secord’s Perceptual Models of Viewpoint Preference[37].
This algorithm was excessively dependent on interesting region and details, therefore it would obtain wrong results in some cases.As shown in Figure 53, it calculated the best viewpoint in the top view where gathered most details and interesting areas, but human eye tended to choose the side view as a natural superior view in fact.
5.2.2 Estimation of camera viewpoint position based on CNN
In recent years, machine learning had been gradually introduced into 3D viewpoint estimation, such as:Krizhevsky used convolutional neural network for image-based view classification[110]. Peng trained a convolutional neural network to rendering 3D model[111]. Subsequently, neural network methods gradually applied to forefront issues such as posture estimation,virtual scene understanding and multi-viewpoint classification.Su adopted a intensive multi-viewpoint deep learning for viewpoint classification[112]. Stark used machine learning to recognize CAD data[113]. Payet and Todorovic used silhouette to estimate 3d posture by deep learning[114]. Fidler proposed a deformable 3d cuboid model for 3d target learning[115]. Cabral’s patent used a virtual panoramic camera for intelligently learning multi-viewpoints and 360°panoramic[116,117].
In recent years, the Convolution Neural Network had been achieved a great success in image classification, detection, posture estimation, segmentation and other novel computer vision tasks. Mottaghi used convolutional neural network to realize posture estimation of 3d objects[118]. Su used 3D rendering view to train convolutional neural network and estimated the goodness viewpoint[119].
The obstacle of CNN concentrated mainly on the difficulties in obtaining a large number of effective viewpoints without manual annotation.Wang designed a rendering pipeline,which generated realistic RGB images with viewpoint annotation by CNN training, and placed 3d models in a panoramic scene for estimation[120] (Figure 54).
By collecting 1780 high-quality 3D CAD models and classifying them, Wang rendered a synthetic RGB image data set with placing those models in a realistic panoramic scene (Figure 55).
Where the posture of 3d object denoted as a tuple Θ = ( θ a ,   θ e ,   θ c ) , θ a , θ e and θ c denoted azimuth angle, elevation angle and clockwise angle respectively. The test collected 1780 3D models from TurboSquid and cgtrade. The CNN architecture had been shown in Figure 56.
The input fixed size image was scaled to 256×256 pixels after processing.The estimate viewpoint of output denoted as Θ. The CNN employed four convolution layers to extract features and two fully connected layers to estimate viewpoints. For each convolutional layer, the kernel was 5×5 pixels with a step of 2 pixels, and executed the maximum pool operation behind the convolutional layer processing. In order to reduce excessive fitting in training, the method introduced a dropout regularization method and adopted a linear activation function in output layer to predict the parameters of the viewpoint Θ. The training set denoted as D = { ( I n , Θ n ) | n [ 1 , N ] } , where I n denoted input image, Θ n was viewpoint parameter, N denoted the number of training set.The loss function L ( w ) should be minimal as below:
L ( w ) = 1 N n = 1   N f ( I n ; w ) + λ r ( w ) w * = a r g m i n w L ( w )
Where r(w) had been employed to suppress the large weight, λ denoted the attenuation of weight based on Euclidean Loss. f ( I n ; w ) denoted the output loss item of each layer under the given w:
f ( I n ; w ) = 1 2 ψ ( I n ; w ) - Θ n 2 2
Stochastic Gradient Descent (SGD) method had been used for training model with updating the momentum and the attenuation of weight:
V t = r t V t - 1 - α L ( w t ) r t = t T r e + ( 1 - t T ) r s ,   t < T r t , t T
Where w t and v t denoted the weight and momentum at time t respectively, α represented the learning rate, r t denoted the momentum coefficient, r s denoted the momentum at the beginning and r e represented the momentum at the end time.T denoted the threshold value which used to control the variation of momentum in different epochs. Wang used polar histogram to illustrate the distribution of different viewpoints (Figure 57). The distribution of viewpoints had been biased with varies tendency.
The accuracy Accδ denoted a function of the longitudinal overlap ratio and tolerance δ. By comparison the performance between Tulsiani’s TNet training based on ImageNet and Su’s ShapeNet method based on AlexNet (Table 3). The results indicated that Wang's method performed better than the other methods, and its average accuracy achieved 0.75.
Performance comparison: Tulsiani et al., Su et al., compared the accuracy of the experimental results with Wang's
Method Aero Boat Car Mean
Acc 30° (Tulsiani et al.) 0.78 0.49 0.90 0.72
Acc 30° (Su et al.) 0.74 0.52 0.88 0.71
Acc 30° (Ours) 0.79 0.64 0.82 0.75
5.2.3 Large-scale scene reconstruction based on multiple viewpoints
Multi-view research had been widely applied in robotics technology to explore the implicit correlation features of 3D scene. It’s useful for camera relocalization and 3D reconstruction[121]. Papon proposed a connectivity segmentation method based on voxel cloud[122]. Mur-Artal proposed a monocular SLAM system with high accuracy[123]. Li proposed a method to realize RGBD relocation based on geometric structure[124]. Satkin proposed a viewpoint invariant 3D geometric matching method for scene understanding[125]. Valentin studied the indeterminacy factors in forest regression to achieve accurate camera relocation[126]. Walch proposed a localization method by using LSTMs[127]. All these strategies applied advanced sensor methods of repositioning to scene learning.
Some patented products had been also produced. Such as: Yano et al.’s multi-viewpoint virtual camera[128]. Handa’s multi-view cameras array[129]. Katsumasa’s intelligentized multiple cameras in 3d games[130]. Park's cameras navigation system[131]. Kim’s viewpoints generation system[132].
Sawhney proposed a method to evaluate the geometric similarity and correlation between 3D point sets[133]. On this basis, Sawhney established a determinantal point process (DPP) model to establish relationships between retrieved views. Rahul clustered the viewpoint set VX and embed Fisher vector into these sets (Figure 58), then calculated the normalized gradient of a logarithmic likelihood function. It introduced the similar kernel according to Fisher Kernel theory and Gaussian mixture model (GMM).Given GMM P Θ after training with parameterization to Θ = { p g ,   v g ,   Λ g } 1 G , the Fisher vector embedded in set F x was denoted as φ ( F x ) :
φ ( F x ) = L Θ Θ l o g ( P Θ ( F X ) )
L Θ denoted the Cholesky decomposition factor of fisher's inverse matrix, Θ l o g ( P Θ ( F X ) ) was the score function, Λ g was the diagonal covariance matrix, and a embedded samples could be estimated as:
φ ( F x ) = m 1 0 , m 1 1 T , m 1 2 T . . . m g 0 , m g 1 T , m g 2 T . . . m G 0 , m G 1 T , m G 2 T T
Denoted the dimensions of φ ( F x ) as d φ = ( 2 d F + 1 ) G ,where G was the proportion of mixed components, d F was the dimension of the geometric eigenspace. m g 0 , m g 1 and m g 2 were:
π i , j , g = e x p - 1 2 ( f x i x j - v g ) T Λ g - 1 ( f x i x j - v g ) g = 1 G e x p - 1 2 ( f x i x j - v g ) T Λ g - 1 ( f x i x j - v g )
m g 0 = 1 c X c x i p g i = 1 c X j = 1 c x i ( π i , j , g - p g )
m g 1 = 1 c X c x i p g i = 1 c X j = 1 c x i π i , j , g Λ - 1 / 2 ( f x i x j - v g )
m g 2 = 1 c X c x i 2 p g × i = 1 c X j = 1 c x i π i , j , g Λ - 1 ( f x i x j - v g ) ( f x i x j - v g ) T - I 1
Where 1 denotes the all-1 vector,and the eigenvector required by VX denoted as ϕ ( x ) :
ϕ ( x ) = φ ( F 1 X ) T , , φ ( F h X ) T , , φ ( F H X ) T T
The similarity between the two observation views V X and V Y had been shown as follows:
s ( X , Y ) = - ( 1 25 G H ϕ ( X ) - ϕ ( Y ) 1 )
The retrieved set denoted as R = { V X } X = 1 C R , given a query view VQ and a view signature database D, it could retrieve a set of relatedness views by the nearest neighbor query (Table 4,Figures 59 and 60).
Comparison of positioning accuracy of 7 types of standard scene data sets
Data Appearance reliant (RGB or RGB-D) Depth―Only
Approoach Reconstruction Truth Needed for Relocalization Retrieval
Method Spr[40] C[5] DSc[4] [40] [16] [46] [5] D[40] R VDR
Chess 70.7 94.9 97.4 92.6 96 99.4 99.6 82.7 97.3 99.5
Fire 49.9 73.5 74.3 82.9 90 94.6 94.0 44.7 92.3 97.8
Heads 67.6 48.1 71.7 49.4 56 95.9 89.3 27.0 93.5 98.9
Office 36.6 53.2 71.2 74.9 92 97.0 93.4 65.5 89.7 98.4
Pumpkin 21.3 54.5 53.6 73.7 80 85.1 77.6 15.1 78.3 82.8
Kitchen 29.8 42.2 51.2 71.8 86 89.3 91.1 61.3 87.9 93.7
Stairs 9.2 20.1 4.5 27.8 55 63.4 71.7 13.6 54.8 61.0
Average 40.7 55.2 60.1 67.6 79.3 89.2 88.1 44.3 84.8 90.3
Combine 38.6 55.2 62.5 - - - - 84.8 90.3
The DPP model has been shown as follows:
P L ( C ; R ) = d e t ( L C ) A R , A = k d e t ( L A )
L denoted a symmetric semidefinite similarity matrix indexed by elements of R:
{ L } X , Y = ρ X ρ Y κ e s ( X , Y ) σ ,   1 X ,   Y C R
Where ρ X = e 1 2 s ( X , Q ) ω , then calculated the similarities between two given views VX and VY . The diversity and similarity of querying view VQ could be balanced by the parameters σ , ω and κ .This method adopted macro-scale features and deep learning to obtain good results,but was sophisticated.
6 Summary and prospect
After 30 years of development, the theoretical system of 3d viewpoint research had been further developed. In the early studies, most of methods focused on geometric feature, material, texture, lighting information. Many researchers proposed influential methods and established theoretical basis.
In relatively mature period of viewpoint research, researchers often adopted geometric silhouette, curvature, mesh saliency, and volume feature to describe viewpoint quality. These methods broaden the horizons of viewpoint research.
In recent years, many researchers devoted to adopting the visual perception method in viewpoint research, and combined semantic segmentation approach with viewpoints selection. Several theory introduced visual significance as a measurement. The evaluation results of such methods had been more similar to the feedback effect of human visual mechanism, such as saliency segmentation method and semantic network method, etc. These methods had been more advanced, but with more operating cumbersome in semantic annotation and manual intervention. It remains unresolved questions to realize automatic semantic segmentation and built more perfect visual perception model.
Viewpoint quality research has a great potential in many fields such as computer graphics and virtual vision. Viewpoints selection method could play an important role in intelligent virtual tour, rendering optimization, 3d scene design, 3d model retrieval and other related works such as image modeling. Viewpoint evaluation theory could applied in some advanced questions such as scene understanding, molecular visualization, ray traced optimizing, scientific calculation visualization, and visual servo. In practical application, it could improve the efficiency of virtual scenes analyzing, and optimized the visual effect of 3d games. At present, the viewpoint research had not been deeply developed in machine learning. It could be predicted that the smart combination with artificial intelligence would bring more possibilities and excellent performances for 3d viewpoint research.



Palmer S, Rosch E, Chase P. Canonical perspective and the perception of objects. Attention & Performance, 1981, 9: 135–151


Koenderink J J. An internal representation for solid shape based on the topological properties of the apparent contour. Ablex Publishing Corp, 1987: 257–285


Biederman I. Recognition-by-components: A theory of human image understanding. Psychological Review, 1987, 94(2): 115–147 DOI:10.1037/0033-295x.94.2.115


Blanz V, Tarr M J, Bülthoff H H. What object attributes determine canonical views? Perception, 1999, 28(5): 575–599 DOI:10.1068/p2897


Kamada T, Kawai S. A simple method for computing general position in displaying three-dimensional objects. Computer Vision, Graphics, and Image Processing, 1988, 41(1): 43–56 DOI:10.1016/0734-189x(88)90116-8


Roberts D R, Marshall A D. Viewpoint selection for complete surface coverage of three dimensional objects. In: Proc. of the British Machine Vision Conference, 1998, 2


Plemenos D, Benayada M. Intelligent Display Techniques in Scene Modelling. New Techniques to Automatically Compute Good Views. In: International Conference Graphi. St Petersburg, Russia: 1996


Fleishman S, Cohen-Or D, Lischinski D. Automatic camera placement for image-based modeling. In: Proceedings of Seventh Pacific Conference on Computer Graphics and Applications. Seoul, South Korea: 1999, 12–20 DOI:10.1109/PCCGA.1999.803344


Barral P, Dorme G, Plemenos D. Visual understanding of a scene by automatic movement of a camera, 1999.


Vázquez P P, Feixas M, Sbert M, Heidrich W. Viewpoint Selection Using Viewpoint Entropy. In: Vision Modeling & Visualization Conference. Stuttgart, Germany: 2001, 273–280 DOI:10.1109/MCG.1984.276095


Gooch B, Reinhard E, Moulding C, Shirley P. Artistic Composition for Image Creation. Eurographics. Vienna: Springer Vienna, 2001: 83–88 DOI:10.1007/978-3-7091-6242-2_8


Feixas M. An information theory framework for the study of the complexity of visibility and radiosity in a scene. University of Catalonia, 2002


Page D L, Koschan A F, Sukumar S R, Roui-Abidi B, Abidi M A. Shape analysis algorithm based on information theory. Proceedings 2003 International Conference. Barcelona, Spain: 2003, 1–229 DOI:10.1109/ICIP.2003.1246940


Felzenszwalb P F, Huttenlocher D P. Efficient graph-based image segmentation. International Journal of Computer Vision, 2004, 59(2): 167–181 DOI:10.1023/b:visi.0000022288.19776.77


Plemenos D, Sbert M, Feixas M, Gonzalez F. Viewpoint Quality: Measures and Applications. Girona, France: 2005, 1–8


Katz S, Leifman G, Tal A. Mesh segmentation using feature point and core extraction. The Visual Computer, 2005, 21(8/9/10): 649–658 DOI:10.1007/s00371-005-0344-9


Lee C H, Varshney A, Jacobs D W. Mesh saliency. Los Angeles, California: ACM, 2005: 659–666 DOI:10.1145/1186822.1073244


Polonsky O, Patané G, Biasotti S, Gotsman C, Spagnuolo M. What’s in an image? The Visual Computer, 2005, 21(8–10): 840–847 DOI:10.1007/s00371-005-0326-y


Attene M, Katz S, Mortara M, Patane G, Spagnuolo M, Tal A. Mesh segmentation―A comparative study. In: IEEE International Conference on Shape Modeling and Applications, Matsushima, Japan: 2006, 7 DOI:10.1109/SMI.2006.24


Shilane P, Funkhouser T. Distinctive regions of 3D surfaces. ACM Transactions on Graphics, 2007, 26(2): 7 DOI:10.1145/1243980.1243981


Fu H, Cohen-Or D, Dror G, Sheffer A. Upright orientation of man-made objects. New York, NY, USA: ACM, 2008, 1–7


Vieira T, Bordignon A, Peixoto A, Tavares G, Lopes H, Velho L, Lewiner T. Learning good views through intelligent galleries. Computer Graphics Forum, 2009, 28(2): 717–726 DOI:10.1111/j.1467-8659.2009.01412.x


Laga H. Semantics-driven approach for automatic selection of best views of 3D shapes. In: Proceedings of the 3rd Eurographics Conference on 3d Object Retrieval. Norrk, Sweden: 2010, 15–22


Biedl T C, Hasan M, López-Ortiz A. Efficient view point selection for silhouettes of convex polyhedra. Computational Geometry, 2011, 44(8): 399–408 DOI:10.1016/j.comgeo.2011.04.001


Serin E, Doger C, Balcisoy S. 3D Object Exploration Using Viewpoint and Mesh Saliency Entropies. In: Computer and Information Sciences II. London: Springer London, 2011: 299–305 DOI:10.1007/978-1-4471-2155-8_38


Leifman G, Shtrom E, Tal A. Surface regions of interest for viewpoint selection. In: IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012, 414–421 DOI:10.1109/CVPR.2012.6247703


Serin E, Sumengen S, Balcisoy S. Representational image generation for 3D objects. The Visual Computer, 2013, 29(6–8): 675–684 DOI:10.1007/s00371-013-0805-5


Han H L, Li J, Wang W C, Zhao H W, Hua M. View selection of 3D objects based on saliency segmentation. 2014 International Conference on Virtual Reality and Visualization. Shenyang, China: 2014, 214–219 DOI:10.1109/ICVRV.2014.12


Xing L, Zhang X, Wang C C L, Hui K. Highly Parallel Algorithms for Visual-Perception-Guided Surface Remeshing. IEEE Computer Graphics and Applications, 2014, 34(1): 52–64 DOI:10.1109/MCG.2013.84


Su H, Qi C R, Li Y, Guibas L J. Render for CNN: Viewpoint estimation in images using CNNs trained with rendered 3d model views. In: Proceedings of the IEEE International Conference on Computer Vision, 2015, 2686–2694 DOI:10.1109/ICCV.2015.308


Shi N, Tao T. CNNs based Viewpoint Estimation for Volume Visualization. ACM Transactions on Intelligent Systems and Technology (TIST), 2016, 1(1): 1


Massa F, Marlet R, Aubry M. Crafting a multi-task CNN for viewpoint estimation. British Machine Vision Conference. York, United Kingdom, Machine Vision Association, 2016


Leifman G, Shtrom E, Tal A. Surface regions of interest for viewpoint selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(12): 2544–2556 DOI:10.1109/tpami.2016.2522437


Jeong S W, Sim J Y. Saliency detection for 3D surface geometry using semi-regular meshes. IEEE Transactions on Multimedia, 2017, 19(12): 2692–2705 DOI:10.1109/tmm.2017.2710802


Engler T, Wuensche H J. Recursive 3D scene estimation with multiple camera pairs. In: Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA). Montreal, QC, Canada, 2017, 1–6 DOI:10.1109/IPTA.2017.8310129


Sawhney R, Li F, Christensen H I, Isbell C L. Purely Geometric Scene Association and Retrieval-A Case for Macro Scale 3D Geometry. arXiv preprint arXiv: 1808. 01343, 2018


Secord A, Lu J W, Finkelstein A, Singh M, Nealen A. Perceptual models of viewpoint preference. ACM Transactions on Graphics, 2011, 30(5): 1–12 DOI:10.1145/2019627.2019628


Feldman J, Singh M. Information along contours and object boundaries. Psychological Review, 2005, 112(1): 243–252 DOI:10.1037/0033-295x.112.1.243


Stoev S L, Strasser W. A case study on automatic camera placement and motion for visualizing historical data. In: IEEE Visualization. Boston, MA, USA: IEEE, 2002, 545–548 DOI:10.1109/VISUAL.2002.1183826


Yang Y L, Shen C H. Multi-scale salient features for analyzing 3D shapes. Journal of Computer Science and Technology, 2012, 27(6): 1092–1099 DOI:10.1007/s11390-012-1287-z


Sander F, Krueger F. Gestaltpsychologie und Kunsttheorie: ein Beitrag zur Psychologie architektonischer Gestalten. Beck, 1932


Seligmann D D, Feiner S. Automated generation of intent-based 3D Illustrations. ACM SIGGRAPH Computer Graphics, 1991, 25(4): 123–132 DOI:10.1145/127719.122732


Karp P, Feiner S. Issues in the automated generation of animated presentations. Graphics Interface, 1990, 90: 39–48


Kowalski M A, Hughes J F, Rubin C B, Ohya J. User-guided composition effects for art-based rendering. In: ACM Symposium on Interactive 3D Graphic. New York, NY, USA, ACM, 2001, 99–102 DOI:10.1145/364338.364374


Sokolov D, Plemenos D, Tamine K. Methods and data structures for virtual world exploration. The Visual Computer, 2006, 22(7): 506–516 DOI:10.1007/s00371-006-0025-3


Sokolov D, Plemenos D, Herder J. High level methods for scene exploration. Journal of Virtual Reality and Broadcasting, 2006, 3(12): 1860–2037


Colin C. A System for Exploring the Universe of Polyhedral Shapes. In: Eurographics. Nice, France: 1988


Gao T H, Wang W C, Han H L. Efficient view selection by measuring proxy information. Computer Animation and Virtual Worlds, 2016, 27(3/4): 351–357 DOI:10.1002/cav.1698


Chen X B, Saparov A, Pang B, Funkhouser T. Schelling points on 3D surface meshes. ACM Transactions on Graphics, 2012, 31(4): 1–12 DOI:10.1145/2185520.2185525


Loken B. Perspectives on persuasion and visual information processing. ACR North American Advances, 1984


Kucerova J, Varhanikova I, Cernekova Z. Best view methods suitability for different types of objects. In: Proceedings of the 28th Spring Conference on Computer Graphics. Budmerice, Slovakia: ACM, 2012: 55–61 DOI:10.1145/2448531.2448538


Goldstein E B. Encyclopedia of Perception. SAGE Publications, 2010


Coe M. Human Factors for technical communicators. Wiley Technical Communication Library, 2010


Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254–1259 DOI:10.1109/34.730558


Noser H, Renault O, Thalmann D, Thalmann N M. Navigation for digital actors based on synthetic vision, memory, and learning. Computers & Graphics, 1995, 19(1): 7–19 DOI:10.1016/0097-8493(94)00117-h


Koch C, Ullman S. Shifts in Selective Visual Attention: Towards the Underlying Neural Circuitry. In: Matters of Intelligence. Dordrecht: Springer Netherlands, 1987: 115–141 DOI:10.1007/978-94-009-3833-5_5


Tsotsos J K, Culhane S M, Kei Wai W Y, Lai Y Z, Davis N, Nuflo F. Modeling visual attention via selective tuning. Artificial Intelligence, 1995, 78(1/2): 507–545 DOI:10.1016/0004-3702(95)00025-9


Milanese R, Wechsler H, Gil S, Bost J M, Pun T. Integration of bottom-up and top-down cues for visual attention using non-linear relaxation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: 1994, 781–785 DOI:10.1109/CVPR.1994.323898


Privitera C M, Stark L W. Focused JPEG encoding based upon automatic preidentified regions of interest. SPIE, 1999, 3644 DOI:10.1117/12.348474


Suh B, Ling H, Bederson B B, Jacobs D W. Automatic thumbnail cropping and its effectiveness. In: Proceedings of the 16th annual ACM symposium on User interface software and technology. Vancouver, Canada, ACM, 2003: 95–104 DOI:10.1145/964696.964707


DeCarlo D, Santella A. Stylization and abstraction of photographs. ACM Transactions on Graphics, 2002, 21(3): 769–776 DOI:10.1145/566654.566650


Santella A, DeCarlo D. Visual interest and NPR: an evaluation and manifesto. In: Proceedings of the 3rd international symposium on Non-photorealistic animation and rendering. ACM, 2004: 71–150 DOI:10.1145/987657.987669


Yamauchi H, Saleem W, Yoshizawa S, Karni Z, Belyaev A, Seidel H P. Towards stable and salient multi-view representation of 3D shapes. IEEE International Conference on Shape Modeling and Applications. Matsushima, Japan, 2006, 40 DOI:10.1109/SMI.2006.42


Guy G, Medioni G. Inferring global pereeptual contours from local features. International Journal of Computer Vision, 1996, 20(1/2): 113–133 DOI:10.1007/bf00144119


Sawhney R, Li F, Christensen H I, Isbell C L. Purely geometric scene association and retrieval–A case for macro-scale 3D geometry. arXiv. org, 2018


Frintrop S, Nüchter A, Surmann H. Visual Attention for Object Recognition in Spatial 3D Data. In: Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, 168–182 DOI:10.1007/978-3-540-30572-9_13


Kim Y, Varshney A. Saliency-guided enhancement for volume visualization. IEEE Transactions on Visualization and Computer Graphics, 2006, 12(5): 925–932 DOI:10.1109/tvcg.2006.174


Hisada M, Belyaev A G, Kunii T L. A skeleton-based approach for detection of perceptually salient features on polygonal surfaces. Computer Graphics Forum, 2002, 21(4): 689–700 DOI:10.1111/1467-8659.00627


Heckbert P S, Garland M. Optimal triangulation and quadric-based surface simplification. Computational Geometry, 1999, 14(1–3): 49–65 DOI:10.1016/s0925-7721(99)00030-9


Gȩbal K, Baerentzen J A, Aanaes H, Larsen R. Shape analysis using the auto diffusion function. Computer Graphics Forum, 2009, 28(5): 1405–1413 DOI:10.1111/j.1467-8659.2009.01517.x


Medimegh N, Belaid S, Atri M, Werghi N. 3D mesh watermarking using salient points. Multimedia Tools and Applications, 2018, 77(24): 32287–32309 DOI:10.1007/s11042-018-6252-6


Jeong S, Sim J. Saliency Detection for 3D Surface Geometry Using Semi-regular Meshes. IEEE Transactions on Multimedia, 2017, 19(12): 2692–2705 DOI:10.1109/TMM.2017.2710802


Costa L D F. Visual saliency and attention as random walks on complex networks. arXiv preprint physics/0603025, 2006


Norris J R. Markov Chains. Cambridge: Cambridge University Press, 1997 DOI:10.1017/cbo9780511810633


Lau M, Dev K, Shi W, Dorsey J, Rushmeier H. Tactile mesh saliency. ACM Trans Graph, 2016, 35(4): 1–11 DOI:10.1145/2897824.2925927


Gingold Y, Shamir A, Cohen-Or D. Micro perceptual human computation for visual tasks. ACM Transactions on Graphics, 2012, 31(5): 1–12 DOI:10.1145/2231816.2231817


Wang X, Lindlbauer D, Lessig C, Maertens M, Alexa M. Measuring the visual salience of 3D printed objects. IEEE Computer Graphics and Applications, 2016, 36(4): 46–55 DOI:10.1109/mcg.2016.47


Takahashi S, Fujishiro I, Takeshima Y, Nishita T. A feature-driven approach to locating optimal viewpoints for volume visualization. Proc IEEE Visualization, 2005, 495–502 DOI:10.1109/VIS.2005.4


Mortara M, Spagnuolo M. Semantics-driven best view of 3D shapes. Computers Graphics, 2009, 33(3): 280–290 DOI:10.1016/j.cag.2009.03.003


Simms D J, Milnor J. Morse Theory, Annals of Mathematics Studies 51. In: Proceedings of the Edinburgh Mathematical Society, 1964, 14(1): 84


Mortara M, Patané G, Spagnuolo M, Falcidieno B, Rossignac J. Blowing bubbles for multi-scale analysis and decomposition of triangle meshes. Algorithmica, 2004, 38(1): 227–248 DOI:10.1007/s00453-003-1051-4


Hoppe H. Progressive meshes. In: Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. New Orleans, Louisiana, ACM, 1996, 99–108


Cohen J, Olano M, Manocha D. Appearance-preserving simplification. In: Proceedings of the 25th annual conference on Computer graphics and interactive techniques. ACM, 1998: 115–122 DOI:10.1145/280814.280832


Karni Z, Gotsman C. Spectral compression of mesh geometry. In: Proceedings of the 27th annual conference on Computer graphics and interactive techniques. New York, NY, USA, ACM, 2000: 279–286 DOI:10.1145/344779.344924


Lindstrom P, Turk G. Image-driven simplification. ACM Transactions on Graphics, 2000, 19(3): 204–241 DOI:10.1145/353981.353995


Luebke D, Hallen B. Perceptually Driven Simplification for Interactive Rendering. In: Eurographics. Vienna: Springer Vienna, 2001: 223–234 DOI:10.1007/978-3-7091-6242-2_21


Zhang E, Turk G. Visibility-guided simplification. In: IEEE Visualization. Boston, MA, USA, IEEE, 2002: 267–274 DOI:10.1109/VISUAL.2002.1183784


Erdem U M, Sclaroff S. Automated camera layout to satisfy task-specific and floor plan-specific coverage requirements. Computer Vision and Image Understanding, 2006, 103(3): 156–169 DOI:10.1016/j.cviu.2006.06.005


Yabuta K, Kitazawa H. Optimum camera placement considering camera specification for security monitoring. In: 2008 IEEE International Symposium on Circuits and Systems. Seattle, WA, USA, IEEE, 2008, 2114–2117 DOI:10.1109/ISCAS.2008.4541867


Alam M S, Goodwin S D. Control of Constraint Weights for a 2D Autonomous Camera. In: Advances in Artificial Intelligence. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009: 121–132 DOI:10.1007/978-3-642-01818-3_14


Becker S C, Bove V M. Semiautomatic 3D-model extraction from uncalibrated 2D-camera SPIE, 1995, 2410 DOI:10.1117/12.205979


Yao Y, Chen C H, Koschan A, Abidi M. Adaptive online camera coordination for multi-camera multi-target surveillance. Computer Vision and Image Understanding, 2010, 114(4): 463–474 DOI:10.1016/j.cviu.2010.01.003


Amriki K A, Atrey P K. Bus surveillance: how many and where cameras should be placed. Multimedia Tools and Applications, 2014, 71(3): 1051–1085 DOI:10.1007/s11042-012-1247-1


Kahng J H, Yoon S H. System for Providing Multiple Virtual Reality Views. US Patent: WO2018KR03334, 2017-06-27


Piemonte Patrick S, Pahwa A, Moore C D, Howard J A. Virtual camera for 3D maps. EPO Patent: AU20180264015, 2018-11-13


Vandrotti B, Veldandi M, Lehtiniemi A, Vaquero D A. Auto Scene Adjustments For Multi Camera Virtual Reality Streaming. US Patent: US201715602356, 2017-05-23


Molina H, Anthony M, Srinivasan V, Perez C G, Handa A, Marshall C B. Container-Based Virtual Camera Rotation. US Patent: US201715636359, 2017-06-28


Jeon J W. Method And System For Controlling Virtual Camera In Three-Dimensional Virtual Space, and Computer-Readable Recording Medium. Japan Patent: JP20180103400, 2018-05-30


Barral P, Dorme G, Plemenos D. Visual understanding of a scene by automatic movement of a camera. In: International conference 3IA. Moscow, Russia: 1999, 3–4


Barral P, Dorme G, Plemenos D. Intelligent scene exploration with a camera. In: International Conference 3IA. Limoges, France: 2000


Barral P, Dorme G, Plemenos D. Scene understanding techniques using a virtual camera. In: Eurographics. Interlaken, Switzerland, 2000


Dorme G. Study and implementation of 3D scene understanding techniques. France: University of Limoges, 2001.


Jaubert B, Tamin K, Plemeno D. Techniques for off-line scene exploration using a virtual camera. In: International Conference 3IA. Limoges, France, 2006


Lu M H, Newcomb S E, Maissey B R, De Andrad A J L. Manipulating Virtual Camera Dolly in Multi-Dimensional Space to Produce Visual Effect. US Patent: US201815927823, 2018-03-21


Marchand E, Courty N. Image-based virtual camera motion strategies. In: Fels S, Poulin P, eds. Graphics Interface Conference, Morgan Kaufmann, Montreal, Quebec, 2000, 69–76


Guzman-Rivera A, Kohli P, Glocker B, Shotton J, Sharp T, Fitzgibbon A, Izadi S. Multi-output learning for camera relocalization. 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA, IEEE, 2014: 1114–1121 DOI:10.1109/CVPR.2014.146


Cretu A M, Chagnon-Forget M, Payeur P. Selectively densified 3D object modeling based on regions of interest detection using neural gas networks. Soft Computing, 2017, 21(18): 5443–5455 DOI:10.1007/s00500-016-2132-z


Martinetz T M, Berkovich S G, Schulten K J. "Neural-gas" network for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks, 1993, 4(4): 558–569 DOI:10.1109/72.238311


Monette-Thériault H, Cretu A M, Payeur P. 3D object modeling with neural gas based selective densification of surface meshes. 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC). San Diego, CA, USA, IEEE, 2014: 1354–1359 DOI:10.1109/SMC.2014.6974103


Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6): 84–90 DOI:10.1145/3065386


Peng X, Sun B, Karim A, Kate S. Exploring invariances in deep convolutional neural networks using synthetic images. Eprint Arxiv, 2014, 1278–1286


Su H, Sun M, Li F F, Savarese S. Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. In: IEEE 12th International Conference on Computer Vision. Kyoto, Japan, IEEE, 2009, 213–220 DOI:10.1109/ICCV.2009.5459168


Stark M, Goesele M, Schiele B. Back to the future: learning shape models from 3D cad data. In: BMVC. 2010, 2: 5


Payet N, Todorovic S. From contours to 3D object detection and pose estimation. In: International Conference on Computer Vision. Barcelona, Spain, 2011, 983–990 DOI:10.1109/ICCV.2011.6126342


Fidler S, Dickinson S, Urtasun R. 3d object detection and viewpoint estimation with a deformable 3d cuboid model. In: Advances in Neural Information Processing Systems. 2012, 611–619


Cabral B K, Hsu J, Coward A H. Panoramic virtual reality camera assembly. US Patent: US201629569877F,2016-06-30


Cabral B K, Briggs F S, Hsu J, Pozo A P, Coward A. Three- Dimensional, 360-Degree Virtual Reality Camera System. US Patent: CA20173019786, 2017-01-31


Mottaghi R, Xiang Y, Savarese S. A coarse-to-fine model for 3D pose estimation and sub-category recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA, IEEE, 2015, 418–426 DOI:10.1109/CVPR.2015.7298639


Su H, Qi C R, Li Y Y, Guibas L J. Render for CNN: Viewpoint estimation in images using CNNs trained with rendered 3D model views. In: IEEE International Conference on Computer Vision. Santiago, Chile, IEEE, 2015, 2686–2694 DOI:10.1109/ICCV.2015.308


Wang Y M, Li S Y, Jia M Y, Liang W. Viewpoint estimation for objects with convolutional neural network trained on synthetic images. In: Lecture Notes in Computer Science. Cham, Springer International Publishing, 2016, 169–179 DOI:10.1007/978-3-319-48896-7_17


Lowry S, Sunderhauf N, Newman P, Leonard J J, Cox D, Corke P, Milford M J. Visual place recognition: A survey. IEEE Transactions on Robotics, 2016, 32(1): 1–19 DOI:10.1109/tro.2015.2496823


Papon J, Abramov A, Schoeler M, Wörgötter F. Voxel cloud connectivity segmentation–supervoxels for point clouds. In: IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA, IEEE, 2013, 2027–2034 DOI:10.1109/CVPR.2013.264


Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics, 2015, 31(5): 1147–1163 DOI:10.1109/tro.2015.2463671


Li S D, Calway A. RGBD relocalisation using pairwise geometry and concise key point sets. In: IEEE International Conference on Robotics and Automation (ICRA). Seattle, WA, USA, IEEE, 2015, 6374–6379 DOI:10.1109/ICRA.2015.7140094


Satkin S, Hebert M. 3DNN: viewpoint invariant 3D geometry matching for scene understanding. In: IEEE International Conference on Computer Vision. Sydney, NSW, Australia, IEEE, 2013, 1873–1880 DOI:10.1109/ICCV.2013.235


Valentin J, Nießner M, Shotton J, Fitzgibbon A, Izadi S, Torr P. Exploiting uncertainty in regression forests for accurate camera relocalization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA, IEEE, 2015, 4400–4408 DOI:10.1109/CVPR.2015.7299069


Walch F, Hazirbas C, Leal-Taixé L, Sattler T, Hilsenbeck S, Cremers D. Image-based localization using LSTMs for structured feature correlation. In: IEEE International Conference on Computer Vision (ICCV). Venice, Italy, IEEE, 2017: 627–637 DOI:10.1109/ICCV.2017.75


Yano T, Handa M, Aizawa M, Mizuno S, Tanaka K, Matsushita A, Morisawa K, Komiyama M, Fujii K, Date A. Method and apparatus for generating a virtual image from a viewpoint selected by the user, from a camera array with transmission of foreground and background images at different frame rates. US Patent: AU20170270402, 2016-05-25


Handa M, Aizawa M, Mizuno S, Tanaka K, Matsushita A, Morisawa K, Yano T, Komiyama M, Fujii K, Date A. Method and apparatus for generating a virtual image from a viewpoint selected by the user, from a camera array with daisy chain connection. EPO Patent: AU20170270403, 2016-05-25


Tanaka K, Handa M, Aizawa M, Mizuno S, Matsushita A, Morisawa K, Yano T, Komiyama M, Fujii K, Date A. Method and apparatus for generating a virtual image from a viewpoint selected by the user, from a camera array with default parameters associated to the selected type of sport event. EPO Patent: AU20170270400, 2017-05-22


Kim Y G. Apparatus and method for generating image at any point-view based on virtual camera. Korea Patent: KR20170074440, 2017-06-13


Park J, Jang J. 3 A method and apparatus for setting virtual camera based on branching region and structures for exploring 3D treelike objects. Korea. KR20170095242[P]. 2017-07-27.


Sawhney R, Li F, Christensen H I, Isbell C L. Purely geometric scene association and retrieval–A case for macro-scale 3D geometry. arXiv. org, 2018