Adv Search
Home | Accepted | Article In Press | Current Issue | Archive | Special Issues | Collections | Featured Articles | Statistics

2020,  2 (3):   175 - 212   Published Date:2020-6-20

DOI: 10.1016/j.vrih.2020.05.003
1 Introduction2 MLS system 2.1 System design 2.2 GNSS and IMU 2.2.1   GNSS/IMU integrated navigation 2.3 Laser scanner 2.3.1   SLAM-based navigation 3 Processing of point cloud data 3.1 Point clouds completion 3.1.1   Geometry-based approaches 3.1.2   Data-driven approaches 3.2 Feature extraction (line, plan, and supervoxel) 3.3 Matching and registration 3.4 Semantic labeling and segmentation 3.4.1   Feature-based methods 3.4.2   Deep learning methods 3.5 Object/instance extraction 4 Typical urban modeling applications based on MLS 4.1 Building facet modeling 4.2 High-definition (HD) map 4.2.1   Road surface 4.2.2   Road boundary 4.2.3   Road markings 4.2.4   Traffic signs 4.3 Building information models (BIM) 4.3.1   Linear-primitive 4.3.2   Planar-primitive 4.3.3   Volumetric-primitive 4.3.4   Door and window detection 4.4 Traffic visibility evaluation 5 Future work6 Conclusion


Mobile laser scanning (MLS) systems mainly comprise laser scanners and mobile mapping platforms. Typical MLS systems can acquire three-dimensional point clouds with 1-10cm point spacings at a normal driving or walking speed in streets or indoor environments. The efficiency and stability of these systems make them extremely useful for application in three-dimensional urban modeling. This paper reviews the latest advances of the LiDAR-based mobile mapping system (MMS) point cloud in the field of 3D modeling, including LiDAR simultaneous localization and mapping, point cloud registration, feature extraction, object extraction, semantic segmentation, and processing using deep learning. Furthermore, typical urban modeling applications based on MMS are also discussed.


1 Introduction
Urban 3D modeling is used to establish a 2.5D or 3D digital representation of the earth’s surface and the objects present on it, such as buildings, roads, vegetation, and other manmade attributes in urban areas. There are three major categories of this approach: (1) conventional geodetical mapping techniques, (2) approaches based on 2D image photogrammetry, and (3) approaches based on 3D measurements, such as laser scanning. Although the data acquired are dense, and precision is high, conventional geodetical mapping techniques are time-consuming and show poor mobility. Therefore, this method is not suitable for large-scale mobile mapping tasks. 2D image photogrammetry methods are easy to set up and low-cost, and various deep learning methods can be conveniently integrated with these methods to extract and visualize semantic information. However, these methods are overly sensitive to environmental changes, such as ambient light, weather, and darkness. Moreover, the 3D model built by only using images cannot be directly used for navigation. The modeling methods based on LiDAR are of high precision, exhibit high reliability and are not easily affected by changes in the environment. Unlike the 3D models built based on images, the 3D model built using LiDAR has applications in the field of autonomous driving. Therefore, the methods discussed in this paper are mainly based on LiDAR or other 3D measurement equipment.
The task of large urban area 3D modeling demands high efficiency in data acquisition. The MLS system comprises an MMS equipped with laser scanners. The technology of MMS facilitates efficient 3D modeling. Mobile mapping is a system technology that enables the installation of photogrammetry sensors on mobile platforms with high-precision, high-efficiency georeferencing capabilities. MMSs can efficiently collect georeferenced three-dimensional measurements of the environment when the platform is in motion. Successful MMSs include the VISAT[1] system from the University of Calgary, Canada, the GPSVan[2] developed by the Ohio State University, and the LD2000[3] that is developed by the Wuhan University. At present, a typical MLS can collect 1 million points per second, which means that it can cover a road and its surrounding surface with a point density of 2000 points per square meter, 1-10cm point spacings, and moving speed of 10-110km/h.
MLS point clouds are large-volume and have heavy redundancy and irregular distributions[4]. In addition, the quality of the point cloud is degraded if noise and occlusion are present. Consequently, MLS point cloud processing is a challenging task in urban 3D modeling. Standard point cloud processing involves aspects such as feature point extraction, matching, and registration, object detection, semantic segmentation, and simultaneous localization and mapping (SLAM).
This paper presents a review of the MLS solutions in urban 3D modeling, as depicted in Figure 1. The rest of the paper is organized as follows. Section 2 reviews the MLS technology. Section 3 provides the discussion on the processing of MLS point cloud, and Section 4 presents the typical urban modeling applications based on MLS.
2 MLS system
In this section, we first introduce the system design and the important sensors of the MLS. Among the MLS sensors, the global navigation satellite system (GNSS) and inertial measurement unit (IMU) are the key components of MLS for navigation. However, LiDAR plays a significant role in GNSS-denied environments.
2.1 System design
The MLS system is an MMS equipped with laser scanners. As shown in Figure 2, MLS systems usually consist of GNSS receivers, laser scanners, digital cameras, IMUs, and other devices. Synchronization of the data from the abovementioned sensors to a time frame of reference is achieved via precise timestamping[5]. Methods for calculating ground coordinates for objects from the laser scanning system have been well reported by various studies[6]. One such method involves the combining of the measurements obtained from the integrated GNSS/INS navigation system, laser scanner, and sensor calibration parameters.
2.2 GNSS and IMU
The MLS systems perform the survey by the ground vehicles. In MLS, the navigation system, which includes a global navigation satellite system (GNSS) and an inertial measurement unit (IMU), provides the vehicle's trajectory and attitude for generating the georeferenced 3D point clouds (Figure 3)[4]. The relative precision of the point can be lower than in the order of a subcentimeter, and its absolute accuracy depends on the above GNSS-IMU-integrated navigation solution.
2.2.1   GNSS/IMU integrated navigation
GNSS provides geographical position and velocity data of a GNSS receiver antenna by employing a constellation of orbiting satellites. The most popular GNSS systems include the global positioning system (GPS) (United States), global navigation satellite system (GLONASS) (Russia), COMPASS/BeiDou navigation system (BDS) (China), and Galileo (European Union).
The position measurement is computed by triangulating the satellite signals within a clear view of the receiver antenna. Generally, there must be four satellites visible for a positional fix, as shown in Figure 4, and the accuracy of the GNSS ideally increases as more satellites become available. However, there are some common error sources, for example, receiver noise, atmospheric delays, multipath, and satellite clock timing, which result in the GNSS receivers usually having a positioning accuracy of 1-2m. Obstructions such as buildings or trees can block the satellite signal, which results in unreliable navigation. Some methods such as post-processing, precise point positioning (PPP), and real-time kinematic (RTK)[7] have been proposed to improve the accuracy of GNSS.
An inertial navigation system (INS) computes the relative position of an object over time using rotation and acceleration measurements from an IMU, which can measure the relative movement in 3D space. An IMU contains six complementary sensors, which are arrayed on three orthogonal axes. An accelerometer and a gyroscope, which measure linear and rotational acceleration, respectively, are coupled on each of the three axes. Based on the linear acceleration and rotational acceleration measurements, the INS can calculate the position and velocity for all the three axes. In addition to this, the IMU can provide an angular solution, which can be translated into a local attitude (roll, pitch, and azimuth) solution in INS[8].
When using an IMU for navigating in a 3D space, hundreds/thousands of samples are acquired per second, and consequently, many errors get accumulated. Thus, without an external reference, an uncorrected INS system can quickly drift from the true position. The INS can estimate the error of the IMU measurements using a mathematical filter if an external reference is provided by the GNSS. The GNSS provides an absolute set of coordinates that are used as the starting points and continuous positions and velocities for updating the INS filter estimates. The integration of the GNSS and INS therefore enhances the overall performance in providing a more powerful navigation solution. For example, the INS system can be effectively used for navigating for longer periods when the GNSS is unreliable due to signal obstructions.
2.3 Laser scanner
In the MLS system, the point cloud is generated by the laser scanners, also known as LiDAR, which can estimate the distance of an object by emitting laser lights and measuring the time required by the light to returns to the sensor. 3D LiDARs can be used for plane mapping, obstacle avoidance during navigation, and urban area modeling. LiDARs are mainly used in outdoor environments, especially in fields such as geodesy, meteorology, geology, and military.
Usually, the optical pulse or the wave can only be used to measure the distance in a specific direction. LiDARs normally include oscillating mirrors, which can perform scanning in multiple directions. According to the specific oscillation mechanism, LiDARs can scan the surrounding environment in both 2D and 3D.
A rotating LiDAR has a 360° view. With each rotation, it can scan points along a cone originating from the sensor, thereby resulting in a single circular scan line. This cone angle is varied by a predefined amount after each full rotation, with a maximum absolute angle such that the sensor is unable to scan the area directly above or under it.
The Velodyne VLP-16 and HDL-32 are the most affordable commercial multi-beam sensors, and their main specifications are provided in Table 1. VLP-16 is more compact and lightweight as compared to HDL-32, which has a relatively higher cost and better scanning effect.
Manufacturer specifications (VLP-16 and HDL-32 sensors)[9]
VLP-16 HDL-32
Laser/detector pairs 16 32
Range 1m to 100m 1m to 70m
Accuracy ±3cm ±2cm
Data Distance/calibrated reflectivity Distance/calibrated reflectivity
Data rate 300000 points/s 700000 points/s
Vertical FOV 30°:[-15°, +15°] 41.3°:[-30.67°, +10.67°]
Vertical Resolution 2.0° 1.33°
Horizontal FOV 360° 360°
Horizontal Resolution 0.1° to 0.4° (programmable) 0.08° to 0.35° (programmable)
Size 103mm×72mm 85.3mm×149.9mm
Weight 0.83Kg 1.3Kg
The main advantages of LiDAR are as follows:
(1) Different types of LiDARs can provide different measurement ranges, from a few centimeters to more than 100 meters. Therefore, it can be used in both indoor and outdoor environments.
(2) The horizontal aperture of LiDAR is usually between 90 and 360 degrees.
(3) The angle resolution of LiDAR is usually less than 1 degree.
(4) The measurement error of LiDAR is low and is a usually constant (for a short distance) or is linear with the measuring distance.
(5) LiDAR can provide both medium and high sampling rates, which is crucial for application in a dynamic environment; the sampling rate is usually adjustable from 10Hz to 20Hz.
The main disadvantage of LiDAR is that it is expensive. In addition, LiDAR’s power consumption is high (more than ten times of the camera), and its scanning performance declines in the presence of fog, rainstorm, or dust.
2.3.1   SLAM-based navigation
The visibility of the ground receiver to the GNSS satellite is the main reason for the high accuracy achieved in GNSS positioning. However, GNSS signals are vulnerable to external interference and can lead to failure when the platform is in a complex environment such as in the case of high-rise buildings, steep slopes, or indoor environments. The accuracy of positioning will be also be consequently degraded. Therefore, alternative techniques must be developed to address the aforementioned issue. SLAM is arguably one of the most important algorithms used in robotics and 3D vision, which is also applicable in GNSS-denied environments.
LiDAR has been an important sensor in robot navigation for obstacle avoidance and path planning. Meanwhile, LiDAR-based SLAM methods, such as feature-based registration method[10], iterative closest point[11] (ICP), and normal distribution transform[12] (NDT), have been proposed for estimating the transformation between two sets of overlapped point clouds.
A feature-based registration method is commonly used for initial transformation estimation between the two point clouds. This type of method first finds the key features in the two point clouds. Next, it computes the descriptors for these key features to perform image matching. Finally, it calculates the transformation matrix between the corresponding key features.
ICP converges to a local minima by minimizing the squared error, and can therefore be categorized as: point-to-point, point-to-plane, and plane-to-plane ICP. For point-to-point ICP, the correspondence pairs are built by pairing each point in the first point cloud with the closest point in the second point cloud. Subsequently, in each correspondence pair, the transformation between the two point clouds is computed by minimizing the sum of the squared distance between the two points.
NDT employs statistical models of the points to estimate the possible alignment between the two point clouds.
LiDAR SLAM may fail to effectively work in situations where sparseness of LiDAR point clouds exists. The integration of camera and LiDAR can improve the performance[10,13]. Camera-based visual odometry can provide initial estimation for ICP and correct the motion distortion of the point clouds caused by the different receiving times of the points. Scherer et al. estimated the ego-motion of the system by the integration of images and IMU data, and then refined the ego-motion estimation by LiDAR data[14]. Droeschel et al. developed a 3D multi-resolution map for robot navigation by fusing LiDAR data with a 3D map[15].
As the sensor scans its surroundings, the platform may move and rotate. Let us consider an extremum example as follows: if the platform counter-rotates at the same angular velocity as its rotating-scanner, all the points will be located on the same vertical plane in the world frame. The full scan from the sensor frame can’t be accurately map to the world frame with a single affine transform. This is because each point is taken at a different moment in time, and thus, each point has its own frame in relation to the world frame. Precise robot poses at various times during the scanning process allows for the correction of distortion by associating a different affine mapping from the sensor frame to world frame for each group of points acquired.
Monocular visual odometry (VO) has been well explored in this area for several years, and there are some robust and mature solutions that exist, such as MonoSLAM[16], ORB-SLAM[17], and SVO[18]. For example, graph-based optimization and loop closure can be applied for visual SLAM methods, such as RTAB-Map method[19,20]. However, visual SLAM methods are limited under dynamic weather and insufficient lighting conditions. To improve accuracy and robustness under such conditions, some studies have proposed the combination of additional cameras to this system. There are also some visual solutions that have been integrated with the inertial measurement unit (IMU), such as Geneva's work[21], AbolDeepIO[22], VINS[23], as well as to its advanced version—VINS-Fusion[24]. VINS-Fusion fuses local states (camera, IMU, LiDAR, etc.) with global sensors (GPS, magnetometer, barometer, etc.) and achieves globally drift-free and locally accurate pose estimation. The fusion of local estimations from the existing VO/VIO approaches and global sensors is depicted in Figure 5.
Owing to the demands of high accuracy maps in autonomous driving, robustness of methods in dynamic environments, and dense point cloud data, LiDAR-based SLAM has always been a widely investigated technique in the field of autonomous driving. Furthermore, the price of multi-beam LiDARs has dropped significantly in the recent years. Several researchers have also investigated the integration of LiDARs with other sensors (for example, cameras, IMU, etc.) (Figure 6).
A straight-forward solution for the integration of lasers with cameras is the use of the VO result as an initial guess for the ICP or GICP[25] pipeline, as was demonstrated in the work of Pandey[26] and Zhang[13]. Zhang combined visual odometry and LiDAR odometry for the mapping tasks. Some methods also exist that treat color information as the fourth channel of a 3D point for the subsequent ICP pipelines[27,28,29]. Another way to fuse the information from cameras and lasers is by using the LiDAR information to enhance the visual features. Graeter et al. proposed LIMO, which can track camera features and estimate camera motion based on the LIDAR point clouds[30].
Furthermore, there are studies that have focused on LiDAR-IMU fusion, which is a topic that remains to be sufficiently investigated[10,31]. Ye et al. introduced a tightly coupled LiDAR-IMU fusion method by jointly minimizing the cost derived from the LiDAR and IMU measurements[32]. Geneva et al. presented the LIPS, which is a singularity free plane factor that leverages the closest point plane, by fusing with IMU in a graph-based optimization framework[21].
3 Processing of point cloud data
Several studies have focused on the processing of point cloud data. There are five categories of these methods, namely, feature extraction, registration, completion, semantic segmentation, and object/instance extraction. We will discuss these methods in detail in this section.
3.1 Point clouds completion
By increasing the popularity of data acquisition devices, such as laser scanners and RGB-D cameras, even complicated objects can be digitized with impressive accuracy. Given different digitizing technologies, there are still several limitations pertaining to environmental conditions, inter-object occlusion, and sensor capabilities that constrain the full effectiveness of scene depth captured by a mobile laser scanner. Incomplete data will bring uncertainty to subsequent processing. To avoid this, we must have a corresponding complete version of the data. For simple data acquisition, we can re-scan to obtain nre data. However, sometimes, obtaining a full version of the 3D data by re-scanning can be challenging due to occlusion caused by objects or inaccessibility of the scanning to the observation area by the scanning device, and therefore, we need to complete the data manually or automatically.
This has created an area of completing the missing 3D information of an MLS data or other forms of 3D data. Existing methods for 3D data completion are categorized into geometry-based, data-driven, and learning-based approaches.
3.1.1   Geometry-based approaches
Geometry-based approaches estimate shapes using geometric cues from the input, where the missing regions are inferred from the observation areas. These approaches are effective in completing small holes and regular shapes within a reasonable time cost.
Many previous works on surface reconstruction have reported the generation of smooth interpolations to fill holes in locally incomplete scans. Their superior performance of surface reconstruction always relies on the type of environment the MLS data represents. The most common scenario is the traffic scene, which is also easy to reconstruct road surface.
A road surface reconstruction method was proposed to process the raw data and produce a 3D model while ensuring that the details are preserved[33]. Another method[34] was used to recognize the curbs while reconstructing missing information caused by occlusion; it also reconstructed road surfaces and pavements with centimeter precision while reconstructing the missing information of the curbs. For indoor mapping, an incremental surface growing-based method[35] was proposed to create the triangular mesh and fill the holes with sizeable noisy LiDAR data from an indoor environment.
Some other methods were proposed to reconstruct the surface in various ways; some reconstructed the operators for surface approximation[36,37]; others provided algorithms to fill the holes on the surface[38,39,40]. Road surface reconstruction approaches, however, fail when the surface of the object is severely damaged due to occlusion.
Symmetry is a common characteristic of real-world objects like buildings. Symmetry is commonly exploited to analyze and process the computational representations of most 3D objects from the real world. Symmetry-based methods identify repeating structures and symmetry axes to duplicate parts to incomplete regions.
Some studies have focused on small objects, such as household objects[41,42,43], and others on large-sized objects, such as buildings[44]. Most of these objects are not symmetrical, and only parts of these objects are symmetrical. For these kinds of 3D objects, some methods[45,46,47] have been proposed to implement symmetry-based completion of the entire object. Thrun et al. described a technique for segmenting objects into parts characterized by different symmetries and used these parts to map the partial 3D shape model into an occluded space[45]. Another general approach[46] was proposed to efficiently extract a compact representation of the Euclidean symmetries of the object to capture essential and high-level information about the object, and in turn, enable further processing, including shape symmetrization and segmentation.
Regularity-based completion approaches are widely used for completing 3D building models[48,49,50] as they are one of the most regular objects in the real world. These methods can complete data using various regularity principles, such as by performing Fourier analysis of each scanline to fill the holes and generating meshes[49], or merely exploiting the large-scale repetitions found in building scans and subsequently using it to remodel the input[48].
3.1.2   Data-driven approaches
Considering that the generation of perfectly precise and complete data could be challenging, data-driven approaches can handle complete shapes by matching the incomplete object with the template models present in template shape databases. The main rationale of this type of approaches is to retrieve a 3D model that is most similar to that of the input query, which can be done in the case of single objects, such as vehicles and furniture, but not when large objects such as buildings, are involved.
Two methods[51,52] along with the datasets of thousands of models, are provided for 3D shape retrieval, where the defective scanned data are replaced with the retrieved model. The replacement is done via a 3D indoor reconstruction by classifying each object in the scene and replacing the mutilated object with a complete one from the dataset[53].
Some studies also deem that simply replacing the incomplete 3D object with a complete one can lead to inaccuracies in the final reconstructed model, and they suggest that the 3D shape be completed by retrieving and assembling all the object parts[51,54,55,56].
3.2 Feature extraction (line, plan, and supervoxel)
The method of efficiently processing massive and complex point cloud data is a challenge. There are two main methods for this purpose. The first method projects the high-density point cloud data into 2D images, and then implements the image processing techniques[57,58,59]. In the other methods, the point cloud data are processed in the feature space. Line and plane features contain abundant geometric information of point clouds, especially in artificial environments. These features are generally parallel, orthogonal, or coplanar, which can effectively reduce the complexity of point clouds without losing their main geometric information. Therefore, line and plane extraction are widely used in target recognition[60], point cloud registration[61], reconstruction[10,13], and so on.
Line extraction can be classified into two categories. In the first category, the real-world object is projected into 2D images, then LSD[62] or EDLines[63] are used to extract lines from these images, and finally these lines are back-projected into a 3D space to obtain the 3D lines. Jain et al. extracted the straight lines of a scene from the multi-view images of the same scene, and then returned these lines to a 3D space based on the visual information, and finally obtained the 3D straight lines[64]. Lin et al. proposed a line-half-planes (LHP) model to extract 2D lines by projecting 3D point clouds onto multi-view images and then obtaining 3D lines by projecting 2D lines back into the 3D space[58]. The advantage of projecting point clouds into images is that the existing 2D line extraction algorithms can be fully utilized. Additionally, the disadvantage is that large-scale point clouds take a considerable amount of time to process. There are several studies that can directly extract line features on point clouds. Daniels et al. used robust moving least-squares method to fit the surface locally and then calculated a set of smooth curves aligned along the edge to identify the line features in the point cloud; finally, they were able to produce a set of complete smooth feature curves[65]. Kim et al. used a moving least-squares approximation to estimate the local curvatures and their derivatives at a point using an approximating surface[66]. Lin et al. presented a facet segmentation-based line segmentation method, which can be directly used on the point cloud[67]. This method can extract completer and more precise line segments compared to the abovementioned methods[58].
Several different algorithms have been proposed for plane extraction from 3D point clouds. Traditional plane extraction techniques can be generally categorized into region-growing[68,69,70], Hough transform[71,72], and model-fitting methods[73,74,75]. However, these methods do not fully employ the geometric constraints of the point clouds. Lin et al. proposed a method based on energy minimal to reconstruct the planes, thereby leveraging a constraint model that requires minimal prior knowledge to implicitly establish relationships among the planes[76]. To balance between high-accuracy and high-efficiency, El-Sayed et al. proposed a plane detection method based on octree-balanced density down-sampling and adaptive plane extraction[77]. Nguyen et al. utilized scan profile patterns and the planarity values between different neighboring scan profiles to detect and segment planar features in sparse and heterogeneous MLS point clouds[78]. Kwon et al. proposed a plane extraction algorithm involving various stages including decomposition, expansion and merging; furthermore, the algorithm works effectively even in the case of low-density point clouds as the expansion stage is included between the conventional decomposing and merging stages[79].
Line and plane extraction are based on point-wise processing. To process point cloud faster, supervoxels were proposed. Supervoxels, an analog of superpixels in the 3D domain, is a promising alternative by which redundancy in the information can be markedly reduced, thereby enabling computational efficiency for fully automatic operation, with a minimal loss of information. Using supervoxels, a point cloud is divided into several patches and then processed in a patch-wise manner, rather than a point-wise manner. Voxel cloud connectivity segmentation (VCCS) is a commonly used supervoxel generation method[80,81]. Lin et al. formalized the supervoxel segmentation problem as a subset selection problem optimized efficiently by a heuristic method utilizing local information for each point[82]. Zai et al.[83] proposed an improved supervoxel algorithm to generate supervoxels with adaptive sizes inspired by the point cloud segmentation method[67]. Wang et al. proposed an efficient 3D object detection method by integrating the supervoxel algorithm with a Hough forest framework[84].
3.3 Matching and registration
3D point cloud registration, a key issue in 3D data processing, is usually considered as rigid registration and urban 3D reconstruction, which can be solved using transforming parameters with six degrees of freedom (6DoF). To this end, numerous related methods have been proposed having a variety of applications.
The ICP[11] algorithm alternates between the estimation of the point correspondence and that of the transformation matrix (Figure 7). Many variations of this method[85,86,87,88] have been proposed in the literature. However, ICP suffers from certain limitations, such as (1) explicit estimation of closest point correspondences, which leads to quadratic complexity scaling of the points, (2) sensitivity to initialization, and (3) difficulty of integration with deep learning frameworks owing to the issue of differentiability. The abovementioned methods cannot guarantee the global optimality of the solutions. Therefore, several researchers have focused on optimization algorithms to estimate relative transformations[89,90,91].
Pioneering studies on handcrafted 3D feature descriptors were mostly inspired by their 2D counterparts. Many approaches including SHOT[92], RoPS[93], TOLDI[94], FPFH[95], and ACOV[96] estimate the unique local reference frame (LRF), which is not robust to noise. Therefore, MLS large-scale point clouds are not ideal for adoption. With the development of deep-learning methods in the domain of geometric representation of 3D data, learned-based 3D local feature descriptors are being often applied to point cloud registration. Some studies have[97,98,99,100,101,102,103,104] focused on learning local features with robustness and then extracting matching correspondences using strategies such as RANSAC; finally, the extracting correspondences are used to estimate the transformation matrix. Some other studies[105,106,107,108] have focused on constructing a local feature learning method that is end-to-end and network based to achieve point cloud registration. Whereas, other studies have proposed the use of the global information to regress rotational transformation matrices and translation vectors[109,110].
Following RANSAC, Aiger et al. proposed a randomized alignment approach, which uses planar congruent sets to compute optimal global rigid transformation[111,112]. However, these RANSAC-like methods are point-level operations, which may easily be sub-optimal when computing transformation.
3.4 Semantic labeling and segmentation
Semantic labeling and segmentation of point cloud entails understanding and recognizing the meaningful entities in a scene by assigning each point to an entity. Examples of entities in an urban scene may include sky, buildings, facades, roads, windows, doors poles, and pedestrians. In this section, we review the classification and semantic segmentation methods that focus on terrestrial laser scanning (TLS) and MLS of the point cloud. Notably, comprehensive literature can be found on terrestrial mobile laser scan processing, covering semantic segmentation, feature extraction, and object recognition in Che’s work[113].
3.4.1   Feature-based methods
Feature-based methods label each point in the point cloud by extracting and joining the features to form a vector. A trained classifier is then employed to perform labelling. Hackel et al. reduced the computation time and also addressed the challenge of varying densities in a point cloud by handling the strong varying densities of TLS points[114]. The TLS and MLS point clouds are constituted of millions of points and as such, labeling each point is computationally intensive. Weinmann et al. improved classification results by using five different definitions of the neighborhood when selecting the optimized features in the feature extraction process[115]. Hu et al. used gridded segmentation to address the computational challenges and achieved good segmentation results without relying on computationally expensive representations of the scene[116]. Segmentation and classification were simultaneously conducted in Zhao’s work[117], where each segment was classified using its geometric properties and the homogeneity of each segment was evaluated using the object class. Spatial smoothing of neighboring elements can lead to improvements in the segmentation results. Probabilistic models, for example, the Markov random field (MRF) and conditional random field (CRF), are used for this purpose. Lu et al. assigned semantic labels to each point by calculating the node potentials and edge potentials using the distance between the points, and the contextual relationships between the points were given by the MRF[118]. Another network[119] was proposed that employed CRFs to propagate contextual information between neighboring entities. They performed discrete, multi-label classification by learning high-dimensional parameters of CRFs, and the higher-order models were found to be robust in preserving salient labels.
Previously, handcrafted features were primarily used for visualization tasks. The handcrafted features were designed to be in variance to certain transformations; however, they are usually geared toward a specific task and require a considerable amount of human intervention. Feature-based methods in this category heavily relied on handcrafted features that have since been outperformed by semantic features.
3.4.2   Deep learning methods
Deep learning techniques learn features that can be applied in multiple tasks and the learning happens in an end-to-end manner, thereby requiring little human intervention. Convolutional neural networks (CNNs) have proven to be effective in data formats that have regular formats like the grid-like structure of pixels in 2D images. However, deploying CNNs directly on point clouds is challenging. Hence, it is an active and ongoing research area. Point clouds are irregular and as such, the segmentation of points has taken the following directions.
In general, deep learning methods in 3D can be categorized into Volumetric CNN, Multiview CNN, and Point-based methods corresponding to the popular 3D data representations of Volumetric, Multiview images, and Point Cloud, respectively.
Volumetric CNNs operate on volumetric data, which is often represented as a 3D binary voxel grid. 3D-ShapeNets[120] represent a 3D shapes on shapeNets as a probability distribution of binary variables on a 3D grid. The voxel grid makes it possible to apply 3D convolution operation. In Charles’s work[121], they proposed a model for predicting objects from partial sub-volumes by addressing the issue of overfitting using auxiliary training tasks, and they also proposed another model for convolving the 3D shapes with the anisotropic probing kernel. In addition, the VoxelNet[122] was used as a 3D CNN on voxels for real-time object recognition. VoxelNet incorporates normal vectors of the object surfaces to the voxels to improve the discrimination capability. Although techniques based on volumetric CNNs have reliable performances, they suffer from limitations such as the introduction of quantization artifacts, high memory consumption and computational cost owing to the sparsity of the occupancy grid.
Projection of 3D point cloud to the 2D grid is done to leverage the high performance of 2D segmentation algorithms by rendering the 3D data in 2D. These techniques are based on the traditional CNN that operates on 2D images. These techniques can map the 3D object into a collection of 2D images of the object taken from different angles. Compared to their volumetric counterparts, multi-view CNNs have superior performance as multi-view images contain richer information as compared to their 3D voxels counterparts. Su et al. conducted the first study on multiview CNNs for object recognition and achieved state-of-the-art accuracy[123]. Leng et al. proposed a stacked local convolutional autoencoder (SLCAE) for the 3D object retrieval task[124]. In Tosteberg’s work[125], 3D point clouds were projected onto a 2D image, and the image was semantically segmented using a 2D semantic classifier. This operation leads to a loss of valuable information in the transformation of 3D to 2D because the former is richer in content (or, depth information). In Wu’s work[126], the spherical projection was used in a pipeline containing 2D CNNs and CRFs to project the point clouds into a 2D grid. The CNN of the pipeline performs segmentation and the CRF refines it. “Auto-labeling” is an approach of transferring high-quality image-based semantic segmentation from reference cameras to point clouds[127]. A fully convolutional neural (FCN) network was used in pixel-wise semantic segmentation of roads from the top view images of the point cloud[128]. Lawin et al. employed a similar approach and even went further to investigate the significance of the surface normal, depth, and color on the architecture[129]. The main drawback of the aforementioned methods lies in the information loss that occurs during the 3D-to-2D projection process.
Direct processing[130,131] of the 3D point cloud is also very popular. Point-based methods were pioneered by PointNet[132]. Because a point cloud is unstructured, irregular, and unordered, it is often converted into volumetric shapes and multi-view images, which are then processed using volumetric CNNs and multi-view CNNs, respectively. However, many methods exist that can be applied directly on the point clouds in an end-to-end manner using a combination of symmetric functions. These symmetric functions are composed of a multilayer perceptron that is shared by all the input points and the global feature is extracted using the maxpooling function, which is also a symmetric function. PointNet++[133] extended the PointNet to include local dependency by applying PointNet hierarchically on local regions. Several other methods[134,135,136] were introduced to improve the local dependency computations. PointCNN[135] applies X-transformations on local regions before applying PointNet-like MLPs. VoxelNet[134] processed the point clouds directly to achieve object detection by dividing the provided input into voxels, and using the points in each voxel to compute the feature vectors for the voxels; this process is applied hierarchically in stacked voxel feature encoding layers. Notably, region proposals are also used for object detection. DGCNN[136] presented a point cloud in the form of a graph where each point is represented as a node connected by a directed graph to its neighboring points and a convolution-like operation, EdgeConv, is implemented on the neighboring pairs of the points to exploit the local geometry. Huang et al. proposed a multi-scale feature extraction method that embeds local features into a low-dimensional and robust subspace[137]. SEGCloud[138] transformed the point cloud to voxels because the former has a regular structure and therefore, the CNNs can be deployed on them. The architecture combines 3D-FCN, trilinear interpolation, and CRF to label the 3D point clouds. The processing of urban-scale voxels is computationally intensive. Semantic 3D net[139] is a large-scale benchmark of labeled TLS points that is essential in urban-scale classification and segmentation tasks. OctNet[140] trained a network on different resolutions of voxels to address resolution and computation challenges to segment 3D-colored point clouds in the RueMonge2014 dataset[141] of Haussmanian-style facades into the window, wall, balcony, door, roof, sky, and shop. Engelmann et al.[142] built its framework upon PointNet[132] by enlarging its receptive field to cater for urban-scale scenes. Landrieu et al. presented an architecture that directly addresses the challenge of semantic segmentation of urban-scale scenes by encoding contextual relationships between object parts in the 3D point cloud[143]. The network first partitions the point cloud into simple shapes called “superpoints” that are then embedded using PointNet[132] for onward segmentation. The superpoints enable the segmentation of large-scale scenes. Xu et al. presented a supervised classification method for LiDAR point cloud semantic labeling[144].
There are few annotated large-scale datasets because the manual point-wise labeling of points is time-consuming and demands great effort. This is the major challenge in large-scale classification and semantic segmentation of point clouds because the tasks are mostly supervised. This is an active and ongoing research field. The task of labeling and segmenting urban scenes is an active research area, especially with the advent of deep learning technologies. Its major challenges are already scaling the existing algorithms or generating novel pipelines to cater to large-scale scenes and lack of detailed annotated datasets to serve as benchmarks for classification and segmentation tasks. Currently, deep learning techniques on point cloud are becoming increasingly popular owing to an increase in the popularity of laser scanners and the fact that they require less preprocessing as compared to both multi-view and volumetric CNNs. Point-based 3D deep learning methods and other deep learning methods applicable on other unstructured data such as social networks, are becoming increasingly popular under the term 'Geometric deep learning' introduced in LeCun's work[145].
3.5 Object/instance extraction
3D object detection is crucial for several real-world applications, such as robotics, autonomous driving, and augmented/virtual Reality. It locates and recognizes objects in 3D scenes by estimating oriented 3D bounding boxes and semantic labels of the objects from their point clouds.
Range scans involve the use of the spatial coordinates of the 3D point cloud, and thus, they have an advantage over camera images in locating the detected objects. Furthermore, point clouds are robust to changes in illumination. In addition, compared with image detection, object detection in point cloud naturally locates an object in 3D and provides crucial information for use in subsequent tasks, such as in navigation. However, unlike images, 3D point clouds are sparse and have inconsistent point densities owing to the non-uniform sampling in 3D space, limited sensor ranges, and presence of occlusions. Thus, detecting objects from their point clouds continues to be a huge challenge.
Existing object detection methods for point clouds are mainly divided into three categories as follows: (1) Projection-based methods, which project the point clouds into multiple perspective views, and then apply image-based object detection methods. (2) Voxelization-based methods, which rasterize the point clouds into a 3D voxel grid and then transform them into regular tensors. (3) Direct methods, which project the point clouds and predict the bounding boxes directly without further processing.
Projection-based methods project point clouds into perspective views and apply image-based techniques, which may sacrifice critical geometric details[146]. Alejandro et al. developed a multi-cue, multimodal, and multi-view framework for pedestrian detection with handcrafted features and a random forest classifier, which increases the accuracy by a comparatively large margin[147]. Li et al. presented 3D point clouds in a 2D point map and then used a fully convolutional network to simultaneously predict the confidence of objects detected and bounding boxes[148]. Chen et al. formulated an object detection problem as minimizing an energy function encoding object size prior, ground plane, and several depths informed features such as point cloud densities, and distance to the ground etc[149]. Yang et al. proposed a proposal-free, single-stage 3D object detector, called PIXOR, that estimates the oriented 3D objects from pixel-wise neural network predictions on point clouds[150].
Voxelization-based methods grid irregular point clouds to 3D voxels, and then apply 3D CNN for object detection. These methods fail to leverage data sparsity and suffer from high time cost due to 3D convolution operations. Dominic et al. proposed an efficient and effective framework to apply the sliding window approach on a 3D point cloud for object detection[151]. They demonstrated that exhaustive window searching in 3D can efficiently exploit the sparsity problem. They proved the mathematical equivalence between sparse convolution and voting. Martin et al. detected 3D objects in point clouds using CNNs constructed from sparse convolutional layers[152]. Chen et al. proposed multi-view 3D networks (MV3D) by using both LiDAR point clouds and images to predict oriented 3D bounding boxes[153]. Li et al. proposed a 3D fully convolutional network for object detection in a point cloud[154]. Zhou et al. proposed a 3D detection network, called VoxelNet, by integrating feature extraction and bounding box prediction into an end-to-end deep network[134]. Daniel et al. presented a method for detecting small and potentially obscure obstacles in vegetated terrain[155]. The novelty of this method is the coupling of a volumetric occupancy map with a 3D CNN, which allows for the training of an efficient and highly accurate framework for detection tasks from raw occupancy data.
Recently, many approaches have been designed to operate on raw point clouds and predict bounding box directly without other processing. Shi et al. proposed PointRCNN for 3D object detection from a point cloud by using the bottom-up 3D proposal generation and refinement in canonical coordinates[156]. Charles et al. introduced VoteNet, which “votes” for object centroids directly from point clouds and aggregates votes to generate high-quality object proposals by local geometry[157]. Alex et al. proposed PointPillars, a method for object detection in 3D that enables end-to-end learning with only 2D convolutional layers[158]. PointPillars uses a novel encoder that learns features on vertical columns (pillars) of the point cloud to predict 3D-oriented boxes for objects.
In summary, with the evolution of deep learning architectures suited for point clouds, 3D object detection plays a key role in point cloud processing. However, direct detection of 3D objects in the raw point cloud is still a problematic issue and worthy of future research.
4 Typical urban modeling applications based on MLS
MLS technology has greatly facilitated urban 3D modeling of both indoor and outdoor environments. Nowadays, more and more applications based on MLS have been proposed. In this section, we introduce four major applications based on MLS: (1) Building facet modeling, (2) high-definition (HD) map, (3) building information models, and (4) traffic visibility evaluation.
4.1 Building facet modeling
Recently, 3D modeling of large-scale urban buildings and reconstruction of indoor scenes has attracted increasing attention. Urban buildings are usually composed of complex primitives that may be difficult to model, as shown in Figure 8.
The rapid developments in LiDAR technology, however, have greatly facilitated the acquisition of 3D model data for indoor and large-scale urban scenes. The captured point cloud is inherently capable of representing the physical geometry of real scenes, which facilitates modeling. However, in city scenes, there is a large number of urban objects with a great variety of shapes, and thus, it is difficult and time-consuming to carry out manual modeling of urban buildings from raw point clouds.
Automatic reconstruction of refined 3D models of large-scale urban buildings from raw point clouds is still a big challenge for researchers. The main difficulty is the data quality of raw point clouds from urban buildings. LiDAR point clouds are often contaminated by noise and outliers; they may also be influenced by point density, coverage, and occlusions.
Zhou et al. proposed a novel building segmentation and damage detection to realize automated component-level damage assessment of major building envelop elements including walls, roofs, balconies, columns, and handrails[160]. Goebbels et al. used airborne LiDAR point clouds and true orthophotos to get better building model edges[161]. Zhang et al. constructed a Delaunay triangulated irregular network (TIN) model and an edge length ratio based trace algorithm to refine the building’s boundary; they then used clusters from the same plane point set to determine the roof structures[162]. Chen et al. integrated the LiDAR point cloud and large-scale vector map to model buildings[163]. They preprocessed LiDAR point cloud and vector maps, roof analysis, and building reconstruction in three steps to get the building models. Yi et al. used the divide-and-conquer strategy to decompose the entire point cloud into a number of individual building subsets and then extracted the primitive elements through a novel algorithm called spectral residual clustering[159]. The final 3D building model was generated by applying the union Boolean operations over the block models.
Xiong et al. analyzed the topology graphs of building model surfaces and found the three basic primitives of roof topology graphs[164]. Wang et al. combined the advantages of point clouds and optical images to accurately define building facade features[165]. Zhang et al. proposed a novel framework for urban point cloud classification and reconstruction. They presented an activation function that rectified linear units’ neural networks (ReLu-NNs) to speed up convergence of the rectified linear units (ReLu)[166]. Díaz et al. detected doors and analyzed the visibility issue of indoor environments[167]. Stambler et al. introduced room-, floor-, and building-level reasoning, and built highly accurate models by performing modeling and recognition simultaneously over the entire building[168].
Javanmardi et al. proposed an automatic and accurate 3D building model reconstruction technique that integrates an airborne LiDAR point cloud with a 2D boundary map[169]. Zhang et al. proposed a deep neural network that integrates a 3D convolution, a deep Q-network, and a residual recurrent neural network to acquire semantic labels for large-scale point cloud data[170]. They then used classification results and an edge-aware resampling algorithm to generate urban building models. López et al. utilized historical and bibliographical data to obtain graphic and semantic information of the point cloud, and used BIM software to create a library of parametric elements[171]. Ochmann et al. developed a parametric building model that incorporates contextual information such as global wall connectivity[172]. Xiong et al. proposed a parameter-free algorithm to robustly and precisely construct roof structures and building models[173].
Hojebri et al. proposed a method based on the fusion of a structure’s point cloud and image to obtain accurate modeling results[174]. Hron et al. reviewed auto-generation of 3D building models from the point cloud[175]. Based on the concept of data reuse, Chen et al. proposed a building modeling method that has physical geometric shapes similar to a user-specified point cloud query; the shapes can be retrieved and reused for data extraction and modeling of buildings[176]. Zhang et al. used the Canny and Hough transform operator to extract the edges of the building, and the E3De3 software to obtain the 3D building model[177]. Chen et al. introduced a novel encoding scheme based on low-frequency spherical harmonic basis functions for 3D building model retrieval[178].
In contrast to previous studies, Demir et al. proposed an approach that can operate directly on the raw point cloud[179]. Their approach consists of semi-automatic segmentation, a consensus-based voting schema, a pattern extraction algorithm, and an interactive editing tool. Wu et al. proposed a fast and easy algorithm of plane segmentation based on cross-line element growth (CLEG) for 3D building modeling[180]. Chen et al. integrated point cloud and large-scale vector maps to get the 3D building model[163]. Wang et al. proposed a novel semantic line framework-based modeling building method based on the backpacked point cloud[181]. The proposed method can effectively perform line framework extraction and output results for building modeling.
Table 2 presents a quantitative evaluation of various building models[166].
Quantitative evaluation of building models[166]
Dataset BN1 #Pts2 #Model vertices3 #Model faces4 % completeness5 6
[166] a 16559 1972 3631 99.5 0.02
(16559) (2102) (3875) (98.2) (0.06)
b 1819 198 318 95.8 0.03
(1819) (217) (391) (94.1) (0.04)
c 10194 1179 2199 97.3 0.02
(10194) (1294) (2236) (96.2) (0.03)
d 9198 2097 3938 95.9 0.02
(9198) (2097) (4182) (93.8) (0.03)
e 8111 1782 3298 100 0.06
(8111) (1882) (3421) (98.4) (0.08)
f 33537 4356 7794 94.2 0.15
(33537) (4492) (7925) (93.5) (0.17)
Mean 97.1 0.05
(95.7) (0.07)
Notes: 1: Building identifier; 2: Number of airborne laser scanning (ALS) points in the building; 3: Number of vertices in the building; 4: Number of triangles in the building; 5: Completeness of the model; 6: ∑ represents the standard deviations of the distance of ALS points to their corresponding faces (in meters)[182].
4.2 High-definition (HD) map
HD map is a crucial technology for autonomous driving[183], especially for vehicle localization[184] and motion planning for cars[185].
There are plenty of related works for constructing HD maps. Zhang et al. built an HD map system and described the components of an HD map according to embodiments[186]. Siam et al. proposed a semantic segmentation method to construct HD maps from images[187]. Barsi et al. created HD maps using the TLS system[188].
HD maps can be classified into two types: Dense semantic point cloud maps and maps based on the landmark. Dense semantic point cloud maps are constructed by using a laser scanning point cloud with semantics. Such HD maps include the road surface, road markings, road boundaries, traffic signs, etc. Hence, many technology-oriented companies use this type of map due to its highly accurate and integrated road information.
However, it is difficult to build HD maps directly from the collected LiDAR point cloud. Generally, the collected point cloud consists of buildings, roads, parking lots, vegetation, and other points that are not relevant for generating the map. Therefore, there are many studies focused on how to extract each component of HD maps separately.
4.2.1   Road surface
The road surface is one of the primary components of the HD map, and also one of the essential parts of a road structure. In general, the raw data collected from the laser scanning system contain many irrelevant points and noise; separating the on-road points and off-road points from raw data is a key step for HD map component extraction. Many methods have been proposed for road surface detection and extraction from the point cloud and these are mainly categorized into two basic methods: (1) 3D-based methods, and (2) georeferenced feature (GRF)-based methods[189].
To decrease computation complexity, trajectory information was used in road surface extraction. Wu et al. vertically partitioned raw point clouds using trajectory information; they then used the random sample consensus (RANSAC) method to extract ground points by calculating the average height of ground points[190]. Based on point cloud features, Hata et al. extracted ground surfaces by using different filters, including differential filters and regression filters, on the point cloud[191]. There are some curb-based road surface extraction methods, for example, Guan et al. assumed that road curbstones can represent the boundaries of the pavement and extracted the road surface by separating pavement surfaces from roadsides[192].
Many methods convert the point cloud into a 2D GRF map; road surfaces are then efficiently detected and extracted based on existing computer vision technologies. To minimize computation complexity, Riveiro et al. projected the point cloud onto a 2D space and then detected the road by using principal component analysis (PCA)[193]. Yang et al. extracted road surfaces by generating GRF images to filter out off-ground objects[194].
4.2.2   Road boundary
Road boundary, also called road edge, road curb, or curbstone, is an essential part of HD maps. It is mostly extracted by determining the height variance between sidewalks and driveways. To reduce computational complexity, some trajectory-based methods have been proposed. Wang et al. first divided the point cloud into several parts along the trajectory[195]; the road boundary was then extracted and refined from each part. Wang et al. extracted the road boundary from the point cloud by assuming that the altitude of the road boundary points varies considerably from the altitude of road surface points[196]. Zai et al. detected the rough road boundary via super voxels and the alpha-shape algorithm, and then extracted the curb by applying graph cuts on the trajectory and rough boundary[83]. Since road edge extraction can be regarded as a classification problem, Rachmadi et al. detected the road edge from the 3D point cloud using an encoder-decoder convolutional network[197]. Based on the 3D local feature, Yang et al. proposed a new binary kernel descriptor (BKD) to detect road curbs and markings[194].
4.2.3   Road markings
Road markings are an important part of the HD map for self-driving; they form a key component to achieve accurate navigation. Many related studies extract road markings from the laser scanning point cloud. Road markings consist of different marking types, such as lane lines, zebra crossings, arrows, and texts. Therefore, much research also studied the classification of road markings. The related studies can be mainly divided into two categories: (1) methods based on the 3D method, and (2) methods based on GRF projection.
3D-based methods extract road markings directly from the road surface based on the distinct intensity difference between markings and other points. The trajectory of a vehicle can be used to locate the positions of road markings. Hence, Chen et al. proposed a profile-based intensity analysis by partitioning the point cloud into slices along the trajectory of the vehicle and then extracted the road markings via analyzing the peak value of intensity in each scan line[198]. Yu et al. extracted road markings by using a multi-segment thresholding strategy and spatial density filtering from the point cloud; they then extracted and classified small-sized road markings via deep Boltzmann machine (DBM)-based neural networks[199].
Jung et al. rasterized the point cloud into the x-y plane; the lane markings were then extracted by intensity contrast[200]. As road marking extraction can be treated as semantic segmentation, neural networks can be applied to the extraction problem. With the emergence of many image classification networks, road markings can be classified efficiently by such networks. Wen et al. proposed a deep learning framework to extract, classify, and complete road markings. A modified U-net was first used to extract road markings from the projected intensity image, and then a multi-scale clustering algorithm and a CNN classifier was applied to classify them[201]. Finally, they completed the classified markings using a conditional generative adversarial network and a context-based method.
4.2.4   Traffic signs
Traffic signs are also an important part of HD maps; they provide critical information about roads for traffic safety in navigation during autonomous driving. Generally, traffic signs are part of pole-like objects in a raw point cloud. Therefore, in most studies, pole-like object extraction was performed first. Then, the pole-like objects were classified into different categories that contain different types of traffic signs. Most researches were performed by analyzing the position, continuity, verticality, shape, size, and intensity of the pole-like objects[113].
Based on the size and intensity difference, Wen et al. set a minimum threshold in clusters to remove small objects and filtered out non-sign objects[201]. By using the traffic sign attributes, Arcos-García et al. developed height and planar filters to eliminate small parts and non-planar parts[202]. Huang et al. first detected traffic signs from the point cloud based on high intensity and position; then performed occlusion detection to analyze traffic sign occlusion by observing the relationship between viewpoint and traffic signs[203].
The detected traffic signs are usually classified into different types by analyzing the features of the point clouds and images. Wen et al. extracted integral features consisting of a histogram of gradients (HOG) and color descriptors; they used the support vector machine (SVM) to train the classification model[201]. Some deep learning-based methods have been also been proposed for traffic sign recognition[202]. Yu et al. projected the point cloud into a 2D image and applied the Gaussian-Bernoulli deep Boltzmann machine model for traffic sign recognition (TSR) [204].
The main issue in building an HD map using the LiDAR point cloud is how to accurately extract each component of the HD map from raw data. However, as of now, there is no method to extract all the HD components simultaneously. With the development of point cloud semantic segmentation, this goal may be achieved in the future.
4.3 Building information models (BIM)
The indoor building model is the data source of BIM, which plays a vital role in building maintenance, disaster rescue, and building renewal planning. However, it is a time-consuming and labor-consuming process to generate indoor three-dimensional models artificially. To generate three-dimensional models more efficiently, many studies develop indoor models from the original point clouds automatically. For example, Previtali et al. proposed a method based on optimization to detect the indoor characteristics of buildings[205]. Wang et al. proposed a new method of realizing line frame-based semantic indoor modeling. Tran et al. proposed a novel shape grammar method which can effectively generate three-dimensional models[206]. Shi et al. presented a method capable of automatically reconstructing 3D building models with semantic information from the unstructured 3D point cloud of indoor scenes[207]. Xiao et al. presented a framework that recovers missing points and estimates connectivity relations between planar and non-planar surfaces to obtain complete and high-quality 3D models[208].
Existing approaches to realize indoor modeling can be classified as linear-primitive, planar-primitive, and volumetric-primitive types[209].
4.3.1   Linear-primitive
The line-primitive indoor modeling method assumes that the wall is plane and vertical to the ground, and the indoor model is built based on the plane map. Oesau et al. presented a graph-cut-based indoor-reconstruction method to solve an inside/outside labeling of a space partitioning based on the raw point cloud[210]. Ochmann et al. proposed a parametric modeling method for reconstructing parametric three-dimensional building models from indoor point clouds and automatically reconstructing structural models containing multiple indoor scenes[172]. Ochmann et al. also presented a novel method of tackling the indoor building reconstruction problem from point clouds using integer linear programming[211]. Li et al. presented a segmentation method for the reconstruction of 3D indoor interiors[212]. This method overcomes the over-segmentation of graph-cut operations for long corridors and removes shared surfaces to reconstruct connected areas across multiple floors. The line-primitive indoor modeling method deals with indoor point clouds from a two-dimensional perspective, which is usually only applicable under ground independence and no clutter conditions.
4.3.2   Planar-primitive
The planar-primitive methods mainly involve two steps. First, the plane is extracted by classification, and then, the plane model is built based on the classification result. Sanchez et al. used random sample consensus (RANSAC) for plane fitting and the alpha shape for calculating their ranges and extracting large-scale plane structures such as the ground, ceiling, and wall from indoor point cloud data[213]. Similarly, Budroni et al. used plane scanning to extract ceilings, floors, and walls[214]. These methods can extract the plane very well, but they do not consider occlusion. Moreover, these methods typically use "context-based" reasoning to distinguish building elements before plane fitting and intersection. These methods are not suitable for complex indoor scenes or serious data-missing situations. Wang et al. proposed a method of first semantically classifying the 3D point clouds into different categories and then extracting the line structures from the labeled points separately[181].
4.3.3   Volumetric-primitive
The volumetric-primitive methods have strong regularity. These methods generally satisfy the Manhattan world hypothesis that only vertical and horizontal environments can be included. Furukawa et al. proposed an inverse solid geometry algorithm that detects walls in 2D and then combines them into cubes[215]. Khoshelham et al. proposed a grammar-based approach to reconstruct the indoor space that satisfies the Manhattan world hypothesis by iteratively placing, connecting, and merging the cubes[216]. Previtali et al. transformed the indoor reconstruction problem into a labeling problem of result units in a two-dimensional plane under the condition that the Manhattan world hypothesis is satisfied[217]. Kim et al. presented a geometry and camera pose reconstruction algorithm from image sequences for indoor Manhattan scenes[218].
4.3.4   Door and window detection
Indoor building models generally include the main structures of the buildings, such as ceilings, floors, walls, doors, windows, and other immovable objects, excluding the furniture and other movable objects. The detection of doors and windows is also a necessary part of indoor building models. Michailidis et al. focused on the wall and the extraction of the structures of doors and windows by detecting the holes in the wall[219]. However, this method can only operate on a single wall, and cannot be directly implemented on all indoor point cloud data. Wang et al. determined the most peripheral boundary line of the wall through point clouds on the ground and ceiling, and then, only the internal line structure was retained when extracting the wall line structure that is used to detect the locations of doors and windows[181]. Jung et al. first divided the point cloud into several separate rooms and then modeled the wall; finally, they projected the point on the wall onto a reverse binary and then detected the doors and windows[220]. Quintana et al. proposed a method of detecting doors and windows in three-dimensional color point clouds[221]. The method detects open doors based on rectangular data on the wall, and detects closed doors by identifying the rectangular areas that do not correspond to the actual wall area in subsequent processing. Previtali et al. proposed a voxel-based marking method based on visibility analysis[222]. Díaz-Vilariño et al. applied the generalized Hough transform to wall orthophoto images generated by color point clouds to detect closed doors[223]. Previtali et al. detected occluded doors and windows by implementing a ray-tracing algorithm after extracting the wall[222]. Doors and windows are modeled by obtaining parametrized rectangular shapes in images using the generalized Hough transform (Díaz-Vilariño et al.[224]). Nikoohemat et al. presented several algorithms for the interpretation of the interior space using MLS point clouds, in combination with the trajectory of the acquisition system[225].
4.4 Traffic visibility evaluation
Maintaining high visibility of traffic signs is crucial for traffic safety. Research on the visibility of traffic signs have provided the following categories of methods: simulation-based, image-based, naturalistic driving experimentation-based, and point clouds-based methods.
Simulation-based methods gather statistics based on visual or cognitive information collected from volunteers, and output evaluation results by performing simulation. Some researchers use the simulation platform to investigate the cognition time[226], driver behavior associated with visual distractions[227], and cognitive workload[228]. Motamedi et al. analyzed traffic-sign visibility in BIM-enabled virtual reality (VR) environments[229]. Eye-tracker equipment[230,231] has also been used to determine the visual cognition of traffic signs under simulated driving conditions. Simulation-based methods cannot provide a quantitative evaluation of visibility and visual or cognitive information for real roads.
Image-based methods compute the visibility of a traffic sign based on different contrast ratios and numbers of pixels in the occluded area of an image[232,233,234,235]. These methods cannot continuously evaluate visibility over an entire road surface because of viewpoint-position limitations. Meanwhile, image-based methods are not robust to lighting conditions and do not consider the current geometric properties of the road and traffic signs.
Naturalistic driving experimentation-based methods recognize driving modes by observing a driver's behavior over a prolonged period under natural conditions[236,237,238]. Because humans require time for cognition, a driver has to stop to obtain visibility from a given viewpoint when driving in natural conditions. This drawback renders obtaining the visibility distribution of traffic signs difficult.
Point cloud-based methods study the visibility of traffic signs using point clouds[203,239,240]. Mobile laser scanning (MLS) systems provide an efficient 3D measurement over large-scale traffic environments. Zhang et al. proposed the concept of the visibility field of a traffic sign and considered the geometric, occlusion, and sightline-deviation factors to build a model for evaluating the visibility distribution of traffic signs[241]. Their algorithm is, to date, the only automated algorithm that can test the visibility field on real roads on a large scale. The experimental results are presented in Figure 9.
5 Future work
Despite it achieving success, 3D modeling based on MLS scanning encounters many challenges. First, the next-generation MLS must have a new FOG-based IMU and a multi-GNSS constellation receiver to promote more reliable positioning. It will also integrate a smaller laser scan head to achieve a higher scanning frequency and easier operation. At the same time, owing to the rapid development of the present hardware technology, the expected cost of the MLS system will continue to fall, which will result in its more widespread use. More research and applications are required to explore the full potential of MLS in the future, combining LiDAR data and UAV images. According to the collected data, automatic algorithms, such as terrain extraction, urban 3D modeling, and vegetation analysis, require further development, and semi-automatic change-detection mapping also requires further development.
At present, deep learning on point cloud is in its early stages. Current work should not only focus on improving the accuracy and performance of the dataset but also ensure robustness and portability of the methods. More sophisticated deep-learning architectures need to be developed to handle the challenge of uneven distribution and possible insufficient sampling in the point cloud from the real world. Few datasets capture the complexity of real-world urban scenes, and comprehensive semantic understanding of the complex urban street is a challenge encountered by artificial intelligence (AI). The rapidly growing urban MLS point cloud data will give rise to a new category of geo-big data, and it provides more opportunities to develop better AI on point clouds. Finally, changes in downtown buildings, roads, and vegetation never stop, together with the dynamic scenery of traffic and pedestrians. Most methods only focus on how an accurate 3D city model can be developed by scanning the city only once; there is a lack of abundant dynamic information of the real world. Dynamic 3D modeling, combing the point cloud with other sensors such as cameras, is more challenging and worth studying.
Future applications of the MLS system will play an important role in various detection and modeling tasks in various civil fields such as transportation, civil engineering, forestry and agriculture, and in-process monitoring, and in understanding some natural sciences such as archaeology and geosciences.
6 Conclusion
Large-area urban 3D modeling has evolved rapidly in the past few years. The current development of MLS-based urban 3D modeling includes two parts: development of the hardware MLS system and the processing of point clouds, including LiDAR SLAM, point-cloud registration, feature extraction, object extraction, semantic segmentation, and deep point cloud processing. The current development of MLS brings together various levels of innovation from deep point cloud processing to high-level applications such as BIM, HD map, and traffic monitoring. In this paper, we reviewed the research that has been conducted on mobile mapping and urban 3D modeling using laser scanning.



El-Sheimy N. The development of VISAT: a mobile survey system for GIS applications. 1996


Thompson J, Sorvig K. Sustainable landscape construction: a guide to green building outdoors, second edition. 2008


Li D R. Mobile mapping technology and its applications. Geospatial Information, 2006, 4(4): 1–5 (in Chinese)


Kukko A, Kaartinen H, Hyyppä J, Chen Y W. Multiplatform mobile laser scanning: usability and performance. Sensors, 2012, 12(9): 11712–11733 DOI:10.3390/s120911712


Olsen M J. Guidelines for the use of mobile LIDAR in transportation applications. Transportation Research Board, 2013


Glennie C. Rigorous 3D error analysis of kinematic scanning LIDAR systems. Journal of Applied Geodesy, 2007, 1(3): 147–157 DOI:10.1515/jag.2007.017


Feng Y M, Gu S F, Shi C, Rizos C. A reference station-based GNSS computing mode to support unified precise point positioning and real-time kinematic services. Journal of Geodesy, 2013, 87(10/11/12): 945–960 DOI:10.1007/s00190-013-0659-7


Jeffrey C. An introduction to GNSS: GPS, GLONASS, Galileo and other global navigation satellite systems. NovAtel, 2010


Martinsanz G P. State-of-the-Art Sensors Technology in Spain 2017. MDPI, 2018


Zhang J, Singh S. Laser-visual-inertial odometry and mapping with high robustness and low drift. Journal of Field Robotics, 2018, 35(8): 1242–1264 DOI:10.1002/rob.21809


Besl P, McKay N. Method for registration of 3-D shapes. SPIE, 1992


Biber P, Strasser W. The normal distributions transform: a new approach to laser scan matching. In: Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat No03CH37453). 2003, 2743–2748 DOI:10.1109/iros.2003.1249285


Zhang J, Singh S. Visual-lidar odometry and mapping: low-drift, robust, and fast. In: 2015 IEEE International Conference on Robotics and Automation (ICRA). Seattle, WA, USA, IEEE, 2015, 2174–2181 DOI:10.1109/icra.2015.7139486


Fang Z, Scherer S. Experimental study of odometry estimation methods using RGB-D cameras. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. Chicago, IL, USA, IEEE, 2014 DOI:10.1109/iros.2014.6942632


Maurelli F, Droeschel D, Wisspeintner T, May S, Surmann H. A 3D laser scanner system for autonomous vehicle navigation. In: 2009 International Conference on Advanced Robotics. 2009, 1–6


Davison A J, Reid I D, Molton N D, Stasse O. MonoSLAM: real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(6): 1052–1067 DOI:10.1109/tpami.2007.1049


Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Transactions on Robotics, 2015, 31(5): 1147–1163 DOI:10.1109/tro.2015.2463671


Forster C, Pizzoli M, Scaramuzza D. SVO: Fast semi-direct monocular visual odometry. In: 2014 IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China, IEEE, 2014, 15–22 DOI:10.1109/icra.2014.6906584


Labbe M, Michaud F. Appearance-based loop closure detection for online large-scale and long-term operation. IEEE Transactions on Robotics, 2013, 29(3): 734–745 DOI:10.1109/tro.2013.2242375


Labbe M, Michaud F. Online global loop closure detection for large-scale multi-session graph-based SLAM. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. Chicago, IL, USA, IEEE, 2014, 2661–2666 DOI:10.1109/iros.2014.6942926


Geneva P, Eckenhoff K, Yang Y L, Huang G Q. Lips: lidar-inertial 3D plane slam. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, IEEE, 2018, 123–130 DOI:10.1109/iros.2018.8594463


Abolfazli Esfahani M, Wang H, Wu K Y, Yuan S H. AbolDeepIO: a novel deep inertial odometry network for autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(5): 1941–1950 DOI:10.1109/tits.2019.2909064


Qin T, Li P L, Shen S J. VINS-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 2018, 34(4): 1004–1020 DOI:10.1109/tro.2018.2853729


Qin T, Cao S Z, Pan J, Shen S J. A general optimization-based framework for global pose estimation with multiple sensors. 2019


Segal A, Haehnel D, Thrun S. Generalized-ICP. In: Robotics: Science and Systems V, Robotics: Science and Systems Foundation, 2009, 2(4): 435 DOI:10.15607/rss.2009.v.021


Pandey G, Savarese S, McBride J R, Eustice R M. Visually bootstrapped generalized ICP. In: 2011 IEEE International Conference on Robotics and Automation. Shanghai, China, IEEE, 2011, 2660–2667 DOI:10.1109/icra.2011.5980322


Andreasson H, Triebel R, Burgard W. Improving plane extraction from 3D data by fusing laser data and vision. In: 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems. Edmonton, Alta, Canada, IEEE, 2005, 2656–2661 DOI:10.1109/iros.2005.1545157


Joung J H, An K H, Kang J W, Chung M J, Yu W. 3D environment reconstruction using modified color ICP algorithm by fusion of a camera and a 3D laser range finder. In: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. St. Louis, MO, USA, IEEE, 2009, 3082–3088 DOI:10.1109/iros.2009.5354500


Men H, Gebre B, Pochiraju K. Color point cloud registration with 4D ICP algorithm. In: 2011 IEEE International Conference on Robotics and Automation. Shanghai, China, IEEE, 2011, 1511–1516 DOI:10.1109/icra.2011.5980407


Graeter J, Wilczynski A, Lauer M. LIMO: lidar-monocular visual odometry. In:2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, IEEE, 2018, 7872–7879 DOI:10.1109/iros.2018.8594394


Ye H Y, Chen Y Y, Liu M. Tightly coupled 3D lidar inertial odometry and mapping. In: 2019 International Conference on Robotics and Automation (ICRA). Montreal, QC, Canada, IEEE, 2019, 3144–3150 DOI:10.1109/icra.2019.8793511


Kuindersma S, Deits R, Fallon M, Valenzuela A, Dai H K, Permenter F, Koolen T, Marion P, Tedrake R. Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot. Autonomous Robots, 2016, 40(3): 429–455 DOI:10.1007/s10514-015-9479-3


Yu S J, Sukumar S R, Koschan A F, Page D L, Abidi M A. 3D reconstruction of road surfaces using an integrated multi-sensory approach. Optics and Lasers in Engineering, 2007, 45(7): 808–818 DOI:10.1016/j.optlaseng.2006.12.007


Hervieu A, Soheilian B. Semi-automatic road/pavement modeling using mobile laser scanning. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2013, II-3/W3: 31–36 DOI:10.5194/isprsannals-ii-3-w3-31-2013


Marton Z C, Rusu R B, Beetz M. On fast surface reconstruction methods for large and noisy point clouds. In: 2009 IEEE International Conference on Robotics and Automation. Kobe, IEEE, 2009, 3218–3223 DOI:10.1109/robot.2009.5152628


Lipman Y, Cohen-Or D, Levin D, Tal-Ezer H. Parameterization-free projection for geometry reconstruction. ACM Transactions on Graphics, 2007, 26(3): 22 DOI:10.1145/1276377.1276405


Nealen A, Igarashi T, Sorkine O, Alexa M. Laplacian mesh optimization. In: Proceedings of the 4th international conference on Computer graphics and interactive techniques in Australasia and Southeast Asia. Kuala Lumpur, Malaysia, New York, USA, ACM Press, 2006, 381–389 DOI:10.1145/1174429.1174494


Sarkar K, Varanasi K, Stricker D. Learning quadrangulated patches for 3D shape parameterization and completion. In: 2017 International Conference on 3D Vision (3DV). Qingdao, IEEE, 2017, 383–392 DOI:10.1109/3dv.2017.00051


Zhao W, Gao S M, Lin H W. A robust hole-filling algorithm for triangular mesh. The Visual Computer, 2007, 23(12): 987–997 DOI:10.1007/s00371-007-0167-y


Davis J, Marschner S R, Garr M, Levoy M. Filling holes in complex surfaces using volumetric diffusion. In: Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission, Padova, Italy, IEEE Comput. Soc, 2002, 428–441 DOI:10.1109/tdpvt.2002.1024098


Kroemer O, Ben Amor H, Ewerton M, Peters J. Point cloud completion using extrusions. In: 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012). Osaka, Japan, IEEE, 2012, 680–685 DOI:10.1109/humanoids.2012.6651593


Figueiredo R, Moreno P, Bernardino A. Automatic object shape completion from 3D point clouds for object manipulation. In: Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. Porto, Portugal. SCITEPRESS-Science and Technology Publications, 2017, 565–570 DOI:10.5220/0006170005650570


Sipiran I, Gregor R, Schreck T. Approximate symmetry detection in partial 3D meshes. Computer Graphics Forum, 2014, 33(7): 131–140 DOI:10.1111/cgf.12481


Wolf D, Howard A, Sukhatme G S. Towards geometric 3D mapping of outdoor environments using mobile robots. In: 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems. Edmonton, Alta, Canada, IEEE, 2005, 1507–1512 DOI:10.1109/iros.2005.1545152


Thrun S, Wegbreit B. Shape from symmetry. In: Tenth IEEE International Conference on Computer Vision (ICCV'05). Beijing, China, IEEE, 2005, 1824–1831 DOI:10.1109/iccv.2005.221


Mitra N J, Guibas L J, Pauly M. Partial and approximate symmetry detection for 3D geometry. ACM Transactions on Graphics, 2006, 25(3): 560–568 DOI:10.1145/1141911.1141924


Xu K, Zhang H, Tagliasacchi A, Liu L G, Li G, Meng M, Xiong Y S. Partial intrinsic reflectional symmetry of 3D shapes. ACM Transactions on Graphics, 2009, 28(5): 1–10 DOI:10.1145/1618452.1618484


Zheng Q, Sharf A, Wan G, Li Y, Mitra N J, Cohen-Or D, Chen B. Non-local scan consolidation for 3D urban scenes. 2010, 29(4):94:91–94:99 DOI:10.1145/1778765.1778831


Friedman S, Stamos I. Online facade reconstruction from dominant frequencies in structured point clouds. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Providence, RI, USA, IEEE, 2012,1–8 DOI:10.1109/cvprw.2012.6238908


Pauly M, Mitra N J, Wallner J, Pottmann H, Guibas L J. Discovering structural regularity in 3D geometry. In: ACM SIGGRAPH 2008 papers on-SIGGRAPH '08. Los Angeles, California, New York, USA, ACM Press, 2008, 1–11 DOI:10.1145/1399504.1360642


Li Y Y, Dai A, Guibas L, Nießner M. Database-assisted object retrieval for real-time 3D reconstruction. Computer Graphics Forum, 2015, 34(2): 435–446 DOI:10.1111/cgf.12573


Pauly M, Mitra N J, Giesen J, Gross M, Guibas L J. Example-based 3D scan completion. In: Proceedings of the third Eurographics symposium on Geometry processing. Vienna, Austria, Eurographics Association, 2005, 23


Nan L L, Xie K, Sharf A. A search-classify approach for cluttered indoor scene understanding. ACM Transactions on Graphics, 2012, 31(6): 1–10 DOI:10.1145/2366145.2366156


Kalogerakis E, Chaudhuri S, Koller D, Koltun V. A probabilistic model for component-based shape synthesis. ACM Transactions on Graphics, 2012, 31(4): 1–11 DOI:10.1145/2185520.2185551


Girdhar R, Fouhey D F, Rodriguez M, Gupta A. Learning a predictable and generative vector representation for objects//Computer Vision–ECCV 2016. Cham: Springer International Publishing, 2016, 484–499 DOI:10.1007/978-3-319-46466-4_29


Guan H Y, Li J, Yu Y T, Chapman M, Wang H Y, Wang C, Zhai R F. Iterative tensor voting for pavement crack extraction using mobile laser scanning data. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(3): 1527–1537 DOI:10.1109/tgrs.2014.2344714


Lin Y B, Wang C, Cheng J, Chen B L, Jia F K, Chen Z G, Li J. Line segment extraction for large scale unorganized point clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 2015, 102: 172–183 DOI:10.1016/j.isprsjprs.2014.12.027


Zheng G, Moskal L M, Kim S H. Retrieval of effective leaf area index in heterogeneous forests with terrestrial laser scanning. IEEE Transactions on Geoscience and Remote Sensing, 2013, 51(2): 777–786 DOI:10.1109/tgrs.2012.2205003


Wang Z, Zhang L Q, Fang T, Mathiopoulos P T, Tong X H, Qu H M, Xiao Z Q, Li F, Chen D. A multiscale and hierarchical feature extraction method for terrestrial laser scanning point cloud classification. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(5): 2409–2425 DOI:10.1109/tgrs.2014.2359951


Pathak K, Birk A, Vaškevičius N, Poppinga J. Fast registration based on noisy planes with unknown correspondences for 3-D mapping. IEEE Transactions on Robotics, 2010, 26(3): 424–441 DOI:10.1109/tro.2010.2042989


von Gioi R G, Jakubowicz J, Morel J M, Randall G. LSD: a fast line segment detector with a false detection control. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(4): 722–732 DOI:10.1109/tpami.2008.300


Akinlar C, Topal C. EDLines: a real-time line segment detector with a false detection control. Pattern Recognition Letters, 2011, 32(13): 1633–1642 DOI:10.1016/j.patrec.2011.06.001


Jain A, Kurz C, Thormahlen T, Seidel H P. Exploiting global connectivity constraints for reconstruction of 3D line segments from images. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA, IEEE, 2010, 1586–1593 DOI:10.1109/cvpr.2010.5539781


Daniels J, Ha L K, Ochotta T, Silva C T. Robust smooth feature extraction from point clouds. In: IEEE International Conference on Shape Modeling and Applications 2007 (SMI '07). Minneapolis, MN, USA, IEEE, 2007,123–136 DOI:10.1109/smi.2007.32


Kim S K. Extraction of ridge and valley lines from unorganized points. Multimedia Tools and Applications, 2013, 63(1): 265–279 DOI:10.1007/s11042-012-0999-y


Lin Y B, Wang C, Chen B L, Zai D W, Li J. Facet segmentation-based line segment extraction for large-scale point clouds. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(9): 4839–4854 DOI:10.1109/tgrs.2016.2639025


Besl P J, Jain R C. Segmentation through variable-order surface fitting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1988, 10(2): 167–192 DOI:10.1109/34.3881


Pu S, Vosselman G. Automatic extraction of building features from terrestrial laser scanning. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2006, 36(5): 25–27


Masuta H, Makino S, Lim H O. 3D plane detection for robot perception applying particle swarm optimization. In: 2014 World Automation Congress (WAC). Waikoloa, HI, IEEE, 2014, 549–554 DOI:10.1109/wac.2014.6936041


Duda R O, Hart P E. Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM, 1972, 15(1): 11–15 DOI:10.1145/361237.361242


Xu L, Oja E, Kultanen P. A new curve detection method: Randomized Hough transform (RHT). Pattern Recognition Letters, 1990, 11(5): 331–338 DOI:10.1016/0167-8655(90)90042-z


Fischler M A, Bolles R C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 1981, 24(6): 381–395 DOI:10.1145/358669.358692


Awwad T M, Zhu Q, Du Z Q, Zhang Y T. An improved segmentation approach for planar surfaces from unstructured 3D point clouds. The Photogrammetric Record, 2010, 25(129): 5–23 DOI:10.1111/j.1477-9730.2009.00564.x


Schnabel R, Wahl R, Klein R. Efficient RANSAC for point-cloud shape detection. Computer Graphics Forum, 2007, 26(2): 214–226 DOI:10.1111/j.1467-8659.2007.01016.x


Lin Y, Li J, Wang C, Chen Z, Wang Z, Li J. Fast Regularity-Constrained Plane Reconstruction. 2019


El-Sayed E, Abdel-Kader R F, Nashaat H, Marei M. Plane detection in 3D point cloud using octree-balanced density down-sampling and iterative adaptive plane extraction. IET Image Processing, 2018, 12(9): 1595–1605 DOI:10.1049/iet-ipr.2017.1076


Nguyen H L, Belton D, Helmholz P. Planar surface detection for sparse and heterogeneous mobile laser scanning point clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 2019, 151: 141–161 DOI:10.1016/j.isprsjprs.2019.03.006


Kwon H, Kim M, Lee J, Kim J, Doh N L, You B J. Robust plane extraction using supplementary expansion for low-density point cloud data. In: 2018 15th International Conference on Ubiquitous Robots (UR). Honolulu, HI, IEEE, 2018, 501–505 DOI:10.1109/urai.2018.8441776


Papon J, Abramov A, Schoeler M, Worgotter F. Voxel cloud connectivity segmentation-supervoxels for point clouds. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA, IEEE, 2013, 2027–2034 DOI:10.1109/cvpr.2013.264


Babahajiani P, Fan L X, Kamarainen J, Gabbouj M. Automated super-voxel based features classification of urban environments by integrating 3D point cloud and image content. In: 2015 IEEE International Conference on Signal and Image Processing Applications (ICSIPA). Kuala Lumpur, Malaysia, IEEE, 2015, 372–377 DOI:10.1109/icsipa.2015.7412219


Lin Y B, Wang C, Zhai D W, Li W, Li J. Toward better boundary preserved supervoxel segmentation for 3D point clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 143: 39–47 DOI:10.1016/j.isprsjprs.2018.05.004


Zai D W, Li J, Guo Y L, Cheng M, Lin Y B, Luo H, Wang C. 3-D road boundary extraction from mobile laser scanning data via supervoxels and graph cuts. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(3): 802–813 DOI:10.1109/tits.2017.2701403


Wang H Y, Wang C, Luo H, Li P, Chen Y P, Li J. 3-D point cloud object detection based on supervoxel neighborhood with hough forest framework. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2015, 8(4): 1570–1581 DOI:10.1109/jstars.2015.2394803


Besl P J, McKay N D. A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1992, 14(2): 239–256 DOI:10.1109/34.121791


Bae K H, Lichti D D. A method for automated registration of unorganised point clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 2008, 63(1): 36–54 DOI:10.1016/j.isprsjprs.2007.05.012


Gressin A, Mallet C, Demantké J, David N. Towards 3D lidar point cloud registration improvement using optimal neighborhood knowledge. ISPRS Journal of Photogrammetry and Remote Sensing, 2013, 79: 240–251 DOI:10.1016/j.isprsjprs.2013.02.019


Stechschulte J, Heckman C. Hidden Markov random field iterative closest point. 2017


Yang J L, Li H D, Campbell D, Jia Y D. Go-ICP: a globally optimal solution to 3D ICP point-set registration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(11): 2241–2254 DOI:10.1109/tpami.2015.2513405


Campbell D, Petersson L. GOGMA: globally-optimal Gaussian mixture alignment. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA, IEEE, 2016, 5685–5694 DOI:10.1109/cvpr.2016.613


Straub J, Campbell T, How J P, Fisher J W. Efficient global point cloud alignment using Bayesian nonparametric mixtures. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, IEEE, 2017, 2403–2412 DOI:10.1109/cvpr.2017.258


Tombari F, Salti S, di Stefano L. Unique signatures of histograms for local surface description//Computer Vision–ECCV 2010. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, 356–369 DOI:10.1007/978-3-642-15558-1_26


Guo Y L, Sohel F, Bennamoun M, Lu M, Wan J W. Rotational projection statistics for 3D local surface description and object recognition. International Journal of Computer Vision, 2013, 105(1): 63–86 DOI:10.1007/s11263-013-0627-y


Yang J Q, Zhang Q, Xiao Y, Cao Z G. TOLDI: an effective and robust approach for 3D local shape description. Pattern Recognition, 2017, 65: 175–187 DOI:10.1016/j.patcog.2016.11.019


Rusu R B, Blodow N, Marton Z C, Beetz M. Aligning point cloud views using persistent feature histograms. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems. Nice, IEEE, 2008, 3384–3391 DOI:10.1109/iros.2008.4650967


Zai D W, Li J, Guo Y L, Cheng M, Huang P D, Cao X F, Wang C. Pairwise registration of TLS point clouds using covariance descriptors and a non-cooperative game. ISPRS Journal of Photogrammetry and Remote Sensing, 2017, 134,15–29 DOI:10.1016/j.isprsjprs.2017.10.001


Zeng A, Song S R, Niessner M, Fisher M, Xiao J X, Funkhouser T. 3DMatch: learning local geometric descriptors from RGB-D reconstructions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, IEEE, 2017,199–208 DOI:10.1109/cvpr.2017.29


Huang H B, Kalogerakis E, Chaudhuri S, Ceylan D, Kim V G, Yumer E. Learning local shape descriptors from part correspondences with multiview convolutional networks. ACM Transactions on Graphics, 2018, 37(1): 1–14 DOI:10.1145/3137609


Elbaz G, Avraham T, Fischer A. 3D point cloud registration for localization using a deep neural network auto-encoder. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, IEEE, 2017, 2472–2481 DOI:10.1109/cvpr.2017.265


Khoury M, Zhou Q-Y, Koltun V. Learning compact geometric features. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 153–161


Deng H W, Birdal T, Ilic S. PPF-FoldNet: unsupervised learning of rotation invariant 3D local descriptors//Computer Vision–ECCV 2018. Cham: Springer International Publishing, 2018, 620–638 DOI:10.1007/978-3-030-01228-1_37


Gojcic Z, Zhou C F, Wegner J D, Wieser A. The perfect match: 3D point cloud matching with smoothed densities. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, IEEE, 2019, 5545–5554 DOI:10.1109/cvpr.2019.00569


Xu Y S, Boerner R, Yao W, Hoegner L, Stilla U. Pairwise coarse registration of point clouds in urban scenes using voxel-based 4-planes congruent sets. ISPRS Journal of Photogrammetry and Remote Sensing, 2019, 151: 106–123 DOI:10.1016/j.isprsjprs.2019.02.015


Shi X J, Liu T, Han X. Improved Iterative Closest Point (ICP) 3D point cloud registration algorithm based on point cloud filtering and adaptive fireworks for coarse registration. International Journal of Remote Sensing, 2020, 41(8): 3197–3220 DOI:10.1080/01431161.2019.1701211


Deng H W, Birdal T, Ilic S. PPFNet: global context aware local features for robust 3D point matching. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, IEEE, 2018, 195–205 DOI:10.1109/cvpr.2018.00028


Georgakis G, Karanam S, Wu Z Y, Ernst J, Kosecka J. End-to-end learning of keypoint detector and descriptor for pose invariant 3D matching. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, IEEE, 2018, 1965–1973 DOI:10.1109/cvpr.2018.00210


Yew Z J, Lee G H. 3DFeat-net: weakly supervised local 3D features for point cloud registration//Computer Vision–ECCV 2018. Cham: Springer International Publishing, 2018, 630–646 DOI:10.1007/978-3-030-01267-0_37


Deng H W, Birdal T, Ilic S. 3D local features for direct pairwise registration. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, IEEE, 2019, 3244–3253 DOI:10.1109/cvpr.2019.00336


Aoki Y, Goforth H, Srivatsan R A, Lucey S. PointNetLK: robust & efficient point cloud registration using PointNet. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, IEEE, 2019, 7163–7172 DOI:10.1109/cvpr.2019.00733


Sarode V, Li X Q, Goforth H, Aoki Y, Choset H. PCRNet: point cloud registration network using PointNet encoding. 2019


Aiger D, Mitra N J, Cohen-Or D. 4-points congruent sets for robust pairwise surface registration. In: ACM SIGGRAPH 2008 papers on-SIGGRAPH '08. Los Angeles, California, New York, USA, ACM Press, 2008, 1–10 DOI:10.1145/1399504.1360684


Mellado N, Aiger D, Mitra N J. Super 4PCS fast global pointcloud registration via smart indexing. Computer Graphics Forum, 2014, 33(5): 205–215 DOI:10.1111/cgf.12446


Che E Z, Jung J, Olsen M. Object recognition, segmentation, and classification of mobile laser scanning point clouds: a state of the art review. Sensors, 2019, 19(4): 810 DOI:10.3390/s19040810


Hackel T, Wegner J D, Schindler K. Fast semantic segmentation of 3d point clouds with strongly varying density. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2016, III-3: 177–184 DOI:10.5194/isprs-annals-iii-3-177-2016


Weinmann M, Jutzi B, Mallet C. Semantic 3D scene interpretation: a framework combining optimal neighborhood size selection with relevant features. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2014, II-3: 181–188 DOI:10.5194/isprsannals-ii-3-181-2014


Hu H Z, Munoz D, Bagnell J A, Hebert M. Efficient 3-D scene analysis from streaming data. In: 2013 IEEE International Conference on Robotics and Automation. Karlsruhe, Germany, IEEE, 2013, 2297–2304 DOI:10.1109/icra.2013.6630888


Zhao H J, Liu Y M, Zhu X L, Zhao Y P, Zha H B. Scene understanding in a large dynamic environment through a laser-based sensing. In: 2010 IEEE International Conference on Robotics and Automation. Anchorage, AK, IEEE, 2010, 127–133 DOI:10.1109/robot.2010.5509169


Lu Y, Rasmussen C. Simplified Markov random fields for efficient semantic labeling of 3D point clouds. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vilamoura-Algarve, Portugal, IEEE, 2012, 2690–2697 DOI:10.1109/iros.2012.6386039


Munoz D, Bagnell J A, Vandapel N, Hebert M. Contextual classification with functional max-margin Markov networks. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, IEEE, 2009, 975–982 DOI:10.1109/cvpr.2009.5206590


Wu Z R, Song S R, Khosla A, Yu F, Zhang L G, Tang X O, Xiao J X. 3D ShapeNets: a deep representation for volumetric shapes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA, IEEE, 2015, 1912–1920 DOI:10.1109/cvpr.2015.7298801


Qi C R, Su H, NieBner M, Dai A, Yan M Y, Guibas L J. Volumetric and multi-view CNNs for object classification on 3D data. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA, IEEE, 2016, 5648–5656 DOI:10.1109/cvpr.2016.609


Maturana D, Scherer S. VoxNet: a 3D Convolutional Neural Network for real-time object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Hamburg, Germany, IEEE, 2015, 922–928 DOI:10.1109/iros.2015.7353481


Su H, Maji S, Kalogerakis E, Learned-Miller E. Multi-view convolutional neural networks for 3D shape recognition. In: 2015 IEEE International Conference on Computer Vision. Santiago, Chile, IEEE, 2015, 945–953 DOI:10.1109/iccv.2015.114


Leng B, Guo S, Zhang X Y, Xiong Z. 3D object retrieval with stacked local convolutional autoencoder. Signal Processing, 2015, 112: 119–128 DOI:10.1016/j.sigpro.2014.09.005


Tosteberg P. Semantic segmentation of point clouds using deep learning. 2017


Wu B C, Wan A, Yue X Y, Keutzer K. SqueezeSeg: convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LiDAR point cloud. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). Brisbane, QLD, IEEE, 2018, 1887–1893 DOI:10.1109/icra.2018.8462926


Piewak F, Pinggera P, Schäfer M, Peter D, Schwarz B, Schneider N, Enzweiler M, Pfeiffer D, Zöllner M. Boosting LiDAR-based semantic labeling by cross-modal training data generation//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2019, 497–513 DOI:10.1007/978-3-030-11024-6_39


Caltagirone L, Scheidegger S, Svensson L, Wahde M. Fast LIDAR-based road detection using fully convolutional neural networks. In: 2017 IEEE Intelligent Vehicles Symposium (IV). Los Angeles, CA, USA, IEEE, 2017, 1019–1024 DOI:10.1109/ivs.2017.7995848


Lawin F J, Danelljan M, Tosteberg P, Bhat G, Khan F S, Felsberg M. Deep projective 3D semantic segmentation//Computer Analysis of Images and Patterns. Cham: Springer International Publishing, 2017, 95–107 DOI:10.1007/978-3-319-64689-3_8


Wang X, Chan T O, Liu K, Pan J, Luo M, Li W, Wei C. A robust segmentation framework for closely packed buildings from airborne LiDAR point clouds. International Journal of Remote Sensing, 2020, 41(14): 5147–5165 DOI:10.1080/01431161.2020.1727053


Guo Z, Feng C C. Using multi-scale and hierarchical deep convolutional features for 3D semantic classification of TLS point clouds. International Journal of Geographical Information Science, 2020, 34(4): 661–680 DOI:10.1080/13658816.2018.1552790


Charles R Q, Hao S, Mo K C, Guibas L J. PointNet: deep learning on point sets for 3D classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, IEEE, 2017, 652–660 DOI:10.1109/cvpr.2017.16


Qi C R, Yi L, Su H, Guibas L J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems. 2017, 5099–5108


Zhou Y, Tuzel O. VoxelNet: end-to-end learning for point cloud based 3D object detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, IEEE, 2018, 4490–4499 DOI:10.1109/cvpr.2018.00472


Li Y, Bu R, Sun M, Wu W, Di X, Chen B. Pointcnn: Convolution on x-transformed points. In: Advances in Neural Information Processing Systems. 2018, 820–830


Wang Y, Sun Y B, Liu Z W, Sarma S E, Bronstein M M, Solomon J M. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics, 2019, 38(5): 1–12 DOI:10.1145/3326362


Huang R, Hong D F, Xu Y S, Yao W, Stilla U. Multi-scale local context embedding for LiDAR point cloud classification. IEEE Geoscience and Remote Sensing Letters, 2020, 17(4): 721–725 DOI:10.1109/lgrs.2019.2927779


Tchapmi L, Choy C, Armeni I, Gwak J, Savarese S. SEGCloud: semantic segmentation of 3D point clouds. In: 2017 International Conference on 3D Vision (3DV). Qingdao, IEEE, 2017, 537–547 DOI:10.1109/3dv.2017.00067


Hackel T, Savinov N, Ladicky L, Wegner J D, Schindler K, Pollefeys M. a new large-scale point cloud classification benchmark. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2017, IV-1/W1: 91–98 DOI:10.5194/isprs-annals-iv-1-w1-91-2017


Riegler G, Ulusoy A O, Geiger A. OctNet: learning deep 3D representations at high resolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2017, 3577–3586 DOI:10.1109/cvpr.2017.701


Riemenschneider H, Bódis-Szomorú A, Weissenberg J, van Gool L. Learning where to classify in multi-view semantic segmentation//Computer Vision–ECCV 2014. Cham: Springer International Publishing, 2014, 516–532 DOI:10.1007/978-3-319-10602-1_34


Engelmann F, Kontogianni T, Hermans A, Leibe B. Exploring spatial context for 3D semantic segmentation of point clouds. In: 2017 IEEE International Conference on Computer Vision Workshops. Venice, IEEE, 2017, 716–724 DOI:10.1109/iccvw.2017.90


Landrieu L, Simonovsky M. Large-scale point cloud semantic segmentation with superpoint graphs. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, IEEE, 2018, 4558–4567 DOI:10.1109/cvpr.2018.00479


Xu Y S, Ye Z, Yao W, Huang R, Tong X H, Hoegner L, Stilla U. Classification of LiDAR point clouds using supervoxel-based detrended feature and perception-weighted graphical model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 13: 72–88 DOI:10.1109/jstars.2019.2951293


Bronstein M M, Bruna J, LeCun Y, Szlam A, Vandergheynst P. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 2017, 34(4): 18–42 DOI:10.1109/msp.2017.2693418


Premebida C, Carreira J, Batista J, Nunes U. Pedestrian detection combining RGB and dense LIDAR data. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. Chicago, IL, USA, IEEE, 2014, 4112–4117 DOI:10.1109/iros.2014.6943141


Gonzalez A, Villalonga G, Xu J L, Vazquez D, Amores J, Lopez A M. Multiview random forest of local experts combining RGB and LIDAR data for pedestrian detection. In: 2015 IEEE Intelligent Vehicles Symposium (IV). Seoul, South Korea, IEEE, 2015, 356–361 DOI:10.1109/ivs.2015.7225711


Li B, Zhang T, Xia T. Vehicle detection from 3d lidar using fully convolutional network. 2016 DOI:10.15607/rss.2016.xii.042


Chen X, Kundu K, Zhu Y, Berneshawi A G, Ma H, Fidler S, Urtasun R. 3D object proposals for accurate object class detection. In: Advances in Neural Information Processing Systems. 2015, 424–432


Yang B, Luo W J, Urtasun R. PIXOR: real-time 3D object detection from point clouds. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, IEEE, 2018, 7652–7660 DOI:10.1109/cvpr.2018.00798


Zeng Wang D, Posner I. Voting for voting in online point cloud object detection. In: Robotics: Science and Systems XI, Robotics: Science and Systems Foundation, 2015: 1(3):10–15607 DOI:10.15607/rss.2015.xi.035


Engelcke M, Rao D, Wang D Z, Tong C H, Posner I. Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). Singapore, IEEE, 2017, 1355–1361 DOI:10.1109/icra.2017.7989161


Chen X Z, Ma H M, Wan J, Li B, Xia T. Multi-view 3D object detection network for autonomous driving. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, IEEE, 2017, 1907–1915 DOI:10.1109/cvpr.2017.691


Li B. 3D fully convolutional network for vehicle detection in point cloud. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver, BC, IEEE, 2017, 1513–1518 DOI:10.1109/iros.2017.8205955


Maturana D, Scherer S. 3D Convolutional Neural Networks for landing zone detection from LiDAR. In: 2015 IEEE International Conference on Robotics and Automation (ICRA). Seattle, WA, USA, IEEE, 2015, 3471–3478 DOI:10.1109/icra.2015.7139679


Shi S S, Wang X G, Li H S. PointRCNN: 3D object proposal generation and detection from point cloud. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, IEEE, 2019, 770–779 DOI:10.1109/cvpr.2019.00086


Qi C R, Litany O, He K M, Guibas L. Deep hough voting for 3D object detection in point clouds. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South), IEEE, 2019, 9277–9286 DOI:10.1109/iccv.2019.00937


Lang A H, Vora S, Caesar H, Zhou L B, Yang J, Beijbom O. PointPillars: fast encoders for object detection from point clouds. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, IEEE, 2019, 12697–12705 DOI:10.1109/cvpr.2019.01298


Yi C, Zhang Y, Wu Q Y, Xu Y B, Remil O, Wei M Q, Wang J. Urban building reconstruction from raw LiDAR point data. Computer-Aided Design, 2017, 93: 1–14 DOI:10.1016/j.cad.2017.07.005


Zhou Z X, Gong J. Automated analysis of mobile LiDAR data for component-level damage assessment of building structures during large coastal storm events. Computer-Aided Civil and Infrastructure Engineering, 2018, 33(5): 373–392 DOI:10.1111/mice.12345


Goebbels S, Pohle-Fröhlich R. Quality enhancement techniques for building models derived from sparse point clouds. In: Proceedings of the 12th International Joint Conference on Computer Vision. Imaging and Computer Graphics Theory and Applications, 2017, 93–104 DOI:10.5220/0006103300930104


Zhang D, Du P. 3D building reconstruction from lidar data based on Delaunay TIN approach. SPIE, 2011


Chen L C, Teo T A, Kuo C Y, Rau J Y. Shaping polyhedral buildings by the fusion of vector maps and lidar point clouds. Photogrammetric Engineering & Remote Sensing, 2007, 73(9): 1147–1157 DOI:10.14358/pers.73.9.1147


Xiong B, Jancosek M, Oude Elberink S, Vosselman G. Flexible building primitives for 3D building modeling. ISPRS Journal of Photogrammetry and Remote Sensing, 2015, 101: 275–290 DOI:10.1016/j.isprsjprs.2015.01.002


Wang Y Z, Ma Y Q, Zhu A X, Zhao H, Liao L X. Accurate facade feature extraction method for buildings from three-dimensional point cloud data considering structural information. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 139: 146–153 DOI:10.1016/j.isprsjprs.2017.11.015


Zhang L Q, Li Z Q, Li A J, Liu F Y. Large-scale urban point cloud labeling and reconstruction. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 138: 86–100 DOI:10.1016/j.isprsjprs.2018.02.008


Díaz-Vilariño L, Khoshelham K, Martínez-Sánchez J, Arias P. 3D modeling of building indoor spaces and closed doors from imagery and point clouds. Sensors, 2015, 15(2): 3491–3512 DOI:10.3390/s150203491


Stambler A, Huber D. Building modeling through enclosure reasoning. In: 2014 2nd International Conference on 3D Vision. Tokyo, IEEE, 2014, 118–125 DOI:10.1109/3dv.2014.65


Javanmardi M, Gu Y L, Javanmardi E, Hsu L T, Kamijo S. 3D building map reconstruction in dense urban areas by integrating airborne laser point cloud with 2D boundary map. In: 2015 IEEE International Conference on Vehicular Electronics and Safety (ICVES). Yokohama, Japan, IEEE, 2015, 126–131 DOI:10.1109/icves.2015.7396906


Zhang L Q, Zhang L. Deep learning-based classification and reconstruction of residential scenes from large-scale point clouds. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(4): 1887–1897 DOI:10.1109/tgrs.2017.2769120


López F J, Lerones P M, Llamas J, Gómez-García-bermejo J, Zalama E. A framework for using point cloud data of heritage buildings toward geometry modeling in A BIM context: a case study on santa maria La real de mave church. International Journal of Architectural Heritage, 2017, 1–22 DOI:10.1080/15583058.2017.1325541


Ochmann S, Vock R, Wessel R, Klein R. Automatic reconstruction of parametric building models from indoor point clouds. Computers & Graphics, 2016, 54: 94–103 DOI:10.1016/j.cag.2015.07.008


Xiong B, Elberink S O, Vosselman G. Building modeling from noisy photogrammetric point clouds. 2014, 2(3):197


Hojebri B, Samadzadegan F, Arefi H. Building reconstruction based on the data fusion of lidar point cloud and aerial imagery. 2014, 103–121


Hron V, Halounová L. Automatic generation of 3D building models from point clouds//Lecture Notes in Geoinformation and Cartography. Cham: Springer International Publishing, 2014, 109–119 DOI:10.1007/978-3-319-11463-7_8


Chen Y C, Lin B Y, Lin C H. Consistent roof geometry encoding for 3D building model retrieval using airborne LiDAR point clouds. ISPRS International Journal of Geo-Information, 2017, 6(9): 269 DOI:10.3390/ijgi6090269


Zhang Y J, Li X H, Wang Q, Liu J, Liang X, Li D, Ni C D, Liu Y. LIDAR point cloud data extraction and establishment of 3D modeling of buildings. IOP Conference Series: Materials Science and Engineering, 2018, 301: 012037 DOI:10.1088/1757-899x/301/1/012037


Chen J Y, Lin C H, Hsu P C, Chen C H. Point cloud encoding for 3D building model retrieval. IEEE Transactions on Multimedia, 2014, 16(2): 337–345 DOI:10.1109/tmm.2013.2286580


Demir I, Aliaga D G, Benes B. Procedural editing of 3D building point clouds. In: 2015 IEEE International Conference on Computer Vision. Santiago, IEEE, 2015, 2147–2155 DOI:10.1109/iccv.2015.248


Wu T, Hu X Y, Ye L Z. Fast and accurate plane segmentation of airborne LiDAR point cloud using cross-line elements. Remote Sensing, 2016, 8(5): 383 DOI:10.3390/rs8050383


Wang C, Hou S W, Wen C L, Gong Z, Li Q, Sun X T, Li J. Semantic line framework-based indoor building modeling using backpacked laser scanning point cloud. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 143: 150–166 DOI:10.1016/j.isprsjprs.2018.03.025


Chen D, Zhang L Q, Mathiopoulos P T, Huang X F. A methodology for automated segmentation and reconstruction of urban 3-D buildings from ALS point clouds. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2014, 7(10): 4199–4217 DOI:10.1109/jstars.2014.2349003


Seif H G, Hu X L. Autonomous driving in the iCity: HD maps as a key challenge of the automotive industry. Engineering, 2016, 2(2): 159–162 DOI:10.1016/j.eng.2016.02.010


Bauer S, Alkhorshid Y, Wanielik G. Using High-Definition maps for precise urban vehicle localization. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC). Rio de Janeiro, Brazil, IEEE, 2016, 492–497 DOI:10.1109/itsc.2016.7795600


Zeng W Y, Luo W J, Suo S, Sadat A, Yang B, Casas S, Urtasun R. End-to-end interpretable neural motion planner. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, IEEE, 2019, 8660–8669 DOI:10.1109/cvpr.2019.00886


Zhang R, Chen C, Di Z, Wheeler M D. Visual odometry and pairwise alignment for high definition map creation. 2019


Siam M, Elkerdawy S, Jagersand M, Yogamani S. Deep semantic segmentation for automated driving: Taxonomy, roadmap and challenges. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC). Yokohama, IEEE, 2017, 1–8 DOI:10.1109/itsc.2017.8317714


Barsi A, Poto V, Somogyi A, Lovas T, Tihanyi V, Szalay Z. Supporting autonomous vehicles by creating HD maps. Production Engineering Archives, 2017, 16: 43–46 DOI:10.30657/pea.2017.16.09


Ma L F, Li Y, Li J, Wang C, Wang R S, Chapman M. Mobile laser scanned point-clouds for road object detection and extraction: a review. Remote Sensing, 2018, 10(10): 1531 DOI:10.3390/rs10101531


Wu F, Wen C L, Guo Y L, Wang J J, Yu Y T, Wang C, Li J. Rapid localization and extraction of street light poles in mobile LiDAR point clouds: a supervoxel-based approach. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(2): 292–305 DOI:10.1109/tits.2016.2565698


Hata A Y, Osorio F S, Wolf D F. Robust curb detection and vehicle localization in urban environments. In: 2014 IEEE Intelligent Vehicles Symposium Proceedings. MI, USA, IEEE, 2014, 1257–1262 DOI:10.1109/ivs.2014.6856405


Guan H Y, Li J, Yu Y T, Wang C, Chapman M, Yang B S. Using mobile laser scanning data for automated extraction of road markings. ISPRS Journal of Photogrammetry and Remote Sensing, 2014, 87: 93–107 DOI:10.1016/j.isprsjprs.2013.11.005


Riveiro B, González-Jorge H, Martínez-Sánchez J, Díaz-Vilariño L, Arias P. Automatic detection of zebra crossings from mobile LiDAR data. Optics & Laser Technology, 2015, 70: 63–70 DOI:10.1016/j.optlastec.2015.01.011


Yang B S, Dong Z, Liu Y, Liang F X, Wang Y J. Computing multiple aggregation levels and contextual features for road facilities recognition using mobile laser scanning data. ISPRS Journal of Photogrammetry and Remote Sensing, 2017, 126: 180–194 DOI:10.1016/j.isprsjprs.2017.02.014


Wang H Y, Luo H, Wen C L, Cheng J, Li P, Chen Y P, Wang C, Li J. Road boundaries detection based on local normal saliency from mobile laser scanning data. IEEE Geoscience and Remote Sensing Letters, 2015, 12(10): 2085–2089 DOI:10.1109/lgrs.2015.2449074


Wang H Y, Cai Z P, Luo H, Wang C, Li P, Yang W T, Ren S P, Li J. Automatic road extraction from mobile laser scanning data. In: 2012 International Conference on Computer Vision in Remote Sensing. Xiamen, China, IEEE, 2012, 136–139 DOI:10.1109/cvrs.2012.6421248


Rachmadi R F, Uchimura K, Koutaki G, Ogata K. Road edge detection on 3D point cloud data using Encoder-Decoder Convolutional Network. In: 2017 International Electronics Symposium on Knowledge Creation and Intelligent Computing (IES-KCIC). Surabaya, IEEE, 2017, 95–100 DOI:10.1109/kcic.2017.8228570


Chen X, Kohlmeyer B, Stroila M, Alwar N, Wang R, Bach J. Next generation map making: geo-referenced ground-level LIDAR point clouds for automatic retro-reflective road feature extraction. In: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. Seattle, Washington, Association for Computing Machinery, 2009, 488–491 DOI:10.1145/1653771.1653851


Yu Y T, Li J, Guan H Y, Jia F K, Wang C. Learning hierarchical features for automated extraction of road markings from 3-D mobile LiDAR point clouds. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2015, 8(2): 709–726 DOI:10.1109/jstars.2014.2347276


Jung J, Che E Z, Olsen M J, Parrish C. Efficient and robust lane marking extraction from mobile lidar point clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 2019, 147: 1–18 DOI:10.1016/j.isprsjprs.2018.11.012


Wen C L, Li J, Luo H, Yu Y T, Cai Z P, Wang H Y, Wang C. Spatial-related traffic sign inspection for inventory purposes using mobile laser scanning data. IEEE Transactions on Intelligent Transportation Systems, 2016, 17(1): 27–37 DOI:10.1109/tits.2015.2418214


Arcos-GarcíaÁ, Soilán M, Álvarez-García J A, Riveiro B. Exploiting synergies of mobile mapping sensors and deep learning for traffic sign recognition systems. Expert Systems With Applications, 2017, 89: 286–295 DOI:10.1016/j.eswa.2017.07.042


Huang P D, Cheng M, Chen Y P, Luo H, Wang C, Li J. Traffic sign occlusion detection using mobile laser scanning point clouds. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(9): 2364–2376 DOI:10.1109/tits.2016.2639582


Yu Y T, Li J, Wen C L, Guan H Y, Luo H, Wang C. Bag-of-visual-phrases and hierarchical deep models for traffic sign detection and recognition in mobile laser scanning data. ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 113: 106–123 DOI:10.1016/j.isprsjprs.2016.01.005


Previtali M, Díaz-Vilariño L, Scaioni M. Towards automatic reconstruction of indoor scenes from incomplete point clouds: door and window detection and regularization. ISPRS-International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2018, XLII-4: 507–514 DOI:10.5194/isprs-archives-xlii-4-507-2018


Tran H, Khoshelham K, Kealy A, Díaz-Vilariño L. Shape grammar approach to 3D modeling of indoor environments using point clouds. Journal of Computing in Civil Engineering, 2019, 33(1): 04018055 DOI:10.1061/(asce)cp.1943-5487.0000800


Shi W Z, Ahmed W, Li N, Fan W Z, Xiang H D, Wang M Y. Semantic geometric modelling of unstructured indoor point cloud. ISPRS International Journal of Geo-Information, 2018, 8(1): 9 DOI:10.3390/ijgi8010009


Xiao Y, Taguchi Y, Kamat V R. Coupling point cloud completion and surface connectivity relation inference for 3D modeling of indoor building environments. Journal of Computing in Civil Engineering, 2018, 32(5): 04018033 DOI:10.1061/(asce)cp.1943-5487.0000776


Díaz-Vilariño L, Verbree E, Zlatanova S, Diakité A. Indoor modelling from slam-based laser scanner: door detection to envelope reconstruction. ISPRS-International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2017, XLII-2/W7: 345–352 DOI:10.5194/isprs-archives-xlii-2-w7-345-2017


Oesau S, Lafarge F, Alliez P. Indoor scene reconstruction using feature sensitive primitive extraction and graph-cut. ISPRS Journal of Photogrammetry and Remote Sensing, 2014, 90: 68–82 DOI:10.1016/j.isprsjprs.2014.02.004


Ochmann S, Vock R, Klein R. Automatic reconstruction of fully volumetric 3D building models from oriented point clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 2019, 151: 251–262 DOI:10.1016/j.isprsjprs.2019.03.017


Li L, Su F, Yang F, Zhu H H, Li D L, Zuo X K, Li F, Liu Y, Ying S. Reconstruction of three-dimensional (3D) indoor interiors with multiple stories via comprehensive segmentation. Remote Sensing, 2018, 10(8): 1281 DOI:10.3390/rs10081281


Sanchez V, Zakhor A. Planar 3D modeling of building interiors from point cloud data. In: 2012 19th IEEE International Conference on Image Processing. Orlando, FL, USA, IEEE, 2012,1777–1780 DOI:10.1109/icip.2012.6467225


Budroni A, Boehm J. Automated 3D reconstruction of interiors from point clouds. International Journal of Architectural Computing, 2010, 8(1): 55–73 DOI:10.1260/1478-0771.8.1.55


Furukawa Y, Curless B, Seitz S M, Szeliski R. Reconstructing building interiors from images. In: 2009 IEEE 12th International Conference on Computer Vision. Kyoto, IEEE, 2009, 80–87 DOI:10.1109/iccv.2009.5459145


Khoshelham K, Díaz-Vilariño L. 3D modelling of interior spaces: learning the language of indoor architecture. ISPRS- International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2014, XL-5: 321–326 DOI:10.5194/isprsarchives-xl-5-321-2014


Previtali M, Díaz-Vilariño L, Scaioni M. Indoor building reconstruction from occluded point clouds using graph-cut and ray-tracing. Applied Sciences, 2018, 8(9): 1529 DOI:10.3390/app8091529


Kim S, Manduchi R, Qin S Y. Multi-planar monocular reconstruction of Manhattan indoor scenes. In: 2018 International Conference on 3D Vision. Verona, IEEE, 2018, 30–33 DOI:10.1109/3dv.2018.00076


Michailidis G T, Pajarola R. Bayesian graph-cut optimization for wall surfaces reconstruction in indoor environments. The Visual Computer, 2017, 33(10): 1347–1355 DOI:10.1007/s00371-016-1230-3


Jung J, Stachniss C, Ju S, Heo J. Automated 3D volumetric reconstruction of multiple-room building interiors for as-built BIM. Advanced Engineering Informatics, 2018, 38: 811–825 DOI:10.1016/j.aei.2018.10.007


Quintana B, Prieto S A, Adán A, Bosché F. Door detection in 3D coloured point clouds of indoor environments. Automation in Construction, 2018, 85: 146–166 DOI:10.1016/j.autcon.2017.10.016


Previtali M, Barazzetti L, Brumana R, Scaioni M. Towards automatic indoor reconstruction of cluttered building rooms from point clouds. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2014, 2(5): 281–288 DOI:10.5194/isprsannals-ii-5-281-2014


Díaz-Vilariño L, Khoshelham K, Martínez-Sánchez J, Arias P. 3D modeling of building indoor spaces and closed doors from imagery and point clouds. Sensors, 2015, 15(2): 3491–3512 DOI:10.3390/s150203491


Díaz-Vilariño L, Boguslawski P, Khoshelham K, Lorenzo H. Obstacle-aware indoor pathfinding using point clouds. ISPRS International Journal of Geo-Information, 2019, 8(5): 233 DOI:10.3390/ijgi8050233


Nikoohemat S, Peter M, Oude Elberink S, Vosselman G. Exploiting indoor mobile laser scanner trajectories for semantic interpretation of point clouds. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2017, 4: 355–362 DOI:10.5194/isprs-annals-iv-2-w4-355-2017


Sun L S, Yao L Y, Rong J, Lu J Y, Liu B H, Wang S W. Simulation analysis on driving behavior during traffic sign recognition. International Journal of Computational Intelligence Systems, 2011, 4(3): 353–360 DOI:10.2991/ijcis.2011.4.3.9


Li N X, Busso C. Predicting perceived visual and cognitive distractions of drivers with multimodal features. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(1): 51–65 DOI:10.1109/tits.2014.2324414


Lyu N C, Xie L, Wu C Z, Fu Q, Deng C. Driver's cognitive workload and driving performance under traffic sign information exposure in complex environments: a case study of the highways in China. International Journal of Environmental Research and Public Health, 2017, 14(2): 203 DOI:10.3390/ijerph14020203


Motamedi A, Wang Z, Yabuki N, Fukuda T, Michikawa T. Signage visibility analysis and optimization system using BIM-enabled virtual reality (VR) environments. Advanced Engineering Informatics, 2017, 32: 248–262 DOI:10.1016/j.aei.2017.03.005


Li L D, Zhang Q N. Research on visual cognition about sharp turn sign based on driver's eye movement characteristic. International Journal of Pattern Recognition and Artificial Intelligence, 2017, 31(7): 1759012 DOI:10.1142/s0218001417590121


Liu B H, Sun L S, Rong J. Driver's visual cognition behaviors of traffic signs based on eye movement parameters. Journal of Transportation Systems Engineering and Information Technology, 2011, 11(4): 22–27 DOI:10.1016/s1570-6672(10)60129-8


Belaroussi R, Gruyer D. Impact of reduced visibility from fog on traffic sign detection. In: 2014 IEEE Intelligent Vehicles Symposium Proceedings. MI, USA, IEEE, 2014, 1302–1306 DOI:10.1109/ivs.2014.6856535


Doman K, Deguchi D, Takahashi T, Mekada Y, Ide I, Murase H, Sakai U. Estimation of traffic sign visibility considering local and global features in a driving environment. In: 2014 IEEE Intelligent Vehicles Symposium Proceedings. MI, USA, IEEE, 2014, 202–207 DOI:10.1109/ivs.2014.6856474


Doman K, Deguchi D, Takahashi T, Mekada Y, Ide I, Murase H, Tamatsu Y. Estimation of traffic sign visibility toward smart driver assistance. In: 2010 IEEE Intelligent Vehicles Symposium. La Jolla, CA, USA, IEEE, 2010, 45–50 DOI:10.1109/ivs.2010.5548137


Doman K, Deguchi D, Takahashi T, Mekada Y, Ide I, Murase H, Tamatsu Y. Estimation of traffic sign visibility considering temporal environmental changes for smart driver assistance. In: 2011 IEEE Intelligent Vehicles Symposium (IV). Baden-Baden, Germany, IEEE, 2011, 667–672 DOI:10.1109/ivs.2011.5940467


Balsa-Barreiro J, Valero-Mora P M, Berné-Valero J L, Varela-García F A. GIS mapping of driving behavior based on naturalistic driving data. ISPRS International Journal of Geo-Information, 2019, 8(5): 226 DOI:10.3390/ijgi8050226


Balsa-Barreiro J, Sánchez García M, Valero-Mora P M, Pareja Montoro I. Geo-referencing naturalistic driving data using a novel method based on vehicle speed. IET Intelligent Transport Systems, 2013, 7(2): 190–197 DOI:10.1049/iet-its.2012.0152


Lee J, Yang J H. Analysis of driver's EEG given take-over alarm in SAE level 3 automated driving in a simulated environment. International Journal of Automotive Technology, 2020, 21(3): 719–728 DOI:10.1007/s12239-020-0070-3


Katz S, Tal A, Basri R. Direct visibility of point sets. In: ACM SIGGRAPH 2007 papers on-SIGGRAPH '07. San Diego, California, New York, USA, ACM Press, 2007 DOI:10.1145/1275808.1276407


Katz S, Tal A. Improving the visual comprehension of point sets. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA, IEEE, 2013, 121–128 DOI:10.1109/cvpr.2013.23


Zhang S X, Wang C, Lin L L, Wen C L, Yang C H, Zhang Z M, Li J. Automated visual recognizability evaluation of traffic sign based on 3D LiDAR point clouds. Remote Sensing, 2019, 11(12): 1453 DOI:10.3390/rs11121453