Home About the Journal Latest Work Current Issue Archive Special Issues Editorial Board


2022,  4 (3):   223 - 246

Published Date:2022-6-20 DOI: 10.1016/j.vrih.2022.03.004


The recent advancements in the field of Virtual Reality (VR) and Augmented Reality (AR) have a substantial impact on modern day technology by digitizing each and everything related to human life and open the doors to the next generation Software Technology (Soft Tech). VR and AR technology provide astonishing immersive contents with the help of high quality stitched panoramic contents and 360° imagery that widely used in the education, gaming, entertainment, and production sector. The immersive quality of VR and AR contents are greatly dependent on the perceptual quality of panoramic or 360° images, in fact a minor visual distortion can significantly degrade the overall quality. Thus, to ensure the quality of constructed panoramic contents for VR and AR applications, numerous Stitched Image Quality Assessment (SIQA) methods have been proposed to assess the quality of panoramic contents before using in VR and AR. In this survey, we provide a detailed overview of the SIQA literature and exclusively focus on objective SIQA methods presented till date. For better understanding, the objective SIQA methods are classified into two classes namely Full-Reference SIQA and No-Reference SIQA approaches. Each class is further categorized into traditional and deep learning-based methods and examined their performance for SIQA task. Further, we shortlist the publicly available benchmark SIQA datasets and evaluation metrices used for quality assessment of panoramic contents. In last, we highlight the current challenges in this area based on the existing SIQA methods and suggest future research directions that need to be target for further improvement in SIQA domain.


1 Introduction
Recently, considerable advancements have been reported in the field of Virtual Reality (VR)[1] and Augmented Reality (AR)[2] for enabling their applications in various technology-enabled sectors including education[3,4], health[5,6], sports[7,8], and production sector[9,10]. The aim of VR and AR in the abovementioned sectors is to provide an immersive and realistic experience to the end users using wide field-of-view contents or panoramic images on state-of-the-art head mounted display devices such as Samsung VR Gear Headset and Oculus Rift. Typically, these panoramic images having dimension of 360°×180° constructed by stitching a series of images (having optimized overlapping gaps captured through multiple cameras with especially designed camera setup) using image stitching algorithms[11,12]. As an integral part of VR/AR, the quality of panoramic contents plays an important role in the user experiences where a minor artifact in the panoramic imagery can greatly reduce the overall quality of VR/AR contents. Considering the significance of panoramic contents quality, the perceptual quality of panoramic/immersive contents must be validated using Image Quality Assessment (IQA) algorithm before using it in VR and AR applications. Such a timely validation of the perceptual quality of panoramic contents can greatly facilitates the VR/AR experience thereby providing high quality stitched images and discarding erroneous stitched imagery.
Till date, several IQA approaches have been proposed to evaluate the perceptual quality of an image. The presented IQA methods include similarity-based matrices (Structural Similarity Index Measure (SSIM)[13], Feature-Similarity Index Matrix (FSIM)[14], Multi-Scale Structural Similarity Index Measure (MS-SSIM)[15], Gradient Magnitude Similarity (GMS)[16], and Spectral Angle Mapping (SAM)[17]) difference-based (Picture-wise Just Noticeable Difference (JND)[18]), distortion-based (Distortion Identification-based Image Verity and Integrity Evaluation (DIIVINE)[19]), entropies-based (Spatial and Spectral Entropies Quality evaluation method (SSEQ)[20]), natural spatial quality-based approaches (Natural Image Quality Evaluator (NIQE)[21], Blind Image Quality Indices (BIQI)[22], and Blind Image Integrity Notator using DCT Statistics (BLIINDS-II)[23]). These IQA approaches have shown significant development in the area of image quality assessment by introducing computationally efficient, precise, and effective methods. Despite the astonishing performance shown by these existing IQA methods for image quality assessment task, these IQA approaches are only capable to deal with 2D images and unable to estimate the perceptual quality of stitched images. There are two main concerns behind the limitation of 2D IQA approaches towards stitched images: Firstly, the artifacts in the stitched images are mostly tempted by ghosting and misalignment error (local errors). Secondly, while optimizing the optical adjustment of images for achieving the better stitching results, the optimization often causes global distortions in constructed panoramic images that include chromatic aberrations and parallax distortion. Considering the inability of 2D IQA methods for panoramic stitched images, there must be an efficient quality assessment method which can estimate or predict the quality of stitched image.
Focusing on the quality assessment of stitched images specifically, literature reports numerous Stitched Image Quality assessment (SIQA) methods presented so far towards the perceptual quality assessment of panoramic images. Among the presented SIQA methods few studies followed subjective assessment of the panoramic images while rest of the methods proposed objective computational techniques to evaluate the quality of stitched images. Based on this, the SIQA literature is divided into two main classes of SIQA approaches namely Subjective SIQA and Objective SIQA as shown in Figure 1. Generally, Subjective SIQA methods involves human observations (visual perception of individuals for the quality of given panoramic images or videos) based on subjective tests in specific experimental environment. During the subjective assessment of stitched image quality, a user/subject assess the quality of given panoramic contents based on their personal perception and knowledge. After the completion of subjective quality assessment session, the experts usually compute the quality score for a particular panoramic image/video given by a set of subjects and then estimate the Mean Opinion Score (MOS) by averaging the quality score values for particular image. The subjective SIQA methods are comparatively accurate than that of objective SIQA methods, however there are few concerns which make them difficult to adopt for practical applications. For instance, subjective SIQA methods require huge amount of human effort to obtain the quality score of a single panoramic image. Also, it is time consuming, where the same image/video is forward to multiple subjects for quality rating which limits its applications in practical environments.
To alleviate the shortcomings of subjective SIQA methods, the best alternative is the objective SIQA methods that estimate the perceptual quality of given panoramic image using computational algorithms without any human observation. The objective SIQA approaches takes RGB panoramic image as an input and extract multi-scale features (local and global features), where the extracted features are then forwarded to mathematical model or machine learning algorithm to regress the quality of given RGB image. Among these objective SIQA methods some approaches need a reference image while estimating the quality of given panoramic image, whereas some of them estimate the perceptual quality of given panoramic image without any reference image. Based on the technical diversity of quality assessment approach, the objective SIQA methods can be classified as a Full-Reference SIQA and No-Reference SIQA methods. The Full-Reference SIQA approaches takes a pair of panoramic images as an input where one is use as an image to investigate for quality while the other is use as a reference image that provides additional information for perceptual comparison. On the other hand, No-Reference SIQA methods does not require any reference image, instead they use spatial characteristics of the given panoramic image that include structural consistency, histogram statistics, chrominance, and visibility of the contents at the edges. The workflow of typical Full-Reference SIQA and No-Reference SIQA systems are pictorially depicted in Figure 2.
To provide a detailed SIQA literature to research community, this prospective survey discusses the existing objective SIQA methods with detail explanation and categorized them into two main categories namely Full- Reference SIQA and No-Reference SIQA. We investigate all existing methods in detail and discuss their strength and weakness in specific scenario while dealing with the quality assessment task. In addition, we reported a detailed quantitative analysis of the existing Full-Reference and No-Reference SIQA methods presented so far in terms of SROCC, PLCC, and RMSE metric and presented in Table 1 and Table 2. The SROCC, PLCC, and RMSE values presented in given tables are reported from their original papers which indicate their performance on specific dataset for stitched image quality assessment task. Further, we shortlist and discuss the publicly available SIQA datasets followed by the evaluation metrics commonly used for panoramic contents quality estimation/assessment. Finally, we discuss the current challenges in the SIQA domain that need to be addressed and provide future directions for concerned researchers to resolve these challenges in their future research's. More precisely the key contributions of this survey are as follows:
Summarized comparative analysis of both traditional and deep learning-based Full-Reference SIQA methods based on the evaluation dataset and commonly used SIQA evaluation metrics including SROCC, PLCC, and RMSE
Method Year Method Description Dataset SROCC PLCC RMSE Method Type
Traditional Deep Learning
Yang et al.[24] 2017 Investigating optical flow features and structural characteristics for panoramic image quality assessment SIQA[24] - - 0.2374
Zhou et al.[25] 2017 Focusing on scale invariant features i.e., SIFT and BRIEF descriptor for the quality assessment of stitched images Not Given - - -

Xu et


2017 Examining the consistency analysis of viewing points of the user towards panoramic videos quality assessment VQA-ODV[61] - - -
Zhang et al.[27] 2017 Evaluating the quality of panoramic videos using single stimulus continuous quality scale (SSCQS) and subjective assessment of multimedia panoramic video quality (SAMPVIQ) Panoramic video dataset[27] 0.7745 0.5859 13.6107
Yang et al.[28] 2018 Exploiting the spatial difference between the panoramic video frames using 3D CNN towards the quality assessment task VRQ-TJU[28] 0.8940 0.9008 8.1985
Guo et al.[30] 2018 Perceptual quality assessment of immersive contents using spatial peripheral vision SUN360[62] - - -
Chen et al.[31] 2018 Omnidirectional video quality assessment using structural similarity (SSIM) in the spherical domain Omnidirectional video quality assessment[31] 0.8211 0.8635 0.4428
Zhang et al.[32] 2018 Providing a generic database for subjective panoramic video quality assessment Subjective quality database for panoramic videos[32] 0.8166 0.8058 -
Lim et al.[33] 2018 Investigating deep adversarial learning and human perception guider for virtual reality image quality assessment SUN360[62] 0.8721 0.8522 8.8048
Li et al.[34] 2019 Exploiting perceptual hash, histogram similarity, sparce reconstruction, global color difference, and size of the blind zone for omnidirectional 360° image quality assessment CROSS[34] 0.7370 0.7370 1.3890
Yu et al.[35] 2019 Providing cross-reference omnidirectional images dataset for immersive contents quality assessment CROSS[34] - - -
Li et al.[36] 2019 Proposing viewports via saliency-driven CNN architecture towards 360° video quality assessment VQA-ODV[61] 0.8962 0.8740 5.7551
Wu et al.[37] 2019 Analyzing the perceptual quality of virtual reality videos using 3D CNN Panoramic video dataset[27] 0.9601 0.9414 1.1265
Kim et al.[38] 2019 Examining the perceptual quality of virtual reality omnidirectional images using human perception guider SUN360[62] 0.8823 0.8877 6.3837
Azevedo et al.[29] 2020 Estimating quality of 360° videos using multi-metric fusion approach VQA-ODV[61] 0.9171 0.9257 4.9954

Yan et


2020 Focusing on the quality assessment of stereoscopic stitched images using ghost, color, shape, and disparity distortion analysis VQA-ODV[61] - 0.8253 -
Chen et al.[41] 2020 Analyzing the quality of stereoscopic omnidirectional images using predictive coding theory based on human perception OIQA[63] 0.9020 0.9060 -
CVIQD2018[64] 0.9000 0.9070
Yang et al.[42] 2020 Examining the quality of panoramic videos using spherical CNN and non-local properties of the immersive contents VRQ-TJU[28] 0.9240 0.9390 -
Wang et al.[43] 2021 Focusing on the quality assessment of stitched images using bi-directional color matching ISIQA[51] 0.3340 0.3608 -
CCSID[43] 0.7071 0.7380 8.6715
Summarized comparative analysis of both traditional and deep learning-based No-Reference SIQA methods based on the evaluation dataset and commonly used SIQA evaluation metrics including SROCC, PLCC, and RMSE
Method Year Method Description Dataset SROCC PLCC RMSE Method Type
Traditional Deep Learning


et al.[44]

2005 Focusing on the panoramic video quality assessment using low-level and high-level vision factors Not Given - - -


et al. [45]

2010 Evaluating video stitching approaches using color correction in multi-view frame data Not Given - - -


et al.[46]

2017 Investigating the perceptual quality of panoramic images using error-activation-guided metric SIQA[24] - - -


et al.[47]

2018 Estimating the visual quality of stitching image using trained sparce convolutional kernels and features selection SIQA[24] 0.7295 0.8574 0.3161


et al.[48]

2019 Investigating the quality of stitched images using hybrid warping approach Not Given - - -


et al.[49]

2019 Following the asymmetric processing pipeline of human brain towards panoramic image quality assessment OIQA[63] 0.7150 0.7408 1.4264


et al.[50]

2019 Presenting no-reference quality assessment of stitched images using structural properties and saliency features. Not Given - - -


et al.[51]

2019 Examining the quality of Virtual reality stitched contents using color correction and bandpass analysis ISIQA[51] 0.7820 0.8030 -
Li et al.[52] 2019 Investigating the quality of 360° omnidirectional contents using deep low-resolution deformation and high-level recurrence CROSS[34] 0.7420 0.7420 2.0670


et al. [55]

2019 Evaluating the quality of 360° images using multi-channel CNN architecture CVIQ[55] 0.9187 0.9247 4.6247


et al.[53]

2020 Multi-task learning for blind panoramic contents quality assessment ISIQA[51] 0.7593 0.8022 -


et al.[54]

2020 Analyzing the quality of stitched image using stitching-specific distortion segmentation SUN360[62] 0.8591 0.9367 0.2194


et al.[40]

2020 Focusing on the quality evaluation of omnidirectional images by segmenting the distortion specific regions and spherical projection analysis CVIQD2018[64] 0.8614 0.9077 6.1178


et al.[56]

2020 Presenting graph convolutional neural network for viewport prediction towards omnidirectional image quality assessment OIQA[63] 0.9050 0.9241 5.4616
CVIQD2018[64] 0.7832 0.7911 1.2934
Poreddy et al.[57] 2021 Investigating the quality of panoramic contents using scene statistics and univariate generalized gaussian distribution



- - -


et al.[58]

2021 Utilizing the adjacent pixels correlation technique towards the quality assessment of panoramic image OIQA[63] 0.9394 0.9466 0.7142
CVIQD2018[64] 0.9322 0.9496 4.3690


et al.[59]

2021 Exploiting local-global naturalness and multifrequency analysis for 360° image quality evaluation OIQA[63] 0.9614 0.9695 0.5146
CVIQD2018[64] 0.9670 0.9751 3.1036


et al.[60]

2021 Utilizing spatial domain features extraction and temporal pooling for panoramic video quality assessment Subjective quality database for panoramic videos[32] 0.7754 0.8121 0.4499
Sendjasni et al.[66] 2021 Exploiting perceptually-weight CNN architecture followed by visual scan-path for the quality assessment of 360° images CVIQD2018[64] 0.9280 0.9490 -


et al.[67]

2021 Investigating global statistical characteristics and local measurement errors for stitched image quality evaluation ISIQA[51] 0.8406 0.8532 6.7551
CCSID[43] 0.7632 0.7776 8.3911


et al.[68]

2021 Proposing multi-stream CNN network with distortion discrimination for omnidirectional contents quality assessment OIQA[63] 0.9230 0.8990 6.3960
CVIQD2018[64] 0.9280 0.9490 -
1. To the best of our knowledge, this is the first attempt to present a brief yet informative survey that cover all existing SIQA methods presented till date. This survey explores the overall literature of SIQA domain and exclusively focus on objective SIQA approaches, the publicly available datasets, and evaluation metrics.
2. We provide organized analytics of SIQA methods for concerned researchers, where we classified the SIQA approaches into two classes namely Full-Reference SIQA and No-Reference SIQA methods. The main objective of this classification is to describe the typical workflow of both type of methods and gain the attention of research community towards the domain of SIQA.
3. This prospective survey reports the current challenges of the objective SIQA domain in a single prospective overview and provide the future directions to attract the interested researcher to further explore this research domain.
The rest of the article is structured as follows: Section II provides the detailed discussion of Objective SIQA methods including both Full-Reference SIQA and No-Reference SIQA. The publicly available SIQA datasets are discussed in Section III. Section IV provides the detail of evaluation metrics commonly used for SIQA task. Section V discuss the key challenges in the SIQA domain and suggest future directions for further research. Finally, Section VI concludes this survey with its findings.
2 Background and related works
In this section, we overview the literature of objective SIQA methods published from 2005 to 2021 with their detailed explanation in year-wise chronological order. As per difference in the workflow towards quality assessment of stitched images, we categorize the objective SIQA methods into Full-Reference SIQA and No-Reference SIQA methods and discuss them in separate sections. The summarized details (including publication year, method description, the dataset used for evaluation, and the obtained quality assessment performance) of Full-Reference SIQA and No-Reference SIQA methods are given in Table 1 and Table 2, respectively. Whereas the visual overview of objective SIQA methods classification into Full-Reference SIQA and No-Reference SIQA methods are depicted in Figure 3. Finally, the year-wise citation of both Full-Reference SIQA and No-Reference SIQA methods are depicted in Figure 4, that indicates the research interest and progress of researchers towards both sub-domains (FR-SIQA and NR-SIQA) of SIQA.
2.1 Full-reference SIQA methods
To automate the process of panoramic contents quality assessment, there are numerous Full-Reference SIQA approaches. In this section, we present a detailed discussion of Full-Reference SIQA approaches. For reference, Yang et al. proposed a novel approach to form a new SIQA metric by combining the perceptual geometric error metric and a local structure guided IQA metric[24]. In their findings, they combined the two matrices in a content-adaptive manner, where the amount of image structure is first estimated from the originally captured viewpoint images. They conducted three groups of experiments on the SIQA dataset and compute the local variance of optical flow field energy between the distorted and reference images for measuring the geometric errors. They attained the highest accuracy of 94.3% from fused metrics and adopt the min-spanning-tree technique for saliency detection. Zhou et al. proposed a quality evaluation approach amongst the splices and source images[25]. In their experiments, they applied a mosaic algorithm by taking multi-images and cutting those images into parts with overlapping sections. As the mosaic algorithm produces distortion in geometric structure, the authors simulate this structure by taking the color and brightness factor of optical images. To measure the effectiveness of the mosaic algorithm, they utilize the SAM and BRF approach. Furthermore, for assessing the loss quality of panoramic videos Xu et al. proposed a novel approach named VQA. Their findings include different perspectives of panoramic videos including viewing direction data and producing different mean scores[26]. Their approach mainly contains, 360°×180° FoV panoramic videos by creating a new database of 40 subjects with a total of 48 panoramic video sequences. Another group of researchers, Zhang et al. conducted a subjective and objective quality assessment of panoramic videos[27]. To create their database for panoramic videos, they Initially processed the 16 standard panoramic videos by utilizing different encoders and adding noise into them. Their database contains a total of 384 panoramic videos. In terms of numerous subjective as well as objective approaches, they explore the bitrate encoder and noise of panoramic videos of their own conducted datasets. Yang et al. utilized an openly accessible data set namely VRQ-TJU for assessing the VR quality in subjective full reference images[28]. They utilized a 3D end-to-end CNN approach for the prediction of VR quality. This novel approach is capable of extracting spatial-temporal features without acquiring hand-crafted features. In their findings, they took pre-processed video patches and got the score of every single patch using 3D CNN. For performance evaluation, they applied the quality score fusion. Azevedo et al. proposed a novel multi-metric fusion technique of viewport-based for assessing the visual quality of 360 videos as well as 2D videos. Initially, they worked on viewports extracted from the 360 videos[29]. For matching the subjective-quality score, they trained their proposed model that can combines these extracted features into a metric. Guo et al. explored the VR equipped devices for natural images visual perceptions[30]. They determine the quality of images in terms of their eccentricity by randomly select the images and extract the features of these selected images. For their findings, they utilized independent-cross validation approach and compared the results with state-of-the-art approaches.
By continuing the research, Chen et al. presented an objective omni-directional video quality assessment approach based on the structural similarity in the spherical domain[31]. The project amongst two domains can be handled by adopting the relation of structural similarity in 2D plane and sphere. They analyzed the relationship of structural similarity between the 2-D plane and the 360-degree spherical domain and proposed an SSIM-based VQA algorithm for omnidirectional video. This proposed metric is verified on a subjective omnidirectional video quality assessment database and compared with state-of-the-art objective quality evaluation metrics. Zhang et al. present an approach to code the applications that can take features of panoramic videos by applying quality assessment of subjected panoramic videos[32]. In their findings, instead of coding, they initially proposed the approach of resampling a video sequence. With optimal display resolution, the max range per pixel on the center of the video can be guaranteed and make this assessment more efficient and reliable. They established a subjective-quality database that includes a total of 50 distorted sequences generated from 10 different raw panoramic videos. For the performance evaluation, they utilized JVET. Lim et al. proposed an approach for quality assessment of VR images along with the adversarial learning for omnidirectional images[33]. By considering the exceptional characteristics of omnidirectional images, they proposed a deep network with a human perception guide as well as a novel predictor. Their presented approach automatically predicts the quality score by utilizing spatial features. For the performance evaluation of their findings, they conduct extensive subjective experiments on omni-directional data samples. Their approach outer performs the existing approaches of full reference images. Li et al. Proposed a quality assessment approach by creating an omni-direction dataset[34]. They utilized stitched images as well as fisheye images. In their findings, they evaluate the quality of images by making pair of images, which means they make one pair (0 and 180) and second pair (90 and 270). After creating pairs, they create a cross-reference for providing the ground truth of stitching regions. As per dataset requirements, they proposed omni-directional stitching quality assessing metrices by exploring the relationship amongst stitched and its cross-referenced images. For their performance evaluation, they performed qualitative and quantitative experiments of these quality assessment metrices. Yu et al. proposed a novel approach by generating a new dataset of stitched and fisheye's images from 4 different angles including, 0, 90, 180, and 270 degree[35]. For their findings, they created two pairs of images a (0 and 180 degree) and b (90 and 270 degree). While evaluating one pair they keep the other pair as a reference. Chen Li et al. proposed an approach of quality assessment on 360 videos by taking into account both auxiliary tasks of viewpoint prediction[36]. This approach consists of two main phases i.e., viewport proposal and secondly the video quality assessment. Initially, the viewport network was developed and followed by a viewport design to rate the video quality assessment score.
Following the previous methods, Wu et al. presented their dataset for efficient video quality assessment[37], firstly, they generated a subjective score database and add the projection format into it. Then they present a 3D CNN to predict the VR video quality without VR video references. For performance evaluation, they applied a different quality score to assess their proposed approach. Kim et al. proposed a novel deep learning-based approach for VR image quality assessment that automatically predicts the visual quality of an Omni-directional image[38]. Intending to assess the visual quality in viewing the omni-directional image, this VR quality score predictor learns the positional and visual characteristics of the omnidirectional image by encoding the positional features and visual features of a patch on the omni-directional image. Their proposed human perception guider evaluates the predicted quality score by referring to the human subjective score using adversarial learning. For performance evaluation, they conducted comprehensive subjective experiments. Yan et al. considered different distortion types of stitching methods by proposing a perceptual objective stitched images quality assessment including color, ghost, shape, and structure distortion[39]. By utilizing the color difference coefficient, point distance, and info-loss, they designed the quality valuation metrics. To evaluate their data sample, they utilized a subjective quality assessment database for stitched images. Zheng et al. proposed a segmented-spherical projection image quality assessment approach[40]. They convert the ERP format into SSP so that they can resolve the stretching distortion in bipolar regions of the SSP. They extract the features of bipolar and equatorial's regions to predict the quality of distorted OI's. For their findings, they utilized two datasets i.e., CVIQD2018 and MVAQD. Chen et al. proposed stereoscopic omni-directional image quality evaluator to deal with 3D 360 images[41]. In their findings, two different modules were involved namely predictive theory-based as well as multi-view fusion-based module. The authors introduced the predictive theory to simulate the competition between high and ribaldry dominance features to obtain the quality score of viewport images. Furthermore, the authors also explored the quality score of viewport images using both content and location weights of images. Yang et al. proposed an end-to-end neural network based model[42]. They used CNN model to evaluate the quality of panoramic videos of two datasets namely VRQ-TJU and VR-VQA48. They extract the complex spatial-temporal features by combining the CNN and non-local network for panoramic video. Wang et al. proposed a novel bi-directional quality assessment approach for stitched images generate by their own dataset[43]. In their findings, they establish dense correspondence amongst the testing and benchmarked stitched images database i.e., ISIQA database. They extract the color, geometric and structure features. For their performance evaluation, they utilized SVR algorithm for assessing the quality score and attained optimum results in terms of blind quality metrics and quality metrics. The detailed information (including the publication year, their method description, the evaluation dataset, and their quality assessment performance) of the above-mentioned SIQA methods are presented in Table 1. Although, these Full-Reference SIQA methods performed well for stitched image quality assessment task, however, there are several limitations in these methods that include the availability of reference panoramic image or 360-deg video, time consuming pairwise comparison of reference and target panoramic images, and inefficiency in real-time environment. The availability of reference data in case of panoramic imagery are sometime impossible or very difficult, requiring high time-consuming efforts and expertise for data collection and annotation. These limitations make Full-Reference SIQA methods inappropriate for real-time stitched image quality assessment under specific time constraint environment.
2.2 No-reference SIQA methods
In addition to the Full-Reference SIQA methods, there are several No-Reference SIQA approaches that have been proposed to automate the process of panoramic contents quality assessment. In this section, we present the detailed discussion on No-Reference SIQA approaches presented so far with their strength and limitations and analyze them based on their performance. Generally, these No-Reference SIQA methods assess the perceptual quality of panoramic contents or stitched image without any reference image or any prior information. For instance, Leorin et al. presented a panoramic video quality assessment approach for videoconferencing applications[44]. Their method estimates the perceptual quality of panoramic video by utilizing the motion saliency in overlapping regions, calibration variance of contiguous cameras, and non-uniformity of the scene in stitched regions. Xu et al. approached to the performance evaluation of image and video stitching method by using 9 different color correction algorithms[45]. They have used 40 synthetically generated image pairs and 30 original stitched image pairs and evaluate the performance of stitching approach by utilizing color correction algorithms. To localize the stitching artifacts (such as ghosting and shape consistency), Yang et al. proposed a CNN-assisted stitched IQA metric that focuses on stitching distortions in the stitched image[46]. Their method first localize distortion using fine-tuned CNN architecture and later refine the localized region by mapping the error activation obtained from the network. While estimating the quality of stitched image, they weight each distortion based on its size and misrepresentation level. Similarly, Ling et al. presented a Convolutional Sparce Coding (CSC) driven approach to assess the perceptual quality of panoramic images[47]. Their proposed CSC strategy first used a set of convolutional filters to detect the distortion-specific regions in a given stitched image and then used trained kernels to estimate the compound effect of different type of stitching distortions in a single region. Focusing on the quality assessment of stitched images, Gandhe et al. proposed a hybrid wrapping approach that optimize the homography by merging two global and one local warps that helps to rectify the distortion and quantity the structural irregularities in the stitched images[48]. Xia et al. suggested a blind panoramic contents quality assessment method that predict the quality of panoramic image following asymmetric approach that mimic human brain[49]. Their method extracted panoramic-weighted Local Binary Pattern (LBP) and computed relative variation of the panoramic stitched images. The extracted features are then regressed by Support Vector Regressor (SVR) to obtain predicted quality score.
Continuing research in this direction, Yu et al. presented a no-reference stitched image quality assessment metric that focuses on the quality in overlapping regions[50]. They first eliminated the outer points and isolated points using the spherical projection and then used the bounding boxes to locate the overlapping regions in stitched image and estimate its quality. Focusing the SIQA applications in virtual reality, Madhusudana et al. suggested a no-reference based approach named Stitched Image Quality Evaluator (SIQE) that utilized Gaussian mixture technique to detect the bivariate statistics of neighboring coefficients for steerable pyramid decompositions and validated its spatial correlation for quality assessment task[51]. Li et al. presented a deep learning-based stitched image quality assessment approach having two distinct modules namely low-resolution and high-resolution modules[52]. The low-resolution module learns the rules of deformation from the parsing process of dual fisheye to omni-directional images. Whereas the high-resolution module focusing on the enhanced resolution of stitched images. Similarly, Hou et al. proposed a deep learning-based method for blind quality assessment of panoramic stitched images[53]. They used siamese like network comparing the quality of two images having the same scenery and predict the quality score for a given pair of images. Following the deep learning strategy, Ullah et al. suggested a learning-based approach that focuses on stitching distortions segmentation followed by the quality estimation module[54]. Their method first segment the stitching errors in panoramic image using Mask R-CNN and then forward the segmented distorted regions to quality estimation module. The later module estimates the quality by aggregating the distorted pixels over error-free pixels. Subsequently, Sun et al. presented a multichannel CNN-assisted framework for blind quality evaluation of 360° image[55]. They first divided the 360° image into six viewports and forward them as an input to CNN model. Their proposed CNN model first extracted spatial CNN features and then regressed the extracted features using image quality regressor.
On the other hand, Zheng et al. proposed a segmentation-based approach name Segmented Spherical Projection (SSP) to assess the quality of omni-directional images[40]. Their approach first transforms the equirectangular projection (ERP) image to spherical image and then extract the perceptual features from the transformed image with fan-shaped window. Lastly, they pooled the extracted features to predict the quality of distorted omni-directional image. Xu et al. presented a blind SIQA approach named Viewport oriented Graph Convolution Network (VGCN) for omnidirectional images quality assessment[56]. They constructed a spatial viewport graph where the nodes of created graph are defined with the particular viewports having maximum probability to be viewed. Later, reasoning on the constructed graph based on GCN network they predict the quality of panoramic image. Afterward, Poreddy et al. suggested a supervised no-reference quality assessment approach for the quality evaluation of 3D virtual reality images[57]. They computed the Univariate Generalized Gaussian Distribution (UGGD) parameters and multi-orient steerable sub-band decomposition. Lastly, the spatial BRISQUE score and predicted saliency are pooled to predict the final quality score. Ding et al. proposed a blind SIQA approach for panoramic image quality assessment by utilizing Adjacent Pixels Correlation (APC) and statistical features[58]. They estimated the probability distribution of neighboring pixels and difference map using Markov chains to detect the variation in statistical properties of panoramic images. In last, they fed the extracted statistical feature into Support Vector Regression (SVR) algorithm to predict the quality score. Inspired by the Human Visual System (HVS) characteristics and frequency dependent properties, Zhou et al. first parsed equirectangular projection map into wavelet sub-bands and then exploited the entropy values of low-frequency and high-frequency to compute multifrequency information of omni-directional image[59]. Finally, they regressed the multifrequency information and naturalness computation using SVR algorithm and predict the visual quality of panoramic image. Zhang et al. presented a no-reference based method for panoramic video quality assessment[60]. Their method extracts spatial and temporal features from the panoramic video. The extracted spatio-temporal features are then pooled to estimate the final quality prediction score. Zhou et al. proposed a multi-steam CNN-assisted distortion discrimination approach for omni-directional image quality assessment[59]. Their proposed CNN architecture is based on the view port images generated via retinal information received from human VR perceiving experience. Similarly, Sendjasni et al. proposed a CNN based no-reference SIQA approach to predict the perceptual quality of 360° image[66]. Instead of forwarding the entire ERO to CNN, they fed only visually important viewports to their CNN model and extract visual information towards quality estimation score. Tian et al. presented a novel SIQA approach for stitched image quality assessment by exploiting local measurement errors and global statistical characteristics of stitched image[67]. They focused on stitched image properties that include misalignment, structural distortion, geometric error, ghosting, blur, and chromatic errors. The extracted local and global features are regressively aggregated into a quality score using regression algorithm. The detailed overview (including the publication year, their method description, the evaluation dataset, and their quality assessment performance) of the afore-mentioned SIQA approaches are given in Table 2. Despite having better performance than that of No-Reference SIQA methods, these Full-Reference SIQA methods still have numerous drawbacks that include computationally complex architectures, unavailability of end-to-end deep learning architectures, and requiring high computational resources for deployment. Considering the requirements of this technological era, the deep learning architectures must provide a balance tradeoff between the model complexity and accuracy which can be deploy on resource constraint devices and offer reasonable performance. Therefore, based on the current limitations in existing SIQA approaches, a computationally efficient yet accurate deep learning architecture is now the demand of this technological era that enables researchers and computer vision practitioners to deploy resource efficient AL algorithms on smart devices and offer reasonable accuracy.
3 Datasets
Till date, several datasets have been contributed to the SIQA domain, providing both high quality and distorted panoramic image and video data. Among the contributed SIQA datasets, the publicly accessible datasets are discussed in this section with the detail information (including the number of images in the dataset and resolution of images etc.,). The publicly accessible panoramic stitched image quality assessment datasets that are mostly used include: Stitched Image Quality Assessment (SIQA) dataset[24], IISc Stitched Image Quality Assessment (ISIQA) dataset[51], Cross-Reference Omnidirectional Stitching (CROSS) dataset[34], Stereoscopic Stitched Image Database (SSID)[39], Omnidirectional Image Quality Assessment (OIQA) dataset[63], Color Correction-based Stitched Image Database (CCSID)[43], and Compressed VR Image Quality Database (CVIQD2018)[64]. Each dataset is overviewed in the following separate subsection.
3.1 Stitched image quality assessment (SIQA) dataset[24]
SIQA dataset is based on synthesized virtual sceneries in which they obtain the images from virtual scene using the powerful 3D model tool called Unreal Engine. A synthesized panoramic camera with 12-head is set in each scene at different locations which covers the surrounding area of 360 degrees with 90-degree Field of View (FOV) of each camera. All 12 cameras captured a same image at same location simultaneously. Every camera view provides as a complete reference for the stitched view of the cameras to its left and right. SIQA dataset used 12 different 3D scenes wild landscape and structured scenes and they utilize the Nuke stitching tool with different parameters to obtain the two collections of stitched images. They collect 816 stitched image samples with higher definition of 3k and 2k in size. The decision of best stitched images was made on the bases of 28 different viewer annotations and for the utilization of ground truth, 100000 or more decisions are gathered into Mean Subjective Opinion (MOS). This dataset is the first dataset of stitched images which consider the perspective variations and also constructed properly in terms of scale and formation.
3.2 Cross-reference omnidirectional stitching (CROSS) dataset[34]
The collection of data for this dataset is performed through the Samsung gear fisheye 360° camera to enhance the quality and robustness of captured data in different conditions. This dataset is comprised on 10 different scenes, which is further divided into two groups i.e., indoor and outdoor. The Indoor environments includes classroom, meeting room, underground park, dance room, stairs, and reading room while Outdoor environments consists of wild area, basketball court, streets, and residential area. This whole dataset consists of 292 quaternions of fisheye samples while others are obtained from stitching results of 7 different techniques (Samsung Gear, Open Source, WeiMethod, Stereoscopic Vision Projection, ManMethod, Isometric Projection, and Equidistant Projection). The dataset contains variety of indoor and outdoor images such as natural light environment and non-natural light environment. The real fisheye images are taken in highest resolution (5792×2896) through the Samsung gear cameras which were then stitched together using image stitching algorithms. For the collection of omni-directional images, every group comprises of 4 images taken from various orthogonal categories at the same position of camera. For the evaluation of stitching quality, they used the cross-reference images while the fisheye images in orthogonal degrees are referred to cross-reference for the quality evaluation of stitched image at a given degree.
3.3 Stereoscopic stitched image database (SSID)[39]
In this dataset, the input stereoscopic images are taken using FUJIFILM REAL 3D using a variety of complex camera motions, not only sample rotation and plane motion. The dataset comprises of three stitched results for every sample, generated using three exemplary image stitching techniques: APAP[69], homography, and Yan's[70]. The dataset is constructed using 30 samples and every sample is arranged in file, which contains input stitched images, stereoscopic images from 3 stitched techniques and anaglyph stereoscopic samples that is input-R1, input L1, input-R2, input-L2, H-S, H-R, H-L, APAP-S, APAP-R, APAP-L, YAN-S, YAN-R, and YAN-L. For verifying the effectiveness of their proposed technique, a user study is conducted, and 30 participants were invited in this user study which has normal stereoscopic vision. The participant selected for this study was student of graduate and undergraduate program with different ages varies from 18-28. Each participant was guided about the different type of distortion in stitching before starting the test. In first phase, the participants were asked to rate the three 2D stitched left and three 2D stitched right images in random arrangement from 1 to 5(very bad-very good) in random order. In second phase, participants were asked to rate the anaglyph stereoscopic stitched samples from 1 to 5 (very uncomfortable-very comfortable). The total number of images rating was 270 times where each participant rates every sample 9 times.
3.4 Color correction-based stitched image database (CCSID)[43]
The authors developed this dataset to overcome the problem of color aberrations in stitched images. The dataset is collected from 10 sources which include pair images of panoramic. The authors investigated color aberration impact on stitched images by artificially adjusting the contrast, brightness, and saturation. The color aberration simulation is performed in IrfanView software, between two samples in which one sample is artificially adjusted which result a set of images with inconsistent colors. Five level of color difference is defined with different range of contrast (0, 20, -60, 0, and -30), brightness (-99, -50, 0, 50, and -30), and saturation (50, -50, 50, 100, and -40) in each level, respectively. Furthermore, color correction is performed to remove color aberration by utilizing different color correction algorithms. The influence of the color correction algorithms on stitched image are analyzed using image stitching algorithm[71], for stitched images generated from the pair of panoramic images.
3.5 IISc stitched image quality assessment (ISIQA) dataset[51]
This dataset comprised on 264 stitched images which are collected from many stitching algorithms from 26 miscellaneous scenes. The database offers a wide range of perceptual quality and includes a variety of stitching errors such blur color, ghosting, and geometric distortions. Every scene is comprised of many different images with overlapped points of view produced by moving the camera horizontally. The images were captured in high resolution of 4032×2268 smartphone (Samsung Galaxy S7 edge) camera for each viewpoint image. To evaluate the performance of stitching algorithms for static scene, many cautions were taken during image acquisition to avoid object motion between consecutive images. Each scene is comprised of 4-5 pictures taken from various angles, resulting in stitched images with a horizontal field of view of 180-270 degrees and a horizontal resolution of 8000-10000 pixels. The 26 scenes include a wide range of locations, including, indoor and outdoor spaces, buildings, gardens, and public spaces. To create the dataset of stitched images, each set of overlapped viewpoint images is related to a scene is patched using different stitching methods. The dataset images are evaluated by reviews of 6600 different human ratings which is experienced by them in Virtual Reality (VR) Head Mounted Display (HMD).
3.6 Omnidirectional image quality assessment (OIQA) dataset[63]
This dataset is mainly comprising on 320 distorted images and 16 raw images, so total number of 336 images in whole dataset. The image resolution range varies from 11332×5666 to 13320×6660 and the image format is equirectangular format. All raw images are taken by professionals available under the copyrights of Creative Commons (CC). These raw images are carefully checked in order to avoid artifacts, and all these images have close perceptual quality and resolution. This process can reduce the original contents quality influence and can also avoid intrinsic artifacts. Two most common compression methods such as JPEG and JPEG2000 are used for compression and simulation of artifacts during compression. Five levels of compressions are set manually to cover wide range of perceptual quality. As many omnidirectional contents are produced and stored equirectangular format which is followed by compression and transmission, so all the degraded images by distortion are also compressed directly in equirectangular format. White Gaussian Noise and Gaussian blur are most common type of distortion, and these distortion types are less produced in transmission, so it is mainly considered in capturing. As these omnidirectional images are captured by shooting cameras and stitched, so it is hard to find this type of distortion. The source image is split into the array of 15 small blocks which shows the captured scenes by every camera in camera array. Since distortion are produced at each camera sensor, white Gaussian noise and Gaussian blur are added to every block of 15 blocks, and they are again stitched to 1 omni-directional image.
3.7 Compressed VR image quality database (CVIQD2018)[64]
The CVIQD2018 dataset include sixteen images in which four image are extracted from JVET test video and 12 images are captured by Spherical VR 4K Insta360 camera. Each sample in the dataset has the resolution of 4096×2048 with diverse scenes such as landscapes, persons, objects, and towns. After collection of the dataset the authors encoded the samples with three types of encoding techniques such as JPEG[72], Advanced Video Coding (AVC) or H.264[73], and High Efficiency Video Coding (HEVC) or H.265[74]. The JPEG is a common technique used to compress digital images while the other techniques are used to compress videos. For more detail the authors used JPEG compression technique with -5 interval and 50 to 0 quality factors while the AVC and HEVC are used with 30 to 50 factor and 2 intervals. This dataset includes sixteen source images and a total of 528 images encoded in compressed form.
4 Evaluation metrics
The decision-making ability of a machine learning or deep learning-based stitched image quality assessment algorithm cannot be fully efficient or reliable enough in terms of precision. To evaluate the performance of SIQA algorithm for quality assessment task, the algorithm must be evaluated using some performance evaluation metrices. In SIQA domain there are several evaluation metrices that can be in use for analyzing the performance of an SIQA algorithm, where each evaluation metric is overviewed in the following subsections:
4.1 Pearson linear correlation coefficient (PLCC)[75]
The Pearson Linear Correlation Coefficient (PLCC) metric is one of the most widely used evaluation metrices for validating the performance of image quality assessment methods. This metric simply computes or quantify the correlation (relationship) between two data points. In image quality assessment perspective, it can be used for comparing the performance of panoramic image quality assessment metrics and methods. The output value of PLCC metric can either be positive or negative, the positive value indicates the positive correlation (point A increases as point B increases) where the negative value denotes the negative correlation (point A is increases as point B decreases). The correlation value 0 mean there is no relationship between point A and point B. Mathematically, the PLCC metric can be formulated as follows:
P L C C ( x ,   y ) = c o v ( x ,   y ) σ x σ y = i = 1 n ( x i   - x ¯ ) ( y i   - y ¯ ) i = 1 n ( x i   - x ¯ ) 2 i = 1 n ( y i   - y ¯ ) 2
Here cov is the convenience of input samples i.e., x and y. σx and σy is the standard deviation of sample x and y where and ȳ are the sample means of input variable x and y, respectively.
4.2 Spearmans rank order correlation coefficient (SROCC) [76]
The Spearman's Rank Order Correlation Coefficient (SROCC) metric is also commonly used for comparing the performance of image quality assessment metrices/methods and have similar characteristics as that of PLCC metric. However, it first computes the ranks of two data points and then estimates the correlation between the computed ranks and predict the strength of the correlation by providing scaler values ranging (+1, 0, ‒1). The positive output value (+1) indicates the perfect positive relationship (the value of R(xi ) is increases as the value of R(yi ) increases) between two distinct ranks, where the negative output value (‒1) shows the perfect negative relationship (the value of R(xi ) is increases as the value of R(yi ) decreases) between two ranks. The output value 0 indicates no relationship between the two ranks. Mathematically, the SROCC metric can be expressed as follows.
S R O C C = 1 - 6 d i 2 n ( n 2 - 1 )
Here di 2 is the squared difference between the ranks of two observations, where R(xi ) and R(yi ) is the ranks of data point xi and yi , respectively. The variable n indicates the total number of observations.
4.3 Root mean square error (RMSE)[77]
The Root Mean Square Error (RMSE) metric also as the Root Mean Square Deviation (RMSD) commonly used for performance evaluation of image quality assessment metrices/methods. It provides pair-wise comparison by computing the difference between the model predicted value and the actual ground-truth value. The smaller the RMSE value the better the model's performance and vice versa. It provides a single scaler value by computing the sum of squared difference under the square root. Although this metric is considered useful for model performance evaluation, a very few huge errors in the sum can may cause substantial increase in the RMSE value which will reflect the worse performance of a model. The mathematically formulation of RMSE is given below:
R M S E = i = = 1 n ( y ̂ i   - y i ) 2 n
Here in equation (3), ŷᵢ is the predicted value and yᵢ is the actual ground truth value. Where n is the total number of data points. Further, (ŷᵢ yᵢ)² is the squared difference between the predicted value and the ground truth value.
4.4 Mean absolute error (MAE)[77]
The Mean Absolute Error (MAE) is considered one of the reliable metrices for computing the pair-wise difference between two data points and widely used for evaluating the performance of image quality assessment metrics/methods. More precisely, it is the arithmetic mean of the absolute error between two data points and provide a single numeric value that indicate the model's performance. In image quality assessment perspective, the two data points for comparison are basically the feature points of predicted and ground-truth images. The MAE metric computes the sum of absolute difference between the feature points of predicted and ground-truth image and then take the average of the summed absolute difference. The mathematical representation of MAE is as follows:
M A E = i = 1 n y i   - x i n
Here yi is the predicted value and xi ground truth value. Where n is the total number of data samples.
5 Panoramic/stitched contents quality assessment: challenges and recommendations
In this survey we provide a detailed overview of the research efforts devoted by the research community to the SIQA domain till date. The reported research studies developed both traditional (handcrafted features and machine learning) approaches and deep learning algorithms to obtain the convincing results for panoramic image quality assessment task. Despite the extensive efforts and contribution of research community to the SIQA domain, still there are few challenges that make difficulties while dealing with panoramic stitched image quality assessment task. Considering these challenges, we also recommend future research directions to move a step ahead towards efficient and robust SIQA system.
5.1 Current challenges
In this section, we will overview the major challenges found in the literature of panoramic stitched image quality assessment that significantly limit the performance of SIQA methods. The shortlisted key challenges in SIQA domain are discussed in the following subsections.
(1) Unavailability of benchmark panoramic image dataset
The SIQA literature provide several SIQA datasets[24,34,39,43,51,63,64], however the data consistency among these datasets is very poor and cannot be used for reproduction of experimentation. Researchers in the SIQA domain are enthusiastically working to provide a generic yet large-scale dataset that meets the requirements of benchmark dataset that include size of the dataset, data diversity, and class balance (number of samples per class) throughout the dataset. However, still there is no publicly available generic benchmark dataset that provide distorted panoramic data covering most of the distortion types that usually occur during image stitching process. The scarcity of such a benchmark dataset significantly restricts the performance of deep learning algorithms and make it difficult for reproducibility of panoramic image quality assessment experiments.
(2) Unavailability of efficient end-to-end deep SIQA models for real-time environment
In the current SIQA literature, there are several deep learning-based panoramic stitched image quality assessment methods. However, most of them used multiscale and computationally complex CNN architectures that significantly increase the computational cost and running time of overall SIQA framework. Such a complex deep learning architectures are mostly failed to perform in real-time environment due to their high computational requirements. The best solution to solve this issue to propose an end-to-end deep learning architecture requiring limited computational requirements. An end-to-end architectures are mostly used for real-time environment where it receive input at one end and provide processed output at the other end of the network. Currently, there is no single deep learning-based SIQA method which provides end-to-end learning capability by extracting low-level and high-level features, localizing the distortion specific region, and estimating the overall quality at once as a final output. In the absence of such an end-to-end CNN-assisted SIQA algorithm it is very difficult to analyze the quality of panoramic contents in real-time thereby limiting its application in immersive media industry for practical usage.
(3) Scarcity of distortion localization-based SIQA methods
The existing deep learning-based SIQA approaches have shown considerable improvements in the panoramic image quality assessment field by introducing more robust and efficient algorithms. However, still these CNN-assisted methods are unable to localize or segment stitching distortions in given panoramic image. Although, there are few approaches[24,46,54] that can localize or segment the distortion specific regions, they are focusing on different types of stitching distortions and cannot be used for localizing the same distorted regions. The unavailability of such an efficient localization-based SIQA approaches make it unable to segment, localize, or track stitching distortions in panoramic image or 360° videos.
5.2 Recommendation and future research directions
Baring the key challenges in mind, we discuss the important aspects of the panoramic stitched image quality assessment with brief explanation for further research towards advanced and efficient SIQA. The area of panoramic stitched image quality can be further improved and mature by focusing on the shortlisted key future research directions.
(1) Availability of generic dataset for panoramic stitched image quality assessment
Despite the availability of SIQA datasets[24,34,39,43,51,63,64] for panoramic stitched image quality assessment task, there is still a need of generic large-scale 360° video dataset that covers almost all type of stitching distortion (including parallax, ghosting, motion sickness, cyber sickness, and color blending distortion etc.,). Such dataset will provide a benchmark data repository having variety of distorted panoramic data for stitched image quality assessment task. The availability of a large-scale benchmark SIQA dataset will not only improve the performance of SIQA methods but will also motivate the concerned researchers to develop more efficient and deep CNN architectures that meets the requirement of the benchmark dataset. Thus, it is strongly recommended to establish a large-scale benchmark SIQA dataset and make it publicly available to the research community for future research in this domain.
(2) Active learning capabilities for SIQA
A deep learning-based SIQA algorithm usually fails when test with the data having additional unseen information. For instance, a deep learning model trained on one type of panoramic distorted data may no be applicable for other type of stitched distorted image data due to spatial variation in the image data. Similarly, a trained SIQA deep learning algorithm cannot be tested on a newly introduced stitching errors or data compression distortions. Such kind of issues can be overcome with the help of active learning or online learning approach by updating the already trained model with new panoramic image data. In recent years, researchers around the world have actively evaluated active learning approach in various computer vision fields including image classification[78] and object detection[79] tasks. In active learning pretrained deep learning model iteratively adjust the weights on newly provided data and adopt the additional knowledge without too much training efforts. Therefore, active learning can bring significant improvements in the SIQA domain in terms of scalability as well as accuracy.
(3) Efficient deep SIQA for practical applications
Notwithstanding the existence of several No-Reference SIQA methods, there is no such algorithm that can be used for practical application (such as standard tool for panoramic contents quality assessment). The recently presented deep learning-based No-Reference SIQA algorithm have shown better performance as compared to the traditional SIQA algorithms. However, still there is a big room for improvement in deep learning-based SIQA algorithms in terms of robustness, precision, and real-time applicability. Mostly, the existing AI-driven SIQA algorithms are computationally complex which make them unsuitable for real-time applications in SIQA domain. While designing a deep learning algorithm for stitched image quality assessment task, there are several factors that need to be consider that include model complexity (number of parameters require to train on a given data), the real-time applicability assessment, and the executional environment for model deployment. The more better tradeoff between the model complexity and model performance the more better the results will be in the targeted executional environment. Considering the lack of the abovementioned factors in the existing AI-assisted SIQA algorithms, there is need of an efficient, yet accurate SIQA algorithm that can be used for practical application thereby providing real-time processing capabilities with reasonable accuracy.
6 Conclusion
Considering the recent developments in the domain of stitched image quality assessment and deficiencies in the existing SIQA methods, we presented a brief yet informative survey of SIQA methods. In this survey, we exclusively focused on objective SIQA methods and cover the literature of objective SIQA from 2005 to 2021 by discussing their workflow towards stitched image quality assessment with their strength and weaknesses. To drive the attention of concerned researchers towards stitched image quality assessment, we first provided the detailed discussion of objective SIQA methods by dividing the existing methods into Full-Reference SIQA and No-Reference SIQA methods. Next, we overviewed the publicly available benchmark SIQA dataset used for panoramic image quality assessment task and its technical detail (number of images, type of images, and type of distortions etc.). Following this, we shortlisted the commonly used evaluation metrics for panoramic image quality assessment and provided their mathematical explanation for each metric. Further, we highlighted the key challenges, such as the unavailability of generic public SIQA datasets (such datasets that consider almost all type of stitching distortions including, parallax, blending seams, ghosting artifacts, blurriness, and motion distortion etc.) and fully end-to-end deep learning-based No-Reference SIQA methods that can completely automate the SIQA task. Finally, we recommended detailed research directions for future research in panoramic image quality assessment area that include the development of fully end-to-end trainable deep learning-based No-Reference SIQA methods, the exploration of recently introduced attention-oriented CNN architectures[80,81] and Vision Transformers models[82,83] for stitching distortion localization. Such exploration and advancement in this domain can significantly improve the maturity level and applicability of SIQA methods for high-level vision tasks in VR and AR technology.
As a final note, undoubtedly the enthusiastic researchers in SIQA domain are actively working to develop more efficient yet precise algorithm for the panoramic content quality assessment. However, they are continuously struggling to achieve their desire performance by examining both traditional approaches and CNN architectures with different settings towards panoramic contents quality assessment. We firmly believe that the recommended research directions given in this survey will greatly facilitate the concerned researchers working in the SIQA domain and will help the newcomers by presenting the detailed overview of the SIQA methods.



Li Y, Huang J, Tian F, Wang H A, Dai G Z. Gesture interaction in virtual reality. Virtual Reality & Intelligent Hardware, 2019, 1(1): 84–112 DOI:10.3724/sp.j.2096-5796.2018.0006


Zheng L Y, Liu X, An Z W, Li S F, Zhang R J. A smart assistance system for cable assembly by combining wearable augmented reality with portable visual inspection. Virtual Reality & Intelligent Hardware, 2020, 2(1): 12–27 DOI:10.1016/j.vrih.2019.12.002


Zhang H X, Zhang J, Yin X, Zhou K, Pan Z G. Cloud-to-end rendering and storage management for virtual reality in experimental education. Virtual Reality & Intelligent Hardware, 2020, 2(4): 368–380 DOI:10.1016/j.vrih.2020.07.001


Qian J C, Ma Y C, Pan Z G, Yang X B. Effects of virtual-real fusion on immersion, presence, and learning performance in laboratory education. Virtual Reality & Intelligent Hardware, 2020, 2(6): 569–584 DOI:10.1016/j.vrih.2020.07.010


Tai Y H, Shi J S, Pan J J, Hao A M, Chang V. Augmented reality-based visual-haptic modeling for thoracoscopic surgery training systems. Virtual Reality & Intelligent Hardware, 2021, 3(4): 274–286 DOI:10.1016/j.vrih.2021.08.002


Wang H Y, Wu J H. A virtual reality based surgical skills training simulator for catheter ablation with real-time and robust interaction. Virtual Reality & Intelligent Hardware, 2021, 3(4): 302–314 DOI:10.1016/j.vrih.2021.08.004


Farley O R L, Spencer K, Baudinet L. Virtual reality in sports coaching, skill acquisition and application to surfing: a review. Journal of Human Sport and Exercise, 2019, 15(3): 535–548. DOI:10.14198/jhse.2020.153.06


Soltani P, Morice A H P. Augmented reality tools for sports education and training. Computers & Education, 2020, 155: 103923 DOI:10.1016/j.compedu.2020.103923


Syamimi A, Gong Y W, Liew R. VR industrial applications―A Singapore perspective. Virtual Reality & Intelligent Hardware, 2020, 2(5): 409–420 DOI:10.1016/j.vrih.2020.06.001


Zhu W M, Fan X M, Zhang Y X. Applications and research trends of digital human models in the manufacturing industry. Virtual Reality & Intelligent Hardware, 2019, 1(6): 558–579 DOI:10.1016/j.vrih.2019.09.005


Lyu W, Zhou Z, Chen L, Zhou Y. A survey on image and video stitching. Virtual Reality & Intelligent Hardware, 2019, 1(1): 55–83 DOI:10.3724/sp.j.2096-5796.2018.0008


Lee K Y, Sim J Y. Warping residual based image stitching for large parallax. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA, IEEE, 2020, 8195–8203 DOI:10.1109/cvpr42600.2020.00822


Brunet D, Vrscay E R, Wang Z. On the mathematical properties of the structural similarity index. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society, 2012, 21(4): 1488–1499 DOI:10.1109/tip.2011.2173206


Kong Y Q, Cui L, Hou R. Full-reference IPTV image quality assessment by deeply learning structural cues. Signal Processing: Image Communication, 2020, 83: 115779 DOI:10.1016/j.image.2020.115779


Wang Z, Simoncelli E P, Bovik A C. Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003. GrovePacific, CA, USA, IEEE, 2003, 1398–1402 DOI:10.1109/acssc.2003.1292216


Xue W F, Zhang L, Mou X Q, Bovik A C. Gradient magnitude similarity deviation: a highly efficient perceptual image quality index. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society, 2014, 23(2): 684–695 DOI:10.1109/tip.2013.2293423


Huang X, Wen D W, Xie J F, Zhang L P. Quality assessment of panchromatic and multispectral image fusion for the ZY-3 satellite: from an information extraction perspective. IEEE Geoscience and Remote Sensing Letters, 2014, 11(4): 753–757 DOI:10.1109/lgrs.2013.2278551


Liu H, Zhang Y, Zhang H, Fan C, Kwong S, Kuo C J, Fan X. Deep learning based picture-wise just noticeable distortion prediction model for image compression. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society, 2019 DOI:10.1109/tip.2019.2933743


Moorthy A K, Bovik A C. Blind image quality assessment: from natural scene statistics to perceptual quality. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society, 2011, 20(12): 3350–3364 DOI:10.1109/tip.2011.2147325


Liu L X, Liu B, Huang H, Bovik A C. No-reference image quality assessment based on spatial and spectral entropies. Signal Processing: Image Communication, 2014, 29(8): 856–863 DOI:10.1016/j.image.2014.06.006


Mittal A, Soundararajan R, Bovik A C. Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters, 2013, 20(3): 209–212 DOI:10.1109/lsp.2012.2227726


Moorthy A K, Bovik A C. A two-step framework for constructing blind image quality indices. IEEE Signal Processing Letters, 2010, 17(5): 513–516 DOI:10.1109/lsp.2010.2043888


Saad M A, Bovik A C, Charrier C. Blind image quality assessment: a natural scene statistics approach in the DCT domain. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society, 2012, 21(8): 3339–3352 DOI:10.1109/tip.2012.2191563


Yang L Y, Cheung G, Tan Z G, Huang Z. A content-aware metric for stitched panoramic image quality assessment. In: 2017 IEEE International Conference on Computer Vision Workshops. Venice, Italy, IEEE, 2017, 2487–2494 DOI:10.1109/iccvw.2017.293


Zhou X S, Zhang H Y, Wang Y J. A multi-image stitching method and quality evaluation. In: 2017 4th International Conference on Information Science and Control Engineering (ICISCE). Changsha, China, IEEE, 2017, 46–50 DOI:10.1109/icisce.2017.20


Xu M, Li C, Liu Y F, Deng X, Lu J X. A subjective visual quality assessment method of panoramic videos. In: 2017 IEEE International Conference on Multimedia and Expo. Hong Kong, China, IEEE, 2017, 517–522


Zhang B, Zhao J Z, Yang S, Zhang Y, Wang J, Fei Z S. Subjective and objective quality assessment of panoramic videos in virtual reality environments. In: 2017 IEEE International Conference on Multimedia & Expo Workshops. Hong Kong, China, IEEE, 2017, 163–168 DOI:10.1109/icmew.2017.8026226


Yang J C, Liu T L, Jiang B, Song H B, Lu W. 3D panoramic virtual reality video quality assessment based on 3D convolutional neural networks. IEEE Access, 2018, 6: 38669–38682 DOI:10.1109/access.2018.2854922


de A Azevedo R G, Birkbeck N, Janatra I, Adsumilli B, Frossard P. A viewport-driven multi-metric fusion approach for 360-degree video quality assessment. In: 2020 IEEE International Conference on Multimedia and Expo. London, UK, IEEE, 2020, 1–6 DOI:10.1109/icme46284.2020.9102936


Guo P, Shen Q, Ma Z, Brady D J, Wang Y. Perceptual Quality Assessment of Immersive Images Considering Peripheral Vision Impact. 2018


Chen S J, Zhang Y X, Li Y M, Chen Z Z, Wang Z. Spherical structural similarity index for objective omnidirectional video quality assessment. In: 2018 IEEE International Conference on Multimedia and Expo. San Diego, CA, USA, IEEE, 2018, 1–6 DOI:10.1109/icme.2018.8486584


Zhang Y X, Wang Y B, Liu F Y, Liu Z Z, Li Y M, Yang D Q, Chen Z Z. Subjective panoramic video quality assessment database for coding applications. IEEE Transactions on Broadcasting, 2018, 64(2): 461–473 DOI:10.1109/tbc.2018.2811627


Lim H T, Kim H G, Ra Y M. VR IQA NET: deep virtual reality image quality assessment using adversarial learning. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Calgary, AB, Canada, IEEE, 2018, 6737–6741 DOI:10.1109/icassp.2018.8461317


Li J, Yu K W, Zhao Y F, Zhang Y, Xu L. Cross-reference stitching quality assessment for 360° omnidirectional images. MM '19: Proceedings of the 27th ACM International Conference on Multimedia. 2019, 2360–2368 DOI:10.1145/3343031.3350973


Yu K W, Li J, Zhang Y, Zhao Y F, Xu L. Image quality assessment for omnidirectional cross-reference stitching. 2019


Li C, Xu M, Jiang L, Zhang S Y, Tao X M. Viewport proposal CNN for 360° video quality assessment. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, IEEE, 2019, 10169–10178 DOI:10.1109/cvpr.2019.01042


Wu P, Ding W X, You Z X, An P. Virtual reality video quality assessment based on 3d convolutional neural networks. In: 2019 IEEE International Conference on Image Processing. Taipei, Taiwan, China, IEEE, 2019, 3187–3191 DOI:10.1109/icip.2019.8803023


Kim H G, Lim H T, Ro Y M. Deep virtual reality image quality assessment with human perception guider for omnidirectional image. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(4): 917–928 DOI:10.1109/tcsvt.2019.2898732


Yan W Q, Yue G H, Fang Y M, Chen H, Tang C, Jiang G Y. Perceptual objective quality assessment of stereoscopic stitched images. Signal Processing, 2020, 172: 107541 DOI:10.1016/j.sigpro.2020.107541


Zheng X L, Jiang G Y, Yu M, Jiang H. Segmented spherical projection-based blind omnidirectional image quality assessment. IEEE Access, 2020, 8: 31647–31659 DOI:10.1109/access.2020.2972158


Chen Z B, Xu J H, Lin C Y, Zhou W. Stereoscopic omnidirectional image quality assessment based on predictive coding theory. IEEE Journal of Selected Topics in Signal Processing, 2020, 14(1): 103–117 DOI:10.1109/jstsp.2020.2968182


Yang J C, Liu T L, Jiang B, Lu W, Meng Q G. Panoramic video quality assessment based on non-local spherical CNN. IEEE Transactions on Multimedia, 2021, 23: 797–809 DOI:10.1109/tmm.2020.2990075


Wang X J, Chai X L, Shao F. Quality assessment for color correction-based stitched images via bi-directional matching. Journal of Visual Communication and Image Representation, 2021, 75: 103051 DOI:10.1016/j.jvcir.2021.103051


Leorin S, Lucchese L, Cutler R G. Quality assessment of panorama video for videoconferencing applications. In: 2005 IEEE 7th Workshop on Multimedia Signal Processing. Shanghai, China, IEEE, 2005, 1–4


Xu W, Mulligan J. Performance evaluation of color correction approaches for automatic multi-view image and video stitching. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA, IEEE, 2010, 263–270 DOI:10.1109/cvpr.2010.5540202


Yang L Y, Liu J, Gao C Q. An error-activation-guided blind metric for stitched panoramic image quality assessment. Computer Vision, 2017,256–268 DOI:10.1007/978-981-10-7302-1_22


Ling S Y, Cheung G, le Callet P. No-reference quality assessment for stitched panoramic images using convolutional sparse coding and compound feature selection. In: 2018 IEEE International Conference on Multimedia and Expo. San Diego, CA, USA, IEEE, 2018, 1–6 DOI:10.1109/icme.2018.8486545


Gandhe S T, Omkar S. Blind image quality evaluation of stitched image using novel hybrid warping technique. International Journal of Advanced Computer Science and Applications, 2019, 10(6): 384–389 DOI:10.14569/ijacsa.2019.0100649


Xia Y M, Wang Y F, Peng Y. Blind panoramic image quality assessment via the asymmetric mechanism of human brain. In: 2019 IEEE Visual Communications and Image Processing. Sydney, NSW, Australia, IEEE, 2019, 1–4 DOI:10.1109/vcip47243.2019.8965887


Yu S J, Li T S, Xu X Y, Tao H, Yu L, Wang Y X. NRQQA: a no-reference quantitative quality assessment method for stitched images. MMAsia '19: Proceedings of the ACM Multimedia Asia. 2019, 1–6 DOI:10.1145/3338533.3366563


Madhusudana P C, Soundararajan R. Subjective and objective quality assessment of stitched images for virtual reality. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society, 2019, 28(11): 5620–5635 DOI:10.1109/tip.2019.2921858


Li J, Zhao Y F, Ye W H, Yu K W, Ge S M. Attentive deep stitching and quality assessment for 360° omnidirectional images. IEEE Journal of Selected Topics in Signal Processing, 2020, 14(1): 209–221 DOI:10.1109/jstsp.2019.2953950


Hou J W, Lin W S, Zhao B Q. Content-dependency reduction with multi-task learning in blind stitched panoramic image quality assessment. In: 2020 IEEE International Conference on Image Processing. Abu Dhabi, United Arab Emirates, IEEE, 2020, 3463–3467 DOI:10.1109/icip40778.2020.9191241


Ullah H, Irfan M, Han K, Lee J W. DLNR-SIQA: deep learning-based no-reference stitched image quality assessment. Sensors, 2020, 20(22): 6457 DOI:10.3390/s20226457


Sun W, Min X K, Zhai G T, Gu K, Duan H Y, Ma S W. MC360IQA: a multi-channel CNN for blind 360-degree image quality assessment. IEEE Journal of Selected Topics in Signal Processing, 2020, 14(1): 64–77 DOI:10.1109/jstsp.2019.2955024


Xu J H, Zhou W, Chen Z B. Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(5): 1724–1737 DOI:10.1109/tcsvt.2020.3015186


Poreddy A K R, Kara P A, Appina B, Simon A. A no-reference 3D virtual reality image quality assessment algorithm based on saliency statistics. In: Optics and Photonics for Information Processing XV. San Diego, USA, SPIE, 2021 DOI:10.1117/12.2597327


Ding W X, An P, Liu X, Yang C, Huang X P. No-reference panoramic image quality assessment based on adjacent pixels correlation. In: 2021 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting. Chengdu, China, IEEE, 2021, 1–5 DOI:10.1109/bmsb53066.2021.9547132


Zhou W, Xu J H, Jiang Q P, Chen Z B. No-reference quality assessment for 360-degree images by analysis of multifrequency information and local-global naturalness. IEEE Transactions on Circuits and Systems for Video Technology, 1182, PP(99): 1 DOI:10.1109/tcsvt.2021.3081182


Zhang Y X, Liu Z Z, Chen Z Z, Xu X Z, Liu S. No-reference quality assessment of panoramic video based on spherical-domain features. In: 2021 Picture Coding Symposium (PCS). Bristol, United Kingdom, IEEE, 2021, 1–5 DOI:10.1109/pcs50896.2021.9477498


Li C, Xu M, Du X Z, Wang Z L. Bridge the gap between VQA and human behavior on omnidirectional video: a large-scale dataset and a deep learning model. MM '18: Proceedings of the 26th ACM International Conference on Multimedia. 2018, 932–940 DOI:10.1145/3240508.3240581


Xiao J X, Ehinger K A, Oliva A, Torralba A. Recognizing scene viewpoint using panoramic place representation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA, IEEE, 2012, 2695–2702 DOI:10.1109/cvpr.2012.6247991


Duan H Y, Zhai G T, Min X K, Zhu Y C, Fang Y, Yang X K. Perceptual quality assessment of omnidirectional images. In: 2018 IEEE International Symposium on Circuits and Systems. Florence, Italy, IEEE, 2018, 1–5 DOI:10.1109/iscas.2018.8351786


Sun W, Gu K, Ma S W, Zhu W H, Liu N, Zhai G T. A large-scale compressed 360-degree spherical image database: from subjective quality evaluation to objective model comparison. In: 2018 IEEE 20th International Workshop on Multimedia Signal Processing. Vancouver, BC, Canada, IEEE, 2018, 1–6 DOI:10.1109/mmsp.2018.8547102


Chen M X, Jin Y Z, Goodall T, Yu X X, Bovik A C. Study of 3D virtual reality picture quality. IEEE Journal of Selected Topics in Signal Processing, 2020, 14(1): 89–102 DOI:10.1109/jstsp.2019.2956408


Sendjasni A, Larabi M C, Cheikh F A. Perceptually-weighted cnn for 360-degree image quality assessment using visual scan-path and jnd. In: 2021 IEEE International Conference on Image Processing. Anchorage, AK, USA, IEEE, 2021, 1439–1443 DOI:10.1109/icip42928.2021.9506044


Tian C Z, Chai X L, Shao F. Stitched image quality assessment based on local measurement errors and global statistical properties. Journal of Visual Communication and Image Representation, 2021, 81: 103324 DOI:10.1016/j.jvcir.2021.103324


Zhou Y, Sun Y J, Li L D, Gu K, Fang Y M. Omnidirectional image quality assessment by distortion discrimination assisted multi-stream network. IEEE Transactions on Circuits and Systems for Video Technology, 1162, PP(99): 1 DOI:10.1109/tcsvt.2021.3081162


Zaragoza J, Chin T J, Brown M S, Suter D. As-projective-as-possible image stitching with moving DLT. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA, IEEE, 2013, 2339–2346 DOI:10.1109/cvpr.2013.303


Yan W Q, Hou C P, Lei J J, Fang Y M, Gu Z Y, Ling N. Stereoscopic image stitching based on a hybrid warping model. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 27(9): 1934–1946 DOI:10.1109/tcsvt.2016.2564838


Chang C H, Sato Y, Chuang Y Y. Shape-preserving half-projective warps for image stitching. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA, IEEE, 2014, 3254–3261 DOI:10.1109/cvpr.2014.422


Wallace G K. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics, 1992, 38(1): xviii–xxxiv DOI:10.1109/30.125072


Wiegand T, Sullivan G J, Bjontegaard G, Luthra A. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 2003, 13(7): 560–576 DOI:10.1109/tcsvt.2003.815165


Sullivan G J, Ohm J R, Han W J, Wiegand T. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology, 2012, 22(12): 1649–1668 DOI:10.1109/tcsvt.2012.2221191


Benesty J, Chen J, Huang Y, Cohen I. Noise Reduction in Speech Processing. 2009


Sheskin D. Spearman's rank-order correlation coefficient. Handbook of Parametric and Nonparametric Statistical Procedures. 2007, 1353–1370


Brassington G. Mean absolute error and root mean square error: which is the better metric for assessing model performance? EGU General Assembly Conference Abstracts, 2017, 3574


Huang Y, Liu Z W, Jiang M H, Yu X, Ding X H. Cost-effective vehicle type recognition in surveillance images with deep active learning and web data. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(1): 79–86 DOI:10.1109/tits.2018.2888698


Feng C, Liu M Y, Kao C C, Lee T Y. Deep active learning for civil infrastructure defect detection and classification. In: ASCE International Workshop on Computing in Civil Engineering 2017. Seattle, Washington, Reston, VA, USA: American Society of Civil Engineers, 2017, 298–306 DOI:10.1061/9780784480823.036


Ji Y Z, Zhang H J, Jonathan Wu Q M. Salient object detection via multi-scale attention CNN. Neurocomputing, 2018, 322: 130–140 DOI:10.1016/j.neucom.2018.09.061


Xu Q, Xiao Y, Wang D Y, Luo B. CSA-MSO3DCNN: multiscale octave 3D CNN with channel and spatial attention for hyperspectral image classification. Remote Sensing, 2020, 12(1): 188 DOI:10.3390/rs12010188


Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Houlsby N. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2020


Zhou D, Kang B, Jin X, Yang L, Lian X, Hou Q, Feng J. DeepViT: Towards Deeper Vision Transformer. 2021