Adv Search
Home | Accepted | Article In Press | Current Issue | Archive | Special Issues | Collections | Featured Articles | Statistics

2021,  3 (1):   33 - 42   Published Date:2021-2-20

DOI: 10.1016/j.vrih.2020.10.002
1 Introduction2 Related work2.1 RPPG-based HR estimation2.2 NAS3 Proposed method3.1 ROI detection from the face3.2 POS-STMap for pulse representation3.3 NAS4 Results4.1 Datasets and protocols4.2 Implementation details4.3 Baseline methods and measures4.4 Results on VIPL-HR4.5 Results on PURE4.6 Further analysis5 Conclusions


In anticipation of its great potential application to natural human-computer interaction and health monitoring, heart-rate (HR) estimation based on remote photoplethysmography has recently attracted increasing research attention. Whereas the recent deep-learning-based HR estimation methods have achieved promising performance, their computational costs remain high, particularly in mobile-computing scenarios.
We propose a neural architecture search approach for HR estimation to automatically search a lightweight network that can achieve even higher accuracy than a complex network while reducing the computational cost. First, we define the regions of interests based on face landmarks and then extract the raw temporal pulse signals from the R, G, and B channels in each ROI. Then, pulse-related signals are extracted using a plane-orthogonal-to-skin algorithm, which are combined with the R and G channel signals to create a spatial-temporal map. Finally, a differentiable architecture search approach is used for the network-structure search.
Compared with the state-of-the-art methods on the public-domain VIPL-HR and PURE databases, our method achieves better HR estimation performance in terms of several evaluation metrics while requiring a much lower computational cost1.


1 Introduction
In natural human-computer interaction, the machine requires deep understanding of human behavior and intention to promptly and accurately respond. The heart-rate (HR) value can reflect the psychological and physical status of a person and is thus one of the important aspects for achieving deep understanding of behavior and intention. However, traditional HR measurement methods such as electrocardiograph and photoplethysmography (PPG) require contact of the participant skin and are inconvenient for long-term monitoring. Recently, non-contact HR measurement based on remote PPG (rPPG) has attracted considerable research interests[1-7]. The principle behind the rPPG-based HR estimation is that the change in the blood volume caused by heartbeat introduces variation in the light absorption of blood vessels, and thus, the periodic changes in the skin brightness can reflect the HR. In addition, the face area is the least covered skin area on a human body. Therefore, the HR can be estimated based on the color change in the face skin.
Early approaches in HR estimation usually leveraged hand-crafted features to represent the HR signal. Furthermore, these HR estimation methods used blind signal separation (BBS) or optimal space transformation[8-12]. Wang et al. analyzed the core principles of these methods and proposed the plane-orthogonal-to-skin (POS) algorithm that considers the pertinent optical and physiological properties of skin reflections[13]. POS demonstrates good robustness under controlled conditions. However, many challenges are encountered during its practice, including varying lighting conditions, expression changes, and head movement. The traditional hand-crafted algorithms cannot consider all the conditions that affect the HR estimation, which can only be applied under specific conditions.
Following the considerable success in computer vision, deep learning (DL) was used to improve robustness and accuracy in HR estimation. With the increasing number of publicly available datasets, researchers have shifted their attention from data augmentation to efficient network design for HR estimation[14,15]. However, most DL models are manually designed based on the researcher experience, which may not be optimum for the HR estimation task and may require more extra time and computing resources. Thus, learning a better network structure for automatic HR estimation is necessary.
In the present paper, we propose an approach called neural architecture search-HR (NAS-HR) to perform efficient HR estimation. First, pulse signals are extracted from facial regions of interest (ROIs). Then, these signals are preprocessed using POS to further enhance the periodic pulse signals. We then combine the processed signals into a spatial-temporal map (STMap) for deep feature learning. Differentiable architecture search (DARTS) is used to automatically search an efficient network structure during the deep feature learning[16]. We search the network structure within a lightweight search space to reduce the complexity of the model. In summary, the contributions of this work are described as follows.
(1) We propose an efficient approach for HR estimation, which leverages NAS to obtain an optimized lightweight model with better HR estimation accuracy and reduced computational cost.
(2) We propose a new representation for pulse information, i.e., build a STMap with POS signals, which leads to more efficient learning using the backbone convolutional neural network (CNN).
(3) The proposed NAS-HR outperforms a number of state-of-the-art methods in terms of both HR measurement accuracy and computational cost.
2 Related work
This section briefly reviews the existing methods on rPPG-based HR estimation and NAS.
2.1 RPPG-based HR estimation
Many signal-analysis-based algorithms have been proposed to estimate HR from a face video. Poh et al. proposed an HR estimation algorithm based on the principal component analysis to extract periodic pulse signals[8]. Balakrishnan et al. employed independent component analysis to separate the rPPG signals from raw temporal signals for HR estimation[9]. Haan and Jeanne discussed the limitations of BBS and constructed two chrominance orthogonal signals to suppress the noise in the time series[10]. Haan et al. studied the signature of rPPG signals at different wavelengths and then proposed the PBV method based on this signature to extract the pulse-induced signal[11]. Wang et al. proposed spatial subspace rotation (2SR) to perform projection according to the hue correlation of three color channels to improve the robustness of HR estimation[12]. Wang et al. mathematically analyzed the principle of early approaches and proposed a new projection method called POS to extract the pulse signal[13]. Most of these methods used signal decomposition and reconstruction or space transformation to obtain better pulse-signal representation, which demonstrated good performance except in dealing with unconstrained scenarios.
In recent years, some studies have tried to use DL for HR estimation. Chen et al. proposed DeepPhys to extract physiological signals using two consecutive video frames and then obtained the HR from the spectrum of the pulse signal[14]. A two-steam method using a low-rank constraint loss was proposed to obtain reliable features for HR estimation[15]. Yu et al. discussed the effect of image compression on HR estimation and attempted to estimate the HR from highly compressed images[17]. Much redundancy occurred in directly using video as input in the DL method. Therefore, many researchers have tried to design more effective and concise representations of the HR estimation. A time-frequency representation was proposed as an input to CNN to improve the accuracy of HR estimation[18]. In addition, STMap, which is an efficient representation of the physiological signals extracted from videos, was designed as the input of a network[19,20], which achieved a state-of-the-art result.
2.2 NAS
Without requiring a manual design and expert experience, NAS can search for suitable architecture for a particular task. Baker et al. proposed MetaQNN to model the network architecture search as a Markov decision process and used reinforcement learning (specifically, the Q-learning algorithm) to determine the CNN architecture[21]. Zoph et al. used recurrent neural network as a controller to sample and generate the string that described the network structure and then used the reinforced algorithm to learn the controller parameters to generate a network structure with higher accuracy[22]. Esteban introduced an evolutionary algorithm to solve the NAS problem and proposed a variant of the tournament selection and aging evolution[23]. NAS methods based on reinforcement learning and evolutionary algorithm consume substantial computational resources. The DARTS-based algorithms realize the differentiability of the neural network search algorithm, which can dramatically improve the search speed and model accuracy[13,24,25]. These neural network search methods are mostly used for image classification or object detection tasks. However, their effectiveness in HR estimation is not known.
3 Proposed method
NAS-HR aims to learn an efficient and lightweight network structure for HR estimation from a face video. The framework of NAS-HR is shown in Figure 1, which includes the ROI extraction, POS-STMap computation, and NAS.
3.1 ROI detection from the face
Environment changes due to light and head movement can reduce the HR signal extraction accuracy. We design an ROI extraction method to reduce these noise effects such that the pulse signal can be better extracted with high PSNR. In the first step, we employ Seetaface to the face landmark detection. We then consider the areas in the red boxes in Figure 2 as our defined ROIs that are determined based on the face landmarks, which perfectly include the cheek and forehead area that is usually less affected by facial hair or background area.
3.2 POS-STMap for pulse representation
After the ROI extraction, we can extract
N × C
pulse temporal signals (
) with length L by averaging the pixel values inside each ROI, where N and C represent the number of ROIs and number of color channels of the images, respectively. However, these raw signals contain not only the HR information but also the types of noise information such as head movements and light changes. According to early studies[10,12,13], projection transformation can suppress the noise for better pulse-signal extraction.

Algorithm 1: POS

Input: Physiological signals S(N, C, L)

1: l = 48 ← Set the size of the sliding window to 48

2: for n = 1, 2, …, N do ← Deal with N ROIs

3:  for i = 1, 2, …, L do ← Move the sliding window with a stride of one

4:    if j = n - l + 1 > 0, then j ← Ensure the sliding window with padding of zero


V ( n ) i j = S ( n ) j i j μ ( S ( n ) j i )
← Normalize the signals


Q ( n ) = 0 1 - 1 - 2 1 1 V ( n ) i
← Perform projection transformation


p ( n ) = Q ( n ) 1 + σ ( Q ( n ) 1 ) σ ( Q ( n ) 2 ) Q ( n ) 2
← Enhance the pulse signals


P ( n ) j i = P ( n ) j i + ( p - μ ( p ) )
← Cumulative sum

Output: Processed pulse signal

P N ,   L

Considering robustness and simplicity, we use POS[13] as the raw temporal signal-preprocessing method. We denote
, where
P R N × L
, as the signal processed by POS. Then, this procedure can be described as Algorithm 1.
The more robust signals, i.e.,
, are extracted using POS. However, this process also loses some other information that is conducive to the HR estimation. Among the red, green, and blue channels, the HR information component of the green channel is the highest, whereas the noise information component of the red channel is the highest[10, 26, 27]. The accuracy of the DL model can be improved by using this additional information. Therefore, the more robust pulse signals, namely,
, are connected to an
N × L
map. The same process is implemented for the green and red channel signals obtained from the ROIs. Then, the three maps are combined to form the POS-STMap. The POS-STMap used in the present study is different from the STMap[19,20]. We use triangular and rectangular regions that only contain the facial skin as our ROIs instead of using a uniformly divided face grid. This process can lead to better local consistency in each ROI. We use the improved POS algorithm in our preprocessing step before generating the STMap, which is used as the input of the succeeding NAS-HR network.
3.3 NAS
Network structure design. We use DARTS[16] to automatically search for a lightweight and optimum network architecture for HR estimation. Figure 1c shows that our network includes two layers of convolution block,
cells whose structure needs to be optimized, and an AvgPool2d and linear block. Further, because the rPPG-based remote HR estimation often requires a real-time response, in addition to optimizing the network architecture for better HR estimation accuracy, we expect that NAS can help develop a lightweight model. Therefore, we choose a new operation search space (
), which uses depthwise separable convolution instead of the traditional convolution with different convolution kernel sizes, strides, and padding. The deep separation convolution first performs spatial feature learning followed by channel feature learning separation. This process requires fewer parameters and lesser calculations. Therefore, we can obtain a smaller and faster model. Simultaneously, the reduction in the parameters does not result in a reduction in the accuracy. Further, our search space contains skip connection, maxpool, avgpool, and zero operation. The cell structures of the network are divided into reduction cell (stride = 2) and normal cell (stride = 1). The readers are requested to refer to paper[16] for details.
Training strategy. DARTS can determine the optimal receptive field size, operation, and cell structure, which can help obtain the best scale and depth characteristics for HR estimation. The core objective of DARTS is to search for a suitable cell structure, which is achieved by learning the weight of the operations (
η o ( i ,   j )
) between the i-th and j-th nodes of the cell. L1 losses (
L h r
) between the estimated and ground-truth HRs are used to search the network structure.
L t r a i n
L v a l
denote the training and validation losses in the searching stage, respectively.
m i n L v a l ( Φ * ( α ) ,   α )
s . t . Φ * ( α ) = a r g m i n Φ L t r a i n ( Φ ,   α )
η o ( i ,   j ) = e x p ( α o ( i ,   j ) ) o ο e x p ( e x p ( α o ( i ,   j ) ) )
is an intermediate variable employed for a better search result. After the searching stage, the network structure is fine-tuned on the training set and evaluated on the test set. Please refer to paper[16] for more detailed algorithm steps.
4 Results
4.1 Datasets and protocols
VIPL-HR is an unconstrained face database for rPPG-based HR estimation[28]. It contains 2378 RGB and 752 near-infrared videos of 107 subjects (28 females and 79 males) captured under nine different conditions. The head movements, different lighting conditions, and different acquisition equipment in the VIPL-HR cause many difficulties in the HR estimation. In addition, an inevitable challenge occurs in that the frame rate of the videos is unstable. We follow the same routine in previous works[18,19,29], and we use a subject-exclusive fivefold cross-validation protocol on the shuffled database.
PURE contains 60 videos of 10 subjects with six different activities (sitting still, talking, and four variations of rotating and head movement)[30]. Each video is captured using an RGB camera with a 30 Hz frame rate. In contrast to VIPL-HR, PURE has limited data for training, which introduces additional challenges for learning-based HR estimation methods. For fair comparison with the previous work[30], we use 36 videos of six subjects as the training set and 24 videos of four subjects as the test set.
4.2 Implementation details
Data augmentation: To improve the generalization ability and robustness of our algorithm, we utilize two data-augmentation strategies. (1) We refer to paper[20] and perform temporal downsampling and upsampling on the original face video at sampling rates from 0.67 to 1.5 to enrich the HR distribution of the training dataset. (2) For every epoch during the training, we use a window with N × L size to randomly crop the POS-STMap, which reduces the effect of the phase of the input videos.
Parameter settings: Our NAS-HR is implemented using PyTorch. We use the Adam solver as an optimizer with an initial learning rate of 0.001 and batch size of 64. Hyperparameters N, L, and C are empirically set to 15, 256, and 3, respectively. The search and training epochs are set to 10 and 20, respectively. All our experiments are performed using the NVIDIA GTX 1080Ti GPU.
4.3 Baseline methods and measures
We compare the proposed approach with several state-of-the-art HR estimation methods, including NLMS-AF[4], CHROM[10], 2SR[12], POS[13], DeepPhy[14], RhythmNet[19], HR-CNN[30], I3D[31], SynRhythm[32], SAMC[33], and AutoHR[34].
We follow the existing methods[17-19] and use several different measurements to demonstrate the performance, i.e., the mean error (ME), standard deviation (Std), root mean square error (RMSE), mean absolute HR error (MAE), mean of error-rate percentage (MER), and Pearson's correlation coefficient (r).
4.4 Results on VIPL-HR
We compare our approach with several DL methods (I3D[31], DeepPhy[14], SynRhythm[32], and RhythmNet[19]) and several hand-crafted methods (SAMC[33], POS[13], and CHROM[10]). Our test protocol completely follows papers[19,32]. Thus, the results are directly taken from the original papers[19,32] as shown in Table 1.
Results of the proposed approach and several state-of-the-art methods on VIPL-HR
Method ME (bpm) Std (bpm) MAE (bpm) RMSE (bpm) MER (bpm) r
SAMC[33] 10.8 18.0 15.9 21 26.7% 0.11
POS[13] 7.87 15.3 11.5 17.2 18.5% 0.30
CHROM[10] 7.63 15.1 11.4 16.9 17.8% 0.28
I3D[31] 1.37 15.9 12.0 15.9 15.6% 0.07
DeepPhy[14] -2.60 13.6 11.0 13.8 13.6% 0.11
SynRhythm[32] 1.02 8.88 5.79 8.94 7.38 0.73
RhythmNet[19] 0.73 8.11 5.30 8.14 6.71 0.76
AutoHR[34] 8.48 5.68 8.68 0.72
NAS-HR 0.25 8.10 5.12 8.01 6.43% 0.79
From Table 1, we can observe that the three hand-crafted methods (SAMC, POS, and CHROM) perform better than I3D and DeepPhy, which indicates that the DL methods do not always perform very well without suitable preprocessing. Moreover, POS performs the best among these traditional hand-crafted algorithms. Among all methods, we can observe that NAS-HR performs the best in all performance metrics. Furthermore, compared with NAS, AutoHR performs worse than our proposed algorithm. Our method differs from AutoHR in terms of the following. (1) We use the preprocessed POS-STMap as the input of the network instead of the video, which improves the stability of the model. (2) Our algorithm, as an end-to-end algorithm, regresses the HR value instead of learning the physiological signals to calculate the HR value. The result indicates that NAS-HR is effective for remote HR estimation under complex conditions.
4.5 Results on PURE
As a DL-based method, there are concerns about its performance while using a relatively smaller dataset such as PURE. We chose two DL methods (HR-CNN[30] and SynRhythm[32]) and three traditional methods (2SR[12], CHROM[10], and NLMS-AF[4]) for comparison.
From Table 2, we can see that most of the methods are considerably good on PURE. The possible reason for this gap is that the HR distribution of PURE is relatively more concentrated compared with VIPL-HR. NAS-HR again outperforms these state-of-the-art methods, which means that the proposed approach works well even with a small dataset for training.
Results of the proposed approach and several state-of-the-art methods on PURE
Method MAE (bpm) RMSE (bpm) r
2SR[12] 2.44 3.06 0.98
CHROM[10] 2.07 2.5 0.99
HR-CNN[32] 1.84 2.37 0.98
SynRhythm[32] 1.88 2.45 0.98
NAS-HR 1.65 2.02 0.99
4.6 Further analysis
Cell number is an important variable that affects the search cost, parameter size (Params), and floating point of operations (FLOP) of the model. Therefore, we conduct further study about the cell number that ranges from five to eight on VIPL-HR.
The list in Table 3 shows that NAS-HR with 6 cells achieves the best evaluation performance in terms of MAE, RMSE, and r. Hence, the number of cells M is set to 6 in the other experiments. The FLOP of our model with a cell number of 6 is approximately 0.262 GFLOP, which is much smaller than that of the state-of-the-art method, i.e., RhythmNet (6.702 GFLOPs). According to paper[35], our model can run on ARM devices (Snapdragon 810) at approximately nine times per second. In addition, we only need 28 h to search the network using 6 cells in one GPU. When the number of cells is 6, we set the first and fourth cells as the normal cell and the remaining cells as the reduction cell. The cell structures of the searched network where the number of cells is 6 are shown in Figures 3 and 4.
Search cost of NAS-HR with different cell numbers profiled on VIPL-HR using NVIDIA GTX 1080Ti GPU
Method MAE (bpm) RMSE (bpm) r Params (M) FLOPs (G) Search cost (GPU Hours)
RhythmNet[30] 5.79 8.94 0.73 11.178 6.702
NAS-HR-5 5.65 8.49 0.75 4.483 0.262 23
NAS-HR-6 5.12 8.01 0.79 4.528 0.308 28
NAS-HR-7 5.20 8.12 0.79 7.349 0.337 45
NAS-HR-8 5.32 8.32 0.76 7.416 0.376 60
5 Conclusions
HR estimation based on rPPG remains a challenging task because of various challenges such as light changes, head movements, and expression changes. Whereas the DL-based approaches for HR estimation have achieved good performance, the deep neural network design still requires professional experience and may not be optimum in terms of both accuracy and computational cost. In the present paper, we propose a NAS-HR method to build an efficient and lightweight neural network for HR estimation. NAS-HR can achieve improved HR estimation performance with significantly reduced network parameter size and computational cost. NAS-HR outperforms many state-of-the-art methods, including both the non-learning- and learning-based methods. In the future, we will improve the HR estimation accuracy with respect to three aspects: (1) finding new block designs for HR estimation to enrich the search space, (2) establishing a more efficient and suitable network-search strategy, and (3) designing facial representation that is illumination-insensitive[36] and person-independent[37] to improve the HR estimation robustness under unstrained scenarios.



Chen X, Cheng J, Song R C, Liu Y, Ward R, Wang Z J. Video-based heart rate measurement: recent advances and future prospects. IEEE Transactions on Instrumentation and Measurement, 2019, 68(10): 3600–3615 DOI:10.1109/tim.2018.2879706


Wang C, Pun T, Chanel G. A comparative survey of methods for remote heart rate detection from frontal face videos. Frontiers in Bioengineering and Biotechnology, 2018, 6: 33


Li X, Han H, Lu H, Niu X, Yu Z, Dantcheva A, Zhao G, Shan S. The 1st challenge on remote physiological signal sensing (RePSS). In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 2020


Li X, Chen J, Zhao G, Pietikainen M. Remote heart rate measurement from face videos under realistic situations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2014


Qiu Y, Liu Y, Arteaga-Falconi J, Dong H W, Saddik A E. EVM-CNN: real-time contactless heart rate estimation from facial video. IEEE Transactions on Multimedia, 2019, 21(7): 1778–1787


Kopeliovich M. On indirect assessment of heart rate in video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 2020


Mironenko Y. Remote photoplethysmography: rarely considered factors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 2020


Poh M Z, McDuff D J, Picard R W. Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Optics Express, 2010, 18(10): 10762–10774


Balakrishnan G, Durand F, Guttag J. Detecting pulse from head motions in video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2013


de Haan G, Jeanne V. Robust pulse rate from chrominance-based rPPG. IEEE Transactions on Biomedical Engineering, 2013, 60(10): 2878–2886


de Haan G, van Leest A. Improved motion robustness of remote-PPG by using the blood volume pulse signature. Physiological Measurement, 2014, 35(9): 1913–1926


Wang W J, Stuijk S, de Haan G. A novel algorithm for remote photoplethysmography: spatial subspace rotation. IEEE Transactions on Biomedical Engineering, 2016, 63(9): 1974–1984


Wang W J, den Brinker A C, Stuijk S, de Haan G. Algorithmic principles of remote PPG. IEEE Transactions on Biomedical Engineering, 2017, 64(7): 1479–1491


Chen W X, McDuff D. DeepPhys: video-based physiological measurement using convolutional attention networks. In: Computer Vision – ECCV 2018. Springer International Publishing, 2018, 356–373


Wang Z K, Kao Y, Hsu C T. Vision-based heart rate estimation via a two-stream cnn. In: IEEE International Conference on Image Processing. IEEE, 2019


Liu H X, Simonyan K, Yang Y M. DARTS: differentiable architecture search. 2018


Yu Z, Li X, Zhao G. Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks. In: British Machine Vision Conference. 2019


Hsu G S, Ambikapathi A M, Chen M S. Deep learning with time-frequency representation for pulse estimation from facial videos. In: IEEE International Joint Conference on Biometrics. IEEE, 2017


Niu X S, Shan S G, Han H, Chen X L. RhythmNet: end-to-end heart rate estimation from face via spatial-temporal representation. IEEE Transactions on Image Processing, 2020, 29: 2409–2423


Niu X, Zhao X, Han H, Das A, Chen X. Robust remote heart rate estimation from face utilizing spatial-temporal attention. In: IEEE International Conference on Automatic Face & Gesture Recognition. IEEE, 2019


Baker B, Gupta O, Naik N, Raskar R. Designing neural network architectures using reinforcement learning. 2016


Zoph B, Le Q V. Neural architecture search with reinforcement learning. 2016


Real E. Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning. JMLR, 2017


Chen X, Xie L X, Wu J, Tian Q. Progressive differentiable architecture search: bridging the depth gap between search and evaluation. 2019


Xu Y, Xie L, Zhang X, Chen X, Qi G, Tian Q, Xiong H. PC-DARTS: partial channel connections for memory-efficient architecture search. In: International Conference on Learning Representations. New Orleans, USA, 2019


Verkruysse W, Svaasand L O, Stuart Nelson J. Remote plethysmographic imaging using ambient light. Optics Express, 2008, 16(26): 21434–21445


Wang W J, den Brinker A C, de Haan G. Single-element remote-PPG. IEEE Transactions on Biomedical Engineering, 2019, 66(7): 2032–2043


Niu X S, Han H, Shan S G, Chen X L. VIPL-HR: A multi-modal database for pulse estimation from less-constrained face video. In: Computer Vision–ACCV 2018. Springer International Publishing, 2019, 562–576


Špetlík R, Franc V, Matas J. Visual heart rate estimation with convolutional neural network. In: Proceedings of British Machine Vision Conference. 2018


Stricker R, Müller S, Gross H M. Non-contact video-based pulse rate measurement on a mobile service robot. In: The 23rd IEEE International Symposium on Robot and Human Interactive Communication. 2014


Carreira J, Zisserman A. Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2017


Niu X, Han H, Shan S, Chen X. Synrhythm: learning a deep heart rate estimator from general to specifific. In: International Conference on Pattern Recognition. 2018


Tulyakov S, Alameda-Pineda X, Ricci E, Yin L, Cohn J F, Sebe N. Self-adaptive matrix completion for heart rate estimation from face videos under realistic conditions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2016


Yu Z T, Li X B, Niu X S, Shi J G, Zhao G Y. AutoHR: a strong end-to-end baseline for remote heart rate measurement with neural searching. IEEE Signal Processing Letters, 2020, 27: 1245–1249


Ma N N, Zhang X Y, Zheng H T, Sun J. ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Computer Vision – ECCV 2018. Springer International Publishing, 2018, 122–138


Han H, Shan S G, Chen X L, Lao S H, Gao W. Separability oriented preprocessing for illumination-insensitive face recognition. In: Computer Vision – ECCV 2012. Berlin, Heidelberg, Springer Berlin Heidelberg, 2012, 307–320


Niu X, Han H, Yang S, Huang Y, Shan S. Local relationship learning with person-specific shape regularization for facial action unit detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019