Cross-sectional view of skin layers showing PPG optical wavelength penetration across different skin tones with orange vascular highlights
Biometrics & Data

Multi-Wavelength PPG and Skin Tone Validation: What Fair Wearable Accuracy Requires

Fair wearable PPG accuracy requires more than a low average error. It requires optical designs that account for skin pigmentation, validation cohorts that represent the intended.

Fair wearable PPG accuracy requires more than a low average error. It requires optical designs that account for skin pigmentation, validation cohorts that represent the intended population, and reporting that separates performance by skin tone, sex, motion, and signal quality. Multi-wavelength PPG can help because different wavelengths interact differently with melanin, hemoglobin, tissue depth, and movement. It does not solve fairness by itself. The validation design decides whether the measurement is trustworthy.

Photoplethysmography (PPG) estimates cardiovascular timing and pulse features from changes in detected light after light enters tissue and returns to a photodetector. The waveform is shaped by blood-volume pulsation, tissue scattering, melanin absorption, contact pressure, sensor geometry, and motion before any algorithm sees it 23. A model can look accurate in aggregate while failing in subgroups.

The 2026 Physiological Measurement study by Ray, Collins, and Ponnapalli makes this problem explicit. It evaluates wrist-worn multi-wavelength PPG heart-rate estimation across skin tones, sexes, and motion conditions, and pairs subgroup reporting with calibrated uncertainty 1. That is the direction fair wearable validation has to move: from pooled accuracy to subgroup reliability.

Internal link placeholders: [PPG validation methodology](/science/), [photoplethysmography fundamentals](/the-signal/photoplethysmography-ppg), [PPG signal quality](/the-signal/ppg-signal-quality).

Why skin tone matters in PPG physics

PPG is an optical measurement. Light must travel through skin, interact with vascular tissue, and return with enough pulsatile information to estimate a physiological signal. Human skin is not optically uniform. It contains layered structures, chromophores, blood, water, collagen, and scattering boundaries 45.

Melanin is one relevant chromophore. It absorbs light strongly at shorter visible wavelengths and less strongly as wavelength increases toward the near-infrared range 45. Green PPG often produces strong superficial pulsatile signals, but it also sits in a spectral region where melanin absorption can reduce returned light. Red and infrared wavelengths penetrate differently and can sample deeper vascular beds, with different signal and noise tradeoffs 2320.

Algorithms estimate from the detected waveform, not from the skin directly. If optical absorption lowers the pulsatile component, reduces signal-to-noise ratio, or changes waveform morphology, the model receives weaker input. The failure can appear as higher error, greater missingness, higher uncertainty, or silent exclusion of difficult windows.

Pulse oximetry exposed the clinical version of this issue. Controlled and observational studies show that pigmentation can influence pulse oximeter accuracy, especially at low saturation or clinically important thresholds 679. Later studies associated oximetry discrepancies with delayed identification of treatment eligibility in some patient groups 10. PPG heart-rate estimation is not pulse oximetry. The targets differ. But both depend on light-tissue interaction, and both can hide subgroup error when validation reports only a pooled metric.

Wearable optical heart-rate studies show the same measurement principle in another setting. Research has identified skin tone, activity, sensor placement, and device mechanics as sources of inaccuracy [11-13]. A wearable PPG validation study that does not measure and report skin tone has an avoidable blind spot.

The practical burden is higher for research platforms than for casual tracking. A biased or poorly calibrated signal can contaminate cohort analysis, model training, adherence monitoring, and longitudinal physiology studies. The error may not be obvious at the dashboard layer. It may appear later as unexplained variance, reduced generalizability, or a model that works for the enrollment cohort but fails when deployed to a broader population.

That is why subgroup validation belongs upstream. It should be part of the measurement specification, not a post-market correction. PPG studies need enough representation to estimate uncertainty within groups, not just enough subjects to support a pooled average.

What multi-wavelength PPG adds

Single-wavelength PPG gives one optical view of tissue. Multi-wavelength PPG gives several. That can improve measurement design in three ways.

First, wavelengths provide redundancy. If one channel has low returned intensity or poor pulsatile contrast, another may retain usable signal. This does not mean longer wavelengths are always better. It means multiple channels make failure less binary.

Second, wavelength diversity can help separate physiology from optical confounding. Different wavelengths have different absorption and scattering profiles in skin and vascular tissue 42021. A model can use that structure to infer whether a waveform change reflects blood-volume pulsation or a change in optical coupling.

Third, multi-wavelength signals can support quality and uncertainty estimates. If channels disagree, lose pulsatile morphology, or diverge under motion, the system can mark the estimate as unreliable rather than returning a falsely precise value. The seed paper's uncertainty-aware framing is important for exactly this reason 1.

Multi-wavelength PPG is an active engineering direction. Studies describe depth-dependent optical modules, multi-parametric PPG characterization systems, SPAD-based multi-wavelength monitoring, and pressure-sensitive multi-wavelength behavior [20-23]. Simulation work has examined how pigmentation and oxygen saturation affect reflectance PPG signals [26-28]. These papers do not establish one universal correction. They define a design space.

That design space has constraints. Contact pressure can change morphology and signal quality [23-25]. Motion can dominate the waveform during activity [15-19,35,36]. Sensor geometry changes source-detector separation and tissue volume sampled 2320. Multi-wavelength PPG helps only if the validation study includes these conditions.

Fair accuracy is a validation problem

A fair PPG system should not be judged by mean absolute error alone. Aggregate error can hide subgroup failure. A dataset with excellent performance in one group and poor performance in another can still produce a tolerable pooled result if the cohort is imbalanced.

Fair validation starts with cohort design. Skin tone should be measured directly or documented with a reproducible scale. Sex, age, body habitus, perfusion conditions, and activity states should be reported. Motion conditions should be protocolized. Reference measurement should match the target variable. For heart rate and beat timing, ECG is often the appropriate timing reference [12,32-34].

The output metric also matters. Heart-rate mean absolute error answers one question. Beat-to-beat validity answers another. Pulse rate variability is not identical to heart rate variability, especially under non-stationary conditions 3334. A validation study must state whether it evaluates average heart rate, beat detection, interbeat intervals, pulse rate variability, oxygen saturation, respiratory features, or waveform morphology.

Missingness must be reported. A system can look accurate if it silently drops difficult windows. If darker skin tones, high-motion states, low perfusion, or loose contact create more discarded segments, accuracy on retained windows is incomplete. Fair validation reports both error and coverage.

Uncertainty calibration is the next layer. Neural models often return confident predictions when the input is out of distribution. Calibration research shows that confidence scores must be tested, not assumed 4243. In wearable PPG, calibrated uncertainty can tell a researcher when motion, contact, or optical quality has degraded the signal 1.

What the seed paper contributes

Ray and colleagues' 2026 paper is useful because it combines three requirements that are often separated: multi-wavelength input, subgroup-aware evaluation, and uncertainty-aware modeling 1. The objective is not only lower heart-rate error. It is reduced performance disparity across skin tones and sexes, with better trust in unreliable predictions.

That distinction matters. In clinical and research infrastructure, a system that knows when not to answer is safer than a system that returns clean-looking numbers from poor signal. This is especially true when data feeds downstream models, dashboards, or longitudinal studies.

The paper also reinforces a reporting standard. Validation should include group-level performance by skin tone, sex, and motion condition. It should include uncertainty calibration. It should describe the reference standard and collection conditions. These details are the evidence base for whether the signal can be used responsibly.

The larger literature supports this shift. Wearable optical heart-rate studies show that performance varies across users and activities [11-13,44]. Motion-artifact studies show that algorithmic performance depends heavily on activity context [15-19,35,36]. Skin optics and Monte Carlo simulation studies explain why pigmentation, tissue structure, and source-detector geometry change optical return [4,5,26-28]. Fairness literature adds the methodological warning: performance claims detached from subgroup context are incomplete [39-41].

Evidence summary

| Evidence area | What the literature shows | Why it matters for fair PPG validation | Citations | |—|—|—|—| | Skin optics | Melanin, hemoglobin, scattering, and tissue layers shape optical return. | Skin tone can affect signal amplitude and morphology before algorithmic processing. | 45 | | Pulse oximetry disparities | Pigmentation can influence oximetry accuracy and clinically relevant discrepancies. | Optical biosensors need subgroup-specific validation, not only pooled accuracy. | [6-10] | | Wearable optical HR error | Skin tone, activity, and sensor factors can influence wearable optical heart-rate performance. | Validation must stratify by skin tone and motion state. | [11-13,44] | | Multi-wavelength PPG | Multiple wavelengths support depth-dependent sensing, redundancy, and richer signal characterization. | Correction strategies require wavelength-specific and cohort-specific testing. | [20-23] | | Contact and motion | Pressure and motion can change PPG signal quality and waveform morphology. | Fairness testing must include realistic wear states. | [15-19,23-25,35,36] | | Uncertainty and fairness | Models can be miscalibrated and subgroup performance can be hidden by aggregate reporting. | Prediction confidence and subgroup error should be reported together. | [1,39-43] |

What validation should report

A credible PPG skin-tone validation protocol should report at least seven elements.

First, define the target. Heart rate, pulse rate variability, SpO2, respiratory rate, and waveform morphology are different measurement problems. They cannot share one generic accuracy claim.

Second, document the optical system. Report wavelengths, emitter-detector geometry, sampling rate, channel selection, and whether raw waveforms were available. Without these details, another team cannot interpret failure modes.

Third, describe skin tone measurement. Self-reported race is not a substitute for optical phenotype. It may be relevant in health equity research, but it does not directly measure melanin, erythema, tissue optical properties, or sensor-skin coupling. Objective skin-tone tools and wearable spectroscopy work are emerging because visible categories are too imprecise for optical engineering [29-31].

Fourth, stratify performance. Report error, coverage, and uncertainty by skin tone group, sex, activity, and signal-quality tier. Stratification should be pre-specified, not added only after a problem appears.

Fifth, include motion and contact variation. Resting accuracy is necessary but insufficient. Motion and contact pressure are common in real deployments and can change waveform quality [15-19,23-25].

Sixth, report missingness. A low error after excluding low-quality windows does not prove equal performance. It may prove that the model fails quietly in harder groups or harder conditions.

Seventh, report calibration. If uncertainty scores are used, show whether predicted uncertainty matches observed error. A confidence metric that is not calibrated can create false trust 4243.

What this means for research-grade PPG infrastructure

Research-grade PPG infrastructure has to preserve the signal pathway. Raw waveform access matters because subgroup error often originates before the derived metric. If only summary values are available, researchers cannot inspect whether the problem came from optical coupling, motion artifact, beat detection, filtering, or model inference.

This is why Sensor Bio frames PPG as data infrastructure, not as a device feature. Continuous physiological data needs a transparent pipeline: raw optical channels, signal-quality indices, derived metrics, timestamps, and exportable validation context. Internal link placeholder: [wearable PPG system design](/the-signal/wearable-ppg-systems).

The validation standard should match the deployment context. A study designed for seated resting heart rate does not validate exercise conditions. A study designed around average heart rate does not validate beat-to-beat HRV. A study with narrow skin-tone representation does not validate population-level fairness.

Fair accuracy is therefore not a marketing label. It is a measurement claim with requirements. It requires optical design, dataset design, subgroup reporting, uncertainty calibration, and access to enough raw signal to audit the result.

Practical framework for evaluating PPG fairness claims

When evaluating a PPG validation paper or platform, ask five questions.

  1. Did the study measure skin tone directly, or did it rely on broad demographic categories?
  2. Did it report subgroup performance, missingness, and uncertainty, or only pooled error?
  3. Did it test motion and contact variation, or only resting conditions?
  4. Did it validate the exact output being claimed, such as average heart rate, interbeat interval, HRV, or SpO2?
  5. Did it provide enough raw-signal or channel-level information to audit failure modes?

If the answer is no, the claim may still be useful. It is just narrower than it sounds. Multi-wavelength PPG is a strong technical direction because it gives the system more optical information. But fair wearable accuracy comes from the entire pipeline: light source, skin optics, detector, model, validation cohort, and reporting standard.

Internal link placeholder: [request access to Sensor Bio's research platform](/get-started/).

FAQ

Does darker skin tone make PPG inaccurate?

Darker skin tone does not make PPG inherently inaccurate. It can change the optical conditions under which PPG is measured. Melanin absorbs more light at shorter visible wavelengths, which can reduce returned signal and alter signal-to-noise ratio in some designs 45. Accuracy then depends on wavelength selection, emitter power, detector sensitivity, contact mechanics, motion handling, and the algorithm. The correct question is whether the system was validated and reported across the skin-tone range it will measure.

Why does multi-wavelength PPG help with skin tone validation?

Multi-wavelength PPG helps because each wavelength samples tissue differently. Green, red, and infrared light have different absorption and scattering behavior in skin and vascular tissue 2420. Multiple wavelengths can provide redundancy, help detect optical failure, and support models that account for channel-specific quality. It is not a universal correction. A multi-wavelength system still needs subgroup validation, missingness reporting, and calibrated uncertainty before it can support a fairness claim.

What is the difference between accuracy and fairness in wearable PPG?

Accuracy usually reports how close estimates are to a reference measurement. Fairness asks whether that accuracy holds across relevant groups and conditions. A PPG model can have low pooled error while performing worse for a subgroup if the dataset is imbalanced. Fair validation reports error, coverage, and uncertainty by skin tone, sex, motion state, and signal-quality tier [1,39-41]. It also states what output was validated, such as heart rate, beat timing, HRV, or oxygen saturation.

Should validation use race or measured skin tone?

Measured skin tone is more relevant for optical engineering. Race and ethnicity can matter in health equity analysis, but they are not direct measurements of melanin, erythema, tissue optical properties, or sensor-skin coupling. For PPG validation, the strongest design measures skin tone or pigmentation directly, then reports demographic context separately where appropriate. Emerging wearable optical spectroscopy methods reflect this need for more objective skin characterization [29-31].

Can uncertainty-aware models improve PPG safety?

Uncertainty-aware models can improve trust when they are calibrated. In PPG, uncertainty estimates can flag windows where motion, contact, low perfusion, or optical absorption makes the prediction unreliable. That is useful because a withheld or qualified estimate is often better than a confident wrong value. But uncertainty is only useful if predicted confidence matches observed error. Calibration has to be tested explicitly 4243.

Is pulse rate variability from PPG the same as HRV from ECG?

No. PPG-derived pulse rate variability can approximate heart rate variability in some conditions, but it is not identical to ECG-derived HRV. PPG measures pulse arrival at the periphery, while ECG measures cardiac electrical timing. Vascular transit time, motion, respiration, and non-stationary conditions can create divergence [32-34]. Any validation claim should specify whether it evaluates average heart rate, beat intervals, pulse rate variability, or ECG-derived HRV.

References

References

  1. Ray D, Collins T, Ponnapalli PVS. Physiological Measurement. 2026. DOI: https://doi.org/10.1088/1361-6579/ae56ae
  2. Allen J. Physiological Measurement. 2007. DOI: https://doi.org/10.1088/0967-3334/28/3/r01
  3. Tamura T, Maeda Y, Sekine M, Yoshida M. Electronics. 2014. DOI: https://doi.org/10.3390/electronics3020282
  4. Jacques SL. Physics in Medicine and Biology. 2013. DOI: https://doi.org/10.1088/0031-9155/58/11/r37
  5. Anderson RR, Parrish JA. Journal of Investigative Dermatology. 1981. DOI: https://doi.org/10.1111/1523-1747.ep12479191
  6. Bickler PE, Feiner JR, Severinghaus JW. Anesthesiology. 2005. DOI: https://doi.org/10.1097/00000542-200504000-00004
  7. Shi C, Goodall M, Dumville J, et al. Sensors. 2022. DOI: https://doi.org/10.3390/s22093402
  8. Sjoding MW, Dickson RP, Iwashyna TJ, Gay SE, Valley TS. New England Journal of Medicine. 2020. DOI: https://doi.org/10.1056/nejmc2029240
  9. Fawzy A, Wu TD, Wang K, et al. JAMA Internal Medicine. 2022. DOI: https://doi.org/10.1001/jamainternmed.2022.1906
  10. Lee J, Matsumura K, Yamakoshi T, et al. Sensors. 2019. DOI: https://doi.org/10.3390/s19245441
  11. Scholkmann F, et al. Sensors. 2023. DOI: https://doi.org/10.3390/s23146628
  12. Schäfer A, Vagedes J. International Journal of Cardiology. 2013. DOI: https://doi.org/10.1016/j.ijcard.2012.03.119
  13. Gil E, Orini M, Bailón R, Vergara JM, Mainardi L, Laguna P. Physiological Measurement. 2010. DOI: https://doi.org/10.1088/0967-3334/31/9/015
  14. Guo C, Pleiss G, Sun Y, Weinberger KQ. arXiv. 2017. DOI: https://doi.org/10.48550/arXiv.1706.04599
  15. Kendall A, Gal Y. arXiv. 2017. DOI: https://doi.org/10.48550/arXiv.1703.04977

BUILD ON TRUTH

Turn biometric signal into clinical infrastructure

Sensor Bio gives care teams, researchers, and digital health operators direct access to the data and workflows needed to monitor what happens between visits.