Journal of the College of Physicians and Surgeons Pakistan
ISSN: 1022-386X (PRINT)
ISSN: 1681-7168 (ONLINE)
Affiliations
doi: 10.29271/jcpsp.2025.11.1380ABSTRACT
Objective: To compare the face and content validity of two robotic surgery simulators—the DaVinci Skills Simulator (dVSS) and the CMR Versius Simulator—among surgeons with varying levels of experience, to guide simulation-based training programmes.
Study Design: Descriptive analytical study.
Place and Duration of the Study: Department of General Surgery, Dr. Ruth K.M. Pfau Civil Hospital, Karachi, Pakistan, from March 2024 to February 2025.
Methodology: A cross-over study was conducted involving 26 surgical faculty members. Participants performed standardised tasks on both simulators and completed a 5-point Likert-scale questionnaire assessing face and content validity. Statistical analyses were performed using the Wilcoxon signed-rank test for non-normal data and Cronbach’s α for reliability.
Results: Twenty-six faculty members included 16 males and 10 females, with 80.8% novices and 11.5% experts. The dVSS showed significantly higher face [21 (19-22) vs. 17 (16-19); p <0.001] and content validity [22 (19-24) vs. 19 (16-24); p = 0.001] compared to CMR. The assessment demonstrated good reliability (α = 0.835). Most participants (65.4%) had no prior dVSS or CMR exposure, with 80.8% being novices.
Conclusion: While the dVSS remained superior in realism and skill assessment, the CMR simulator demonstrated acceptable validity for training, particularly for novices. Future studies should incorporate longitudinal performance metrics and larger expert cohorts to further evaluate the CMR’s role in robotic surgical education.
Key Words: Robotic surgery simulation, DaVinci, CMR Versius, Face validity, Content validity, Surgical training.
INTRODUCTION
Robot-assisted surgery (RAS) has grown significantly in popularity over the past few decades, particularly in urology, gynaecology, and general surgery.1,2 Its advantages over traditional laparoscopy, such as improved ergonomics, enhanced precision, and a potentially shorter learning curve, have driven its adoption.3,4 However, the lack of standardised training programmes for RAS remains a major challenge.5,6 Simulation-based training has emerged as a critical solution to ensure surgeon competency and patient safety, yet the effectiveness of different robotic platform simulators needs further evaluation.7,8
Currently, two major robotic surgical systems are available in this country: the DaVinci Surgical System (Intuitive Surgical) and the CMR Versius Robotic System. Both platforms come with dedicated simulators—the DaVinci Skills Simulator (dVSS) and DaVinci Trainer (dVT) for DaVinci, and the CMR Versius Simulator for CMR. While previous studies have assessed these simulators independently, there is limited research directly comparing their face and content validity.2,4,9
The findings of this study will help establish which simulator offers superior training effectiveness, guiding the development of standardised RAS training programmes. Additionally, the results may inform hospitals and training centres when selecting robotic platforms, ensuring optimal skill acquisition for surgeons. By addressing the current gap in comparative simulator validation, this research also contributes to improving robotic surgical education and, ultimately, patient outcomes. This study aimed to compare the face and content validity of the dVSS and CMR robotic simulators among surgeons with varying levels of experience, to determine which simulator provides a more realistic training experience and better prepares surgeons for real-world robotic surgery.
METHODOLOGY
The sample size was determined based on prior simulator validation studies. After obtaining IRB approval, all available faculty members of the Department of General Surgery, Dr. Ruth K.M. Pfau Civil Hospital, Karachi, Pakistan, from March 2024 to February 2025, were invited for this study. Twenty-six members consented and were included. All members completed the exercises, and there was no exclusion. Written consent was obtained from all the participants. Two operating rooms were prepared, one equipped with the dVSS (Model SM3000) and the other with the CMR surgical simulator. Participants were divided into two equal groups (n = 13 each), with each group starting in one of the two operating rooms. A standardised briefing was provided in each theatre, explaining the simulator’s operation and the exercises to be performed. Each participant was allotted 20 minutes on the simulator, completing four 5-minute exercises. After finishing the first session, the groups switched theatres and repeated the process on the alternative simulator. Following the completion of both sessions, participants filled out a structured proforma assessing face and content validity using a 5-point Likert scale (1 = lowest to 5 = highest).
Face validity items evaluated user-friendliness and ease of operation, realism of graphics, ability to replicate the surgical environment, adequacy of depth perception, realism of force feedback resembling with actual laparoscopic surgery, and the naturalness and responsiveness of instrument movement. Content validity items asked about the accurate assessment of robotic task proficiency, the clinical relevance of simulated tasks, the difficulty level mirroring real surgical scenarios, usefulness for performance assessment in training, and the role of reality-based simulation in robotic surgery training. Comparative feedback questions assessed which simulator appeared more realistic, provided better differen-tiated skill levels, and offered better ergonomics and comfort.
Table I: Participants’ demographics and experience levels.
|
Characteristics |
Values (n = 26) |
|
Mean age (±SD) |
37.19 ± 9.92 years |
|
Gender (Male/Female) |
16 (61.5%) / 10 (38.5%) |
|
Prior dVSS experience |
9 (34.6%) |
|
Prior CMR simulator experience |
0 (0%) |
|
Novices (0–10 procedures) |
21 (80.8%) |
Data were analysed using SPSS (IBM Corp., Version 19). Normality was assessed using the Shapiro-Wilk test, and mean and median values were reported as appropriate. Descriptive statistics summarised demographic data. The Wilcoxon signed-rank test compared face and content validity scores between simulators, with p <0.05 considered statistically significant. Subgroup analysis of user groups and face and content validity scores between simulators was assessed using the Kruskal-Wallis test. Cronbach’s alpha was used to evaluate the internal consistency (reliability) of the assessment scale.
RESULTS
A total of 26 faculty members participated in the study, with a mean age of 37.19 ± 9.92 (28–59) years. The cohort comprised 16 males (61.5%) and 10 females (38.5%). Based on prior robotic surgery experience, most participants (80.8%, n = 21) were classified as novices (0–10 procedures), while 7.7% (n = 2) were intermediate (10–49 procedures), and 11.5% (n = 3) were experts (>50 procedures; Table I). Notably, 34.6% (n = 9) had previously used the dVSS simulator, whereas none had prior exposure to the CMR simulator. The total median face validity score was significantly higher for the dVSS [21 (19-22)] compared to the CMR simulator [17 (16-19)]. Similarly, the median content validity score was higher for the dVSS [22 (19-24)] than for CMR [19 (16-24)]. Data distribution, assessed via the Shapiro-Wilk test, indicated skewed data for all scores except the face validity of CMR. Consequently, Wilcoxon signed-rank tests were performed, revealing significant differences of face validity between the dVSS and the CMR (z = −3.893; p <0.001) and content validity (z = −3.357; p = 0.001), as shown in Table II. The internal consistency of the validity questionnaire was good, with a Cronbach’s α of 0.835. The majority of respondents (88.5%, n = 23) perceived the dVSS as more realistic than the CMR platform, while all participants (100%, n = 26) endorsed its superior ergonomic design. Only 38.5% (n = 10) of users explicitly recognised the enhanced ability of dVSS to differentiate skill levels, while 61.5% (n = 16) remained neutral. No statistically significant differences were found in validity scores among novices, intermediate, and experts for face and content validity between the dVSS and the CMR simulator (Table III). This suggests that the perceived validity (both face and content) of dVSS and CMR simulator does not vary meaningfully based on experience level.
Table II: Comparison of face and content validity scores between dVSS and CMR simulators.|
Validity types |
dVSS |
CMR simulator |
Statistical test |
p-values |
|
Face validity |
21 (19-22) |
17 (16-19) |
Wilcoxon (*z* = −3.893) |
<0.001 |
|
Content validity |
22 (19-24) |
19 (16-24) |
Wilcoxon (*z* = −3.357) |
0.001 |
|
*Wilcoxon signed-rank test. |
||||
Table III: Subgroup analysis of validity scores according to experience level.
|
Validity types |
Groups |
dVSS |
Median (IQR) |
CMR simulator |
Median (IQR) |
|
Face validity |
Novice |
0.57 |
21 (19-21) |
0.46 |
17 (16-19) |
|
Intermediate |
|||||
|
Expert |
|||||
|
Content validity |
Novice |
0.71 |
17 (19-22) |
0.17 |
19 (16-24) |
|
Intermediate |
|||||
|
Expert |
|||||
|
*Kruskal-Wallis test. |
|||||
DISCUSSION
This study evaluated the face and content validity of two robotic surgery simulators, the dVSS and the CMR simulator, among surgical faculty. The results demonstrated that dVSS significantly outperformed CMR simulator in both validity metrics, although the CMR simulator still showed acceptable training potential.
In the assessment of face validity, the dVSS scored higher [21 (19-22) years] than the CMR [17 (16-19) years], with a statistically significant difference (p <0.001). Prior studies consistently reported high face validity for dVSS on similar Likert-scale assessments.2,4,10 This aligned with this study results and reflected its established ergonomic design and realistic haptic feedback.11 Lower scores for the CMR (17.23) may stem from the lack of prior exposure. No participants had used CMR before, which may have impacted initial comfort. The CMR simulator’s compact design vs. The dVSS console might feel less immersive to first-time users. While DaVinci’s superiority is expected due to its clinical dominance, the CMR is a viable option for early-stage training.12
Regarding content validity, the dVSS again scored higher [22 (19-24)] than the CMR [19 (16-24)], with significance (p = 0.001). DaVinci’s high content validity is well-documented, with studies showing its effectiveness in discriminating between novice and expert performance.10,13 These results of the study are consistent with these findings, likely due to its validated exercise library (e.g., suture sponge tasks).14 In contrast, the CMR simulator’s lower score (19.42) aligns with the limited published data on its validity. Along with other emerging simulators (e.g., Simbionix RobotiX Mentor), it meets basic requirements for skill acquisition.12 However, with tailored exercises, it could further bridge existing gaps in basic robotic training.
In this study, 34.6% of participants had used the dVSS before, while none had prior CMR exposure. Most participants were novices (80.8%), with few experts (11.5%). Familiarity bias is a known confounder in simulator studies. For example, Lyons et al. found that prior dVSS users performed significantly higher than intermediate and novices on face, content and construct validity.15 User experience level (novice, inter-mediate, expert) did not significantly affect face or content validity ratings for either machine (DaVinci or CMR). The novice-heavy cohort may have skewed results, as experts often prioritise advanced metrics. Future studies should control prior experience and include more experts to assess high-level validity.
The reliability of the assessment tool was assessed with Cronbach’s α = 0.835, indicating good consistency. Most simulator validation studies reported α >0.8 for structured questionnaires,16 confirming the tool’s robustness.
This study analysis revealed no statistically significant diffe-rence between age and face or content validity scores for either DaVinci or CMR. This suggested that the subjective assessments of simulator validity were independent of participant demographics in this cohort. These study findings aligned with a study conducted by Ackerman et al., which concluded that age had no significant impact on surgical performance in simulator-based training. Instead, spatial visualisation ability and eye–hand coordination were more strongly correlated with better performance reporting.17 A 2022 multicentre trial of the dVSS simulator found no correlation between age and face validity (ρ = 0.08; *p* = 0.42).18 Contrastingly, in a study assessing skill transfer in robotic versus laparoscopic simulation, participants aged 25 and above scored approximately 10 points lower than their younger counterparts, suggesting that younger age may be associated with better performance on robotic simulators.19 Some studies noted that older surgeons rate simulators lower for ease of use,20 possibly due to less familiarity with digital interfaces. The narrow age range in this cohort (28–59 years) may have obscured such effects.
The impact of gender on simulator performance appeared to be subtle. There was no statistically significant difference between gender and the mean results of face and construct validity between the two simulators in this study. Turkay et al. reported no statistically significant performance diffe-rences between male and female participants overall.21 However, female participants who played video games performed better than those who did not, suggesting that video game experience may mitigate gender differences in performance. Another study using the RobotiX simulator found that male participants out-performed females in 5 out of 24 performance metrics.22 However, this may be influ-enced by the higher prevalence of video game experience among the male participants.
The overwhelming preference for dVSS in realism (88.5%) and ergonomics (100%) reflects its superior face validity scores (*21 vs. 17; p <0.001*). This consistency reinforces dVSS reputation as a gold-standard training tool, with literature citing its refined haptic feedback and console design as key contributors to user satisfaction.11,18 Notably, only 38.5% perceived dVSS as better at skill differentiation, despite its higher content validity score (*22 vs. 19; p = 0.001*). This discrepancy may reflect the predominance of a novice cohort. Less experienced users may lack the expertise to judge skill assessment fidelity.13 Nearly two-thirds of participants did not fault CMR simulator’s ability to discriminate skill levels, suggesting that its content validity is adequate for early training.23
The small expert subgroup (n = 3) limits generalisability to experienced surgeons. The single-session design may not capture CMR simulator’s learning curve and the tech of performance metrics (e.g., task time) to correlate with sub-jective ratings.
CONCLUSION
This study demonstrated a clear user preference for the dVSS simulator, endorsing its superior ergonomics compared with CMR. While dVSS remains the gold standard, CMR demonstrates competitive validity as a novel platform. Its lower cost and portability could make it a practical adjunct in resource-limited settings. Further research should explore its skill-transfer efficacy compared with dVSS.
ETHICAL APPROVAL:
Ethical approval was obtained from the Institutional Review Board of Dow University of Health Sciences, Karachi, Pakistan (Ref. No. IRB-3402/DUHS/Approval/2024/101).
PATIENTS’ CONSENT:
Written consent was obtained from all the participants.
COMPETING INTEREST:
The authors declared no conflict of interest.
AUTHORS’ CONTRIBUTION:
SG: Conception, analysis, interpretation of data, and drafting of the work.
YS: Data collection, analysis, drafting, and revision of the manuscript.
EK, MFI: Drafting and revision.
RF: Drafting, analysis, and revision.
AAL: Analysis and interpretation of the data.
All authors approved the final version of the manuscript to be published.
REFFRENCES