If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Address correspondence to: Benjamin N. Breyer, M.D., M.A.S., F.A.C.S., University of California San Francisco, Zuckerberg San Francisco General Hospital and Trauma Center, 1001 Potrero Suite 3A, San Francisco, CA 94110. Phone: 14152068805; Fax: 14152064499.
Department of Urology, University of California San Francisco, San Francisco, CADepartment of Biostatistics and Epidemiology, University of California San Francisco, San Francisco, CA
To identify the current formats of standardized letters of recommendation (SLORs) and evaluate their characteristics, the distribution of applicants’ ratings, correlation between SLOR domain ratings and conventional application metrics, and potential biases.
Methods
We evaluated all applications submitted to our residency program for the 2020-2021 urology match. Two main formats of SLOR were identified. We extracted application characteristics and SLOR domain ratings.
Results
Ninety SLORs from 82 applicants were reviewed. Applicants were highly rated among top tiers in both formats. Some correlations were observed between domain ratings and application metrics such as Step 1 and Step 2 Clinical Knowledge scores, and percentage of Honors in core clinical clerkships. No statistically significant differences were found between female and male applicants in terms of domain ratings. Alpha Omega Alpha members received higher ratings in “urology resident potential,” “academic urologist potential,” and “performance as a sub-intern” domains. Applicants from top 40 US medical schools performed better as sub-interns, and were more likely to be ranked higher. Letters from home institutions were associated with higher ratings in several domains. In-person vs virtual interactions received similar ratings except for “communication”.
Conclusion
While it is promising to observe such number of SLORs submitted for the first time in urology, the current formats could benefit from further refinement in their structures and domains to distinguish between highly qualified urology applicants more efficiently. Given the transition in Step 1 score reporting to pass/fail outcome, the need for a reliable urology-specific SLOR will be critical.
The United States urology residency match continues to be highly competitive. In the 2021 match, only 357 vacancies were listed for 528 applicants.
Each year, residency programs evaluate the applicants on a number of objective metrics including United States Medical Licensing Examination (USMLE) Step 1 score, grades, scholarly production, and awards to screen who to interview and how to rank applicants for the match.
Another important component of applications are letters of recommendation (LORs). They provide valuable subjective information regarding an applicant's skillsets, qualifications, and potential weaknesses beyond available metrics.
Historically in the field of urology, letter writers tend to use Narrative Letters of Recommendation (NLORs), without a uniform structure, to introduce the applicant. Previous studies have found the commonly used NLORs to be highly flattering and ambiguous, contain gender bias, and to have very low interobserver reliability in their interpretations.
Additionally, objective metrics available in Electronic Residency Application Service (ERAS) applications (such as transcripts, USMLE scores, etc.) may not be consistent with superlatives used in NLORs.
To overcome variability and bias in NLORs, Standardized Letters of Recommendation (SLORs) have been introduced for residency applications in several specialties including emergency medicine, otolaryngology, and orthopedic surgery.
When (Almost) everyone is above average: a critical analysis of american orthopaedic association committee of residency directors standardized letters of recommendation.
SLORs usually contain several domains pertaining to different aspects of applicant's capabilities and characteristics as well as a global assessment of the applicant. They have the potential to differentiate applicants more objectively and effectively, reduce gender bias, and offer higher inter-reviewer reliability compared to NLORs.
SLORs are also more efficient to write and read than NLORs.
Although the AUA has not officially endorsed a uniform SLOR for candidates seeking urology residency, we observed a substantial increase in the usage of SLORs submitted to our residency program in the 2020-2021 match cycle. We aimed to identify the formats of SLOR used in urology and characterize their domain ratings. We also evaluated the correlation between SLOR domain ratings with several components of applicants’ background and the presence of biases in the current formats of SLOR. We hypothesize that a significant ceiling effect is present among SLOR domains, meaning most applicants are rated in the top tiers by letter writers.
METHODS
Study Design
We retrospectively evaluated all LORs submitted to the corresponding author's residency program during the 2020-2021 match cycle. Two main types of SLOR were identified (Fig. 1A,B). Format 1 was originally proposed by Dr. David Penson and format 2 was distributed by the Society of Academic Urologists. Those with either a separate SLOR or a SLOR integrated into an additional narrative letter (hybrid SLOR) were included. This study was reviewed and approved by our institutional review board.
Figure 1Domains of the 2 SLORs used in urology residency applications (A and B). SLORs, standardized letters of recommendation.
For each SLOR, type and domain ratings were identified and entered into REDCap. Data regarding the type of rotation (home vs away, and in-person vs virtual) were also extracted. The sex and highest academic ranks of letter writers were obtained from the SLORs themselves or letter writers’ institutional websites.
Associated ERAS applications were reviewed to gather information on age, gender, race/ethnicity, USMLE Step 1 and Step 2 Clinical Knowledge (CK) scores, applicant type (ie, MD, DO, international medical graduate [IMG]), and honor society memberships (eg, Alpha Omega Alpha [AOA], Gold Humanism Honor Society [GHHS], etc.). Medical school transcripts and medical student performance evaluations were reviewed to calculate the percentage of Honors in core clinical clerkships. Core clerkships varied slightly among medical schools but mostly included internal medicine, general surgery, pediatrics, obstetrics/gynecology, neurology, psychiatry, and family medicine. Grades were excluded if a pass/fail reporting system was used. Grades affected by the coronavirus disease 2019 (COVID-19) pandemic (ie, grading system switched to pass/fail for rotations held during the pandemic) were excluded as well and percentages were calculated based on the unaffected grades. Medical school rank was obtained from the 2021 US News and World Report according to the research category.
The number of peer-reviewed publications, poster presentations, oral presentations, and book chapters were recorded for each applicant.
Data Analysis
Descriptive statistics were used to report applicant and SLOR characteristics as well as SLOR domain ratings. Domain rating scales were converted into ordinal numbers for statistical tests. The normal distribution of data was verified using Shapiro–Wilk test.
The Pearson's chi-squared or Fisher's exact test was used for categorical variables. The two-sample t-test, Kruskal–Wallis by ranks, or Wilcoxon rank-sum test was used to compare continuous variables, as appropriate. The statistical significance was set at an alpha level of 0.05. Kendall's coefficient of rank correlation tau-b (τb) was calculated to assess the level of correlation between SLOR domain ratings and application metrics (eg, percentage of Honors grades in core clinical clerkships, USMLE scores, number of publications, etc.). We tested for equality of Step 1 scores when comparing domain ratings between subgroups of applicants. All statistical tests were performed using STATA, version 14.1. The STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) statement was followed for design and reporting of this study.
A total of 311 applicants applied to our urology residency program for the 2020-2021 match cycle. Among those, 90 SLORs from 82 urology applicants were identified and included for further review. We excluded two other applicants with SLORs in format 2 in which a 5-point scale was used to rate the domains (instead of the 4-point scale). Forty-nine (59.8%) applicants were male and the mean age was 27 (standard deviation [SD] 1.9) years (Table 1). The average Step 1 score was 242.5 (SD 13.7). Of 82 applicants, 57 (69.5%) reported a Step 2 CK score with an average of 251.8 (SD 11.8). Seventy-six (92.7%) applicants attended an allopathic school of medicine in the US. The mean percentage of Honors in core clinical clerkship was 53.8 (SD 34.0%) based on transcripts and medical student performance evaluation of 56 applicants. In 19/56 (33.9%) applicants, the grading scale for clerkships performed during the COVID-19 pandemic was switched to pass/fail grading and those clerkships were not included in calculation of their Honors percentages. Twenty-one (25.6%) applicants were members of AOA whereas 10 (12.2%) were members of GHHS. The median (interquartile range [IQR]) for the number of peer-reviewed publications, poster presentations, and oral presentations were 2 (1-3), 3.5 (2-6), and 1 (0-4), respectively. Table 1 demonstrates applicant characteristics defined by the formats of SLOR.
Table 1Urology residency applicant (n = 82) and letter (n = 90) characteristics defined by format of the SLOR
Format 1
Format 2
Total
P value
Applicants, n (%)
60 (73.2)
22 (26.8)
82 (100.0)
Gender, n (%)
.15
Female
27 (45.0)
6 (27.3)
33 (40.2)
Male
33 (55.0)
16 (72.7)
49 (59.8)
Age, mean (SD)
27.1 (2.0)
26.8 (1.5)
27.0 (1.9)
.97
Race, n (%)
.47
White
31 (51.7)
10 (45.5)
41 (50.0)
Asian
12 (20.0)
4 (18.2)
16 (19.5)
Black
6 (10.0)
4 (18.2)
10 (12.2)
Hispanic
9 (15.0)
1 (4.6)
10 (12.2)
Other
1 (1.7)
1 (4.6)
2 (2.4)
Unknown
1 (1.7)
2 (9.1)
3 (3.7)
USMLE Step 1 score, mean (SD)
243.3 (13.9)
240.3 (13.2)
242.5 (13.7)
.39
USMLE Step 2 CK score, mean (SD)
251.7 (12.6)
252.1 (10.1)
251.8 (11.8)
.90
Applicant type, n (%)
1.00
MD
55 (91.7)
21 (95.5)
76 (92.7)
DO
3 (5.0)
1 (4.6)
4 (4.9)
IMG
2 (3.3)
0 (0)
2 (2.4)
Percentage of Honors in core clinical clerkships, mean (SD)
56.0 (33.5)
48.4 (35.7)
53.8 (34.0)
.46
Report of grades affected by COVID-19 pandemic, n (% in those with reported grades)
16 (40.0)
3 (18.8)
19 (33.9)
.13
Honor society membership(s), n (%)
Alpha Omega Alpha
17 (28.3)
4 (18.2)
21 (25.6)
.41
Gold Humanism Honor Society
7 (11.7)
3 (13.6)
10 (12.2)
1.00
Other
12 (20.0)
3 (13.6)
15 (18.3)
.75
Medical school rank, n (%)
.04
Top 40
32 (53.3)
6 (27.3)
38 (46.3)
Non-top 40
28 (46.7)
16 (72.7)
44 (53.7)
Publications, median (IQR)
Peer-reviewed article
2 (1-3.5)
2 (1-3)
2 (1-3)
.97
Poster presentation
4 (3-8)
3 (2-4)
3.5 (2-6)
.11
Oral presentation
2 (0-4)
1 (1-2)
1 (0-4)
.53
Book chapter
0 (0-0)
0 (0-0)
0 (0-0)
.50
SLORs, n (%)
62 (68.9)
28 (31.1)
90 (100.0)
Type of letter, n (%)
<.001
Hybrid SLOR
59 (95.2)
13 (46.4)
72 (80.0)
SLOR alone
3 (4.8)
15 (53.6)
18 (20.0)
SLOR is from, n (%)
<.001
Home institution
53 (85.5)
11 (39.3)
64 (71.1)
Away institution
9 (14.5)
17 (60.7)
26 (28.9)
SLOR is the result of a/an, n (%)
<.001
In-person interaction
62 (100.0)
19 (67.9)
81 (90.0)
Virtual interaction
0 (0)
9 (32.1)
9 (10.0)
CK, clinical knowledge; IMG, international medical graduate; IQR, interquartile range; SD, standard deviation; SLOR, standardized letter of recommendation; USMLE, United States medical licensing examination.
Statistically significant P values are shown in bold typefaces.
Of 90 SLORs, 62 (68.9%) letters were in format 1 while 28 (31.1) were in format 2. Most letters (59, 95.2%) in format 1 were hybrid SLORs whereas about a half (15, 53.6%) of letters in format 2 were SLORs alone (P <.001). Fifty-three (85.5%) and 11 (39.3%) format 1 and format 2 letters were from home rotations, respectively (P <.001). All SLORs in format 1 were the result of in-person rotations while 9 (32.1%) SLORs in format 2 were obtained from virtual rotations performed during the COVID-19 pandemic (P <.001). The vast majority of applicants in both formats fell in the highest tiers for almost all domains (5/6 in format 1, and 9/9 in format 2), except for “likely rank position at letter writer's institution” domain of format 1 which has a median equal to the middle tier. Figure 2 demonstrates the distribution of domain ratings box plot for both SLOR formats.
Figure 2Box and whisker plots showing the distribution of SLOR domain ratings for (A) format 1 and (B) format 2. Box lines represent the 25th, 50th, and 75th percentiles. Whisker lines represent the 5th to 95th percentile range, and dots represent outliers outside of this range. SLOR, standardized letters of recommendation. (Color version available online.)
Correlations Between Letter Domains and Application Metrics
In correlations between domains of format 1 and components of ERAS applications, the academic potential was significantly correlated with the number of peer-reviewed papers (τb = 0.32, P =<.01) and poster presentations (τb = 0.27, P = .01). Among format 2 domains, knowledge base vs Step 1 score (τb = 0.32, P = .04) and percentage of Honors (τb = 0.37, P = .04) were significantly correlated. Additionally, the correlation between technical aptitude and the percentage of Honors (τb = 0.59, P =<.01) was statistically significant. Other tested correlations are presented in Supplementary Table 1.
Impact of Applicant and Letter Characteristics on Domain Ratings
Female applicants had a higher mean in all format 1 domains compared to male applicants except for the “performance relative to other sub-interns” domain; however, such differences were not statistically significant. In format 2 domains, although male applicants received higher ratings in more domains compared to their female counterparts, these differences were not statistically significant (Table 2). Applicants self-identifying as white received higher ratings than all other applicants in the “potential as a urology resident” (4.5 vs 3.9, P = .01), and “performance as a sub-intern” (4.4 vs 3.7, P = .01) domains. Of note, white applicants with format 1 SLOR had significantly higher Step 1 scores as compared to applicants with other races in this subgroup analysis. The observed differences in all format 2 domains were not statistically significant (Table 2).
Table 2Comparison of domain ratings based on gender and race of applicants
Domain
Female
Male
P value
White
Other
P value
Format 1
Potential as a urology resident
4.3
4.1
.69
4.5
3.9
.01
Potential as an academic urology attending
3.9
3.7
.68
4.0
3.4
.09
Performance as a sub-intern
4.2
4.0
.46
4.4
3.7
.01
Urologic knowledge base
2.6
2.2
.16
2.5
2.3
.35
Performance relative to other sub-interns
3.3
3.4
.64
3.6
3.1
.13
Likely rank position
3.4
2.9
.21
3.3
3.1
.58
Format 2
Communication
3.4
3.2
.31
3.3
3.2
.68
Professionalism
3.2
3.5
.46
3.4
3.4
.86
Team player
3.1
3.3
.50
3.3
3.2
.76
Teachability/response to feedback
3.2
3.3
.89
3.5
3.1
.21
Technical aptitude
2.9
2.9
.87
3.1
2.8
.27
Leadership potential
3.0
2.9
.68
3.2
2.8
.26
Knowledge base
2.8
3.2
.31
3.4
2.8
.11
Other stakeholder assessments
3.0
3.0
1.00
3.1
2.8
.41
Overall rank of candidate
3.3
2.9
.23
3.2
2.9
.35
Values are mean of domain ratings. Of note, white applicants in format 1 had significantly higher Step 1 score compared to applicants with other races in format 1.
Statistically significant P values are shown in bold typefaces.
The difference between AOA members and nonmembers were significant in urology resident potential (4.7 vs 4.0, P <.01), academic urologist potential (4.3 vs 3.6, P = .04), and performance as a sub-intern (4.5 vs 3.9, P = .03) among format 1 domains. Members of GHHS were only ranked significantly higher in the “overall rank of candidate” domain (3.8 vs 2.9, P = .02). Applicants from the top 40 US medical schools performed better as sub-interns (4.3 vs 3.8, P = .04) and were more likely to be ranked higher at writer's institution (3.6 vs 2.8, P = .02).
Letters from applicants’ home institution were rated higher than letters from away institution in the “potential as an academic urology attending” (3.9 vs 2.9, P = .02), “performance as a sub-intern” (4.2 vs 3.4, P = .01), “performance relative to other sub-interns” (3.7 vs 2.1, P <.01), and “likely rank position” (3.3 vs 2.0, P = .03) domains of format 1 (Supplementary Table 2). No significant differences were observed among domains of format 2. Additionally, applicants experiencing an in-person interaction with letter writers received significantly higher ratings in the communication domain (3.4 vs 2.9, P = .04).
Domain ratings in both formats remained stable with increasing depth of interaction between the applicant and letter writer (Supplementary Table 3). The academic rank of letter writers was not associated with domain ratings where the number of observations was sufficient to run statistical tests (Supplementary Table 4). Letters from female writers had higher means in all domains of format 1 and 6 out of 9 domains of format 2 SLORs; however, these differences were not statistically significant (Supplementary Table 4).
DISCUSSION
This is the first study that investigates the utility of the SLORs in urology residency applications. While there was a substantial increase in SLORs use this match cycle, in our cohort 74% had NLORs. We also found that there is a marked ceiling effect where most applicants are rated among top tiers in both formats of urology SLORs (Fig. 2). The tight distribution of domain ratings may make it challenging for interviewers to rely on SLORs to distinguish between applicants. Similar outcomes were observed in SLORs in emergency medicine, orthopedic surgery, and otolaryngology suggesting the ceiling effect could be a pervasive drawback to SLORs (similar to NLORs).
When (Almost) everyone is above average: a critical analysis of american orthopaedic association committee of residency directors standardized letters of recommendation.
This phenomenon may have several underpinnings. First, given the competitiveness of matching into urology, applicants are highly self-selected and typically outstanding candidates amongst their peers. Second, the ceiling effect could be due to the fact that medical students tend to get LORs from attendings that are known for writing strong letters or who will review their candidacy favorably. Furthermore, using Likert-like scales to rate the applicants, which is the case for the current formats of urology SLORs, may decrease the observed variability of ratings compared to wider scales (eg, 0-100 quantitative scales). Lastly, the distinguishing feature of urology applicants may not be captured by the current domains of SLOR or may be difficult to be ranked by a Likert-like scale.
When (Almost) everyone is above average: a critical analysis of american orthopaedic association committee of residency directors standardized letters of recommendation.
Therefore, urology SLORs could be improved to contain more domains pertaining to different aspects of an applicant. Commitment to urology, initiative and drive, and research capabilities are examples of domains that could enhance existing urology SLOR templates. Moreover, we suggest the use of quantitative scales with wider range to increase the variability in future urology SLORs. Also, the letter writers should also be instructed that the ratings they give on SLORs should be based on a comparison with the urology applicants pool, not the total population of fourth-year medical students.
This study indicated few meaningful correlations between ERAS application metrics and corresponding SLOR domain ratings. Those mainly included correlations with USMLE scores, percentage of Honors grades in core clinical clerkships, and number of publications/poster presentations (Supplementary Table 1). Although some correlations approached our predefined significance level, we found no correlations in some expected areas (eg, urologic knowledge base vs Step 1/Step 2 CK scores; performance as a sub-intern vs % of Honors; overall rank vs Step 1 score; etc.). The latter finding could be due to the relatively low number of observations in some tested groups of our study. It should be noted that we observed the grading system for core clinical clerkships was affected by the COVID-19 pandemic in about a third of this year's transcripts and was switched to a pass/fail system. Additionally, some medical schools routinely use a pass/fail system to report grades. These might justify the relatively low level of correlations between Honors grades and domain ratings in our study.
A recent study on linguistic analysis of urology NLORs demonstrated that letters written for match-successful applicants had more power words which was also the case for the NLORs written for male urology applicants. The authors found an implicit gender bias in urology NLORs.
Using predefined scales to rate specific characteristics of applicants, the SLORs have been found in prior studies in other specialties to reduce gender biases.
This is of utmost importance especially in a field like urology where gender imbalance is a real concern. Interestingly, our findings were congruent with those studies indicating that gender bias did not exist in either domain of both SLORs, with the mean of most domain ratings being slightly higher for female applicants compared to their male peers. We also looked for other potential biases in the report of current formats of SLOR. No significant differences were found in domain ratings with regards to applicants’ race, depth of interaction between letter writers and applicants, and gender of letter writers. To our best knowledge, the presence of racial bias has not been evaluated in urology NLORs before. Therefore, at this point we are unable to determine if SLORs were able to reduce such an important bias among urology applicants. We also found SLORs in format 1 from home institutions had significantly higher ratings. With an increasing number of virtual sub-internships this year, there was a growing concern about whether that type of interaction would affect the letter writers’ perception about the applicants. Based on our data, no statistically significant differences were found in domain ratings between in-person and virtual interactions, except for the “communication” domain, which is understandable in the virtual context.
In February 2020, the USMLE parent organizations announced a transition in Step 1 score reporting from a 3-digit numeric score to pass/fail.
This decision was made after several months of debates and will take place no sooner than January 2022. Previous studies have emphasized the role of Step 1 score in urology match. More than 80% of urology programs used the Step 1 score as a cutoff point to screen applicants for offering interviews.
It was also one of the predictors of a successful match into urology to the extent that the odds of matching multiplied by 1.5 for every 10-point increase in the Step 1 score.
The Step 2 CK score could serve as a surrogate considering the more clinical nature of this examination. However, a decent number of applicants have not taken their Step 2 CK examination when filing their residency applications, especially this portion will be higher among urology applicants given that urology is an early match. In our study, only 69.5% of applicants reported their Step 2 CK scores. With the upcoming transition of the Step 1 examination to pass/fail reporting, there is an emerging call for another equitable and effective screening tool. A recent study indicated that most urology program directors (84.6%) believe this change will make it more difficult to objectively compare applicants.
All in all, these further highlight the importance of introduction and widespread usage of other quantitative measures such as SLORs in urology residency applications. A cultural shift toward a more reliable, honest, and fair evaluation of applicants is crucial as well to ensure the success of this new assessment tool.
We acknowledge there are several limitations to our study. First, due to the limited use of SLORs by urology attendings, there were a relatively low number of observations in some subgroups of our study. Our findings could pave the road for additional research and development of more effective urology-specific SLORs, and could promote the widespread use of such letters by urology attendings.
Moreover, we derived the data on the nature of rotations (in-person vs virtual) based on the mention of that by letter writers. Additionally, no domain was specified for this aspect of the rotations in the current formats. Therefore, there might be some SLORs resulting from virtual rotations that were inadvertently classified as in-person. We compared honor society members to nonmembers. This section of results might have been more meaningful if we could exclude applicants from medical schools with no honor society chapters. However, we found that some honor society elections are held during the senior year, making exclusion of medical schools more complicated. Finally, the SLORs of applicants to our program may not be generalizable to other programs.
CONCLUSION
The use of SLOR in urology residency applications is still limited compared to other specialties. There is a lack of variability among domain ratings of the current SLOR formats which makes it hard to differentiate between highly qualified urology applicants. This study noted some correlations between conventional application metrics with SLOR domain ratings, mostly in format 1 SLORs. Interestingly, discrepancies between female and male applicants were not found. However, a bias existed between format 1 SLORs from home institution compared to away institutions. Given the transition in USMLE Step 1 score reporting to pass/fail outcome, the need for a reliable urology-specific SLOR will be critical for the field to select the most qualified applicants to enter this competitive surgical subspecialty.
When (Almost) everyone is above average: a critical analysis of american orthopaedic association committee of residency directors standardized letters of recommendation.