Conditions associated with rare genetic diseases are largely underrepresented [26, 27] in commonly used clinical terminologies such as the ICD-10 and ICD-9. The problem persists in the latest version of the International Classification of Diseases (ICD-11) terminology [28], where conditions associated with genetic diseases are either categorized in counterintuitive ways, too broadly generalized, or not defined at all [29]. In this work, we have demonstrated the feasibility of using a computational phenotype across multiple institutions to identify patients who satisfy Cleveland Clinic criteria and who may therefore benefit from PTEN genetic analysis. The positive predictive value of this approach at each of the three sites exceeded 80%, suggesting that an informatics approach may be able to bypass the shortcoming of the ICD9/10 code system in explicitly including “PTEN hamartoma tumor syndrome.”
We also evaluated the percentage of patients who were correctly identified as having PHTS, out of the total number of patients identified as satisfying Cleveland Clinic criteria through this informatics approach. While this number was low across the three sites—between 0 and 3.5%—several factors account for this. First, the number of patients with PHTS identified may reflect the very low prevalence of PHTS, which according to one estimate is 1:200,000 [30]. Second, these percentages do not take into account those who did not undergo any genetic testing in the first place. Third, not every patient who underwent genetic testing had genetic testing that included PTEN sequencing.
These percentages become higher (15.6%, 30.2%) when the denominator is further limited by those who have undergone genetic testing which would have captured PTEN variants. It is worthwhile to compare this higher range of percentages (i.e., PTEN molecular diagnosis among those identified by informatics approach who had genetic testing that included detection of PTEN variants) to clinical scenarios reported in prior studies of diagnostic yield of PTEN testing in different cohorts. For example, in the original data serving as the basis for the Cleveland Clinic pediatric PTEN criteria, there were 92 pediatric patients who met relaxed International Cowden Consortium operational criteria for CS [23], of whom 28 had a PTEN mutation (30.4%). In a retrospective study of the percentage of patients with a confirmed PTEN mutation among different pediatric cohorts, 2/14 (14.2%) had PHTS among those with ASD and macrocephaly, 3/13 (23.1%) had PHTS among those with ASD and developmental delay/ID and macrocephaly, and 6/32 (18.8%) had PHTS among those with developmental delay/ID and macrocephaly [31]. Hence, the informatics approach used in our study not only shows promise in identifying those who may meet Cleveland Clinic PTEN criteria but also underscores that there were many patients who may have benefited from genetic testing but who did not actually undergo genetic testing. This is evident by the large percentage of patients in our study identified by informatics approach as having met Cleveland Clinical PTEN criteria, who either did not have genetic testing or had genetic testing which did not include analysis of PTEN variants.
The approach taken here across three academic research centers can be used at several other institutions around the country in the future to identify patients that would benefit from PTEN sequencing. Furthermore, similar computational phenotypes can be developed and tested for other rare genetic disorders. For example, if a clinician is evaluating a patient for whom only electronic health records are available, the use of a computational phenotype could help delineate a phenotype caused by a particular gene defect.
Limitations
Limitations in this informatics approach for detecting patients who met Cleveland Clinic criteria for PTEN testing are evident in the instances of false positives, that is, those who met Cleveland Clinic criteria by the informatics approach but who on review of the medical records did not actually meet Cleveland Clinic criteria. A large contributing factor is that the billing codes may not accurately or completely encompass the clinical phenotype. In addition, there may be inaccuracies in the billing codes. For instance, in some cases, providers coded patients as having developmental delay, when the clinical documentation specifically mentioned “normal development.” There can be a mismatch in actual clinical information vs. intention behind billed ICD codes. For example, there was an instance in which a patient postoperatively lost speech but regained this ability later on. The provider coded this as expressive language disorder, perhaps because another more suitable billing code was not identifiable.
Coding systems such as ICD-10 and ICD-9 were developed primarily for administrative purposes [32]. Given the lack of precise clinical codes for genetic diseases and their symptoms, errors in coding can be difficult to avoid [33]. Studies have revealed widespread inconsistencies in the precision of billing codes in capturing clinical symptoms [34, 35]. In other words, though it is feasible to use billing codes to ascertain Cleveland Clinic criteria, there is a need for improved precision of clinical codes in capturing clinical phenotype diversity to address this limitation. Deep phenotyping [36, 37], using finer-grained representations of disease phenotypes as defined in terminologies such as the Human Phenotype Ontology (HPO) [38] and SNOMED CT [39], is essential for precise characterization and phenome-based diagnosis of rare diseases such as PHTS.
There were several additional limitations. First, we did not analyze whether patients identified as having met Cleveland Clinic criteria, and whose charts were reviewed and confirmed to meet Cleveland Clinic criteria, reported another clinical reason to suspect a diagnosis other than PHTS. Second, we did not ascertain whether macrocephaly was truly present, due to inconsistent availability and documentation of head circumference. This may help account for the low fraction of individuals who fulfill Cleveland Clinic criteria who have pathogenic PTEN variants. For example, at the BCH site, we identified an example of one patient with PHTS with macrocephaly and related dermatological findings who would have fulfilled Cleveland Clinic criteria, but macrocephaly was not billed as a diagnosis. Third, we did not limit EMR data to that prior to the diagnosis (given that a patient diagnosis would influence what clinical features are referenced in the notes), since it was not straightforward to ascertain age of diagnosis (though report date is one possibility, patient knowledge and provider knowledge of this diagnosis may lag). Finally, we did not have the data to evaluate race/ethnicity/social vulnerability index. On review of data from the BCH site, nearly 60% of the patients identified as having met Cleveland Clinic criteria using the informatics approach were white, suggesting that minorities were underrepresented, which limits generalizability. This point underscores continued need for attention to inclusion and diversity in ongoing research efforts, especially to the question of why minorities are underrepresented in research databases and clinical encounters.