Modeling familial predictors of proband outcomes in neurogenetic disorders: initial application in XYY syndrome

Background Disorders of gene dosage can significantly increase risk for psychopathology, but outcomes vary greatly amongst carriers of any given chromosomal aneuploidy or sub-chromosomal copy number variation (CNV). One potential path to advance precision medicine for neurogenetic disorders is modeling penetrance in probands relative to observed phenotypes in their non-carrier relatives. Here, we seek to advance this general analytic framework by developing new methods in application to XYY syndrome—a sex chromosome aneuploidy that is known to increase risk for psychopathology. Methods We analyzed a range of cognitive and behavioral domains in XYY probands and their non-carrier family members (n = 58 families), including general cognitive ability (FSIQ), as well as continuous measures of traits related to autism spectrum disorder (ASD) and attention deficit hyperactivity disorder (ADHD). Proband and relative scores were compared using covariance, regression and cluster analysis. Comparisons were made both within and across traits. Results Proband scores were shifted away from family scores with effect sizes varying between 0.9 and 2.4 across traits. Only FSIQ and vocabulary scores showed a significant positive correlation between probands and their non-carrier relatives across families (R2 ~ 0.4). Variability in family FSIQ also cross-predicted variability in proband ASD trait severity. Cluster analysis across all trait-relative pairings revealed that variability in parental psychopathology was more weakly coupled to their XYY versus their euploid offspring. Conclusions We present a suite of generalizable methods for modeling variable penetrance in aneuploidy and CNV carriers using family data. These methods update estimates of phenotypic penetrance for XYY and suggest that the predictive utility of family data is likely to vary for different traits and different gene dosage disorders. Trial registrations ClinicalTrials.gov NCT00001246, “89-M-0006: Brain Imaging of Childhood Onset Psychiatric Disorders, Endocrine Disorders and Healthy Controls.” Date of registry: 01 October 1989. Supplementary Information The online version contains supplementary material available at 10.1186/s11689-021-09360-7.


Background
Disorders of gene dosage, ranging from aneuploidies to copy number variations (CNVs), are increasingly recognized as high-impact genetic risk factors for neuropsychiatric disease [1]. Recurrent pathogenic gene dosage disorders have been associated with increases in risk for several neuropsychiatric phenotypes, including autism spectrum disorder (ASD), bipolar disorder, and schizophrenia [2]. However, there is also strikingly high variability within carriers of any given aneuploidy [XYY [3]] or CNV [16p11.2 deletion [4], 22q11.2 deletion [5]], especially when carriers are identified through populationbased versus clinical sampling frameworks [6,7]. In particular, clinical sampling-as compared to populationbased sampling-is influenced by referral and/or ascertainment biases that can result in an enrichment for greater phenotypic severity. The broad range of outcomes within aneuploidy and CNV disorders poses complex questions regarding the sources of phenotypic variation, and also frustrates personalized medicine approaches by making it difficult to predict outcomes in new cases. There has been growing interest in addressing this challenge by using family-based study designs to improve the prediction of penetrance for individual carriers, and to better understand sources of outcome variance across carriers.
Improved prediction of penetrance for individual carriers of rare genetic disorders is of great importance because information about the penetrance of a given genetic disorder (for example as used in a genetic counseling context) is typically based on reported phenotypic averages in clinical groups. However, given that there is typically significant phenotypic variability amongst individuals with any specific genetic disorder [4], a critical question for both families and clinicians relates to the likely phenotypic outcome for a given affected individual, which will reflect consequences of the genetic disorder as well as the genetic and environmental background upon which the disorder is occurring.
A recent study used family data to model outcomes in probands carrying a 16p11.2 deletion [4]. The authors reported significant moderate intraclass correlation between parent and proband scores for IQ, autism-related traits measured using the Social Responsiveness Scale (SRS [8]), and motor dexterity. This work demonstrates the potential for family-based study designs to improve outcome prediction in aneuploidy and CNV carriers. Important open questions remain, however, regarding: (i) whether the utility of family data in predicting proband outcomes might vary for different traits and different genetic disorders, (ii) whether there is cross-trait correspondence between family traits and proband outcomes, such as familial cognitive ability predicting proband psychopathology, (iii) whether one can boost prediction by adding data on early perinatal health to family trait measures, and (iv) whether trait correspondence between parents and offspring is itself modified by aneuploidy or CNV carriage.
Our current study develops and implements new analytic tools to address these open questions using data from XYY syndrome as proof of principle first application, although the approaches presented are explicitly designed to be generalizable to any aneuploidy or CNV disorder. XYY syndrome arises due to carriage of an extra Ychromosome in males and is associated with increased height, decreased non-verbal and verbal IQ, and elevated risk for learning difficulties and social impairment [3,9]. As a genetically defined condition that is associated with highly variable outcomes across individuals, XYY syndrome can be considered a paradigmatic example of a genomic dosage disorder that can impact human development. Our study design seeks to examine neurobehavioral trait correspondence between XYY probands and their non-carrier first degree relatives (parents and siblings) using four complementary analytic strategies.
First, we apply the standard approach of estimating Pearson correlation coefficients between XYY probands and their first-degree relatives for 12 different traits spanning cognitive and neuropsychiatric domains (see "Methods" section). This analysis gives useful traitspecific information, but also provides the broadest view to date regarding potential differences across neurobehavioral traits in observed covariation between carrier probands and their unaffected first-degree relatives. Second, we complement standard correlation analysis by also using a regression framework [10] to model proband-family interrelationships for each trait. Third, by extending this regression framework to a multivariate model, we test if knowledge of socioeconomic and perinatal variables can provide added prediction of variation in proband traits above and beyond knowledge of the trait value in unaffected family members. Given the well-established relationships between interindividual variation in IQ and psychopathology [11,12], we also use multiple linear regression to test if family FSIQ is significantly associated with proband variation in non-FSIQ traits. Finally, we systematically map all cross-trait correlations across parent-proband, parent-sibling, and proband-sibling pairings to (i) directly capture differences in trait coherence between parent-proband and parent-sibling pairs, and (ii) find groupings of traits that show shared covariation in families. Inclusion of unaffected siblings provides a natural comparison for observed trait correspondence between parents and XYY probands in order to assess whether carriage of an extra Y-chromosome may not only alter phenotypes relative to family background, but also modify the concordance of trait variability across parents and their offspring.
The analytic framework presented is not only applicable to better understanding the sources of variation and predicting child-family outcomes in XYY syndrome, but it can also be used to determine the magnitude and variability of phenotypes associated with other aneuploidy and CNVs disorders. As discussed below, our findings carry implications for the use of family data to best predict outcomes in offspring carrying high-impact genetic risks for atypical development.

Participants
Our study population consisted of 58 families all defined through an index singleton XYY proband. Participants were recruited through the Association for X and Y Chromosome Variations [13] and the NIH Clinical Center Office of Patient Recruitment. The primary inclusion criterion was presence of a cytogenetically confirmed non-mosaic XYY karyotype in the proband and proband age between 5 and 25 years. Probands with very low birth weight were excluded given that this is not a recognized phenotypic association of XYY syndrome [14] and represents an alternative source of proband impairment that is not central to our analysis. We sought to include any biological parents and male siblings that accompanied the proband at enrollment. We restricted recruitment to male siblings to exclude sex as a source of outcome discordance between carrier probands and their siblings. Phenotypic data were available for unaffected full biological relatives as follows: proband mothers in 57 families, proband fathers in 34 families and proband siblings in 24 families (Table 1, Additional file 1). Informed consent (or assent where appropriate) was obtained from all study participants. All study procedures were approved by an NIH Institutional Review Board.

Testing and questionnaires
Group sizes ranged from 23 to 58 individuals per domain of measurement (see Additional file 2 for details). We included the following cognitive traits as measured using age-appropriate Wechsler scales: Full-Scale Intelligence Quotient (FSIQ-values scaled to have a population mean of 100 and a standard deviation (s.d.) of 15), vocabulary and matrix reasoning scores (scaled mean of 10 and a s.d. of 3). ASD-related traits were measured using the Social Responsiveness Scale Second Edition questionnaire (SRS-2 [15]). For this study, we analyzed the overall SRS-2 total score, scores for each of five treatment subscales (Social Awareness, Social Cognition, Social Communication, Social Motivation, and Restricted Interests and Repetitive Behavior), and one DSM-5 compatible subscale score (Social Communication and Interaction). SRS-2 T-scores have a scaled mean value of 50 (s.d. 10), with higher scores indicating greater impairment. ADHD-related traits were measured using two linked scales that together bridge the wide age-range of our study participants. Specifically, DSM scales for ADHD inattentive and ADHD hyperactiveimpulsive symptoms were drawn from the Conners 3-Parent for participants under 18 years of age, and from the Conners' Adult ADHD Rating Scales-Self-Report: Long Version (CAARS-S:L) for participants 18 years of age and older [16,17]. Results are reported as T scores (scaled mean of 50 and s.d. of 10), with higher scores indicating greater impairment. We used a standardized clinical interview to gather data in probands on key perinatal variables that have previously been linked to neurodevelopmental outcomes: birth weight, gestation time, and maternal age [18][19][20]. We also included the sociodemographic questionnaire developed by the MacArthur Network, as socioeconomic status (SES) has also been linked to neurodevelopmental outcomes in the general population [21].

Statistical analysis
To allow family-proband trait comparisons using all available families for all traits, we derived a summary "family score" for each trait by averaging scores from all available unaffected relatives (i.e., mother, father, sibling). For each of 12 traits, we first calculated probandfamily trait correspondence across families using Pearson correlation coefficients. Next, we estimated traitspecific linear regression models to estimate an offset and slope for prediction of proband scores as a function of family scores ( Fig. 1).

Fig. 1
Annotated sample plot for simple linear regression. The univariate regression framework predicts proband outcome as a function of the family score for that measure. The model provides an offset (the difference between average family and average proband scores) and compares the regression slope to a horizontal line (m = 0) and an identity line (m = 1). This plot is for visualization purposes only and does not include any data from the study For each trait, models were run after centering proband and family scores at the mean family score so that model intercepts would estimate the average score offset in probands relative to their unaffected family members (i.e., the penetrance of XYY for the trait in question). The regression framework above also quantified traitspecific regression slopes given by ß 1 coefficient in the model above, which were compared against two reference values: 1 (the slope if proband scores scale linearly with family scores and show a stable offset across the range of observed family scores), and 0 (the absence of any significant linear relationship between proband and family scores). Because these regressions were calculated without prior standardization of proband and family scores, the observed slopes are not identical to the Pearson correlation between proband and family scores. Moreover, unlike correlation analyses, regression models (i) quantify the expected shift in proband scores for a unit shift in family scores in the unit of trait measurement, (ii) provide a means of testing if this regression coefficient differs from a pre-specified value of interest ( Fig. 1), and (iii) provide a model for estimating the predicted proband score at a family score of interest (by definition, predicted proband offsets vary by family scores when the observed regression slope differs from 1). Given the small sample size of our study, and expectations from prior work that effect sizes for offsets would be significantly larger than those for the ß 1 coefficient, we considered the ß 1 coefficient tests to be exploratory in nature and in particular need for replication in larger samples. Furthermore, because we examine these slopes in a mass univariate fashion, our analysis do not explicitly test for differences in slopes for different traits.
We next extended these simple univariate regression models to determine if prediction of variation in a given trait across probands could be improved, beyond consideration of just that trait in family members, by incorporating additional knowledge of (i) family FSIQ ("Model 1" below) and (ii) a set of additional variables capturing socioeconomic and perinatal variables ("Model 2" below). Again, given the small size of our sample, these extended tests should be considered exploratory in nature.
Model 1: Model 2: The ß 2 coefficient from Model 1 was compared to 0, to test for statistical evidence that variation in family FSIQ could explain significant variation in each proband trait beyond knowledge gained from the same trait in family members. Models 1 and 2 were compared using an ANOVA to first test for evidence that combined consideration of all specified socioeconomic and perinatal variables could predict additional variance in proband outcomes. A statistically significant increase in variance was required before consideration of individual ß coefficients from the socioeconomic and perinatal variables in Model 2. As the multiple linear regression models 1 and 2 could only be run with participants who had complete information on all predicting variables for each proband outcome, we also re-ran all univariate regression models with these reduced participant sets to allow direct comparison of the variance in proband outcomes that could be explained by univariate versus multiple linear regression models.
Finally, we extended our analysis to a simultaneous exploration of all possible pairwise trait interrelationships across and within the probands, siblings, and parents represented in our cohort. Specifically, we generated a square (36 × 36) correlation matrix of all pairwise Pearson correlations coefficients between 12 proband, sibling, and parent measures. For this analysis, a single parent score was calculated for each measure by averaging scores from the mother and father in each family, or using one parent's data if only one parent participated. All measures were z-scored and FSIQ, vocabulary, and matrix reasoning were inverted, such that score polarity matched that for the other scales (i.e., lower scores indexing less impairment). The full correlation matrix was clustered using hierarchical agglomerative clustering, and selection of the final number of clusters (k) was determined using the elbow method [22].

Participant characteristics
Descriptive statistics for trait scores in probands and noncarrier family-members are provided in Table 1, with complete demographic information presented in Additional file 1. Mean FSIQ in XYY probands was 86.65 + 13.96, and proband FSIQ ranged from 53 to 112. Total mean vocabulary and matrix reasoning scaled scores in probands were 7.11 + 2.91 and 8.97 + 3.05, respectively. In line with previous reports, mean scores for FSIQ, SRS-2, and Conners were all shifted relative to reference norms (decreased for FSIQ and increased for SRS-2, and Conners) in XYY probands, and we observed considerable variability in each trait across probands [32][33][34]. For example, proband SRS-2 total score ranged from 42 to 100 and proband ADHD hyperactive-impulsive symptoms ranged from 40 to 90. Trait scores in non-carrier family members were in the average range relative to population norms. For the 12 trait measures of interest, there were no significant differences between proband scores as a function of the availability of family data (i.e., the mean proband FSIQ score was not significantly different for the probands who had one relative participate versus those who had two or three relatives participate).

Relationship between proband and family scores for individual traits: primary analysis of offset
Results from correlation and regression analysis of proband-family interrelationships for each trait examined are summarized in Table 2 and are graphically represented for selected traits in Fig. 2. The remainder of the trait-specific scatterplots are presented in Additional file 3. Observed correlations between probands and family scores varied greatly across traits. Statistically significant positive correlations between proband and family traits were only observed for FSIQ (r = 0.63, p < 0.0001) and vocabulary score (r = 0.57, p < 0.0001). Weaker (r < 0.3) and statistically non-significant positive probandparent correlations were seen for matrix reasoning and ADHD hyperactive-impulsive symptoms. Proband-family correlation coefficients for SRS-2 total scores and subtests ranged from r = − 0.26 to r = 0.088; only the SRS-2 social awareness subscale reached statistical significance (r = − 0.26, p = 0.045).
Regression analyses revealed a statistically significant offset in proband scores relative to family scores for all traits. Observed offsets (i.e., the penetrance of XYY syndrome) varied in magnitude across traits, with the greatest offsets observed for ADHD-related traits (Cohen's d > 2 score elevation), followed by SRS-2 measures of ASD-related traits (median Cohen's d across scales~1.7 score elevation) and FSIQ measures (Cohen's d~1.3 score decrement for FSIQ and vocabulary, and~0.9 matrix reasoning).

Relationship between proband and family scores for individual traits: preliminary analysis of slopes
Analysis of model slope terms indicated that FSIQ and vocabulary showed by far the closest correspondence between variation in proband and family scores as compared to other scales. The observed slopes for these traits were0 .9 and both statistically indistinguishable from 1, indicating that across the full range of IQ scores observed in our cohort, there was an approximately 1:1 relationship between score differences across probands and score differences across their unaffected relatives ( Table 2, Fig. 2a,b). The equivalent slope term for regression analysis of matrix reasoning was statistically indistinguishable from 0 ( Table  2, Fig. 2c). Examination of adjusted R 2 estimated from regression analyses indicated that variability in family scores explained approximately 39% of the variance in proband FSIQ, 31% of variance in proband vocabulary scores, and 3% of variance in proband matrix reasoning. Besides FSIQ and vocabulary scores, only one other trait examined had an estimated slope from regression analysis that was statistically significantly different from 0: the SRS-2 social awareness subscale. However, the observed slope for prediction of variation in proband social awareness scores from those in unaffected relatives was negative (− 0.6, Table 2, Fig. 2d), suggesting that greater degrees of impaired social awareness in XYY probands by parent rating tend to be associated with better social awareness in first degree relatives.

Exploring cross-trait influences of family IQ and perinatal variables
Several measures of proband ASD traits from the SRS-2, including total score, social communication, and DSM-5 social communication and interaction scores, were significantly associated with family IQ in models that also accounted for family scores on the corresponding ASD trait (Additional file 4). For all of these traits, greater family FSIQ scores were associated with reduced SRS-2 scores in probands. These extended models including family IQ explained approximately four times greater variance in the named proband SRS-2 traits than models which only included the corresponding SRS-2 trait in family members' measures. However, the absolute proportion of variance that could be predicted for proband scores remained low (average~5%, Additional file 4). Further extending multiple linear regression models to include family SES and a set of perinatal factors (see "Methods" section) did not yield significantly greater prediction for any proband traits measured in this study (Additional file 4).

Exploratory cluster analysis of all pairwise trait-relative relationships
The pairwise correlation matrix across all measures for all family members is shown in Fig. 3. Ordering trait measures by their family member of origin created a 3 × 3 quadrant structure, highlighting three distinct classes of intra-familial trait comparisons: parent-proband, parent-proband's sibling and proband-sibling (Fig. 3a).
Qualitatively, this correlation matrix shows that (i) intertrait correlations tend to be higher within an individual than between individuals, and (ii) trait correlations tend to be lower between parents and XYY probands than between parents and unaffected siblings of XYY probands. Unsupervised clustering of this full pairwise relative-trait correlation matrix according to an empirically selected 4 cluster solution (Fig. 3b) separated out the following subsets of traits across family members: (i) a "Family IQ and Proband ADHD-trait" cluster, which includes all IQ measures (FSIQ, vocabulary, and matrix reasoning) for all family members, proband ADHD inattentive and hyperactive-impulsive symptoms, (ii) a "Parent Psychopathology and Sibling ADHD-trait" cluster of all parent SRS-2 measures, parent ADHD inattentive and hyperactive-impulsive symptoms, and sibling ADHD inattentive and hyperactive-impulsive symptoms, (iii) a "Sibling Psychopathology" cluster including all sibling SRS-2 measures, and (iv) a "Proband Psychopathology" cluster of all proband SRS-2 measures (Fig. 3c). The hierarchical relationship between these four clusters (Fig. 3b) supports the qualitative impression from Fig. 3a by revealing a higher-order set of two clusters: Proband Psychopathology and Family IQ vs. Parent Psychopathology and Sibling Psychopathology.

Discussion
Our study develops and applies a set of complementary methods for prediction of outcome variability in aneuploidy and CNV carriers based on family data. By using these methods to model data from families with male probands carrying an extra Y chromosome, we (i) refine insights into the impact of XYY on several aspects of early cognitive and behavioral development, and (ii) show the general utility of family-based analyses for resolving interdependencies between neurobehavioral outcomes in aneuploidy or CNV carriers, and neurobehavioral profiles in their first degree, non-carrier relatives. We address the implications of our results below and consider important steps for future work.  The correlation analyses in this study indicate that the degree of coherence between trait variability across aneuploidy-carrying probands and their unaffected family members can vary greatly across traits. The strongest positive proband-family trait correlations were seen for FSIQ (0.63). Similar magnitudes of proband-family FSIQ correlations have also been reported in several other aneuploidies and CNVs [22q11.2.2 deletion syndrome: [10], 16p11.2.2 deletion syndrome: [4], Down syndrome: [35], and Klinefelter syndrome [36]], as well as in the general population [10,37]. This suggests that although most neuropsychiatric aneuploidies and CNVs significantly impact FSIQ, many do so without disrupting the other sources of variance that underpin the well-documented familiality of these traits in the general population [38]. In other words, the causal pathways that mediate the negative impact of many aneuploidies and CNVs on FSIQ may be partly distinct from those that explain intrafamilial FSIQ correlations. This principle may not hold equally for vocabulary and matrix reasoning, however-although formal tests of this notion must await direct comparisons in larger samples than ours of family-proband correlations for different traits. In our present study of XYY syndrome, and in past studies of XXY syndrome and 16p11.2 deletion disorder, the correlation between probands and firstdegree relatives was stronger for verbal than non-verbal subcomponents of general cognitive ability [4,36].
Further, in contrast to the robustly positive proband-family correlations for FSIQ, we observe near-zero or negative proband-family correlations for several SRS-2 traits in XYY syndrome (0.088 > r > − 0.26). This finding is surprising given that previous reports have found strong and positive probandfamily correlations for social impairment in 16p11.2.2 deletion syndrome [4], and in the general population (e.g., parent-child r~0.35 [37]). One potential explanation for these findings is a differential rater effect across studies or syndromes. For example, variation in parental social functioning may influence how parents rate the social behavior of affected offspring (e.g., parents with greater social awareness are more sensitive to social impairments), and these sorts of effects may vary by the nature of social dysfunction seen in different neurogenetic disorders. Another potential explanation is that the uncoupling of child from parent SRS-2 scores is driven in part by aspects of the underlying biology of XYY syndrome, such that carriage of an extra Y-chromosome introduces independent sources of variance in social functioning that disrupt or overwhelm the influence of familial factors. For example, given that psychopathology in XYY syndrome has been shown to affect the specificity of the SRS-2 screening form [3], the specificity of SRS-2 ratings may differ between genetic disorders with dissimilar profiles of psychopathology-thereby leading to differing proband-parent correlations in SRS-2 scores. The variability of the relationship between family and XYY probands across different traits, and the differences between findings in XYY and gene dosage disorders like 16p11.2 deletion syndrome, suggests that the accuracy of family data in predicting proband outcomes is likely to be both trait-and disorder-specific. Taken together, our findings suggest that knowing family scores for certain traits, such as FSIQ and vocabulary scores, is useful for predicting proband trait variation across aneuploidy and CNV disorders, but that other traits, such as social impairment, can show highly variable proband-family correlations in different aneuploidy and CNV disorders.
Our findings also emphasize the added value of modeling proband-family trait interrelationships within a regression framework [4,10,36]. Specifically, by using regression to estimate proband offsets relative to family members versus the general population, we derived more refined estimates of the penetrance of XYY syndrome. This refinement is achieved because, (i) recruitment biases may enrich probands for particular background genetic and environmental factors that influence traits that are also impacted by XYY syndrome, and (ii) family-based offsets estimate the penetrance of XYY syndrome while controlling for the background genetic and environmental factors that probands share with their family members. For some traits, such as IQ and ADHD, our family-based models estimated offsets that differ from prior studies where trait scores in XYY syndrome were compared to those from recruited controls or standardized instrument distributions [33,39]. We observed the largest offsets for ADHD-related traits, followed by ASD-related traits and FSIQ/vocabulary scores.
In addition to allowing offset estimation, regression approaches also provide a quantitative framework for estimation of proband scores given known family scores. However, the magnitudes of these slopes were not as well estimated as the proband offsets, and given the small sample size of our cohort, results from these regression slope analyses must be considered provisional in nature. By analyzing these slopes, our study reveals that the average magnitude of FSIQ reduction in XYY probands relative to their first-degree relatives is expected to be the same (~1.3 Cohen's d effect size) across different levels of proband and family IQ. This finding indicates that any potential ascertainment biases that might enrich for recruitment of families with unusually high or low FSIQs should not bias estimation of the penetrance of XYY for FSIQ reduction (although may still bias estimation for other neurobehavioral measures).
Regression analysis also facilitates multivariate modeling of proband outcomes from family data, which we harnessed to test for potential moderating effects of family FSIQ and perinatal variables on other proband-family trait interrelationships. These analyses revealed that there is a negative association between family IQ and proband SRS-2 scores: a higher family IQ score provides a protective effect for some aspects of social responsiveness. This finding demonstrates the need to examine cross-measure, family-to-proband relationships to better predict proband outcomes in the future. Multiple linear regression models also suggested that variability in XYY proband outcomes is not significantly related to variability in familial SES and perinatal variables. Further expansion of such multiple linear regression methods would help to more systematically determine the limits of our capacity to predict variation in proband outcomes from family phenotypic data.
Finally, our framework applies clustering analysis to resolve the effect of an additional Y-chromosome on coherence within and between family members across several cognitive and behavioral measures. In this context, clustering provides an analytically efficient means of describing the architecture of trait variation within families. Observing cluster composition can indicate the extent to which clustering of traits within a family is governed by carrier status (e.g., clusters separate carriers from others regardless of trait) as compared to the phenotype being considered (e.g., clusters separate IQ from other traits regardless of the family member being measured). In application to XYY probands and their family members, we observe co-clustering of traits into 4 broad groups (i) Family IQ and Proband ADHD-traits, (ii) Parent Psychopathology and Sibling ADHD-traits, (iii) Sibling Psychopathology, and (iv) Proband Psychopathology. All of the SRS-2 measures for a given family member cluster together, which is expected due to the high internal consistency of the SRS-2 [40]. The clustering of ADHD-traits in parents and unaffected siblings was also found in previous reports of children with ADHD [41]. Notably, proband ADHD-traits clustered with family IQ measures, suggesting that the sources of ADHD-trait variation may differ between unaffected siblings and the XYY probands. While overall clustering patterns suggest a greater coherence between parent and sibling psychopathology compared to parent and proband psychopathology, all of the family IQ measures cluster together, suggesting that carriage of an extra Ychromosome may disrupt coherence between probands and family members for psychopathology, but not for IQ measures. We anticipate that similar applications of clustering to family-based phenotypic data may help to reveal traits in relatives that are most closely coupled to trait variation in carriers, and to specify the effects of applying different dimension reduction techniques to familial phenotypic data.
Our findings should be considered in light of several limitations and caveats. First, because the families in our cohort were not identified through a population-based sampling frame, they may not be fully representative of the full range of outcomes and background factors seen in XYY-probands and their first-degree relatives. Second, our study is cross-sectional in design and therefore cannot resolve potentially age-varying proband-family interrelationships. Third, observed correlations between rating scale scores can be influenced by methodological aspects which our study design does not directly model, such as parent versus child [42], or mother versus father [43] rater effects. Relatedly, our study design is not able to disambiguate the many potential sources of observed correlations between parent and child traits, which could include highly contrasting mechanisms-i.e., shared genetic determinants of IQ variation between probands and parents versus high parental psychopathology being driven by high caregiver strain, which in turn relates to proband psychopathology. In a similar vein, the variable contribution of unaffected relatives to estimated trait scores across different families means that the observed trait correlations between probands and families reflects a composite of different degrees of genetic relatedness. However, our findings from naturalistic estimation of trait correspondence between probands and their family members provide a valuable reference-point when contemplating how such approaches might be used in practice to improve prediction of proband outcomes from family data in practice. Finally, our modest sample size necessarily places limits on our power to confidently detect certain phenomena-such as those modelled by the slope terms in family-proband regression analyses. Future analyses in larger cohorts will help to address this limitation and also to directly test the reproducibility of our findings.

Conclusions
Given the extensive phenotypic variation observed in individuals with aneuploidies and pathogenic CNVs, there is a need to build models that will allow for a better prediction of outcomes in individual carriers. Our study expands the suite of tools for modeling trait interrelationships between aneuploidy/CNV carriers and their unaffected family members. The utility of these tools is demonstrated in application to XYY syndrome, where we find that (i) as reported for several other gene dosage disorders, FSIQ and vocabulary scores show strong correlation between XYY probands and family members, (ii) the proband-family coupling in ASD-related traits that has been reported in CNV disorders appears to be lost in XYY syndrome, (iii) family FSIQ shows a statistically significant "cross-trait" relationship with the severity of ASD-related symptoms in XYY probands, and (iv) carriage of an extra Ychromosome may weaken correlations that are otherwise seen between parent and offspring psychopathology, but does not weaken this correlation for IQ. While the initial application of this model was in XYY syndrome, this framework can also be used to predict outcomes in probands with other chromosomal aneuploidies and copy number variants.