Probing a neural unreliability account of auditory sensory processing atypicalities in Rett Syndrome

Background In the search for objective tools to quantify neural function in Rett Syndrome (RTT), which are crucial in the evaluation of therapeutic efficacy in clinical trials, recordings of sensory-perceptual functioning using event-related potential (ERP) approaches have emerged as potentially powerful tools. Considerable work points to highly anomalous auditory evoked potentials (AEPs) in RTT. However, an assumption of the typical signal-averaging method used to derive these measures is “stationarity” of the underlying responses – i.e. neural responses to each input are highly stereotyped. An alternate possibility is that responses to repeated stimuli are highly variable in RTT. If so, this will significantly impact the validity of assumptions about underlying neural dysfunction, and likely lead to overestimation of underlying neuropathology. To assess this possibility, analyses at the single-trial level assessing signal-to-noise ratios (SNR), inter-trial variability (ITV) and inter-trial phase coherence (ITPC) are necessary. Methods AEPs were recorded to simple 100 Hz tones from 18 RTT and 27 age-matched controls (Ages: 6–22 years). We applied standard AEP averaging, as well as measures of neuronal reliability at the single-trial level (i.e. SNR, ITV, ITPC). To separate signal-carrying components from non-neural noise sources, we also applied a denoising source separation (DSS) algorithm and then repeated the reliability measures. Results Substantially increased ITV, lower SNRs, and reduced ITPC were observed in auditory responses of RTT participants, supporting a “neural unreliability” account. Application of the DSS technique made it clear that non-neural noise sources contribute to overestimation of the extent of processing deficits in RTT. Post-DSS, ITV measures were substantially reduced, so much so that pre-DSS ITV differences between RTT and TD populations were no longer detected. In the case of SNR and ITPC, DSS substantially improved these estimates in the RTT population, but robust differences between RTT and TD were still fully evident. Conclusions To accurately represent the degree of neural dysfunction in RTT using the ERP technique, a consideration of response reliability at the single-trial level is highly advised. Non-neural sources of noise lead to overestimation of the degree of pathological processing in RTT, and denoising source separation techniques during signal processing substantially ameliorate this issue. Supplementary Information The online version contains supplementary material available at 10.1186/s11689-024-09544-x.


Introduction
Rett Syndrome (RTT), an X-linked monogenic disorder caused by de novo mutations in the Methyl-CpG-binding protein 2 gene (MeCP2), is associated with severe intellectual disability in female children [1,48].Classical RTT begins with early onset neurodevelopmental regression, typically detected between 6 to 18 months of age, that results in progressive loss of previously acquired speech and motor skills [27].The inability to verbalize, a feature in the vast majority of these children, substantially impedes objective clinical assessments of their perceptual and cognitive functioning since conventional cognitive evaluations rely heavily on overt verbal or gestural responses [6].As such, primary outcome measures in RTT are generally based on clinical judgement.As a consequence, there is limited knowledge about the perceptual and cognitive capabilities of these individuals across the progressive clinical stages of RTT [18,52].The lack of objective assessment tools adversely impacts both clinical evaluation and the measurement of therapeutic efficacy during intervention trials.It is therefore imperative for the field to identify quantitative measures of neural function that can be objectively measured and longitudinally monitored to capture more subtle changes in neurological function [18,58], ideally without the need for active task participation on the part of these individuals given the typical severity of the phenotype.Developing such measures would provide surrogate biomarkers of disease severity and potentially provide precise measurement of target engagement and longitudinal evaluations of treatment effects during clinical trials [18,58].
To this end, a number of research groups have now deployed electroencephalographic (EEG) recordings as a means to directly measure brain function in neurodevelopmental disorders [4,8,13,22,35,51,55,64]. EEG provides an easy-to-deploy method to assay neurodevelopmental regression in the absence of overt behavioral responses from participants (e.g.[4,18,51,61]).The millisecond-precision of this tool is ideal for assessment of dynamic brain function and can be used to determine the processing level at which information flow is breaking down [21,47,61].This is achieved by assaying the latencies and amplitudes of well-characterized event-related potential (ERP) components, which have stereotypical topology and temporal dynamics in neurotypical populations, and have been well characterized in thousands of papers over the past 60 years [14,39,44,54,62,65,66].A high degree of test-retest reliability is also a feature of this method, making it ideal for longitudinal monitoring of intervention trials [5,34,42].However, a central assumption of this methodology is stationarity of response -that is, that when a stimulus is presented repeatedly to a participant, the neural response on each iteration (or trial) is assumed to be essentially identical, whereby the simple process of signal-averaging across trials will reveal this stationary canonical response because temporally random background activity (noise) will be eliminated through the averaging procedure [30,39,54].While this is perhaps not an unreasonable assumption in studies of neurotypical individuals, it may not be fully justified to assume that near-perfect stationarity is a feature of sensory perceptual processing in neurodevelopmental and neuropsychiatric conditions.For example, a number of researchers have proposed that the neural response in autism spectrum disorder (ASD) may be more variable, or unreliable, on a trial-to-trial basis [28,29,46] (but see [13,19,36]).That is, it could be the case that an evoked response is produced to each stimulus iteration in these conditions, but that from trial to trial, this evoked response might vary in the latencies and amplitudes of the canonical components.In such a situation, signal averaging will have the same impact on these "signals" as it does on the background noise -that is, they will tend to reduce towards zero.
To date, most ERP studies in Rett have shown highly disordered sensory responses, both in audition [3,12,20,33,53,57,67] and vision [2,31,38,45,60,63], but to our knowledge, all of these previous studies, including those from our own research group, rely on the standard signalaveraging approach.Here, we are interested in determining whether a higher degree of variability, both at the individual participant and at the group level, might be a factor in the reduced and delayed ERP responses typically reported in RTT.This is important, because it can have significant implications for the use of the standard ERP as a neuromarker in clinical trials, and it is plausible that functionality at the individual participant level is being obscured by the averaging technique.This is also the case at the group level, where idiosyncratic ERP morphologies and timings at the individual level, when averaged together across the group, could potentially give the impression that the group has a much greater overall deficit than is actually the case.
We set out to test what we have termed the "unreliability account" by measuring the coherence of the auditory evoked potential (AEP) in both RTT and neurotypical (NT) age-matched controls at the single trial level, with an eye to more deeply characterizing potential auditory functionality in RTT.We recorded the AEP from a 72-channel montage in response to simple 1 kHz pure tones at three different stimulus-onset-asynchronies (SOA's: 450, 900 and 1800 ms, see [12], in which standard group analyses of the averaged ERP focused on the mismatch negativity (MMN) response is reported for the same dataset).The relatively large numbers of trials presented to each participant (850 per condition, see Table S2) allowed for in-depth analysis at the single trial level, which is necessary for measures of inter-trial (i.e.intra-participant) variability with high statistical power.To measure inter-trial reliability of the auditory responses, we applied a number of relevant approaches, calculating inter-trial-variability (ITV), signal-to-noise estimates (SNR), and inter-trail phase coherence (ITPC) at the individual participant level.We additionally sought to better understand the inter-participant variability that may derive from combining participants across various stages of disease severity by comparing homogeneity of the AEP between the groups.
Another consideration when making EEG/ERP recordings in clinical populations is baseline differences in non-neural sources of noise, such as those produced by muscle or movement artifacts [25], which can also serve to reduce the reliability of estimations of neural activity and potentially lead to overestimation of inter-group differences.To this end, we applied data denoising source separation (DSS) to separate temporally coupled signal carrying components from temporally decoupled activity [16,26,59], and compared all of the above measures post-compared to pre-DSS signal derivation.

Participants
Data were analyzed from 25 females with confirmed MECP2 mutations and 30 typically developing controls (TD) (20 females and 10 males).Participants with RTT were recruited through the Rett Syndrome Center of Montefiore Children's Hospital in the Bronx, NY, while TD participants were recruited from the local community.Seven datasets from the RTT group and three from the TD group were excluded from further analysis due to noisy EEG data that resulted in less than 20% accepted trials per condition.The final sample contained 17 females with RTT (mean age: 12.6 ± 4.8, range 6-22) and 24 TDs (15 females and 9 males) (12.45 ± 4.9, range 6-26).There was no significant difference in age between the RTT and TD group (t (41) = 0.12, p = 0.9).
All participants with RTT underwent genetic testing and phenotypic assessment accompanied by detailed medical history questionnaires completed by their caregivers.Symptom severity in RTT was measured using the Rett Syndrome Severity Scale (RSSS) which is the primary scale used by the Rett Syndrome Center of Montefiore Children's Hospital [32,49].This clinician-rated scale represents an aggregate measure of the severity of clinical symptoms, including motor function, seizures, autonomic function, ambulation, eye contact, and communication [49].The RSSS score in the current RTT group ranged between 5 and 15 (Mean ± SD = 10.94 ± 2.8), with higher scores indicating more severe disease.For reference, composite scores in the range of 0-7 are considered to correspond to a mild phenotype, 8-14 to a moderate phenotype, and 15-21 to severe features [32].
TDs were excluded if they had a family history of a neurodevelopmental disorder or any neurological/psychiatric disorders.All individuals in the TD group passed a hearing screen on the day of EEG testing.A limitation of the current study is that hearing acuity could not be similarly assessed in participants with RTT.However, in all cases, parents reported that the children with RTT could hear, and this was confirmed by clinical observation.Furthermore, participants with RTT were excluded if they had evidence of ear infection on the day of EEG acquisition.Tympanometry was performed on all participants to rule out middle-ear involvement, and Type-A tympanograms were observed in all cases.Clinical demographic information, including RSSS severity scores, ages of onset and regression, and medication of all participants, are listed in supplementary materials (Table S1; Clinical Demographics).There were no differences in age-range or RSSS scores between the seven excluded RTT datasets, and those included in the final analysis.
All aspects of the research conformed to the tenets of the Declaration of Helsinki.The institutional review boards of the University of Rochester and the Albert Einstein College of Medicine approved this study.Written informed consent was obtained from parents or legal guardians, and where possible, informed assent from the participants was obtained.Participants were compensated at a rate of $15/hour for their time.

Experimental design, procedure and stimuli
Experimental design, procedures and stimuli were identical to those described in an earlier report from this dataset [12] and have been purposefully deployed in a number of other rare disease populations to allow for comparisons across phenotypes [11,22,23,24]; See Fig. S1 for a paradigm schematics).We presented a simple auditory mismatch-negativity (MMN) paradigm while recording high-density EEG (72 channels).All participants sat in a sound-attenuated and electrically shielded booth (Industrial Acoustics Company, Bronx, New York) on a caregiver's lap or in a chair/wheelchair.They watched a muted movie of their choice on a laptop (Dell Latitude E640) while passively listening to auditory stimuli presented at an intensity of 75 dB SPL using a pair of Etymotic insert earphones (Etymotic Research, Inc., Elk Grove Village, IL, USA).The MMN paradigm consisted of regularly (85%) occurring standard tones that were randomly (15%) interspersed with deviant tones, with the constraint that two deviant tones never occurred in succession.These tones had a frequency of 1000 Hz with a rise and fall time of 10 ms.Standard tones had duration of 100 ms while deviant tones were 180 ms in duration.The responses to the deviant tones were reported in our earlier paper which concentrated on the MMN response [12], and will not be discussed or analyzed further here.The tones were presented in three separate conditions with stimulus onset asynchronies (SOAs) of 450, 900 or 1800 ms (corresponding to 2.2, 1.1 and 0.55 Hz, respectively).These SOA conditions were presented in separate blocks, with each block consisting of 500, 250 or 125 trials respectively (Fig. S1A).Participants were presented with 14 blocks altogether (2 × 450 ms, 4 × 900 ms and 8 × 1800 ms), resulting in 1000 trials per condition.Only the responses to the standard 100 ms tones (N = 850 per condition) are analyzed here.

EEG acquisition
A Biosemi ActiveTwo (Bio Semi B.V., Amsterdam, Netherlands) 72-electrode array was used to record continuous EEG signals.The setup includes an analog-to digital converter, and fiber-optic pass-through to a dedicated acquisition computer (digitized at 512 Hz; DC-to-150 Hz pass-band).EEG data were referenced to an active common mode sense (CMS) electrode and a passive driven right leg (DRL) electrode.

Data processing
EEG data were processed and analyzed offline using custom scripts that included functions from the EEGLAB Toolbox for MATLAB (the MathWorks, Natick, MA, USA) [17] and the FieldTrip Toolbox for MATLAB [50].EEG data were initially filtered using a Chebyshev Type II filter between 1 and 40 Hz, with the following parameters: highpass filter: stopband at 0.1 Hz, passband at 1 Hz, attenuation: 65 dB.Lowpass filter: stopband at 40 Hz, passband at 35 Hz, attenuation: 65 dB.Continuous EEG data were subjected to a channel rejection algorithm, which identified bad channels using measures of standard deviation and covariance with neighboring channels.Rejected channels were interpolated using the EEGLAB spherical interpolation.For all statistical analyses, data were epoched to 2 s segments: from 1 s pre-tone onset to 1 s post-tone onset.Trials with artifacts of ± 150 µV were excluded from further analysis.For the remaining trials, the threshold was set at two standard deviations over the mean of the maximum values for each epoch, to exclude any remaining artifact contaminated trials.The number of accepted trials for each SOA condition and group is presented in Table S2.To maximize AEP amplitudes at the fronto-central scalp sites where analyses were carried out, data were referenced to TP7 (or TP8 if TP7 was noisy), a temporo-parietal site below the Sylvian fissure where the auditory response tends to invert relative to fronto-central sites.

Data analysis
All analyses were performed on data averaged from three electrodes over fronto-central scalp (FC3, FCz and FC4).A multipronged approach was taken to analyzing the data.1) In accord with conventional ERP analyses, we tested for group level differences in the amplitude of the AEP using standard analyses of variance, for two time-windows corresponding to the two major deflections in the AEP: Average amplitudes were calculated for each participant for each group and for each SOA for the P1 (50-100 ms) and N2 (200-300 ms) timeframes.
2) Another set of analyses focused on measuring withinsubject variability and comparing this across groups.For this, linear mixed effects models were applied to both regular and denoised data (as described below), with analyses on data from the N2 (200-300 ms) window, where response amplitude was greatest.Three metrics of within-subject variability were tested:

Signal-to-Noise Ratio (SNR)
SNR was measured across trials, for each individual in each group and for each SOA condition using a shuffling method: Signal was calculated as mean amplitude in the 200-300 ms time window, and Noise was defined as mean amplitude for the same window, with every other trial flipped in polarity (i.e., multiplied by -1) to remove the stationary response (i.e., the evoked potential).

Inter-trial Variability (ITV)
ITV was calculated as the mean of the deviations of the individual trials from the average AEP (standard deviations), in the 200 to 300 ms window.

Inter-trial phase coherence (ITPC)
To quantify the consistency of phase of the auditory response across trials, ITPC was calculated as the circular coherence of phases across trials for 2-s epochs centered on stimulus onset for each individual participant, trial, and SOA condition.
ITPC was calculated as follows [41]: where θ ntf is the phase at temporal bin t and frequency bin f, in trial n.Output values range between 0 (no phase coherence) to 1 (perfect phase coherence).Morlet wavelet convolution was used on the 2-s epochs.Wavelets were composed of Gaussians that ranged from 3 to 5 cycles.For visualization, ITPC is averaged across participants and presented for each group and condition, preand post DSS.The parameter that is used for statistical analysis is the maximal ITPC value across frequencies, calculated on 200-300 ms window post stimulus onset.

Denoising Source Separation (DSS)
Recordings of EEG signals inherently contain both stimulus-driven responses and stimulus irrelevant responses/ noise [15,16].In order to extract components that are directly related to auditory stimulus evoked activity, we employed dimensionality reduction through the Denoising Source Separation (DSS) algorithm [59,16].DSS decomposes multi-channel EEG recordings to extract neural response components that are consistent across trials and has been demonstrated to be effective in denoising auditory evoked activity [16].This denoising technique is based on a blind source separation that removes stimulusunrelated components from stimulus-related components through a spatial filter.These spatial filters are linear combinations of the sensors designed to partition data into signal carrying components of interest and non-signal carrying components [16].In this study, DSS was performed on the 2-s-long epochs for each subject and each of the three conditions independently (presented here as Pre-DSS signals).After data from all channels were normalized, they were submitted to principal component analysis (PCA).This yielded a time series matrix, ordered by decreasing bias scores, that is partitioned to signal and noise components.Based on SNR calculation, it was determined that the first two DSS components contributed significantly and were optimal.These components were retained and projected back to sensor space to obtain the denoised EEG data (referred to hereafter as Post-DSS signals), which denote denoised auditory responses throughout this paper.

Analysis of Variance (ANOVA)
In line with the standard approach for analyzing ERP components [12,67], we employed repeated measures ANOVA with SOA (450, 900, and 1800 ms) as a within participant factor and Group (RTT vs. TD) as a between participant factor.This analysis was conducted on data from fronto-central electrodes (FC3, FCz and FC4), for the aforementioned time windows corresponding to the major deflections of the AEP (P1 and N2).

Linear mixed effects models (LME)
For subsequent analyses, to account for random and fixed effects, including differences in neuronal variability between participants, we implemented LME on the dependent measures from the different analyses.The fitlme MATLAB function was used.Advantages over the standard ANOVA approach have been previously detailed [37,40,68].Mixed-effects models account for multiple comparisons and interactions.Condition and Group were used across all models as fixed effects.Participants were treated as random factors according to the following linear-model expression: and for analyses involving within-subject analyses we used where EEG stands for the AEP amplitude values, SOA condition corresponds to the three stimulus presentation intervals (450 ms, 900 ms, and 1800 ms), and Group corresponds to the control and RTT cohorts.

Wilcoxon rank test
We used non-parametric testing to assess the presence of group effects for the following measures: AEP, SNR and ITV, in both the pre-DSS and post-DSS data, separately.This was done to have a first estimation of the differences between RTT and controls in any of these measures, prior to applying the advanced lme models.

Cluster-based permutation
Cluster-based permutation statistics were used to assess significant modulations across groups and conditions and were computed as a function of channels*time [43,50].These univariate tests were performed by means of dependent samples t-tests (p < 0.5 two sided), and cluster-based permutation tests (based on a minimum of 2 channels), to control for multiple comparisons.The significance of the observed cluster-level statistic (based on the t values within the cluster) was assessed by comparison to the distribution of all permutation-based cluster-level statistics.The final cluster p value that we report in all figures was assessed as the proportion of 2000 Monte Carlo iterations in which the cluster-level statistic was exceeded.Cluster significance was indicated by p values below 0.025 (two-sided cluster significance threshold).

Standard AEP analysis
In Fig. 1, the standard "stationary" AEP is plotted for each individual in both the TD (left column) and RTT (right column) groups, and mean standard amplitude for the three different SOA conditions is shown in the three rows (Panels A to C) [56].Panel D shows the group-averaged waveforms over-plotted for each of the SOAs.Note that the displayed AEPs represent an average of activity from the fronto-central electrode chain (FC3, FCz and FC4) as modeled in Panel E. In Panel A (the 450 ms SOA), one can readily appreciate the general morphology of the AEP and the relative consistency across individuals in the TD group (left column), with a clear P1 in the initial response (at 50-100 ms; blue shaded timeframe), followed by a second smaller positive deflection (P2, ~ 150 ms) and then by a longer latency negativity (between 200-300 ms: N2).The dark green trace at the bottom of Panel A shows the group-averaged TD waveform, with standard error of the mean also indicated.Despite the general consistency of the individual participant waveforms in Panel A, it can also be appreciated that even in this TD cohort, there is a high degree of inter-participant variability.This is undoubtedly enhanced by the wide age-range of our cohort, since the morphology of the AEP changes over the course of development.In the right column of Panel, A, the same over-plotting has been conducted for participants in the RTT cohort.Here, one can appreciate that the individual traces are considerably more divergent from each other, and this is reflected in the substantially reduced amplitude of the group averaged RTT waveform (red trace, bottom right of Panel A), where only a highly reduced P1 component is evident.Nonetheless, one can also appreciate that there are individuals in the RTT cohort who are producing waveforms with large amplitude positive and negative deflections that may reflect preserved auditory processing, albeit with different temporal dynamics to those seen in TD participants.Similar patterns are also evident in Panels B and C at the two slower presentation rates (900 ms SOA, panel B and 1800 ms SOA, Panel C).As the rate of presentation is slowed, the second P2 positivity emerges more clearly in the TD cohort, whereas this is not the case in the RTT cohort.At both of these slower rates, AEP responses in the TD cohort are clearly more consistent across individuals than those seen in the RTT cohort.
Figure 2 shows data from four participants from each cohort randomly selected and age-matched to illustrate one of the central points of the current work.Note that panels A through D show data from children at four different age brackets (6-7, 8-9, 10-12 and 14-16 years respectively).One can readily appreciate that each of the four control participants produces highly replicable AEPs across each SOA condition -that is, there is a high degree of within-participant consistency across conditions, but also a high degree of between-participant consistency in terms of component timing and morphology, despite the relatively wide span of ages represented.In most cases, a positive deflection at about 100 ms (the P1) is followed by a broad negative deflection at about 200 ms (here referred to as N2); see grey and pink shading for timeframes used for analyses of the P1 (50-100 ms) and N2 (200-300 ms), respectively.In the data from participants with RTT, one can also see that AEP responses, albeit noisier (see displayed standard error of the mean (SEM) shading around the waveform traces), are highly replicable across conditions within-participant, whereas the timing and morphology of the responses are evidently not as consistent across RTT participants as they are in TDs.

Denoising Source Separation (DSS)
We applied the DSS technique to enhance the signal components carrying evoked activity that is reproducible across trials, with an eye to enhancing the signal to noise ratio of the AEP in the RETT data.DSS achieves this by accentuating signals which are consistent across trials while suppressing noise-like components that are independent of stimulus timing [59,16]).
Figure 3 displays ERP trials pre-and post-DSS, for each group and SOA condition.Using the DSS denoised data, we replicated the initial AEP analyses, over the same fixed N2 time-window (200-300 ms) and averaged across the same channels (FC3, Fz, FC4), to further explore any difference that might be revealed using DSS (i.e., Post-DSS data).AEP values for N2 pre and post DSS are shown in Fig. 4. Pre-DSS, TD individuals had significantly greater AEP N2 amplitudes compared to RTT (RTT: Mean ± sem = -0.17overall reduction in ITPC for the RTT compared to the TD group, and ITPC was higher overall for the data post-DSS.Figure 6A shows the ITPC dynamics along the 2-s epochs, for each time point and frequency sample (1-25 Hz).As seen in Fig. 6A, lower ITPC for RTT is seen across conditions around times of stimulus onset.To assess the degree of ITPC, we calculated ITPC on the N2 window for each participant (see Fig. 6B,C).Pre-DSS, higher ITPC was observed for TD (Mean ± sem = 0.14 ± 0.002) than RTT (Mean ± sem = 0.103 ± 0.002; Wilcoxon rank test statistic: 5207; p = 1.37e-04).Post-DSS, this ITPC difference became stronger (TD: Mean ± sem = 0.26 ± 0.15; RTT: Mean ± sem = 0.15 ± 0.03; Wilcoxon rank test statistic = 5506; p = 8.82e-08) (see Fig. 6B and C for a group comparison, and DSS comparison, respectively).

Discussion
Event-related potential recordings provide a simple, highly portable and relatively inexpensive means of directly and objectively recording neural processing outcomes from human subjects, even in patient populations where task compliance or the following of instructions is compromised or infeasible.Considerable work has now shown that measures of auditory evoked potentials (AEPs) are highly disordered in Rett syndrome [3,12,20,33,53,57], and that these measures are correlated with clinical measures of disease severity [67].As a direct measure of neural function, therefore, these AEP measures hold much promise as neuromarkers against which the effectiveness Fig. 6 Inter-Trial phase coherence (ITPC) for each group and condition pre-and post-DSS.A time-frequency ITPC plots show a reduction of coherence values in the RTT group across conditions, following stimulus onset.The average ITPC across frequencies is overlaid in white, on top of each plot.B Raincloud plots of ITPC peak values derived from each participants (as seen in white in 4A), for each condition, pre-and post DSS.C Same ITPC values shown in 4B, but now plotted within group for each condition, and compared between pre-and post DSS of therapeutic interventions could be measured.It is a reasonable proposition that such measures are much closer to the site of action in pharmacological or gene therapy interventions and might be expected to show treatmentrelated changes much sooner than clinical observational measures that rely on changes in symptomatology or behavioral outcomes.These latter changes would be expected to emerge over relatively long timeframes, secondary to improvements in neural functioning.However, standard AEP/ERP signal processing techniques, which typically involve the averaging of multiple responses across trials, introduce significant risk of obscuring variability across individual neural responses, and could lead to overestimation of the extent of processing deficit that is actually present in a given individual.Here, the fact that there were three AEP conditions (i.e. three different inter-stimulus intervals were used in separate experimental blocks) allowed for a within-subject comparison across these conditions.What becomes clear upon simple visual inspection is that highly anomalous component morphology is common in RTT (see Figs. 1 and 2), such if an experimenter were to observe just one of these averaged AEPs for a given RTT participant, the presence of a response might be questionable.However, it is also clear from visual inspection that such anomalous morphologies are generally consistent across all three conditions -that is, similar appearing AEP responses are evident in most RTT participants.This is in contrast to the TD participants where for the most part, typical AEP morphology is observed.Put another way, in TD participants, there is a "central tendency" whereas in RTT participants, there is a tendency towards highly individualized responses.As mentioned in the Introduction, this has significant impact on group-averaged comparisons.Whereas the central tendency of the AEP in TDs will lead to a robust groupaveraged waveform, the highly variable individual morphologies expressed in RTT will, by definition, lead to a weak group-averaged estimation.The highly idiosyncratic processing within the RTT group likely reflects disruption of typical processing along the auditory cortical processing hierarchy that does not manifest across patients in a stereotyped way.
Here, we set out to better understand this potential response variability by deploying a set of signal-processing tools at the single trial level, including denoising source separation, inter-trail variability estimation, inter-trial phase coherence measures, and estimates of signal-to-noise ratios.First, using the standard canonical component-based ERP analysis techniques, the current work replicated previous outcomes of strikingly atypical group level AEPs in individuals with RTT, which were evident at each of the three stimulation rates used (Figs. 1  and 2) [12], differences that were manifest as substantially reduced AEP component amplitudes in RTT versus TD [57,67].In turn, taking advantage of the large number of trials per condition that were recorded in this study, we assessed inter-trial response variability to the auditory stimuli at the individual participant level.Across all metrics: signal-to-noise ratio (SNR), inter-trial variability (ITV) (see Fig. 5), and inter-trial phase coherence (ITPC: Fig. 6), significantly higher levels of response variability were observed in RTT compared to the TD participants.
Another possible confounding factor when comparing RTT, or indeed any clincial population, to TD controls using EEG meaures is that neural repsonses might be obscured by excessive non-neural noise (e.g.movement artifacts, muscle tension or flexion noise, excessive eye-movements or blinking, bruxism, etc.).To mitigate such influences, we also applied the denoising source separation (DSS) algorithm, a joint-decorrelation technique that suppresses the most prominent non-neural noise sources, and preserves the activity of interest [15].The previous set of analyses were then repeated using these DSS accentuated signals.Clear improvements in the signal post-DSS were observed for both groups, but especially so for the RTT group, such that large betweengroups differences in the amplitude of the N2 AEP component and measures of inter-trial variability (ITV) that were observed prior to denoising, were no longer statistically detectable following application of the technique.However, post-DSS measures of SNR remained significantly lower in the RTT group, and differences between RTT and TD in ITPC were even more robust following DSS.Taken together, these analyses make clear that non-neural sources of noise very likely contribute to overestimation of the extent of AEP deficits in RTT but that clear deficits remain detectable following denoising that minimizes the contribution of non-neural noise to response estimates.This suggests that application of DSS should likely be a facet of any signal-processing pipeline designed to test neural information processing in individuals with rare diseases like RTT.
The SNR calculations pre-versus post-denoising are highly instructive in this regard, demonstrating clear and rather dramatic effects of applying DSS.In the case of the RTT group, SNR across conditions increased from 7.8 to 30.1, representing a massive 3.9-fold increase in signal estimation.SNR did also increase in the TD group, but by a more modest amount (26.1 to 41.4), a 1.6-fold increase.While SNR in RTT remained significantly lower than that found in TD, it is clear that pre-denoising, this difference was substantially overestimated and suggested a much greater deficit than is likely the case.Non-evoked potential noise is therefore a major source of potentially confounding variance in inter-group comparisons concerning RTT individuals and should be a consideration in all studies assessing differences between rare-disease clinical groups and neurotypical control populations.Similarly, AEP peak voltage variability, assessed by ITV, improved post-DSS in both groups, but more so in RTT (a 4.5-fold decrease, from 7.76 to 1.75µv) than TD (a 2.3fold decrease, from 3.50 to 1.54 µv).This time, group differences in ITV were, in fact, no longer statistically detectable post-DSS.Lastly, in the case of ITPC, DSS also significantly improved these estimates in both populations.In the TD group, ITPC estimate pre-DSS was 0.14 but this improved to 0.26 following denoising, whereas in the RTT group the improvement was more modest (0.10 to 0.15).For both pre-and post-DSS estimates, the difference between TD and RTT participants was statistically robust.
Thus, while denoising substantially improves SNR in RTT and leads to lower estimates of inter-trial variability both in terms of response amplitudes and phase, substantial deficits remained in the RTT group in SNR and ITPC measures of response variability, whereas this was not the case for ITV.In summary, while previous results showed robust evoked-response atypicalities using the standard component-based ERP approach, the present work suggests that the addition of measures that assess response variability can add significant insight into putative dysfunction and may well provide more sensitive biomarkers for assessment of treatment effects on neural function.That ITPC is found to be significantly lower in RTT, even post-DSS, provides at least partial support for a neural unreliability account of auditory processing deficits in this population, although lower SNR estimates and idiosyncratic temporal evolution of the AEP also suggest that sensory processing is both attenuated and temporally disrupted, and that the differences between RTT and TD are not wholly accounted for by "unreliable" responsivity.

Study limitations
Auditory responses continue to mature with typical development [7,9,10], and as such, our relatively wide participant age range (7 to 22 years of age) is a limiting factor.Furthermore, the number of usable RTT data sets was reduced from 25 to 17 due to excessively noisy EEG data and an insufficient number of accepted trials per condition.It will be key to develop better methods to capture adequate EEG data in these difficult-to-test populations, as a 68% success rate will not be adequate if such measures are to be fully useful as outcomes in clinical trials.It is also the case that the limited RTT sample precludes the possibility to meaningfully examine mutation subtype in this cohort due to the lack of sufficient power.Neither were we able to consider potential differences as a function of classic versus atypical Rett phenotype.Both of these distinctions will be of great interest as this work progresses.Another limitation is, as can be seen in Table S2, the number of accepted trials in Rett individuals is lower than that of TDs, likely due to nonneural sources of noise, such as oculomotor and muscle movements.While the source of such noise sources cannot be determined from the EEG signal alone, future work using an integrated video-EEG monitoring system to directly assess the relationship between overt movement and EEG activity could address this limitation.

Conclusions
This study deployed in-depth analysis of auditory evoked response variability to assess the contribution of the degree of response variability (unreliability) to altered auditory processing in RTT.We replicated previous outcomes of atypical AEP morphologies and significantly reduced AEP amplitudes in Rett Syndrome using standard component-based ERP analysis.Using metrics that specifically measured neuronal variability, we observed substantially increased inter-trial variability, lower signalto-noise ratios, and reduced inter-trial phase coherence in the auditory responses of RTT participants, providing strong support for a "neural unreliability" account in this population.However, deployment of denoising source separation (DSS) techniques painted a somewhat different picture, making it clear that non-neural sources of noise are a likely contributor to overestimation of the extent of auditory processing deficits in this population.Post-DSS, ITV measures were substantially reduced, so much so that pre-DSS ITV differences between RTT and TD populations were no longer detected.In the case of SNR and ITPC, DSS substantially improved these estimates in the RTT population, but robust differences between RTT and TD were still fully evident.This work strongly suggests that employing DSS techniques will provide much better estimates of veridical sensory-perceptual processing abilities in rare disease populations such as RTT, or in any other population where a high degree of non-neural noise and high inter-individual variability are expected to be major contributors.

Fig. 1
Fig. 1 Standard Mean AEP (1 s epochs) for TDs (left) and RTT (right) over fronto-central scalp (averaged over electrodes FC3, FCz, FC4).Panels (A-C) shows colored traces representing an average of all trials in response to standard tones for each participant and their grand average AEP (green for TD and red for RTT trace with black traces -standard deviation) for all SOA conditions.TDs produced classic AEP waveforms while the RTT group exhibited atypical responses with reduced AEP amplitude across SOAs.A clear initial peak (P1) within the time period from 50 to 100 -blue shaded panels was present for all SOAs in both groups.Distribution of mean standard amplitude and quartiles are plotted at the far right in panels (A-C) for TD (green) and RTT (red) during the period of initial peak (from 50-100 ms) across SOAs.Significant difference between the groups is marked by asterisk (for the 450 (p = 0.05), 900 ms (p = 0.80) and 1800 ms (p = 0.04) SOAs).Panel (D) shows change in AEP morphology as a function of SOA seen in the control and RTT group.Panel (E) illustrates the locations of the averaged fronto-cental scalp electors that yielded the AEPs