Author response: Temporally delayed linear modelling (TDLM) measures replay in both animals and humans Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.7554/elife.66917.sa2
· OA: W3183650584
Article Figures and data Abstract Introduction Results Discussion Materials and methods Appendix 1 Appendix 2 Appendix 3 Appendix 4 Appendix 5 Data availability References Decision letter Author response Article and author information Metrics Abstract There are rich structures in off-task neural activity which are hypothesized to reflect fundamental computations across a broad spectrum of cognitive functions. Here, we develop an analysis toolkit – temporal delayed linear modelling (TDLM) – for analysing such activity. TDLM is a domain-general method for finding neural sequences that respect a pre-specified transition graph. It combines nonlinear classification and linear temporal modelling to test for statistical regularities in sequences of task-related reactivations. TDLM is developed on the non-invasive neuroimaging data and is designed to take care of confounds and maximize sequence detection ability. Notably, as a linear framework, TDLM can be easily extended, without loss of generality, to capture rodent replay in electrophysiology, including in continuous spaces, as well as addressing second-order inference questions, for example, its temporal and spatial varying pattern. We hope TDLM will advance a deeper understanding of neural computation and promote a richer convergence between animal and human neuroscience. Introduction Human neuroscience has made remarkable progress in detailing the relationship between the representations of different stimuli during task performance (Haxby et al., 2014; Kriegeskorte et al., 2008; Barron et al., 2016). At the same time, it is increasingly clear that resting, off-task, brain activities are structurally rich (Smith et al., 2009; Tavor et al., 2016). An ability to study spontaneous activity with respect to task-related representation is important for understanding cognitive process beyond current sensation (Higgins et al., 2021). However, unlike the case for task-based activity, little attention has been given to techniques that can measure representational content of resting brain activity in humans. Unlike human neuroscience, representational content of resting activity is studied extensively in animal neuroscience. One seminal example is 'hippocampal replay' (Wilson and McNaughton, 1994; Skaggs and McNaughton, 1996; Louie and Wilson, 2001; Lee and Wilson, 2002): during sleep, and quiet wakefulness, place cells in the hippocampus (that signal self-location during periods of activity) spontaneously recapitulate old, and explore new, trajectories through an environment. These internally generated sequences are hypothesized to reflect a fundamental feature of neural computation across tasks (Foster, 2017; Ólafsdóttir et al., 2018; Pfeiffer, 2020; Carr et al., 2011; Lisman et al., 2017). Numerous methods have been proposed to analyse hippocampal replay (Davidson et al., 2009; Grosmark and Buzsáki, 2016; Maboudi et al., 2018). However, they are not domain general in that they are designed to be most suited for specific needs, such as particular task design, data modality, or research question (van der Meer et al., 2020; Tingley and Peyrache, 2020). Most commonly, these methods apply to invasive electrophysiology signals, aiming to detect sequences in a linear track during spatial navigation task (Tingley and Peyrache, 2020). As a result, they cannot be directly adapted for analysing human resting activity collected using non-invasive neuroimaging techniques. Furthermore, in rodent neuroscience, it is non-trivial to adapt these algorithms to even small changes in tasks (such as 2D foraging). This may be a limiting factor in taking replay analyses to more interesting and complex tasks, such as complex mazes (Rosenberg et al., 2021). Here, we introduce temporal delayed linear modelling (TDLM), a domain-general analysis toolkit, for characterizing temporal structure of internally generated neural representations in rodent electrophysiology as well as human neuroimaging data. TDLM is inspired by existing replay detection methods (Skaggs and McNaughton, 1996; Davidson et al., 2009; Grosmark and Buzsáki, 2016), especially those analysis of population of replay events (Grosmark and Buzsáki, 2016). It is developed based on the general linear modelling (GLM) framework and can therefore easily accommodate testing of 'second-order' statistical questions (van der Meer et al., 2020), such as whether there is more forward than reverse replay, or is replay strength changing over time, or differs between behavioural conditions. This type of question is ubiquitous in cognitive studies, but is typically addressed ad hoc in other replay detection methods (van der Meer et al., 2020). In TDLM, such questions are treated naturally as linear contrasts of effects in a GLM. Here, we show TDLM is suited to measure the average amount of replay across many events (i.e. replay strength) in linear modelling. This makes it applicable to both rodent electrophysiology and human neuroimaging. Applying TDLM on non-invasive neuroimaging data in humans, we, and others, have shown it is possible to measure the average sequenceness (propensity for replay) in spontaneous neural representations (Wimmer et al., 2020; Nour et al., 2021; Liu et al., 2019; Liu et al., 2021a). The results resemble key characteristics found in rodent hippocampal replay and inform key computational principles of human cognition (Liu et al., 2019). In the following sections, we first introduce the logic and mechanics of TDLM in detail, followed by a careful treatment of its statistical inference procedure. We test TDLM in both simulation (see section 'Simulating MEG data') and real human MEG/EEG data (see section 'Human replay dataset'). We then turn to rodent electrophysiology and compare TDLM to existing rodent replay methods, extending TDLM to work on a continuous state space. Lastly, using our approach we re-analyse rodent electrophysiology data from Ólafsdóttir et al., 2016 (see section 'Rodent replay dataset') and show what TDLM can offer uniquely compared to existing methods in rodent replay analysis. To summarize, TDLM is a general, and flexible, tool for measuring neural sequences. It facilitates cross-species investigations by linking large-scale measurements in humans to single-neuron measurements in non-human species. It provides a powerful tool for revealing abstract cognitive processes that extend beyond sensory representation, potentially opening doors for new avenues of research in cognitive science. Results Temporal delayed linear modelling Overview of TDLM Our primary goal is to test for temporal structure of neural representations in humans. However, to facilitate cross-species investigation (Barron et al., 2021), we also want to extend this method to enable measurement of sequences in other species (e.g. rodents). Consequently, this sequence detection method has to be domain general. We chose to measure sequences in a decoded state space (e.g. posterior estimated locations in rodents [Grosmark and Buzsáki, 2016] or time course of task-related reactivations in humans [Liu et al., 2019]) as this makes results from different data types comparable. Ideally, a general sequence detection method should (1) uncover structural regularities in the reactivation of neural activity, (2) control for confounds that are not of interest, and (3) test whether this regularity conforms to a hypothesized structure. To achieve these goals, we developed the method under a GLM framework, and henceforth refer to it as temporal delayed linear modelling, that is, TDLM. Although TDLM works on a decoded state space, it still needs to take account of confounds inherent in the data where the state space is decoded from. This is a main focus of TDLM. The starting point of TDLM is a set of n time series, each corresponding to a decoded neural representation of a task variable of interest. This is what we call the state space, X, with dimension of time by states. These time series could themselves be obtained in several ways, described in detail in a later section ('Getting the states'). The aim of TDLM is to identify task-related regularities in sequences of these representations. Consider, for example, a task in which participants have been trained such that n = 4 distinct sensory objects (A, B, C, and D) appear in a consistent order :A→B→C→D (Figure 1a, b). If we are interested in replay of this sequence during subsequent resting periods (Figure 1c, d), we might want to ask statistical questions of the following form: 'Does the existence of a neural representation of A, at time T, predict the occurrence of a representation of B at time T+ ∆t?' and similarly for B→C and C→D. Figure 1 with 1 supplement see all Download asset Open asset Task design and illustration of temporal delayed linear modelling (TDLM). (a) Task design in both simulation and real MEG data. Assuming there is one sequence, A->B->C->D, indicated by the four objects at the top. During the task, participants are shown the objects and asked to figure out a correct sequence for these objects while undergoing MEG scanning. A snapshot of MEG data is shown below. It is a matrix with dimensions of sensors by time. (b) The transitions of interest are shown, with the red and blue entries indicating transitions in the forward and backward direction, respectively. (c) The first step of TDLM is to construct decoding models of states from task data, and (d) then transform the data (e.g. resting-state) from sensor space to the state space. TDLM works on the decoded state space throughout. (e) The second step of TDLM is to quantify the temporal structure of the decoded states using multiple linear regressions. The first-level general linear modelling (GLM) results in a state*state regression coefficient matrix (empirical transition matrix), β, at each time lag. (f) In the second-level GLM, this coefficient matrix is projected onto the hypothesized transition matrix (black entries) to give a single measure of sequenceness. Repeating this process for the number of time lags of interest generates sequenceness over time lags (right panel). (g) The statistical significance of sequenceness is tested using a non-parametric state permutation test by randomly shuffling the transition matrix of interest (in grey). To control for multiple comparisons, the permutation threshold is defined as the 95th percentile of all shuffles on the maximum value over all tested time lags. (h) The second-level regressors Tauto, Tconst, TF, and TB, as well as two examples of the permuted transitions of interest, Tpermute(for constructing permutation test), are shown. In TDLM, we ask such questions using a two-step process. First, for each of the n2 possible pairs of variables Xi and Xj, we find the linear relation between the Xi time series and the ∆t-shifted Xj time series. These n2 relations comprise an empirical transition matrix, describing how likely each variable is to be succeeded at a lag of ∆t by each other variable (Figure 1e). Second, we linearly relate this empirical transition matrix with a task-related transition matrix of interest (Figure 1f). This produces a single number that characterizes the extent to which the neural data follow the transition matrix of interest, which we call 'sequenceness'. Finally, we repeat this entire process for all ∆t of interest, yielding a measure of sequenceness at each possible lag between variables and submit this for statistical inference (Figure 1g). Note that, for now, this approach decomposes a sequence (such as A→B→C→D) into its constituent transitions and sums the evidence for each transition. Therefore, it does not require that the transitions themselves are sequential: A→B and B→C could occur at unrelated times, so long as the within-pair time lag was the same. For interested readers, we address how to strengthen the inference by looking explicitly for longer sequences in Appendix 1: Multi-step sequences. Constructing the empirical transition matrix In order to find evidence for state-to-state transitions at some time lag ∆t, we could regress a time-lagged copy of one state, Xj, onto another, Xi (omitting residual term ε in all linear equations): (1) Xjt+∆t=Xitβij Instead, TDLM chooses to include all states in the same regression model for important reasons, detailed in section 'Moving to multiple linear regression': (2) Xjt+∆t=∑k=1nXktβkj In this equation, the values of all states Xk at time t are used in a single multilinear model to predict the value of the single state Xj at time t+∆t. The regression described in Equation 2 is performed once for each Xj, and these equations can be arranged in matrix form as follows: (3) X∆t=Xβ Each row of X is a time point, and each of the n columns is a state. X∆t is the same matrix as X, but with the rows shifted forwards in time by ∆t. βij is an estimate of the influence of Xit on Xjt+∆t. β is an n×n matrix of weights, which we call the empirical transition matrix. To obtain β, we invert Equation 3 by ordinary least squares regression: (4) β=(XTX)-1XTX∆t This inversion can be repeated for each possible time lag ( ∆t=1,2,3,…), resulting in a separate empirical transition matrix β at every time lag. We call this step the first-level sequence analysis. Testing the hypothesized transitions The first-level sequence analysis assesses evidence for all possible state-to-state transitions. The next step in TDLM is to test for the strength of a particular hypothesized sequence, specified as a transition matrix,T. Therefore, we construct another GLM which relates T to the empirical transition matrix, β. We call this step the second-level sequence analysis: (5) β=∑r=1rZ(r)∗Tr As noted above, β is the empirical transition matrix obtained from first-stage GLM. It has dimension of n by n, where n is the number of states. Each entry in β reflects the unique contribution of state i to state j at given time lag. Effectively, the above equation models this empirical transition matrix β as a weighted sum of prespecified template matrices, Tr. Thus, r is the number of regressors included in the second-stage GLM, and each scalar valued Z(r) is the weight assigned to the r th template matrix. Put in other words, Tr constitutes the regressors in the design matrix, each of which has a prespecified template structure, for example, Tauto, Tconst, TF, and TB (Figure 1h). TF and TB are the transpose of each other (e.g. red and blue entries in Figure 1b), indicating transitions of interest in forward and backward direction, respectively. In 1D physical space, TF and TB would be the shifted diagonal matrices with ones on the first upper and lower off diagonals. Tconst is a constant matrix that models away the average of all transitions, ensuring that any weight on TF and TB reflects its unique contribution. Tauto is the identity matrix. Tauto models self-transitions to control for autocorrelation (equivalently, we could simply omit the diagonal elements from the regression). Z is the weights of the second-level regression, which is a vector with dimension of 1 by r. Each entry in Z reflects the strength of the hypothesized transitions in the empirical ones, that is, sequenceness. Repeating the regression of Equation 5 at each time lag (Δt=1,2,3,…) results in time courses of the sequenceness as a function of time lag (e.g. the solid black line in Figure 1f). ZF, ZB are the forward and backward sequenceness, respectively (e.g. red and blue lines in Figure 1g). In many cases, ZF and ZB will be the final outputs of a TDLM analysis. However, it may sometimes also be useful to consider the quantity: (6) D=ZF−ZB D contrasts forward and backward sequences to give a measure that is positive if sequences occur mainly in a forward direction and negative if sequences occur mainly in a backward direction. This may be advantageous if, for example, ZF and ZB are correlated across subjects (due to factors such as subject engagement and measurement sensitivity). In this case, D may have lower cross-subject variance than either ZF or ZB as the subtraction removes common variance. Finally, to test for statistical significance, TDLM relies on a non-parametric permutation-based method. The null distribution is constructed by randomly shuffling the identities of the n states many times and re-calculating the second-level analysis for each shuffle (Figure 1g). This approach allows us to reject the null hypothesis that there is no relationship between the empirical transition matrix and the task-defined transition of interest. Note that there are many incorrect ways to perform permutations, which permute factors that are not exchangeable under the null hypothesis and therefore lead to false positives. We examine some of these later with simulations and real data. In some cases, it may be desirable to test slightly different hypotheses by using a different set of permutations; this is discussed later. If the time lag Δt at which neural sequences exist is not known a priori, then we must correct for multiple comparisons over all tested lags. This can be achieved by using the maximum ZF across all tested lags as the test statistic (see details in section 'Correcting for multiple comparisons'). If we choose this test statistic, then any values of ZF exceeding the 95th percentile of the null distribution can be treated as significant at α=0.05 (e.g. the grey dotted line in Figure 1g). TDLM steps in detail Getting the states As described above, the input to TDLM is a set of time series of decoded neural representations or states. Here, we provide different examples of specific state spaces (X, with dimension of time by states) that we have worked with using TDLM. States as sensory stimuli The simplest case, perhaps, is to define a state in terms of a neural representation of sensory stimuli, for example, face, house. To obtain their associated neural representation, we present these stimuli in a randomized order at the start of a task and record whole-brain neural activity using a non-invasive neuroimaging method, for example, Magnetoencephalography (MEG) or Electroencephalography (EEG). We then train a model to map the pattern of recorded neural activity to the presented image (Figure 1—figure supplement 1). This could be any of the multitude of available decoding models. For simplicity, we used a logistic regression model throughout. The states are defined in terms of neural activity. The are trained at For example, the stimuli are and the neural activity, we found that the of stimuli in MEG signal are consistent with those in For in a consistent with the as well as the for stimuli was between a place and a also known to to and for stimuli to a consistent with the for objects was in a consistent with an as well as an temporal that may relate to of are to The can be found at This is adapted from et al., In neural activity is recorded by multiple sensor on the The sensor record whole-brain neural activity at temporal To a the sequence is in we choose whole-brain sensor activity at a single time point (i.e. spatial as the data into Ideally, we would to a time point where the neural activity can be most This can be as the time point that the decoding If the state is defined by the sensory of stimuli, we can a to the ability of to to data of the same type at each time point (see Appendix 2 for its In this is whether the trained on this sensory feature can be used to the data of the same stimuli (Figure b). Figure 2 with 1 supplement see all Download asset Open asset different state (a) Assuming we have two abstract each abstract has two different sensory panel). The MEG/EEG data corresponding to each is a representation of sensory and abstract (right panel). The abstract can be as the common information in the of two (b) decoding models for The simplest state is defined by sensory To the time point for we can a on the neural activity. (c) decoding models for The state can also be defined as the To this we to a of sensory We can train the on the neural activity by one and test it on the other the same abstract If neural activity both a sensory and abstract then the information that can is the common abstract (d) The state can also be defined as the sequence we have the time point based on the we can train the decoding models based on the sensor data at this given time. us the data, with dimension of number of by number of The have dimension of by The aim is to obtain the weights, so that is the logistic we apply on the inference of weights will detail the in section we the data at testing time (e.g. during from sensor space to the decoded state where is the testing data, with dimension of time by and X is the decoded state space, with dimension of time by states. States as As well as sequences of sensory it is possible to for replay of more abstract neural representations. might be associated with the presented image (e.g. in which case analysis can as above by for (Wimmer et al., 2020). A more example, is where the to the sequence or In space, for example, cells spatial in a that over the sensory of any one and therefore can be across et al., In human studies, representations have been for the in a sequence (Liu et al., 2019; et al., For example, different sequences have representations for their second (Figure These representations also replay (Liu et al., 2019). However, to measure this replay we to train for these abstract representations. This a as it is not possible to the abstract representations in the of the examples the sensory is to that the are to the abstract than the sensory representations (see Appendix 2 for of time point for abstract include to across and ensuring the are to sensory representations (Figure supplement details in Liu et al., 2019). One that the of sensory is if the structural representations can be shown to sequence the subjects have their sensory (Liu et al., 2019). TDLM can also be used to ask questions the of different types of replay events (Figure This can provide for powerful the temporal of replay, such as the temporal structure between or the pattern of the same This more of TDLM its and is discussed in Appendix of sequences. confounds and in sequence detection Here, we the key of TDLM. Temporal In linear methods, temporal autocorrelation can statistical such as modelling are to these effects et al., and However, autocorrelation is a particular for analysis of where it with between the decoded neural To see consider a where we are testing for the sequence TDLM is interested in the between Xi and Xj (see Equation 1). if the Xi and Xj time series and are also correlated with one another, then will be correlated with the analysis will sequences. between states are representations of stimuli decoded from neuroimaging data. If these states are decoded using an one state to be decoded at each then the n states will be by the other if states are each a null state corresponding to the of stimuli, then the n states will typically be correlated with one Notably, in our case, these are between forward and backward one approach for is to the measure described above This works well as shown in et al., However, a is it us from measuring forward and backward sequences The of this section that for measurement of forward and backward sequences. to multiple linear regression The above are a linear relationship between and if we we can the by simply for it in a linear regression, as in we not have to the of X these variables have been decoded from brain activity. in but not will that the control for autocorrelation is to weight on and therefore inference of sequences. This cannot be without a estimate of X, but it can be It out that the is We not but what if we a that included estimated If we control for that we would be on We can and to this by including that are themselves correlated with estimated with different from The most approach is to include the other states of each of which has different to the multiple linear regression of Equation Figure this method to the same data structure false in the linear regression of Equation and by the same so in This is based on a et al., 2018; et al., cannot for sequenceness in forward and backward but have to on their The multiple regression for the structure of the data and allows correct inference to be Unlike the subtraction method proposed above (Figure the multiple regression separate inference on forward and backward sequences. Figure 3 Download asset Open asset of spatial and on temporal delayed linear modelling (TDLM). (a) linear regression or approach relies on an of forward and backward subtraction is panel). TDLM relies on multiple linear TDLM can forward and backward transitions (right panel). (b) as during can of sequence detection and in TDLM the signal (right panel). (c) The spatial between the sensor weights of for each state the of sequence This that between states is important for sequence analysis. (d) null data to the set the of sequence