Lukas Sönning
YOU?
Author Swipe
View article: Reproducibility, replication, and preregistration
Reproducibility, replication, and preregistration Open
The replication crisis has raised our awareness that empirical findings may be much less robust than they seem; indeed, some highly influential psycholinguistic studies have seen failures to replicate. This shows that there are fundamental…
View article: Down-sampling strategies in corpus phonology
Down-sampling strategies in corpus phonology Open
Corpus-based work in segmental phonology is often forced to down-size the pool of relevant tokens to a manageable subset. The standard approach in corpus software is to select a random sample of observations. This is inefficient in corpus …
View article: Case-control down-sampling in corpus research
Case-control down-sampling in corpus research Open
When corpus researchers are forced to down-size their data, different down-sampling techniques can be used to increase the amount of information in the subset of corpus hits. In alternation studies, one strategy is the selection of tokens …
View article: An analysis of bar chart usage in corpus data visualization
An analysis of bar chart usage in corpus data visualization Open
A recent survey of graph usage in corpus-based research has shown that the bar chart is the most widely used graph type for corpus data presentation. Motivated by this finding, the present paper offers a systematic review of bar chart usag…
View article: Dispersion analysis
Dispersion analysis Open
Dispersion analysis looks at how widely or evenly an item is distributed in a corpus. Indices express this information on a scale from 0 to 1, with higher values commonly denoting a more balanced distribution. Most measures rely on a divis…
View article: Dispersion analysis
Dispersion analysis Open
Dispersion analysis looks at how widely or evenly an item is distributed in a corpus. Indices express this information on a scale from 0 to 1, with higher values commonly denoting a more balanced distribution. Most measures rely on a divis…
View article: Count regression models for keyness analysis
Count regression models for keyness analysis Open
A wide variety of measures have been used in previous work to assess the keyness of items in a particular domain of language use. The present paper explores an approach to keyword analysis based on regression modeling. Specifically, we use…
View article: Advancing our understanding of dispersion measures in corpus research
Advancing our understanding of dispersion measures in corpus research Open
This paper offers a survey of recent corpus-based work, which shows that dispersion is typically measured across the text files in a corpus. Systematic insights into the behaviour of measures in such distributional settings are currently l…
View article: Ordinal response scales: Psychometric grounding for design and analysis
Ordinal response scales: Psychometric grounding for design and analysis Open
Ordinal response scales are commonly used in applied linguistics. To summarize the distribution of ratings or judgments provided by informants, these are usually converted into numbers and then averaged or analyzed with ordinary regression…
View article: Regression and random forests: Synergies for variationist corpus research
Regression and random forests: Synergies for variationist corpus research Open
Logistic regression and random forests are widely used modeling approaches in corpus-based work. As most studies tend to focus on one form of analysis, this paper demonstrates how their strengths may be combined in variationist corpus rese…
View article: Sensitivity of dispersion measures to distributional patterns and corpus design
Sensitivity of dispersion measures to distributional patterns and corpus design Open
While the purpose of dispersion measures is to quantify how evenly an item (or structure) is distributed in a corpus, recent work has shown that indices also respond to other features in the data: Juilland’s D varies systematically with th…
View article: 10 Years of TROLLing
10 Years of TROLLing Open
The Tromsø Repository of Language and Linguistics (TROLLing) published its first dataset on June 13, 2014. Since then, the repository has grown to 173 datasets, each of which is available in open access and equipped with metadata explainin…
View article: Latent-Variable Modelling of Ordinal Outcomes in Language Data Analysis
Latent-Variable Modelling of Ordinal Outcomes in Language Data Analysis Open
In empirical work, ordinal variables are typically analysed using means based on numeric scores assigned to categories. While this strategy has met with justified criticism in the methodological literature, it also generates simple and inf…
View article: Latent-variable modeling of ordinal outcomes in language data analysis
Latent-variable modeling of ordinal outcomes in language data analysis Open
In empirical work, ordinal variables are typically analyzed using means based on numeric scores assigned to categories. While this strategy has met with justified criticism in the methodological literature, it also generates simple and inf…
View article: Advancing our understanding of dispersion measures in corpus research
Advancing our understanding of dispersion measures in corpus research Open
While our inventory of dispersion measures is continuously growing, corpus linguists find little guidance for the choice among them. This paper offers a survey of recent corpus-based work, which shows that dispersion is typically measured …
View article: Down-sampling from hierarchically structured corpus data
Down-sampling from hierarchically structured corpus data Open
Resource constraints often require researchers to restrict their attention to a subset of the tokens returned by a corpus query. This paper sketches a methodology for down-sampling and offers a survey of current practices. The most prevale…
View article: (Re-)viewing the Acquisition of Rhythm in the Light of L2 Phonological Theories
(Re-)viewing the Acquisition of Rhythm in the Light of L2 Phonological Theories Open
Previous work on non-native speech rhythm has often drawn on L2 phonological theory for the interpretation of findings. The explicit confrontation of theory-derived hypotheses with data remains scarce, however. This paper illustrates how a…
View article: (Re-)viewing the acquisition of rhythm in the light of L2 phonological theories
(Re-)viewing the acquisition of rhythm in the light of L2 phonological theories Open
Previous work on non-native speech rhythm has often drawn on L2 phonological theory for the interpretation of findings. The explicit confrontation of theory-derived hypotheses with data remains scarce, however. This paper illustrates how a…
View article: Count regression models for keyness analysis
Count regression models for keyness analysis Open
A wide variety of measures have been used in previous work to assess the keyness of items in a particular domain of language use. The present paper explores an approach to keyword analysis based on regression modeling. Specifically, we use…
View article: Evaluation of keyness metrics: Reliability and interpretability
Evaluation of keyness metrics: Reliability and interpretability Open
While keyword analysis has become an essential tool in corpus-based work, the question of how to quantify keyness has been subject to considerable methodological debate. This has given rise to a variety of computerized metrics for detectin…
View article: Seeing the wood for the trees: Predictive margins for random forests
Seeing the wood for the trees: Predictive margins for random forests Open
Recursive partitioning techniques such as classification trees and random forests offer a number of attractive features to corpus data analysts. However, the way in which these models are typically reported – a decision tree and/or set of …
View article: Evaluation of text-level measures of lexical dispersion: Robustness and consistency
Evaluation of text-level measures of lexical dispersion: Robustness and consistency Open
The traditional approach to measuring lexical dispersion is to form corpus parts of equal size and then compare the occurrence rate of an item across these units. In recent methodological work, this strategy has met with criticism due to i…
View article: Drawing on principles of perception: The line plot
Drawing on principles of perception: The line plot Open
This paper draws attention to a display type that outperforms many competitors in the face of complexity: the line plot. Though familiar to most (if not all) linguists, this graph type is commonly associated with time series data. The true…
View article: The replication crisis, scientific revolutions, and linguistics
The replication crisis, scientific revolutions, and linguistics Open
Relevant issues also feature prominently in methodological handbooks (
View article: The replication crisis, scientific revolutions, and linguistics
The replication crisis, scientific revolutions, and linguistics Open
Relevant issues also feature prominently in methodological handbooks (
View article: Clear vs. dark /l/ in German Learner English: Dataset for chapter 5 in "Phonological variation in German Learner English"
Clear vs. dark /l/ in German Learner English: Dataset for chapter 5 in "Phonological variation in German Learner English" Open
This dataset contains tabular files with acoustic measurements for prevocalic and non-prevocalic laterals produced by n = 62 German learners of English and n = 26 native speakers of English (BrE and AmE). The German subjects are instructio…
View article: Final voiced obstruents in German Learner English: Dataset for chapter 10 in "Phonological variation in German Learner English"
Final voiced obstruents in German Learner English: Dataset for chapter 10 in "Phonological variation in German Learner English" Open
This dataset contains tabular files with (i) auditory data on final voiced obstruent (FVO) production by German learners and (ii) acoustic measurements of the preceding vowel duration by German learners and native speakers. The data origin…
View article: The labiodental fricative /v/ in German Learner English: Dataset for chapter 8 in "Phonological variation in German Learner English"
The labiodental fricative /v/ in German Learner English: Dataset for chapter 8 in "Phonological variation in German Learner English" Open
This dataset contains tabular files with auditory classifications for onset /v/ produced by German learners of English. The data originate from two different studies. The data by Soenning (2020) include n = 62 speakers (instructional-setti…