Data transformation
View article
A review: Data pre-processing and data augmentation techniques Open
This review paper provides an overview of data pre-processing in Machine learning, focusing on all types of problems while building the machine learning problems. It deals with two significant issues in the pre-processing process (i). issu…
View article
Improving your data transformations: Applying the Box-Cox transformation Open
Many of us in the social sciences deal with data that do not conform to assumptions of normality and/or homoscedasticity/homogeneity of variance. Some research has shown that parametric tests (e.g., multiple regression, ANOVA) can be robus…
View article
Seriously misleading results using inverse of Freeman‐Tukey double arcsine transformation in meta‐analysis of single proportions Open
Standard generic inverse variance methods for the combination of single proportions are based on transformed proportions using the logit, arcsine, and Freeman‐Tukey double arcsine transformations. Generalized linear mixed models are anothe…
View article
A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery From Building Operational Data Open
The rapid development in data science and the increasing availability of building operational data have provided great opportunities for developing data-driven solutions for intelligent building energy management. Data preprocessing serves…
View article
Understanding sequencing data as compositions: an outlook and review Open
Motivation Although seldom acknowledged explicitly, count data generated by sequencing platforms exist as compositions for which the abundance of each component (e.g. gene or transcript) is only coherently interpretable relative to other c…
View article
Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling Open
The appropriate choice of normalisation, missing value imputation, transformation and scaling methods differs depending on the data analysis method and the choice of method is essential to maximise the biological derivations from UHPLC-MS …
View article
Compositional Data Analysis Open
Compositional data are nonnegative data carrying relative, rather than absolute, information—these are often data with a constant-sum constraint on the sample values, for example, proportions or percentages summing to 1% or 100%, respectiv…
View article
Impact of Big Data and Machine Learning on Digital Transformation in Marketing: A Literature Review Open
This paper describes the impact of big data and machine learning (ML) on digital transformation of the marketing industry and the challenges it faces from a data and information management perspective. To do this, the study identified area…
View article
Data transformation: a focus on the interpretation Open
Several assumptions such as normality, linear relationship, and homoscedasticity are frequently required in parametric statistical analysis methods. Data collected from the clinical situation or experiments often violate these assumptions.…
View article
Linnorm: improved statistical analysis for single cell RNA-seq expression data Open
Linnorm is a novel normalization and transformation method for the analysis of single cell RNA sequencing (scRNA-seq) data. Linnorm is developed to remove technical noises and simultaneously preserve biological variations in scRNA-seq data…
View article
Evaluating Functional Diversity: Missing Trait Data and the Importance of Species Abundance Structure and Data Transformation Open
Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive…
View article
Wrex: A Unified Programming-by-Example Interaction for Synthesizing Readable Code for Data Scientists Open
Data wrangling is a difficult and time-consuming activity in computational notebooks, and existing wrangling tools do not fit the exploratory workflow for data scientists in these environments. We propose a unified interaction model based …
View article
Foofah Open
Data transformation is a critical first step in modern data analysis: before any analysis can be done, data from a variety of sources must be wrangled into a uniform format that is amenable to the intended analysis and analytical software …
View article
Compositional Data Analysis of Microbiome and Any-Omics Datasets: A Validation of the Additive Logratio Transformation Open
Microbiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc.). These data are g…
View article
Cross-platform normalization of microarray and RNA-seq data for machine learning applications Open
Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarr…
View article
Count data in biology—Data transformation or model reformation? Open
Statistical analyses are an integral component of scientific research, and for decades, biologists have applied transformations to data to meet the normal error assumptions for F and t tests. Over the years, there has been a movement from …
View article
Interoperability and FAIRness through a novel combination of Web technologies Open
Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-…
View article
An Incremental Dimensionality Reduction Method for Visualizing Streaming Multidimensional Data Open
Dimensionality reduction (DR) methods are commonly used for analyzing and visualizing multidimensional data. However, when data is a live streaming feed, conventional DR methods cannot be directly used because of their computational comple…
View article
Incrementally Transforming Electronic Medical Records into the Observational Medical Outcomes Partnership Common Data Model: A Multidimensional Quality Assurance Approach Open
Background The development and adoption of health care common data models (CDMs) has addressed some of the logistical challenges of performing research on data generated from disparate health care systems by standardizing data representati…
View article
Why You Cannot Transform Your Way out of Trouble for Small Counts Open
Summary While data transformation is a common strategy to satisfy linear modeling assumptions, a theoretical result is used to show that transformation cannot reasonably be expected to stabilize variances for small counts. Under broad assu…
View article
Metadata Extraction and Management in Data Lakes With GEMMS Open
In addition to volume and velocity, Big data is also characterized by its variety. Variety in structure and semantics requires new integration approaches which can resolve the integration challenges also for large volumes of data. Data lak…
View article
Deep Learning with Data Transformation and Factor Analysis for Student Performance Prediction Open
Student performance prediction is one of the most concerning issues in the field of education and training, especially educational data mining. The prediction supports students to select courses and design appropriate study plans for thems…
View article
Reshaping and aggregating data: an introduction to reshape package. Open
It is common that data format extracted from clinical database does not meet the purpose of statistical analysis. In clinical research, variables are frequently measured repeatedly over the follow-up period. Such data can be displayed eith…
View article
Automatic Transformation of Data Warehouse Schema to NoSQL Data Base: Comparative Study Open
Driven by the ever-growing of data from social network (SN), data warehouse (DW) approaches must be adapted. Generally the star, snowflake or constellation models are used as logical ones. All these models are inadequate when dealing with …
View article
Handling Skewed Data: A Comparison of Two Popular Methods Open
Scientists in biomedical and psychosocial research need to deal with skewed data all the time. In the case of comparing means from two groups, the log transformation is commonly used as a traditional technique to normalize skewed data befo…
View article
ImageGP 2 for enhanced data visualization and reproducible analysis in biomedical research Open
ImageGP is an extensively utilized, open‐access platform for online data visualization and analysis. Over the past 7 years, it has catered to more than 700,000 usages globally, garnering substantial user feedback. The updated version, Imag…
View article
Data Preprocessing: The Techniques for Preparing Clean and Quality Data for Data Analytics Process Open
The model and pattern for real time data mining have an important role for decision making. The meaningful real time data mining is basically depends on the quality of data while row or rough data available at warehouse. The data available…
View article
An empirical study on measurement of efficiency of digital transformation by using data envelopment analysis Open
Nowadays digitalization is an important topic for businesses and government agencies. There are important reports publishing about digitalization or digital transformation. This study aims to meas-ure the relative efficiency of digital tra…
View article
Automatic robust Box–Cox and extended Yeo–Johnson transformations in regression Open
The paper introduces an automatic procedure for the parametric transformation of the response in regression models to approximate normality. We consider the Box–Cox transformation and its generalization to the extended Yeo–Johnson transfor…
View article
Healthcare data security and privacy in Data Warehouse architectures Open
Data Warehouse (DW) is a common term used in the Data Mining process for storing copious amounts of data ready for analysis. Organizations are starting to prioritize Data Warehouses, which are essential for mining their historical datasets…