Data exploration
View article
Pavian: interactive analysis of metagenomics data for microbiome studies and pathogen identification Open
Summary Pavian is a web application for exploring classification results from metagenomics experiments. With Pavian, researchers can analyze, visualize and transform results from various classifiers—such as Kraken, Centrifuge and MethaPhlA…
View article
CZ CELLxGENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data Open
Hundreds of millions of single cells have been analyzed using high-throughput transcriptomic methods. The cumulative knowledge within these datasets provides an exciting opportunity for unlocking insights into health and disease at the lev…
View article
CellProfiler Analyst: interactive data exploration, analysis and classification of large biological image sets Open
Summary: CellProfiler Analyst allows the exploration and visualization of image-based data, together with the classification of complex biological phenotypes, via an interactive user interface designed for biologists and data scientists. C…
View article
SCImago Graphica: a new tool for exploring and visually communicating data Open
Despite the increasing number of data visualization authoring systems in recent years, it remains a challenge to simultaneously achieve high expressive power and ease of use in a single tool. In this paper we present SCImago Graphica, a no…
View article
Futzing and Moseying: Interviews with Professional Data Analysts on Exploration Practices Open
We report the results of interviewing thirty professional data analysts working in a range of industrial, academic, and regulatory environments. This study focuses on participants' descriptions of exploratory activities and tool usage in t…
View article
A cross-package Bioconductor workflow for analysing methylation array data Open
Methylation in the human genome is known to be associated with development and disease. The Illumina Infinium methylation arrays are by far the most common way to interrogate methylation across the human genome. This paper provides a Bioco…
View article
Principal Component Analysis: A Natural Approach to Data Exploration Open
Principal component analysis (PCA) is often used for analyzing data in the most diverse areas. In this work, we report an integrated approach to several theoretical and practical aspects of PCA. We start by providing, in an intuitive and a…
View article
giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and\n Data Exploration Open
We introduce giotto-tda, a Python library that integrates high-performance\ntopological data analysis with machine learning via a scikit-learn-compatible\nAPI and state-of-the-art C++ implementations. The library's ability to handle\nvario…
View article
The State of the Art in Cartograms Open
Cartograms combine statistical and geographical information in thematic maps, where areas of geographical regions (e.g., countries, states) are scaled in proportion to some statistic (e.g., population, income). Cartograms make it possible …
View article
MIRIA: A Mixed Reality Toolkit for the In-Situ Visualization and Analysis of Spatio-Temporal Interaction Data Open
In this paper, we present MIRIA, a Mixed Reality Interaction Analysis toolkit designed to support the in-situ visual analysis of user interaction in mixed reality and multi-display environments. So far, there are few options to effectively…
View article
Finding Related Tables in Data Lakes for Interactive Data Science Open
Many modern data science applications build on data lakes, schema-agnostic repositories of data files and data products that offer limited organization and management capabilities. There is a need to build data lake search capabilities int…
View article
AliTV—interactive visualization of whole genome comparisons Open
Whole genome alignments and comparative analysis are key methods in the quest of unraveling the dynamics of genome evolution. Interactive visualization and exploration of the generated alignments, annotations, and phylogenetic data are imp…
View article
Controlling False Discoveries During Interactive Data Exploration Open
Recent tools for interactive data exploration significantly increase the chance that users make false discoveries. They allow users to (visually) examine many hypotheses and make inference with simple interactions, and thus incur the issue…
View article
Software tools for visualizing Hi-C data Open
High-throughput assays for measuring the three-dimensional (3D) configuration of DNA have provided unprecedented insights into the relationship between DNA 3D configuration and function. Data interpretation from assays such as ChIA-PET and…
View article
HiPiler: Visual Exploration of Large Genome Interaction Matrices with Interactive Small Multiples Open
This paper presents an interactive visualization interface-HiPiler-for the exploration and visualization of regions-of-interest in large genome interaction matrices. Genome interaction matrices approximate the physical distance of pairs of…
View article
InsightPilot: An LLM-Empowered Automated Data Exploration System Open
Exploring data is crucial in data analysis, as it helps users understand and interpret the data more effectively. However, performing effective data exploration requires in-depth knowledge of the dataset, the user intent and expertise in d…
View article
Glycowork: A Python package for glycan data science and machine learning Open
While glycans are crucial for biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include these diverse carbohydrates into workflows. Here, we present glycowork, an …
View article
A comprehensive review of tools for exploratory analysis of tabular industrial datasets Open
Exploratory data analysis plays a major role in obtaining insights from data. Over the last two decades, researchers have proposed several visual data exploration tools that can assist with each step of the analysis process. Nevertheless, …
View article
giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration Open
We introduce giotto-tda, a Python library that integrates high-performance topological data analysis with machine learning via a scikit-learn-compatible API and state-of-the-art C++ implementations. The library's ability to handle various …
View article
Visualisation of Linked Data – Reprise Open
International audience
View article
Goals, Process, and Challenges of Exploratory Data Analysis: An Interview Study Open
How do analysis goals and context affect exploratory data analysis (EDA)? To investigate this question, we conducted semi-structured interviews with 18 data analysts. We characterize common exploration goals: profiling (assessing data qual…
View article
Taggle: Combining overview and details in tabular data visualizations Open
Most tabular data visualization techniques focus on overviews, yet many practical analysis tasks are concerned with investigating individual items of interest. At the same time, relating an item to the rest of a potentially large table is …
View article
Interweaving Multimodal Interaction With Flexible Unit Visualizations for Data Exploration Open
Multimodal interfaces that combine direct manipulation and natural language have shown great promise for data visualization. Such multimodal interfaces allow people to stay in the flow of their visual exploration by leveraging the strength…
View article
Methods for The Metagenomic Data Visualization and Analysis Open
Surveys of environmental microbial communities using metagenomic approach produce vast volumes of multidimensional data regarding the phylogenetic and functional composition of the microbiota. Faced with such complex data, a metagenomic re…
View article
Mastering data visualization with Python: practical tips for researchers Open
Big data have revolutionized the way data are processed and used across all fields. In the past, research was primarily conducted with a focus on hypothesis confirmation using sample data. However, in the era of big data, this has shifted …
View article
Dataset Discovery and Exploration: A Survey Open
Data scientists are tasked with obtaining insights from data. However, suitable data is often not immediately at hand, and there may be many potentially relevant datasets in a data lake or in open data repositories. As a result, data disco…
View article
Aemoo: Linked Data exploration based on Knowledge Patterns Open
This paper presents a novel approach to Linked Data exploration that uses Encyclopedic Knowledge Patterns (EKPs) as relevance criteria for selecting, organising, and visualising knowledge. EKP are discovered by mining the linking structure…
View article
Supporting Data Science in the Statistics Curriculum Open
This article describes a collaborative project across three institutions to develop, implement, and evaluate a series of tutorials and case studies that highlight fundamental tools of data science—such as visualization, data manipulation, …
View article
mdciao: Accessible Analysis and Visualization of Molecular Dynamics Simulation Data Open
We present mdciao, an open-source command line tool and Python Application-Programmers-Interface (API) for easy, one-shot analysis and representation of molecular dynamics (MD) simulation data. Building upon the widely used concept of resi…
View article
pandera: Statistical Data Validation of Pandas Dataframes Open
pandas is an essential tool in the data scientistâs toolkit for modern data engineering, analysis, and modeling in the Python ecosystem. However, dataframes can often be difficult to reason about in terms of their data types and statisti…