Wang-Chiew Tan
YOU?
Author Swipe
View article: Spatiotemporal prediction of obesity rates and model interpretability analysis from a public health perspective
Spatiotemporal prediction of obesity rates and model interpretability analysis from a public health perspective Open
This study, focusing on the assessment of obesity prevalence trends in public health management, proposes an improved Transformer model that integrates temporal embeddings with spatially-constrained feature dependencies rather than purely …
View article: The Cambridge Report on Database Research
The Cambridge Report on Database Research Open
On October 19 and 20, 2023, the authors of this report convened in Cambridge, MA, to discuss the state of the database research field, its recent accomplishments and ongoing challenges, and future directions for research and community enga…
View article: ESG Performance Prediction and Driver Factor Mining for Listed Companies Based on Machine Learning: A Multi-Source Heterogeneous Data Fusion Analysis
ESG Performance Prediction and Driver Factor Mining for Listed Companies Based on Machine Learning: A Multi-Source Heterogeneous Data Fusion Analysis Open
With the acceleration of global economic integration and the growing focus on sustainable development, Environmental, Social, and Governance (ESG) factors have become key standards for evaluating a company's long-term value and risk. Howev…
View article: Diversity, Equity and Inclusion Activities in Database Conferences: A 2022 Report
Diversity, Equity and Inclusion Activities in Database Conferences: A 2022 Report Open
The Diversity, Equity and Inclusion (DEI) initiative started as the Diversity/Inclusion initiative in 2020 [4]. The current report summarizes our activities in 2022. Our responsibility as a community is to ensure that attendees of DB confe…
View article: TimelineQA: A Benchmark for Question Answering over Timelines
TimelineQA: A Benchmark for Question Answering over Timelines Open
Lifelogs are descriptions of experiences that a person had during their life. Lifelogs are created by fusing data from the multitude of digital services, such as online photos, maps, shopping and content streaming services. Question answer…
View article: Reimagining Retrieval Augmented Language Models for Answering Queries
Reimagining Retrieval Augmented Language Models for Answering Queries Open
We present a reality check on large language models and inspect the promise of retrieval augmented language models in comparison. Such language models are semi-parametric, where models integrate model parameters and knowledge from external…
View article: Unstructured and structured data: Can we have the best of both worlds with large language models?
Unstructured and structured data: Can we have the best of both worlds with large language models? Open
This paper presents an opinion on the potential of using large language models to query on both unstructured and structured data. It also outlines some research challenges related to the topic of building question-answering systems for bot…
View article: Reimagining Retrieval Augmented Language Models for Answering Queries
Reimagining Retrieval Augmented Language Models for Answering Queries Open
We present a reality check on large language models and inspect the promise of retrieval-augmented language models in comparison. Such language models are semi-parametric, where models integrate model parameters and knowledge from external…
View article: TimelineQA: A Benchmark for Question Answering over Timelines
TimelineQA: A Benchmark for Question Answering over Timelines Open
Lifelogs are descriptions of experiences that a person had during their life. Lifelogs are created by fusing data from the multitude of digital services, such as online photos, maps, shopping and content streaming services. Question answer…
View article: Annotating Columns with Pre-trained Language Models
Annotating Columns with Pre-trained Language Models Open
Inferring meta information about tables, such as column headers or\nrelationships between columns, is an active research topic in data management\nas we find many tables are missing some of this information. In this paper, we\nstudy the pr…
View article: Augmenting control arms with Real-World Data for cancer trials: Hybrid control arm methods and considerations
Augmenting control arms with Real-World Data for cancer trials: Hybrid control arm methods and considerations Open
Randomized controlled trials (RCTs) are the gold standard for assessing drug safety and efficacy. However, RCTs have some drawbacks which have led to the use of single-arm studies to make certain internal drug development and regulatory de…
View article: Constructing Explainable Opinion Graphs from Reviews
Constructing Explainable Opinion Graphs from Reviews Open
The Web is a major resource of both factual and subjective information. While there are significant efforts to organize factual information into knowledge bases, there is much less work on organizing opinions, which are abundant in subject…
View article: Convex Aggregation for Opinion Summarization
Convex Aggregation for Opinion Summarization Open
Recent advances in text autoencoders have significantly improved the quality of the latent space, which enables models to generate grammatical and consistent text from aggregated latent vectors. As a successful application of this property…
View article: Convex Aggregation for Opinion Summarization
Convex Aggregation for Opinion Summarization Open
Recent advances in text autoencoders have significantly improved the quality of the latent space, which enables models to generate grammatical and consistent text from aggregated latent vectors. As a successful application of this property…
View article: Predictive case control designs for modification learning
Predictive case control designs for modification learning Open
Prediction models for clinical outcomes may be developed using a source dataset and additionally applied to new settings. Towards model external validation and model updating in the new setting, one procedure is model modification learning…
View article: Deep or Simple Models for Semantic Tagging? It Depends on your Data [Experiments]
Deep or Simple Models for Semantic Tagging? It Depends on your Data [Experiments] Open
Semantic tagging, which has extensive applications in text mining, predicts whether a given piece of text conveys the meaning of a given semantic tag. The problem of semantic tagging is largely solved with supervised learning and today, de…
View article: ExplainIt: Explainable Review Summarization with Opinion Causality Graphs
ExplainIt: Explainable Review Summarization with Opinion Causality Graphs Open
We present ExplainIt, a review summarization system centered around opinion explainability: the simple notion of high-level opinions (e.g. noisy room) being explainable by lower-level ones (e.g., loud fridge). ExplainIt utilizes a combinat…
View article: Constructing Explainable Opinion Graphs from Review
Constructing Explainable Opinion Graphs from Review Open
The Web is a major resource of both factual and subjective information. While there are significant efforts to organize factual information into knowledge bases, there is much less work on organizing opinions, which are abundant in subject…
View article: Adaptive Rule Discovery for Labeling Text Data
Adaptive Rule Discovery for Labeling Text Data Open
Creating and collecting labeled data is one of the major bottlenecks in machine learning pipelines and the emergence of automated feature generation techniques such as deep learning, which typically requires a lot of training data, has fur…
View article: OpinionDigest: A Simple Framework for Opinion Summarization
OpinionDigest: A Simple Framework for Opinion Summarization Open
We present OpinionDigest, an abstractive opinion summarization framework, which does not rely on gold-standard summaries for training. The framework uses an Aspect-based Sentiment Analysis model to extract opinion phrases from reviews, and…
View article: Enhancing Review Comprehension with Domain-Specific Commonsense
Enhancing Review Comprehension with Domain-Specific Commonsense Open
Review comprehension has played an increasingly important role in improving the quality of online services and products and commonsense knowledge can further enhance review comprehension. However, existing general-purpose commonsense knowl…
View article: Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization
Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization Open
We present Emu, a system that semantically enhances multilingual sentence embeddings. Our framework fine-tunes pre-trained multilingual sentence embeddings using two main components: a semantic classifier and a language discriminator. The …
View article: Towards Productionizing Subjective Search Systems
Towards Productionizing Subjective Search Systems Open
Existing e-commerce search engines typically support search only over objective attributes, such as price and locations, leaving the more desirable subjective attributes, such as romantic vibe and worklife balance unsearchable. We found th…
View article: Teddy: A System for Interactive Review Analysis
Teddy: A System for Interactive Review Analysis Open
Reviews are integral to e-commerce services and products. They contain a wealth of information about the opinions and experiences of users, which can help better understand consumer decisions and improve user experience with products and s…
View article: SubjQA: A Dataset for Subjectivity and Review Comprehension
SubjQA: A Dataset for Subjectivity and Review Comprehension Open
Subjectivity is the expression of internal opinions or beliefs which cannot be objectively observed or verified, and has been shown to be important for sentiment analysis and word-sense disambiguation. Furthermore, subjectivity is an impor…
View article: OpinionDigest: A Simple Framework for Opinion Summarization
OpinionDigest: A Simple Framework for Opinion Summarization Open
We present OpinionDigest, an abstractive opinion summarization framework, which does not rely on gold-standard summaries for training. The framework uses an Aspect-based Sentiment Analysis model to extract opinion phrases from reviews, and…
View article: Sato: Contextual Semantic Type Detection in Tables
Sato: Contextual Semantic Type Detection in Tables Open
Detecting the semantic types of data columns in relational tables is important for various data preparation and information retrieval tasks such as data cleaning, schema matching, data discovery, and semantic search. However, existing dete…
View article: Essentia: Mining Domain-Specific Paraphrases with Word-Alignment Graphs
Essentia: Mining Domain-Specific Paraphrases with Word-Alignment Graphs Open
Paraphrases are important linguistic resources for a wide variety of NLP applications. Many techniques for automatic paraphrase mining from general corpora have been proposed. While these techniques are successful at discovering generic pa…
View article: Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization
Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization Open
We present Emu, a system that semantically enhances multilingual sentence embeddings. Our framework fine-tunes pre-trained multilingual sentence embeddings using two main components: a semantic classifier and a language discriminator. The …