Explanipedia

A real-world data resource of complex sensitive sentences based on documents from the Monsanto trial Open

Jan Neerbek, M. R. Eskildsen, Peter Dolog, Ira Assent, Nicoletta Calzolari , et al. · 2024

Computer science

In this work we present a corpus for the evaluation of sensitive information detection approaches that addresses the need for real world sensitive information for empirical studies. Our sentence corpus contains different notions of complex…

Sensitive Information Detection: Recursive Neural Networks for Encoding Context Open

Jan Neerbek · 2020

Computer science Biology

The amount of data for processing and categorization grows at an ever increasing rate. At the same time the demand for collaboration and transparency in organizations, government and businesses, drives the release of data from internal rep…

Sensitive Information Detection: Recursive Neural Networks for Encoding\n Context Open

Jan Neerbek · 2020

Computer science Biology

The amount of data for processing and categorization grows at an ever\nincreasing rate. At the same time the demand for collaboration and transparency\nin organizations, government and businesses, drives the release of data from\ninternal …

FCAST Open

Jan Neerbek · 2019

Computer science

FCAST, from Enron dataset. We use TREC Legal labels from domain experts. Dataset has 267366 sentences regarding Enron’s financial state. We use 40000 sentences for validation, 40000 for testing, and the rest for training. The ratio of sent…

FAS Open

Jan Neerbek · 2019

Chemistry

FAS, from Enron dataset. We use TREC Legal labels from domain experts. Dataset has 178266 sentences where Enron claims compliance with Financial Accounting Standards 3 . We use 27000 sentences for validation, 27000 for training. The ratio …

EDENCE Open

Jan Neerbek · 2019

Computer science

EDENCE: 167913 sentences discussing tampering with evidence from Enron dataset. We use TREC Legal labels from domain experts. Dataset has 25000 sentences for validation, 25000 for testing. Approximately 23% in class 1.

PPAY Open

Jan Neerbek · 2019

Computer science

PPAY, from Enron dataset. We use TREC Legal labels from domain experts. Dataset has 134256 sentences about financial prepay transactions. We use 15000 sentences for validation, 15000 for testing. Approximately 13% labeled class 1.

train.txt Open

Jan Neerbek · 2019

Computer science

:unav

test.txt Open

Jan Neerbek · 2019

Geology

:unav

dev.txt Open

Jan Neerbek · 2019

Computer science

:unav

dev.txt Open

Jan Neerbek · 2019

Geography

:unav

test.txt Open

Jan Neerbek · 2019

Geology

:unav

dev.txt Open

Jan Neerbek · 2019

Computer science

:unav

dev.txt Open

Jan Neerbek · 2019

Computer science

:unav

test.txt Open

Jan Neerbek · 2019

Geology

:unav

train.txt Open

Jan Neerbek · 2019

Computer science

:unav

train.txt Open

Jan Neerbek · 2019

Computer science

:unav

test.txt Open

Jan Neerbek · 2019

Mathematics Biology

:unav

train.txt Open

Jan Neerbek · 2019

Computer science

:unav

Jan Neerbek YOU? Author Swipe