Jan Neerbek
YOU?
Author Swipe
View article: A real-world data resource of complex sensitive sentences based on documents from the Monsanto trial
A real-world data resource of complex sensitive sentences based on documents from the Monsanto trial Open
In this work we present a corpus for the evaluation of sensitive information detection approaches that addresses the need for real world sensitive information for empirical studies. Our sentence corpus contains different notions of complex…
View article: Sensitive Information Detection: Recursive Neural Networks for Encoding Context
Sensitive Information Detection: Recursive Neural Networks for Encoding Context Open
The amount of data for processing and categorization grows at an ever increasing rate. At the same time the demand for collaboration and transparency in organizations, government and businesses, drives the release of data from internal rep…
View article: Sensitive Information Detection: Recursive Neural Networks for Encoding\n Context
Sensitive Information Detection: Recursive Neural Networks for Encoding\n Context Open
The amount of data for processing and categorization grows at an ever\nincreasing rate. At the same time the demand for collaboration and transparency\nin organizations, government and businesses, drives the release of data from\ninternal …
View article: FCAST
FCAST Open
FCAST, from Enron dataset. We use TREC Legal labels from domain experts. Dataset has 267366 sentences regarding Enron’s financial state. We use 40000 sentences for validation, 40000 for testing, and the rest for training. The ratio of sent…
View article: FAS
FAS Open
FAS, from Enron dataset. We use TREC Legal labels from domain experts. Dataset has 178266 sentences where Enron claims compliance with Financial Accounting Standards 3 . We use 27000 sentences for validation, 27000 for training. The ratio …
View article: EDENCE
EDENCE Open
EDENCE: 167913 sentences discussing tampering with evidence from Enron dataset. We use TREC Legal labels from domain experts. Dataset has 25000 sentences for validation, 25000 for testing. Approximately 23% in class 1.
View article: PPAY
PPAY Open
PPAY, from Enron dataset. We use TREC Legal labels from domain experts. Dataset has 134256 sentences about financial prepay transactions. We use 15000 sentences for validation, 15000 for testing. Approximately 13% labeled class 1.
View article: train.txt
train.txt Open
:unav
View article: test.txt
test.txt Open
:unav
View article: dev.txt
dev.txt Open
:unav
View article: dev.txt
dev.txt Open
:unav
View article: test.txt
test.txt Open
:unav
View article: dev.txt
dev.txt Open
:unav
View article: dev.txt
dev.txt Open
:unav
View article: test.txt
test.txt Open
:unav
View article: train.txt
train.txt Open
:unav
View article: train.txt
train.txt Open
:unav
View article: test.txt
test.txt Open
:unav
View article: train.txt
train.txt Open
:unav