Saber A. Akhondi
YOU?
Author Swipe
View article: Learning Section Weights for Multi-Label Document Classification
Learning Section Weights for Multi-Label Document Classification Open
Multi-label document classification is a traditional task in NLP. Compared to single-label classification, each document can be assigned multiple classes. This problem is crucially important in various domains, such as tagging scientific a…
View article: One Strike, You're Out: Detecting Markush Structures in Low Signal-to-Noise Ratio Images
One Strike, You're Out: Detecting Markush Structures in Low Signal-to-Noise Ratio Images Open
Modern research increasingly relies on automated methods to assist researchers. An example of this is Optical Chemical Structure Recognition (OCSR), which aids chemists in retrieving information about chemicals from large amounts of docume…
View article: Stress Testing BERT Anaphora Resolution Models for Reaction Extraction in Chemical Patents
Stress Testing BERT Anaphora Resolution Models for Reaction Extraction in Chemical Patents Open
The high volume of published chemical patents and the importance of a timely acquisition of their information gives rise to automating information extraction from chemical patents. Anaphora resolution is an important component of comprehen…
View article: ChemTables: a dataset for semantic classification on tables in chemical patents
ChemTables: a dataset for semantic classification on tables in chemical patents Open
Chemical patents are a commonly used channel for disclosing novel compounds and reactions, and hence represent important resources for chemical and pharmaceutical research. Key chemical data in patents is often presented in tables. Both th…
View article: Chemtables: A Dataset for Semantic Classification on Tables in Chemical Patents
Chemtables: A Dataset for Semantic Classification on Tables in Chemical Patents Open
Chemical patents are a commonly used channel for disclosing novel compounds and reactions, and hence represent important resources for chemical and pharmaceutical research. Key chemical data in patents is often presented in tables. Both th…
View article: ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents
ChEMU 2020: Natural Language Processing Methods Are Effective for Information Extraction From Chemical Patents Open
Chemical patents represent a valuable source of information about new chemical compounds, which is critical to the drug discovery process. Automated information extraction over chemical patents is, however, a challenging task due to the la…
View article: ChEMU-Ref dataset for Modeling Anaphora Resolution in the Chemical Domain
ChEMU-Ref dataset for Modeling Anaphora Resolution in the Chemical Domain Open
In biochemistry, chemical compounds play an important role in pharmaceutical research and can help to save many lives from severe diseases. For chemical compound analysis, the discovery of compounds is usually first presented in chemical p…
View article: ChEMU-Ref: A Corpus for Modeling Anaphora Resolution in the Chemical Domain
ChEMU-Ref: A Corpus for Modeling Anaphora Resolution in the Chemical Domain Open
Biaoyan Fang, Christian Druckenbrodt, Saber A Akhondi, Jiayuan He, Timothy Baldwin, Karin Verspoor. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021.
View article: ChemTables: A Dataset for Semantic Classification of Tables in Chemical Patents
ChemTables: A Dataset for Semantic Classification of Tables in Chemical Patents Open
Chemical patents are a commonly used channel for disclosing novel compounds and reactions, and hence represent important resources for chemical and pharmaceutical research. Key chemical data in patents is often presented in tables. Both th…
View article: CORA: A Deep Active Learning Covid-19 Relevancy Algorithm to Identify Core Scientific Articles
CORA: A Deep Active Learning Covid-19 Relevancy Algorithm to Identify Core Scientific Articles Open
Ever since the COVID-19 pandemic broke out, the academic and scientific research community, as well as industry and governments around the world have joined forces in an unprecedented manner to fight the threat. Clinicians, biologists, che…
View article: ChemTables Sample: dataset for table classification in chemical patents
ChemTables Sample: dataset for table classification in chemical patents Open
Chemical patents are a commonly used channel for disclosing novel compounds and reactions, and hence represent important resources for chemical and pharmaceutical research. Key chemical data in patents is often presented in tables. Both th…
View article: Word Embeddings for Chemical Patent Natural Language Processing
Word Embeddings for Chemical Patent Natural Language Processing Open
We evaluate chemical patent word embeddings against known biomedical embeddings and show that they outperform the latter extrinsically and intrinsically. We also show that using contextualized embeddings can induce predictive models of rea…
View article: An extended overview of the CLEF 2020 ChEMU Lab ::information extraction of chemical reactions from patents
An extended overview of the CLEF 2020 ChEMU Lab ::information extraction of chemical reactions from patents Open
The discovery of new chemical compounds is perceived as a key driver of the chemistry industry and many other economic sectors. The information about the new discoveries are usually disclosed in scientifc literature and in particular, in c…
View article: ChEMU dataset for information extraction from chemical patents
ChEMU dataset for information extraction from chemical patents Open
The discovery of new chemical compounds and their synthesis process is of great importance to the chemical industry. Patent documents contain critical and timely information about newly discovered chemical compounds, providing a rich resou…
View article: Covid-19 Relevancy Algorithm Data Set for Identification of Core Scientific Articles
Covid-19 Relevancy Algorithm Data Set for Identification of Core Scientific Articles Open
CORA Data Set
View article: CORA: A Deep Active Learning Covid-19 Relevancy Algorithm to Identify Core Scientific Articles
CORA: A Deep Active Learning Covid-19 Relevancy Algorithm to Identify Core Scientific Articles Open
Zubair Afzal, Vikrant Yadav, Olga Fedorova, Vaishnavi Kandala, Janneke van de Loo, Saber A. Akhondi, Pascal Coupet, George Tsatsaronis. Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020. 2020.
View article: Detecting Chemical Reactions in Patents
Detecting Chemical Reactions in Patents Open
Extracting chemical reactions from patents is a crucial task for chemists working on chemical exploration. In this paper we introduce the novel task of detecting the textual spans that describe or refer to chemical reactions within patents…
View article: Automatic identification of relevant chemical compounds from patents. The training corpus.
Automatic identification of relevant chemical compounds from patents. The training corpus. Open
Background In commercial research and development projects, public disclosure of new chemical compounds often takes place in patents. Only a small proportion of these compounds are published in journals, usually a few years after the paten…
View article: Automatic identification of relevant chemical compounds from patents
Automatic identification of relevant chemical compounds from patents Open
In commercial research and development projects, public disclosure of new chemical compounds often takes place in patents. Only a small proportion of these compounds are published in journals, usually a few years after the patent. Patent a…
View article: Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings
Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings Open
Chemical patents are an important resource for chemical information. However, few chemical Named Entity Recognition (NER) systems have been evaluated on patent documents, due in part to their structural and linguistic complexity. In this p…
View article: Text Mining for Chemical Compounds
Text Mining for Chemical Compounds Open
Exploring the chemical and biological space covered by patent and journal publications is crucial in early- stage medicinal chemistry activities. The analysis provides understanding of compound prior art, novelty checking, validation of bi…
View article: The biomedical abbreviation recognition and resolution (BARR) track: Benchmarking, evaluation and importance of abbreviation recognition systems applied to Spanish biomedical abstracts
The biomedical abbreviation recognition and resolution (BARR) track: Benchmarking, evaluation and importance of abbreviation recognition systems applied to Spanish biomedical abstracts Open
Healthcare professionals are generating a substantial volume of clinical data in narrative form. As healthcare providers are confronted with serious time constraints, they frequently use telegraphic phrases, domain-specific abbreviations a…
View article: Chemical entity recognition in patents by combining dictionary-based and statistical approaches
Chemical entity recognition in patents by combining dictionary-based and statistical approaches Open
We describe the development of a chemical entity recognition system and its application in the CHEMDNER-patent track of BioCreative 2015. This community challenge includes a Chemical Entity Mention in Patents (CEMP) recognition task and a …
View article: Extraction of chemical-induced diseases using prior knowledge and textual information
Extraction of chemical-induced diseases using prior knowledge and textual information Open
We describe our approach to the chemical-disease relation (CDR) task in the BioCreative V challenge. The CDR task consists of two subtasks: automatic disease-named entity recognition and normalization (DNER), and extraction of chemical-ind…
View article: Erasmus MC at CLEF eHealth 2016: Concept recognition and coding in French texts
Erasmus MC at CLEF eHealth 2016: Concept recognition and coding in French texts Open
We participated in task 2 of the CLEF eHealth 2016 chal-lenge. Two subtasks were addressed: entity recognition and normalization in a corpus of French drug labels and Medline titles, and ICD-10 coding of French death certificates. For both…