ProteinBERT Trained model Article Swipe

View

Related Concepts

Computer science Artificial intelligence

Dan Ofer , Nadav Brandes ·

YOU? · · 2023 · Open Access · · DOI: https://doi.org/10.7910/dvn/hi55j5 · OA: W4398262111

Trained ProteinBERT model weights for the paper "ProteinBERT: A universal deep-learning model of protein sequence and function". https://github.com/nadavbra/protein_bert Also available via FTP: ftp://ftp.cs.huji.ac.il/users/nadavb/protein_bert/epoch_92400_sample_23500000.pkl ProteinBERT is a protein language model pretrained on ~106M proteins from UniRef90. The pretrained model can be fine-tuned on any protein-related task in a matter of minutes. ProteinBERT achieves state-of-the-art performance on a wide range of benchmarks. ProteinBERT is built on Keras/TensorFlow. ProteinBERT's deep-learning architecture is inspired by BERT, but contains several innovations such as global-attention layers that have linear complexity for sequence length (compared to self-attention's quadratic/n^2 growth). As a result, the model can process protein sequences of almost any length, including extremely long protein sequences (of over tens of thousands of amino acids). The model takes protein sequences as inputs, and can also take protein GO annotations as additional inputs (to help the model infer about the function of the input protein and update its internal representations and outputs accordingly). This pretrained Tensorflow/Keras model was produced by training for 28 days over ~670M records (~6.4 epochs over the entire UniRef90 training dataset of ~106M proteins).