Unique and exclusive peptide signatures directly identify intrinsically disordered proteins from sequences without structural information Article Swipe
YOU?
·
· 2020
· Open Access
·
· DOI: https://doi.org/10.6084/m9.figshare.12135996
Intrinsically disordered proteins are now widely accepted to play crucial roles in biological functions. Identification of signatures of intrinsic disorder is one of the key steps towards building a proper repertoire for their occurrence in proteomes. In this work, systematic computational synthesis of a library of all possible (3368400) dipeptides, tripeptides, tetrapeptides and pentapeptides using the natural 20 amino acids allowed us to identify 36 unique tetrapeptides present exclusively in intrinsically disordered proteins and absent in the complete primary sequence space of naturally occurring structured proteins. Further, out of more than 530000 known naturally occurring primary sequences without any structural information, 1349 sequences contain the above identified unique signatures of intrinsic disorder. These sequences, having cellular functions varying from housekeeping to metabolic to transport, more than double the number of the currently known intrinsically disordered proteins. On similar lines, we report that 26577 pentapeptide signatures exclusive to intrinsically disordered proteins, and absent in naturally occurring structured proteins, identify ∼50% of more than half-a-million curated protein sequences without structural information to be intrinsically disordered. The results reported are a major leap forward in exploring functional manifestations of intrinsically disordered proteins. Communicated by Ramaswamy H. Sarma
Related Topics
- Type
- dataset
- Language
- en
- Landing Page
- https://doi.org/10.6084/m9.figshare.12135996
- OA Status
- gold
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4394334180
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4394334180Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.6084/m9.figshare.12135996Digital Object Identifier
- Title
-
Unique and exclusive peptide signatures directly identify intrinsically disordered proteins from sequences without structural informationWork title
- Type
-
datasetOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2020Year of publication
- Publication date
-
2020-01-01Full publication date if available
- Authors
-
Aditya Mittal, Anandkumar Madhavjibhai Changani, Sakshi TapariaList of authors in order
- Landing page
-
https://doi.org/10.6084/m9.figshare.12135996Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.6084/m9.figshare.12135996Direct OA link when available
- Concepts
-
Intrinsically disordered proteins, Computational biology, Peptide, Computer science, Biology, Chemistry, Biophysics, BiochemistryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4394334180 |
|---|---|
| doi | https://doi.org/10.6084/m9.figshare.12135996 |
| ids.doi | https://doi.org/10.6084/m9.figshare.12135996 |
| ids.openalex | https://openalex.org/W4394334180 |
| fwci | 0.0 |
| type | dataset |
| title | Unique and exclusive peptide signatures directly identify intrinsically disordered proteins from sequences without structural information |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12254 |
| topics[0].field.id | https://openalex.org/fields/13 |
| topics[0].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[0].score | 0.9976000189781189 |
| topics[0].domain.id | https://openalex.org/domains/1 |
| topics[0].domain.display_name | Life Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1312 |
| topics[0].subfield.display_name | Molecular Biology |
| topics[0].display_name | Machine Learning in Bioinformatics |
| topics[1].id | https://openalex.org/T10044 |
| topics[1].field.id | https://openalex.org/fields/13 |
| topics[1].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[1].score | 0.9962000250816345 |
| topics[1].domain.id | https://openalex.org/domains/1 |
| topics[1].domain.display_name | Life Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1312 |
| topics[1].subfield.display_name | Molecular Biology |
| topics[1].display_name | Protein Structure and Dynamics |
| topics[2].id | https://openalex.org/T10521 |
| topics[2].field.id | https://openalex.org/fields/13 |
| topics[2].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[2].score | 0.989300012588501 |
| topics[2].domain.id | https://openalex.org/domains/1 |
| topics[2].domain.display_name | Life Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1312 |
| topics[2].subfield.display_name | Molecular Biology |
| topics[2].display_name | RNA and protein synthesis mechanisms |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2778815515 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6878673434257507 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q3408242 |
| concepts[0].display_name | Intrinsically disordered proteins |
| concepts[1].id | https://openalex.org/C70721500 |
| concepts[1].level | 1 |
| concepts[1].score | 0.5281130075454712 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q177005 |
| concepts[1].display_name | Computational biology |
| concepts[2].id | https://openalex.org/C2779281246 |
| concepts[2].level | 2 |
| concepts[2].score | 0.44660383462905884 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q172847 |
| concepts[2].display_name | Peptide |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.3344722390174866 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C86803240 |
| concepts[4].level | 0 |
| concepts[4].score | 0.3283722996711731 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[4].display_name | Biology |
| concepts[5].id | https://openalex.org/C185592680 |
| concepts[5].level | 0 |
| concepts[5].score | 0.32657232880592346 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[5].display_name | Chemistry |
| concepts[6].id | https://openalex.org/C12554922 |
| concepts[6].level | 1 |
| concepts[6].score | 0.24596193432807922 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q7100 |
| concepts[6].display_name | Biophysics |
| concepts[7].id | https://openalex.org/C55493867 |
| concepts[7].level | 1 |
| concepts[7].score | 0.1385611593723297 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q7094 |
| concepts[7].display_name | Biochemistry |
| keywords[0].id | https://openalex.org/keywords/intrinsically-disordered-proteins |
| keywords[0].score | 0.6878673434257507 |
| keywords[0].display_name | Intrinsically disordered proteins |
| keywords[1].id | https://openalex.org/keywords/computational-biology |
| keywords[1].score | 0.5281130075454712 |
| keywords[1].display_name | Computational biology |
| keywords[2].id | https://openalex.org/keywords/peptide |
| keywords[2].score | 0.44660383462905884 |
| keywords[2].display_name | Peptide |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.3344722390174866 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/biology |
| keywords[4].score | 0.3283722996711731 |
| keywords[4].display_name | Biology |
| keywords[5].id | https://openalex.org/keywords/chemistry |
| keywords[5].score | 0.32657232880592346 |
| keywords[5].display_name | Chemistry |
| keywords[6].id | https://openalex.org/keywords/biophysics |
| keywords[6].score | 0.24596193432807922 |
| keywords[6].display_name | Biophysics |
| keywords[7].id | https://openalex.org/keywords/biochemistry |
| keywords[7].score | 0.1385611593723297 |
| keywords[7].display_name | Biochemistry |
| language | en |
| locations[0].id | doi:10.6084/m9.figshare.12135996 |
| locations[0].is_oa | True |
| locations[0].source | |
| locations[0].license | cc-by |
| locations[0].pdf_url | |
| locations[0].version | |
| locations[0].raw_type | dataset |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.6084/m9.figshare.12135996 |
| indexed_in | datacite |
| authorships[0].author.id | https://openalex.org/A5011481258 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-4030-0951 |
| authorships[0].author.display_name | Aditya Mittal |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Aditya Mittal |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5062117560 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Anandkumar Madhavjibhai Changani |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Anandkumar Madhavjibhai Changani |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5055197052 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Sakshi Taparia |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Sakshi Taparia |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.6084/m9.figshare.12135996 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Unique and exclusive peptide signatures directly identify intrinsically disordered proteins from sequences without structural information |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12254 |
| primary_topic.field.id | https://openalex.org/fields/13 |
| primary_topic.field.display_name | Biochemistry, Genetics and Molecular Biology |
| primary_topic.score | 0.9976000189781189 |
| primary_topic.domain.id | https://openalex.org/domains/1 |
| primary_topic.domain.display_name | Life Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1312 |
| primary_topic.subfield.display_name | Molecular Biology |
| primary_topic.display_name | Machine Learning in Bioinformatics |
| related_works | https://openalex.org/W4387497383, https://openalex.org/W2948807893, https://openalex.org/W2778153218, https://openalex.org/W2748952813, https://openalex.org/W1531601525, https://openalex.org/W4391375266, https://openalex.org/W2078814861, https://openalex.org/W2527526854, https://openalex.org/W2147733973, https://openalex.org/W2077487109 |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.6084/m9.figshare.12135996 |
| best_oa_location.is_oa | True |
| best_oa_location.source | |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | |
| best_oa_location.version | |
| best_oa_location.raw_type | dataset |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.6084/m9.figshare.12135996 |
| primary_location.id | doi:10.6084/m9.figshare.12135996 |
| primary_location.is_oa | True |
| primary_location.source | |
| primary_location.license | cc-by |
| primary_location.pdf_url | |
| primary_location.version | |
| primary_location.raw_type | dataset |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.6084/m9.figshare.12135996 |
| publication_date | 2020-01-01 |
| publication_year | 2020 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 28, 43, 177 |
| abstract_inverted_index.20 | 57 |
| abstract_inverted_index.36 | 64 |
| abstract_inverted_index.H. | 192 |
| abstract_inverted_index.In | 36 |
| abstract_inverted_index.On | 136 |
| abstract_inverted_index.be | 170 |
| abstract_inverted_index.by | 190 |
| abstract_inverted_index.in | 11, 34, 69, 75, 152, 181 |
| abstract_inverted_index.is | 20 |
| abstract_inverted_index.of | 15, 17, 22, 42, 45, 81, 88, 109, 129, 159, 185 |
| abstract_inverted_index.to | 7, 62, 120, 122, 146, 169 |
| abstract_inverted_index.us | 61 |
| abstract_inverted_index.we | 139 |
| abstract_inverted_index.The | 173 |
| abstract_inverted_index.all | 46 |
| abstract_inverted_index.and | 52, 73, 150 |
| abstract_inverted_index.any | 98 |
| abstract_inverted_index.are | 3, 176 |
| abstract_inverted_index.for | 31 |
| abstract_inverted_index.key | 24 |
| abstract_inverted_index.now | 4 |
| abstract_inverted_index.one | 21 |
| abstract_inverted_index.out | 87 |
| abstract_inverted_index.the | 23, 55, 76, 104, 127, 130 |
| abstract_inverted_index.1349 | 101 |
| abstract_inverted_index.from | 118 |
| abstract_inverted_index.leap | 179 |
| abstract_inverted_index.more | 89, 124, 160 |
| abstract_inverted_index.play | 8 |
| abstract_inverted_index.than | 90, 125, 161 |
| abstract_inverted_index.that | 141 |
| abstract_inverted_index.this | 37 |
| abstract_inverted_index.26577 | 142 |
| abstract_inverted_index.Sarma | 193 |
| abstract_inverted_index.These | 112 |
| abstract_inverted_index.above | 105 |
| abstract_inverted_index.acids | 59 |
| abstract_inverted_index.amino | 58 |
| abstract_inverted_index.known | 92, 132 |
| abstract_inverted_index.major | 178 |
| abstract_inverted_index.roles | 10 |
| abstract_inverted_index.space | 80 |
| abstract_inverted_index.steps | 25 |
| abstract_inverted_index.their | 32 |
| abstract_inverted_index.using | 54 |
| abstract_inverted_index.work, | 38 |
| abstract_inverted_index.530000 | 91 |
| abstract_inverted_index.absent | 74, 151 |
| abstract_inverted_index.double | 126 |
| abstract_inverted_index.having | 114 |
| abstract_inverted_index.lines, | 138 |
| abstract_inverted_index.number | 128 |
| abstract_inverted_index.proper | 29 |
| abstract_inverted_index.report | 140 |
| abstract_inverted_index.unique | 65, 107 |
| abstract_inverted_index.widely | 5 |
| abstract_inverted_index.∼50% | 158 |
| abstract_inverted_index.allowed | 60 |
| abstract_inverted_index.contain | 103 |
| abstract_inverted_index.crucial | 9 |
| abstract_inverted_index.curated | 163 |
| abstract_inverted_index.forward | 180 |
| abstract_inverted_index.library | 44 |
| abstract_inverted_index.natural | 56 |
| abstract_inverted_index.present | 67 |
| abstract_inverted_index.primary | 78, 95 |
| abstract_inverted_index.protein | 164 |
| abstract_inverted_index.results | 174 |
| abstract_inverted_index.similar | 137 |
| abstract_inverted_index.towards | 26 |
| abstract_inverted_index.varying | 117 |
| abstract_inverted_index.without | 97, 166 |
| abstract_inverted_index.Further, | 86 |
| abstract_inverted_index.accepted | 6 |
| abstract_inverted_index.building | 27 |
| abstract_inverted_index.cellular | 115 |
| abstract_inverted_index.complete | 77 |
| abstract_inverted_index.disorder | 19 |
| abstract_inverted_index.identify | 63, 157 |
| abstract_inverted_index.possible | 47 |
| abstract_inverted_index.proteins | 2, 72 |
| abstract_inverted_index.reported | 175 |
| abstract_inverted_index.sequence | 79 |
| abstract_inverted_index.(3368400) | 48 |
| abstract_inverted_index.Ramaswamy | 191 |
| abstract_inverted_index.currently | 131 |
| abstract_inverted_index.disorder. | 111 |
| abstract_inverted_index.exclusive | 145 |
| abstract_inverted_index.exploring | 182 |
| abstract_inverted_index.functions | 116 |
| abstract_inverted_index.intrinsic | 18, 110 |
| abstract_inverted_index.metabolic | 121 |
| abstract_inverted_index.naturally | 82, 93, 153 |
| abstract_inverted_index.occurring | 83, 94, 154 |
| abstract_inverted_index.proteins, | 149, 156 |
| abstract_inverted_index.proteins. | 85, 135, 188 |
| abstract_inverted_index.sequences | 96, 102, 165 |
| abstract_inverted_index.synthesis | 41 |
| abstract_inverted_index.biological | 12 |
| abstract_inverted_index.disordered | 1, 71, 134, 148, 187 |
| abstract_inverted_index.functional | 183 |
| abstract_inverted_index.functions. | 13 |
| abstract_inverted_index.identified | 106 |
| abstract_inverted_index.occurrence | 33 |
| abstract_inverted_index.proteomes. | 35 |
| abstract_inverted_index.repertoire | 30 |
| abstract_inverted_index.sequences, | 113 |
| abstract_inverted_index.signatures | 16, 108, 144 |
| abstract_inverted_index.structural | 99, 167 |
| abstract_inverted_index.structured | 84, 155 |
| abstract_inverted_index.systematic | 39 |
| abstract_inverted_index.transport, | 123 |
| abstract_inverted_index.dipeptides, | 49 |
| abstract_inverted_index.disordered. | 172 |
| abstract_inverted_index.exclusively | 68 |
| abstract_inverted_index.information | 168 |
| abstract_inverted_index.Communicated | 189 |
| abstract_inverted_index.housekeeping | 119 |
| abstract_inverted_index.information, | 100 |
| abstract_inverted_index.pentapeptide | 143 |
| abstract_inverted_index.tripeptides, | 50 |
| abstract_inverted_index.Intrinsically | 0 |
| abstract_inverted_index.computational | 40 |
| abstract_inverted_index.intrinsically | 70, 133, 147, 171, 186 |
| abstract_inverted_index.pentapeptides | 53 |
| abstract_inverted_index.tetrapeptides | 51, 66 |
| abstract_inverted_index.Identification | 14 |
| abstract_inverted_index.half-a-million | 162 |
| abstract_inverted_index.manifestations | 184 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |