Correcting Scale Distortion in RNA Sequencing Data Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.21203/rs.3.rs-4745774/v1
RNA sequencing (RNA-seq) is the conventional genome-scale approach used to capture the expression levels of all detectable genes in a biological sample. This is now regularly used in the clinical diagnostic space for cancer patients. While the information gained is intended to impact treatment decisions, numerous technical and quality issues remain. This includes inaccuracies in the dissemination of gene-gene relationships. For such reasons, clinical decisions are still mostly driven by DNA biomarkers, such as gene mutations or fusions. In this study, we aimed to correct for systemic bias based on RNA-sequencing platforms in order to improve our understanding of the gene-gene relationships. To do so, we examined standard pre-processed RNA-seq datasets obtained from three studies conducted by two consortium efforts including The Cancer Genome Atlas (TCGA) and Stand Up 2 Cancer (SU2C). We particularly examined the TCGA Bladder Cancer (n = 408) and Prostate Cancer (n = 498) studies as well as the SU2C Prostate Cancer study (n = 208). Using various statistical tests, in all datasets we detected expression-level dependent biases that differ from sample to sample. Using simulations, we show that these biases corrupt gene-gene correlation estimations and t-tests between subpopulations. To mitigate these biases, we introduce two different nonlinear transforms based on statistical considerations that correct these observed biases. We demonstrate that that these transforms effectively remove the observed per-sample biases, reduce sample-to-sample variance, and improve the characteristics of gene-gene correlation distributions. Using a novel simulation methodology that creates controlled diffferences between subpopulations, we show that these transforms reduce variability and slightly increase sensitivity of two population tests. Altogether, these results improve our capacity to understand gene-gene relationships, and may lead to novel ways to utilize the information derived from clinical tests.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- https://doi.org/10.21203/rs.3.rs-4745774/v1
- https://www.researchsquare.com/article/rs-4745774/latest.pdf
- OA Status
- gold
- References
- 7
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4401768385
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4401768385Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.21203/rs.3.rs-4745774/v1Digital Object Identifier
- Title
-
Correcting Scale Distortion in RNA Sequencing DataWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-08-22Full publication date if available
- Authors
-
Christopher Thron, Farhad JafariList of authors in order
- Landing page
-
https://doi.org/10.21203/rs.3.rs-4745774/v1Publisher landing page
- PDF URL
-
https://www.researchsquare.com/article/rs-4745774/latest.pdfDirect link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://www.researchsquare.com/article/rs-4745774/latest.pdfDirect OA link when available
- Concepts
-
Scale (ratio), Distortion (music), Computational biology, Computer science, Biology, Cartography, Telecommunications, Geography, Amplifier, Bandwidth (computing)Top concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- References (count)
-
7Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4401768385 |
|---|---|
| doi | https://doi.org/10.21203/rs.3.rs-4745774/v1 |
| ids.doi | https://doi.org/10.21203/rs.3.rs-4745774/v1 |
| ids.openalex | https://openalex.org/W4401768385 |
| fwci | 0.0 |
| type | preprint |
| title | Correcting Scale Distortion in RNA Sequencing Data |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10885 |
| topics[0].field.id | https://openalex.org/fields/13 |
| topics[0].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[0].score | 0.6230999827384949 |
| topics[0].domain.id | https://openalex.org/domains/1 |
| topics[0].domain.display_name | Life Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1312 |
| topics[0].subfield.display_name | Molecular Biology |
| topics[0].display_name | Gene expression and cancer classification |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2778755073 |
| concepts[0].level | 2 |
| concepts[0].score | 0.5714280009269714 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q10858537 |
| concepts[0].display_name | Scale (ratio) |
| concepts[1].id | https://openalex.org/C126780896 |
| concepts[1].level | 4 |
| concepts[1].score | 0.5668279528617859 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q899871 |
| concepts[1].display_name | Distortion (music) |
| concepts[2].id | https://openalex.org/C70721500 |
| concepts[2].level | 1 |
| concepts[2].score | 0.4638175368309021 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q177005 |
| concepts[2].display_name | Computational biology |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.45607852935791016 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C86803240 |
| concepts[4].level | 0 |
| concepts[4].score | 0.25409072637557983 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[4].display_name | Biology |
| concepts[5].id | https://openalex.org/C58640448 |
| concepts[5].level | 1 |
| concepts[5].score | 0.10402590036392212 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q42515 |
| concepts[5].display_name | Cartography |
| concepts[6].id | https://openalex.org/C76155785 |
| concepts[6].level | 1 |
| concepts[6].score | 0.10288435220718384 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q418 |
| concepts[6].display_name | Telecommunications |
| concepts[7].id | https://openalex.org/C205649164 |
| concepts[7].level | 0 |
| concepts[7].score | 0.10284855961799622 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[7].display_name | Geography |
| concepts[8].id | https://openalex.org/C194257627 |
| concepts[8].level | 3 |
| concepts[8].score | 0.0 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q211554 |
| concepts[8].display_name | Amplifier |
| concepts[9].id | https://openalex.org/C2776257435 |
| concepts[9].level | 2 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q1576430 |
| concepts[9].display_name | Bandwidth (computing) |
| keywords[0].id | https://openalex.org/keywords/scale |
| keywords[0].score | 0.5714280009269714 |
| keywords[0].display_name | Scale (ratio) |
| keywords[1].id | https://openalex.org/keywords/distortion |
| keywords[1].score | 0.5668279528617859 |
| keywords[1].display_name | Distortion (music) |
| keywords[2].id | https://openalex.org/keywords/computational-biology |
| keywords[2].score | 0.4638175368309021 |
| keywords[2].display_name | Computational biology |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.45607852935791016 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/biology |
| keywords[4].score | 0.25409072637557983 |
| keywords[4].display_name | Biology |
| keywords[5].id | https://openalex.org/keywords/cartography |
| keywords[5].score | 0.10402590036392212 |
| keywords[5].display_name | Cartography |
| keywords[6].id | https://openalex.org/keywords/telecommunications |
| keywords[6].score | 0.10288435220718384 |
| keywords[6].display_name | Telecommunications |
| keywords[7].id | https://openalex.org/keywords/geography |
| keywords[7].score | 0.10284855961799622 |
| keywords[7].display_name | Geography |
| language | en |
| locations[0].id | doi:10.21203/rs.3.rs-4745774/v1 |
| locations[0].is_oa | True |
| locations[0].source | |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://www.researchsquare.com/article/rs-4745774/latest.pdf |
| locations[0].version | acceptedVersion |
| locations[0].raw_type | posted-content |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.21203/rs.3.rs-4745774/v1 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5081709937 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-8960-2504 |
| authorships[0].author.display_name | Christopher Thron |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I76756774 |
| authorships[0].affiliations[0].raw_affiliation_string | Department of Science and Mathematics, Texas A&M University-Central Texas, Killeen, TX 76549 USA |
| authorships[0].institutions[0].id | https://openalex.org/I76756774 |
| authorships[0].institutions[0].ror | https://ror.org/015hh0z25 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I76756774 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | Texas A&M University – Central Texas |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Christopher Thron |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Department of Science and Mathematics, Texas A&M University-Central Texas, Killeen, TX 76549 USA |
| authorships[1].author.id | https://openalex.org/A5006551481 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-1256-2441 |
| authorships[1].author.display_name | Farhad Jafari |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I130238516 |
| authorships[1].affiliations[0].raw_affiliation_string | Department of Radiology, University of Minnesota, Minneapolis, MN 55455 USA |
| authorships[1].institutions[0].id | https://openalex.org/I130238516 |
| authorships[1].institutions[0].ror | https://ror.org/017zqws13 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I130238516 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | University of Minnesota |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Farhad Jafari |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Department of Radiology, University of Minnesota, Minneapolis, MN 55455 USA |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://www.researchsquare.com/article/rs-4745774/latest.pdf |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Correcting Scale Distortion in RNA Sequencing Data |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10885 |
| primary_topic.field.id | https://openalex.org/fields/13 |
| primary_topic.field.display_name | Biochemistry, Genetics and Molecular Biology |
| primary_topic.score | 0.6230999827384949 |
| primary_topic.domain.id | https://openalex.org/domains/1 |
| primary_topic.domain.display_name | Life Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1312 |
| primary_topic.subfield.display_name | Molecular Biology |
| primary_topic.display_name | Gene expression and cancer classification |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052, https://openalex.org/W2382290278, https://openalex.org/W3126168585 |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.21203/rs.3.rs-4745774/v1 |
| best_oa_location.is_oa | True |
| best_oa_location.source | |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://www.researchsquare.com/article/rs-4745774/latest.pdf |
| best_oa_location.version | acceptedVersion |
| best_oa_location.raw_type | posted-content |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.21203/rs.3.rs-4745774/v1 |
| primary_location.id | doi:10.21203/rs.3.rs-4745774/v1 |
| primary_location.is_oa | True |
| primary_location.source | |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://www.researchsquare.com/article/rs-4745774/latest.pdf |
| primary_location.version | acceptedVersion |
| primary_location.raw_type | posted-content |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.21203/rs.3.rs-4745774/v1 |
| publication_date | 2024-08-22 |
| publication_year | 2024 |
| referenced_works | https://openalex.org/W3098132351, https://openalex.org/W2137526110, https://openalex.org/W2921683137, https://openalex.org/W2107018762, https://openalex.org/W2177784250, https://openalex.org/W2990029798, https://openalex.org/W4388141697 |
| referenced_works_count | 7 |
| abstract_inverted_index.2 | 130 |
| abstract_inverted_index.= | 141, 147, 159 |
| abstract_inverted_index.a | 20, 237 |
| abstract_inverted_index.(n | 140, 146, 158 |
| abstract_inverted_index.In | 79 |
| abstract_inverted_index.To | 103, 194 |
| abstract_inverted_index.Up | 129 |
| abstract_inverted_index.We | 133, 213 |
| abstract_inverted_index.as | 74, 150, 152 |
| abstract_inverted_index.by | 70, 117 |
| abstract_inverted_index.do | 104 |
| abstract_inverted_index.in | 19, 28, 55, 93, 165 |
| abstract_inverted_index.is | 4, 24, 40 |
| abstract_inverted_index.of | 15, 58, 99, 232, 258 |
| abstract_inverted_index.on | 90, 205 |
| abstract_inverted_index.or | 77 |
| abstract_inverted_index.to | 10, 42, 84, 95, 177, 268, 275, 278 |
| abstract_inverted_index.we | 82, 106, 168, 181, 198, 247 |
| abstract_inverted_index.DNA | 71 |
| abstract_inverted_index.For | 61 |
| abstract_inverted_index.RNA | 1 |
| abstract_inverted_index.The | 122 |
| abstract_inverted_index.all | 16, 166 |
| abstract_inverted_index.and | 48, 127, 143, 190, 228, 254, 272 |
| abstract_inverted_index.are | 66 |
| abstract_inverted_index.for | 33, 86 |
| abstract_inverted_index.may | 273 |
| abstract_inverted_index.now | 25 |
| abstract_inverted_index.our | 97, 266 |
| abstract_inverted_index.so, | 105 |
| abstract_inverted_index.the | 5, 12, 29, 37, 56, 100, 136, 153, 221, 230, 280 |
| abstract_inverted_index.two | 118, 200, 259 |
| abstract_inverted_index.408) | 142 |
| abstract_inverted_index.498) | 148 |
| abstract_inverted_index.SU2C | 154 |
| abstract_inverted_index.TCGA | 137 |
| abstract_inverted_index.This | 23, 52 |
| abstract_inverted_index.bias | 88 |
| abstract_inverted_index.from | 113, 175, 283 |
| abstract_inverted_index.gene | 75 |
| abstract_inverted_index.lead | 274 |
| abstract_inverted_index.show | 182, 248 |
| abstract_inverted_index.such | 62, 73 |
| abstract_inverted_index.that | 173, 183, 208, 215, 216, 241, 249 |
| abstract_inverted_index.this | 80 |
| abstract_inverted_index.used | 9, 27 |
| abstract_inverted_index.ways | 277 |
| abstract_inverted_index.well | 151 |
| abstract_inverted_index.208). | 160 |
| abstract_inverted_index.Atlas | 125 |
| abstract_inverted_index.Stand | 128 |
| abstract_inverted_index.Using | 161, 179, 236 |
| abstract_inverted_index.While | 36 |
| abstract_inverted_index.aimed | 83 |
| abstract_inverted_index.based | 89, 204 |
| abstract_inverted_index.genes | 18 |
| abstract_inverted_index.novel | 238, 276 |
| abstract_inverted_index.order | 94 |
| abstract_inverted_index.space | 32 |
| abstract_inverted_index.still | 67 |
| abstract_inverted_index.study | 157 |
| abstract_inverted_index.these | 184, 196, 210, 217, 250, 263 |
| abstract_inverted_index.three | 114 |
| abstract_inverted_index.(TCGA) | 126 |
| abstract_inverted_index.Cancer | 123, 131, 139, 145, 156 |
| abstract_inverted_index.Genome | 124 |
| abstract_inverted_index.biases | 172, 185 |
| abstract_inverted_index.cancer | 34 |
| abstract_inverted_index.differ | 174 |
| abstract_inverted_index.driven | 69 |
| abstract_inverted_index.gained | 39 |
| abstract_inverted_index.impact | 43 |
| abstract_inverted_index.issues | 50 |
| abstract_inverted_index.levels | 14 |
| abstract_inverted_index.mostly | 68 |
| abstract_inverted_index.reduce | 225, 252 |
| abstract_inverted_index.remove | 220 |
| abstract_inverted_index.sample | 176 |
| abstract_inverted_index.study, | 81 |
| abstract_inverted_index.tests, | 164 |
| abstract_inverted_index.tests. | 261, 285 |
| abstract_inverted_index.(SU2C). | 132 |
| abstract_inverted_index.Bladder | 138 |
| abstract_inverted_index.RNA-seq | 110 |
| abstract_inverted_index.between | 192, 245 |
| abstract_inverted_index.biases, | 197, 224 |
| abstract_inverted_index.biases. | 212 |
| abstract_inverted_index.capture | 11 |
| abstract_inverted_index.correct | 85, 209 |
| abstract_inverted_index.corrupt | 186 |
| abstract_inverted_index.creates | 242 |
| abstract_inverted_index.derived | 282 |
| abstract_inverted_index.efforts | 120 |
| abstract_inverted_index.improve | 96, 229, 265 |
| abstract_inverted_index.quality | 49 |
| abstract_inverted_index.remain. | 51 |
| abstract_inverted_index.results | 264 |
| abstract_inverted_index.sample. | 22, 178 |
| abstract_inverted_index.studies | 115, 149 |
| abstract_inverted_index.t-tests | 191 |
| abstract_inverted_index.utilize | 279 |
| abstract_inverted_index.various | 162 |
| abstract_inverted_index.Prostate | 144, 155 |
| abstract_inverted_index.approach | 8 |
| abstract_inverted_index.capacity | 267 |
| abstract_inverted_index.clinical | 30, 64, 284 |
| abstract_inverted_index.datasets | 111, 167 |
| abstract_inverted_index.detected | 169 |
| abstract_inverted_index.examined | 107, 135 |
| abstract_inverted_index.fusions. | 78 |
| abstract_inverted_index.includes | 53 |
| abstract_inverted_index.increase | 256 |
| abstract_inverted_index.intended | 41 |
| abstract_inverted_index.mitigate | 195 |
| abstract_inverted_index.numerous | 46 |
| abstract_inverted_index.observed | 211, 222 |
| abstract_inverted_index.obtained | 112 |
| abstract_inverted_index.reasons, | 63 |
| abstract_inverted_index.slightly | 255 |
| abstract_inverted_index.standard | 108 |
| abstract_inverted_index.systemic | 87 |
| abstract_inverted_index.(RNA-seq) | 3 |
| abstract_inverted_index.conducted | 116 |
| abstract_inverted_index.decisions | 65 |
| abstract_inverted_index.dependent | 171 |
| abstract_inverted_index.different | 201 |
| abstract_inverted_index.gene-gene | 59, 101, 187, 233, 270 |
| abstract_inverted_index.including | 121 |
| abstract_inverted_index.introduce | 199 |
| abstract_inverted_index.mutations | 76 |
| abstract_inverted_index.nonlinear | 202 |
| abstract_inverted_index.patients. | 35 |
| abstract_inverted_index.platforms | 92 |
| abstract_inverted_index.regularly | 26 |
| abstract_inverted_index.technical | 47 |
| abstract_inverted_index.treatment | 44 |
| abstract_inverted_index.variance, | 227 |
| abstract_inverted_index.biological | 21 |
| abstract_inverted_index.consortium | 119 |
| abstract_inverted_index.controlled | 243 |
| abstract_inverted_index.decisions, | 45 |
| abstract_inverted_index.detectable | 17 |
| abstract_inverted_index.diagnostic | 31 |
| abstract_inverted_index.expression | 13 |
| abstract_inverted_index.per-sample | 223 |
| abstract_inverted_index.population | 260 |
| abstract_inverted_index.sequencing | 2 |
| abstract_inverted_index.simulation | 239 |
| abstract_inverted_index.transforms | 203, 218, 251 |
| abstract_inverted_index.understand | 269 |
| abstract_inverted_index.Altogether, | 262 |
| abstract_inverted_index.biomarkers, | 72 |
| abstract_inverted_index.correlation | 188, 234 |
| abstract_inverted_index.demonstrate | 214 |
| abstract_inverted_index.effectively | 219 |
| abstract_inverted_index.estimations | 189 |
| abstract_inverted_index.information | 38, 281 |
| abstract_inverted_index.methodology | 240 |
| abstract_inverted_index.sensitivity | 257 |
| abstract_inverted_index.statistical | 163, 206 |
| abstract_inverted_index.variability | 253 |
| abstract_inverted_index.conventional | 6 |
| abstract_inverted_index.diffferences | 244 |
| abstract_inverted_index.genome-scale | 7 |
| abstract_inverted_index.inaccuracies | 54 |
| abstract_inverted_index.particularly | 134 |
| abstract_inverted_index.simulations, | 180 |
| abstract_inverted_index.dissemination | 57 |
| abstract_inverted_index.pre-processed | 109 |
| abstract_inverted_index.understanding | 98 |
| abstract_inverted_index.RNA-sequencing | 91 |
| abstract_inverted_index.considerations | 207 |
| abstract_inverted_index.distributions. | 235 |
| abstract_inverted_index.relationships, | 271 |
| abstract_inverted_index.relationships. | 60, 102 |
| abstract_inverted_index.characteristics | 231 |
| abstract_inverted_index.subpopulations, | 246 |
| abstract_inverted_index.subpopulations. | 193 |
| abstract_inverted_index.expression-level | 170 |
| abstract_inverted_index.sample-to-sample | 226 |
| abstract_inverted_index.<title>Abstract</title> | 0 |
| cited_by_percentile_year | |
| countries_distinct_count | 1 |
| institutions_distinct_count | 2 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/3 |
| sustainable_development_goals[0].score | 0.44999998807907104 |
| sustainable_development_goals[0].display_name | Good health and well-being |
| citation_normalized_percentile.value | 0.15059051 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |