Exploratory Data Analysis and ETL with SAS on Hadoop Eco-system with Cervical Cancer Dataset Article Swipe
YOU?
·
· 2020
· Open Access
·
· DOI: https://doi.org/10.31782/ijcrr.2020.121924
Objective:The main objective of this project is to explore and analyse a secondary dataset which collected from "Hospital Universitario de Caracas" in Caracas, Venezuela. Methods:The dataset comprises 858 patients' information relating to demographic information and medical history data.There is a large number of records which are left with blank, which might be intentionally avoided by the patient due to privacy considerations.SAS Studio is utilized in data exploration and data pre-processing.Data cleaning and data transformation are conducted basing on the knowledge gathered in the process of data exploration.Afterwards, the dataset was exported from SAS Studio and uploaded to Hadoop Hortonworks platform for analysing purpose.Lastly, five hypotheses have been explored with the visualization tool of Tableau.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- http://doi.org/10.31782/ijcrr.2020.121924
- https://doi.org/10.31782/ijcrr.2020.121924
- OA Status
- diamond
- Cited By
- 11
- References
- 7
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W3092410600
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W3092410600Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.31782/ijcrr.2020.121924Digital Object Identifier
- Title
-
Exploratory Data Analysis and ETL with SAS on Hadoop Eco-system with Cervical Cancer DatasetWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2020Year of publication
- Publication date
-
2020-01-01Full publication date if available
- Authors
-
Xiaotian Cheng, V. Thiruchelvam, Daniel Mago VistroList of authors in order
- Landing page
-
https://doi.org/10.31782/ijcrr.2020.121924Publisher landing page
- PDF URL
-
https://doi.org/10.31782/ijcrr.2020.121924Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
diamondOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.31782/ijcrr.2020.121924Direct OA link when available
- Concepts
-
Computer science, Exploratory analysis, Database, Cervical cancer, Cancer, Medicine, Data science, Internal medicineTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
11Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 9, 2022: 1, 2021: 1Per-year citation counts (last 5 years)
- References (count)
-
7Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W3092410600 |
|---|---|
| doi | https://doi.org/10.31782/ijcrr.2020.121924 |
| ids.doi | https://doi.org/10.31782/ijcrr.2020.121924 |
| ids.mag | 3092410600 |
| ids.openalex | https://openalex.org/W3092410600 |
| fwci | 0.2937191 |
| type | article |
| title | Exploratory Data Analysis and ETL with SAS on Hadoop Eco-system with Cervical Cancer Dataset |
| biblio.issue | 19 |
| biblio.volume | 12 |
| biblio.last_page | 104 |
| biblio.first_page | 88 |
| topics[0].id | https://openalex.org/T10862 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9304999709129333 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | AI in cancer detection |
| topics[1].id | https://openalex.org/T11396 |
| topics[1].field.id | https://openalex.org/fields/36 |
| topics[1].field.display_name | Health Professions |
| topics[1].score | 0.9269999861717224 |
| topics[1].domain.id | https://openalex.org/domains/4 |
| topics[1].domain.display_name | Health Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/3605 |
| topics[1].subfield.display_name | Health Information Management |
| topics[1].display_name | Artificial Intelligence in Healthcare |
| topics[2].id | https://openalex.org/T11719 |
| topics[2].field.id | https://openalex.org/fields/18 |
| topics[2].field.display_name | Decision Sciences |
| topics[2].score | 0.9007999897003174 |
| topics[2].domain.id | https://openalex.org/domains/2 |
| topics[2].domain.display_name | Social Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1803 |
| topics[2].subfield.display_name | Management Science and Operations Research |
| topics[2].display_name | Data Quality and Management |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.5272592306137085 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C3018260909 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5027096271514893 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q1322871 |
| concepts[1].display_name | Exploratory analysis |
| concepts[2].id | https://openalex.org/C77088390 |
| concepts[2].level | 1 |
| concepts[2].score | 0.4423116147518158 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q8513 |
| concepts[2].display_name | Database |
| concepts[3].id | https://openalex.org/C2778220009 |
| concepts[3].level | 3 |
| concepts[3].score | 0.42230477929115295 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q160105 |
| concepts[3].display_name | Cervical cancer |
| concepts[4].id | https://openalex.org/C121608353 |
| concepts[4].level | 2 |
| concepts[4].score | 0.18237954378128052 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q12078 |
| concepts[4].display_name | Cancer |
| concepts[5].id | https://openalex.org/C71924100 |
| concepts[5].level | 0 |
| concepts[5].score | 0.152265727519989 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11190 |
| concepts[5].display_name | Medicine |
| concepts[6].id | https://openalex.org/C2522767166 |
| concepts[6].level | 1 |
| concepts[6].score | 0.1319684088230133 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q2374463 |
| concepts[6].display_name | Data science |
| concepts[7].id | https://openalex.org/C126322002 |
| concepts[7].level | 1 |
| concepts[7].score | 0.033539384603500366 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11180 |
| concepts[7].display_name | Internal medicine |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.5272592306137085 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/exploratory-analysis |
| keywords[1].score | 0.5027096271514893 |
| keywords[1].display_name | Exploratory analysis |
| keywords[2].id | https://openalex.org/keywords/database |
| keywords[2].score | 0.4423116147518158 |
| keywords[2].display_name | Database |
| keywords[3].id | https://openalex.org/keywords/cervical-cancer |
| keywords[3].score | 0.42230477929115295 |
| keywords[3].display_name | Cervical cancer |
| keywords[4].id | https://openalex.org/keywords/cancer |
| keywords[4].score | 0.18237954378128052 |
| keywords[4].display_name | Cancer |
| keywords[5].id | https://openalex.org/keywords/medicine |
| keywords[5].score | 0.152265727519989 |
| keywords[5].display_name | Medicine |
| keywords[6].id | https://openalex.org/keywords/data-science |
| keywords[6].score | 0.1319684088230133 |
| keywords[6].display_name | Data science |
| keywords[7].id | https://openalex.org/keywords/internal-medicine |
| keywords[7].score | 0.033539384603500366 |
| keywords[7].display_name | Internal medicine |
| language | en |
| locations[0].id | doi:10.31782/ijcrr.2020.121924 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S2765045337 |
| locations[0].source.issn | 0975-5241, 2231-2196 |
| locations[0].source.type | journal |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | 0975-5241 |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | International Journal of Current Research and Review |
| locations[0].source.host_organization | https://openalex.org/P4323867251 |
| locations[0].source.host_organization_name | Radiance Research Academy |
| locations[0].source.host_organization_lineage | https://openalex.org/P4323867251 |
| locations[0].source.host_organization_lineage_names | Radiance Research Academy |
| locations[0].license | |
| locations[0].pdf_url | https://doi.org/10.31782/ijcrr.2020.121924 |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | International Journal of Current Research and Review |
| locations[0].landing_page_url | http://doi.org/10.31782/ijcrr.2020.121924 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5112876866 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Xiaotian Cheng |
| authorships[0].countries | MY |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I181397559 |
| authorships[0].affiliations[0].raw_affiliation_string | School of Computing, Asia Pacific University of Technology & Innovation, Kuala Lumpur, Malaysia. |
| authorships[0].institutions[0].id | https://openalex.org/I181397559 |
| authorships[0].institutions[0].ror | https://ror.org/03c52a632 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I181397559 |
| authorships[0].institutions[0].country_code | MY |
| authorships[0].institutions[0].display_name | Asia Pacific University of Technology & Innovation |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Cheng Xiaotian |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | School of Computing, Asia Pacific University of Technology & Innovation, Kuala Lumpur, Malaysia. |
| authorships[1].author.id | https://openalex.org/A5003051838 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-1908-0680 |
| authorships[1].author.display_name | V. Thiruchelvam |
| authorships[1].countries | MY |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I181397559 |
| authorships[1].affiliations[0].raw_affiliation_string | School of Computing, Asia Pacific University of Technology & Innovation, Kuala Lumpur, Malaysia. |
| authorships[1].institutions[0].id | https://openalex.org/I181397559 |
| authorships[1].institutions[0].ror | https://ror.org/03c52a632 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I181397559 |
| authorships[1].institutions[0].country_code | MY |
| authorships[1].institutions[0].display_name | Asia Pacific University of Technology & Innovation |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Vinesh Thiruchelvam |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | School of Computing, Asia Pacific University of Technology & Innovation, Kuala Lumpur, Malaysia. |
| authorships[2].author.id | https://openalex.org/A5078967756 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Daniel Mago Vistro |
| authorships[2].countries | MY |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I181397559 |
| authorships[2].affiliations[0].raw_affiliation_string | School of Computing, Asia Pacific University of Technology & Innovation, Kuala Lumpur, Malaysia. |
| authorships[2].institutions[0].id | https://openalex.org/I181397559 |
| authorships[2].institutions[0].ror | https://ror.org/03c52a632 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I181397559 |
| authorships[2].institutions[0].country_code | MY |
| authorships[2].institutions[0].display_name | Asia Pacific University of Technology & Innovation |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Daniel Mago Vistro |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | School of Computing, Asia Pacific University of Technology & Innovation, Kuala Lumpur, Malaysia. |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.31782/ijcrr.2020.121924 |
| open_access.oa_status | diamond |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Exploratory Data Analysis and ETL with SAS on Hadoop Eco-system with Cervical Cancer Dataset |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10862 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9304999709129333 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | AI in cancer detection |
| related_works | https://openalex.org/W2065180665, https://openalex.org/W2374348909, https://openalex.org/W2317981192, https://openalex.org/W2394396836, https://openalex.org/W2901938453, https://openalex.org/W2929716001, https://openalex.org/W2376987262, https://openalex.org/W2357769479, https://openalex.org/W2367232817, https://openalex.org/W3146261580 |
| cited_by_count | 11 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 9 |
| counts_by_year[1].year | 2022 |
| counts_by_year[1].cited_by_count | 1 |
| counts_by_year[2].year | 2021 |
| counts_by_year[2].cited_by_count | 1 |
| locations_count | 1 |
| best_oa_location.id | doi:10.31782/ijcrr.2020.121924 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S2765045337 |
| best_oa_location.source.issn | 0975-5241, 2231-2196 |
| best_oa_location.source.type | journal |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | 0975-5241 |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | International Journal of Current Research and Review |
| best_oa_location.source.host_organization | https://openalex.org/P4323867251 |
| best_oa_location.source.host_organization_name | Radiance Research Academy |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4323867251 |
| best_oa_location.source.host_organization_lineage_names | Radiance Research Academy |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://doi.org/10.31782/ijcrr.2020.121924 |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | International Journal of Current Research and Review |
| best_oa_location.landing_page_url | http://doi.org/10.31782/ijcrr.2020.121924 |
| primary_location.id | doi:10.31782/ijcrr.2020.121924 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S2765045337 |
| primary_location.source.issn | 0975-5241, 2231-2196 |
| primary_location.source.type | journal |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | 0975-5241 |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | International Journal of Current Research and Review |
| primary_location.source.host_organization | https://openalex.org/P4323867251 |
| primary_location.source.host_organization_name | Radiance Research Academy |
| primary_location.source.host_organization_lineage | https://openalex.org/P4323867251 |
| primary_location.source.host_organization_lineage_names | Radiance Research Academy |
| primary_location.license | |
| primary_location.pdf_url | https://doi.org/10.31782/ijcrr.2020.121924 |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | International Journal of Current Research and Review |
| primary_location.landing_page_url | http://doi.org/10.31782/ijcrr.2020.121924 |
| publication_date | 2020-01-01 |
| publication_year | 2020 |
| referenced_works | https://openalex.org/W2053994724, https://openalex.org/W4206343787, https://openalex.org/W2529313366, https://openalex.org/W2408281124, https://openalex.org/W1989292852, https://openalex.org/W2995522801, https://openalex.org/W2119047901 |
| referenced_works_count | 7 |
| abstract_inverted_index.a | 11, 39 |
| abstract_inverted_index.be | 51 |
| abstract_inverted_index.by | 54 |
| abstract_inverted_index.de | 19 |
| abstract_inverted_index.in | 21, 64, 81 |
| abstract_inverted_index.is | 6, 38, 62 |
| abstract_inverted_index.of | 3, 42, 84, 112 |
| abstract_inverted_index.on | 77 |
| abstract_inverted_index.to | 7, 31, 58, 96 |
| abstract_inverted_index.858 | 27 |
| abstract_inverted_index.SAS | 92 |
| abstract_inverted_index.and | 9, 34, 67, 71, 94 |
| abstract_inverted_index.are | 45, 74 |
| abstract_inverted_index.due | 57 |
| abstract_inverted_index.for | 100 |
| abstract_inverted_index.the | 55, 78, 82, 87, 109 |
| abstract_inverted_index.was | 89 |
| abstract_inverted_index.been | 106 |
| abstract_inverted_index.data | 65, 68, 72, 85 |
| abstract_inverted_index.five | 103 |
| abstract_inverted_index.from | 16, 91 |
| abstract_inverted_index.have | 105 |
| abstract_inverted_index.left | 46 |
| abstract_inverted_index.main | 1 |
| abstract_inverted_index.this | 4 |
| abstract_inverted_index.tool | 111 |
| abstract_inverted_index.with | 47, 108 |
| abstract_inverted_index.large | 40 |
| abstract_inverted_index.might | 50 |
| abstract_inverted_index.which | 14, 44, 49 |
| abstract_inverted_index.Hadoop | 97 |
| abstract_inverted_index.Studio | 61, 93 |
| abstract_inverted_index.basing | 76 |
| abstract_inverted_index.blank, | 48 |
| abstract_inverted_index.number | 41 |
| abstract_inverted_index.analyse | 10 |
| abstract_inverted_index.avoided | 53 |
| abstract_inverted_index.dataset | 13, 25, 88 |
| abstract_inverted_index.explore | 8 |
| abstract_inverted_index.history | 36 |
| abstract_inverted_index.medical | 35 |
| abstract_inverted_index.patient | 56 |
| abstract_inverted_index.privacy | 59 |
| abstract_inverted_index.process | 83 |
| abstract_inverted_index.project | 5 |
| abstract_inverted_index.records | 43 |
| abstract_inverted_index.Caracas" | 20 |
| abstract_inverted_index.Caracas, | 22 |
| abstract_inverted_index.Tableau. | 113 |
| abstract_inverted_index.cleaning | 70 |
| abstract_inverted_index.explored | 107 |
| abstract_inverted_index.exported | 90 |
| abstract_inverted_index.gathered | 80 |
| abstract_inverted_index.platform | 99 |
| abstract_inverted_index.relating | 30 |
| abstract_inverted_index.uploaded | 95 |
| abstract_inverted_index.utilized | 63 |
| abstract_inverted_index."Hospital | 17 |
| abstract_inverted_index.analysing | 101 |
| abstract_inverted_index.collected | 15 |
| abstract_inverted_index.comprises | 26 |
| abstract_inverted_index.conducted | 75 |
| abstract_inverted_index.knowledge | 79 |
| abstract_inverted_index.objective | 2 |
| abstract_inverted_index.patients' | 28 |
| abstract_inverted_index.secondary | 12 |
| abstract_inverted_index.Venezuela. | 23 |
| abstract_inverted_index.data.There | 37 |
| abstract_inverted_index.hypotheses | 104 |
| abstract_inverted_index.Hortonworks | 98 |
| abstract_inverted_index.Methods:The | 24 |
| abstract_inverted_index.demographic | 32 |
| abstract_inverted_index.exploration | 66 |
| abstract_inverted_index.information | 29, 33 |
| abstract_inverted_index.Objective:The | 0 |
| abstract_inverted_index.Universitario | 18 |
| abstract_inverted_index.intentionally | 52 |
| abstract_inverted_index.visualization | 110 |
| abstract_inverted_index.transformation | 73 |
| abstract_inverted_index.purpose.Lastly, | 102 |
| abstract_inverted_index.considerations.SAS | 60 |
| abstract_inverted_index.pre-processing.Data | 69 |
| abstract_inverted_index.exploration.Afterwards, | 86 |
| cited_by_percentile_year.max | 99 |
| cited_by_percentile_year.min | 89 |
| countries_distinct_count | 1 |
| institutions_distinct_count | 3 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/2 |
| sustainable_development_goals[0].score | 0.4300000071525574 |
| sustainable_development_goals[0].display_name | Zero hunger |
| citation_normalized_percentile.value | 0.64628939 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |