NIH Cloud Platform Interoperability Effort

NCI Cancer Research Data Commons (CRDC)

https://datacommons.cancer.gov/

Vision

Giving researchers a place where they can work together to access diverse data types for integrative analysis, furthering the goals of precision medicine and biomedical discoveries.

Mission

Provide access to standardized and harmonized cancer data in an expandable cloud-based infrastructure, enhancing the way data are shared to empower researchers to work in real time and with more connectivity.

Approach

To provide interoperable resources through federation, data harmonization, standards, and tools and services that can be reused across the research community and to enable enhanced data sharing.

Funder

The CRDC is funded by NCI Moonshot.

PIs

Anand Basu, Andrey Fedorov, Bill Longabaugh, Bob Grossman, Brandi Davis-Dusenbery, Brian O’Conner, David Pot, Melissa Haendel, Ron Kikinis, Sam Volchenboum, Chris Chute, Clare Bernard.

Institutions

Brigham and Women’s Hospital, Enterprise Science and Computing (ESAC), Frederick National Labs, General Dynamics Information Technology, Institute for Systems Biology, Oregon State University, Seven Bridges, The Broad Institute, University of Chicago, Johns Hopkins.

Data

Data Repositories

New genomic, proteomic, imaging, canine, and clinical trial data being added through both existing and new data nodes on a continual basis.

Datasets

More Information

https://datacommons.cancer.gov/data#key-datasets

Tools

Cloud Resources

  • Seven Bridges - 400+ publicly available tools and workflows in Common Workflow Language, + Dockstore, Rstudio, Jupyter notebooks, collaborative genome browser
  • Broad - 700+ publicly available workflows and tools in Workflow Development Language, Integrated Genome Viewer, Dockstore, Jupyter notebooks, BigQuery, ML, pipelines
  • ISB-CGC - Google: VMs, BigQuery, AI, ML, Pipelines, Cohorts, Image Viewers, Notebooks, Plotting, Dockstore
  • Bring your own tools, integrative analysis is available.

Repositories Resources

  • GDC: Data Analysis Visualization Exploration (DAVE) tools
  • PDC: Pepquery, Morpheus, Genome Browser, DDA & DIA common data analysis pipelines
  • Infrastructure: Cancer Data Aggregator (CDA), Center for Cancer Data Harmonization (CCDH), Data Commons Framework (DCF)

Analytical Tools

Authentication

Authorization

  • NCI Data Commons Framework Services (DCFS) by Gen3
  • Researcher Authentication Service (RAS)
  • eRA Commons IDs (controlled data)
  • Individual, OIDC platform authentication

Indexing

  • Permanent globally unique IDs (GUIDs) for data in Google & Amazon locations
  • GUIDs are cloud agnostic, promoting access and providing a mechanism for versioning data

Authorization

  • dbGaP access
  • DCFS by Gen3
  • Authorization enabled by Trusted Partnerships with NIH

Data Models

  • There are many data models across the CRDC, including ICDC, CTDC, PDC, and GDC
  • Center for Cancer Data Harmonization (CCDH) develops overarching model and mapping
  • CRDC also participates in GA4GH efforts

Architecture

User Perspective

CRDC Architecture

System Perspective

CRDC Architecture

Improve this pageContent guide