The Cancer Genomic Cloud (CGC) is a cloud platform, powered by Seven Bridges, funder by the National Cancer Institute (NCI), which allows the sharing of and access to cancer genomics data. This platform is built on The Cancer Genomics Atlas (TCGA). From the website: There are two types of data shared in the platform: 1. Open Access The Open Access tier includes information which is not unique to an individual. This includes information such as: - De-identified clinical and demographic data - Gene expression data - Copy number alterations in regions of the genome - Epigenetic data - Summaries of data across individuals 2. Controlled Access The Controlled Access tier includes information which is unique to an individual. This includes most raw data files, and some processed data such as: - Primary sequencing data (BAM and FASTQ files) from DNA, RNA, miRNA or bisulfite sequencing studies - Raw and processed SNP6 array data - Raw and processed Exon array data - Somatic and germline mutation calls for an individual (VCF and MAF files)
From the website: "Any researcher can access and use data within the Open Access tier (Open Data) as long as they agree to the data use restrictions and requirements outlined in the TCGA publication guidelines. Researchers requiring data from the Controlled Access tier (Controlled Data) for their studies need to obtain an approved Data Access Request through the Database of Genotypes and Phenotypes (dbGaP) and agree with all TCGA Data Use Certifications as well as the TCGA publication policy."
From the website: "The CGC grew out of a pressing need to analyze large cancer genomics datasets, primarily the Cancer Genome Atlas (TCGA). TCGA is one of the richest and most complete genomics datasets, composed of 33 different tumor types or subtypes with data from thousands of patients. Funded by $375 million in taxpayer dollars over the past decade, the project collected and analyzed samples at institutions across the U.S. Multiple samples from each patient were analyzed using multiple approaches including genome sequencing, RNA sequencing, microRNA sequencing, and more. TCGA data represents more than 2.5 petabytes of information and continues to grow as more samples are analyzed."