Dr Christopher Meyer
This workshop will guide attendees on how to use the open-source Gen3 software platform to build, configure, and use data commons, data meshes, and workspaces to share and analyze biomedical data.
Gen3 is an open-source data platform for building and operating data commons, data meshes, data hubs and workspaces. Data commons are cloud-based data platforms with a governance structure to manage, analyze and share data to support a research community. A data mesh (aka data ecosystem) consists of two or more data commons, data repositories, knowledge bases, and applications over a common set of software services. Optionally, a data hub can search and discover data distributed across a mesh and move data from commons and other connected resources to a Gen3 Workspace for exploring and analyzing the data.
Gen3 has been used to develop and operate over 15 data commons and 4 data meshes, many of which are focused on cancer, and which in aggregate, make over 28 PB of data available to the research community, spanning more than 64 million FAIR data objects. For example, Gen3 has been used for large-scale data commons, like the NCI Cancer Research Data Commons (CRDC), the NHLBI Biodata Catalyst platform, and the NIBIB Medical Imaging and Data Resource Center (MIDRC). Gen3 has also been used to set up smaller scale data commons to support particular research communities, such as the liquid biopsy research community (BloodPAC), the Climate and Health Outcomes Research Data Systems (CHORDS) research community, and many others.
We will cover the following topics:
- deployment of a new Gen3 resource using Helm technology;
- configuration and customization of the new Gen3 resource;
- creation of a data model to help curate and harmonize ingested data;
- ingestion of data;
- building data search and access tools;
- customizing analysis workspaces with tools and tutorials.
Attendees will learn how to deploy a Gen3 data resource using Helm, load data into it, and configure it to address their use cases / research community's needs. Attendees will also learn generally about how to use Gen3 data commons, data meshes, and analysis workspaces.
Researchers who wish to share biomedical data, operate a small to mid-scale data resource like a data commons, data hub, and/or analysis workspace, or use one of these data resources to do biomedical research.
Interest in data sharing platforms from a contributor, operator, or researcher/user perspective; familiarity with command-line tools, cloud computing resources (AWS, GCP, Azure, etc.), and general concepts of data modeling.