Mr Mitchell Shiell
This workshop explores why efficient research data management matters and equips participants with practical, open-source tools to implement it. Small laboratories now routinely generate data volumes that once required consortium-scale resources. Meanwhile, computational methods and collaborative science increasingly demand that data be findable, accessible, interoperable, and reusable (FAIR) at scale. Yet accessible data management infrastructure has lagged behind, with many groups managing data in databases or spreadsheets that, while functional internally, lack efficient APIs and searchable interfaces for external access and collaboration.
This workshop guides researchers to build searchable data portals using open-source software. Participants will transform tabular datasets into FAIR-compliant discovery platforms serving both human researchers and data-hungry applications. Drawing from real-world cases, including internal laboratory data hubs and niche community research portals, this hands-on session demonstrates practical solutions for research efficiency, data integrity, and collaboration. The workshop begins with RDM governance principles and FAIR implementation strategies, including guidance on when to implement custom research software infrastructure. Participants will then deploy a discovery portal using Elasticsearch, GraphQL, and Overture's (Overture.bio) search API (Arranger) and portal UI scaffolding (Stage), working hands-on to prepare datasets and configure custom search interfaces. Attendees will leave with a functioning locally deployed Overture instance configured to their data and a roadmap for expanding their infrastructure.
- Demonstrate practical implementation of FAIR principles through working examples
- Enable participants to evaluate when existing research data management systems should be replaced with custom research software
- Provide hands-on experience deploying a lightweight data discovery portal using open-source tools
- Equip participants with technical skills to configure software for their specific datasets
- Introduce using nginx to connect portals within institutional networks or to external collaborators
- Apply FAIR principles to evaluate their current data management workflows
- Assess when simpler database/spreadsheet-based systems should be replaced/extended with custom infrastructure
- Deploy a functional data discovery portal using Elasticsearch, GraphQL, Arranger, and Stage
- Configure search interfaces and indices tailored to their tabular datasets
- Understand how to deploy portals with nginx within institutional networks or publicly
Primary Audience: Biocurators and bioinformaticians managing datasets in spreadsheets or databases that lack unified APIs and discoverable search interfaces for external access. Ideal for those supporting multi-institutional collaborations, making existing data FAIR-compliant, or building browser-accessible community resources.
Not Suited For: This workshop focuses on curated tabular data rather than raw file management. While the infrastructure can be extended to support file storage and retrieval, those capabilities are beyond the scope of this session.
- Basic command line familiarity (navigating directories, running commands)
- Laptop with admin privileges
- Git installed and operational
- Docker Desktop Installed and operational
- At-least 10gb of free diskspace
- Windows users: WSL2 properly configured and tested
- Optional: your own tabular dataset in CSV format
- Strongly recommended: pre-download the required Docker images (more below)