There are a few disciplines that have witnessed the rapid adoption of cloud computing than life sciences research.
FREMONT, CA: Datasets are very easily anonymized; pipelines usually consist of cloud-friendly open-source tools and data are that are often shared, making the cloud a convenient meeting spot. Regardless of all of these reasons, however, is an insatiable demand for computing power. If not done aptly, scaling to the public cloud can result in an array of security, cost, and data consistency problems. In the past few years, and after working with various leading pharmaceutical firms, they have helped many people extend on-premises HPC clusters to the cloud and deploy dedicated but elastic clouds. Here is a list of design best practices that will help an organization to make the most of the public cloud while avoiding the risks.
Design for portability- Bioinformatics clusters includes lots of software with complex interdependencies. A good practice is developing custom machine images that mirror the on-premises environment and be deployed in the cloud. With ready-to-run images, the firms will be able to get up and running that much faster in the cloud. By deploying a cloud environment similar to the on-prem environment, they will be less likely to run into compatibility problems.
Leverage automation- Even with ready-to-run images, assembling clusters in the cloud can be challenging. The users have to worry about details such as filesystems, VPCs, DNS, security groups, VPNs, and more. Different kinds of solutions automatically build clusters. Look for solutions that operate consistently throughout numerous clouds. Future research might involve collaboration or accessing datasets residing in different clouds, so you will want the flexibility to run anywhere. Cloud instances start costing money from when they are deployed; fast and accurate provisioning is essential. Additionally, make sure the chosen automation solution can support the custom images described above.
Containers are your friend. Just as custom images minimize differences between on-prem and cloud infrastructure, containers encapsulate applications and make them portable. A container runtime should be deployed in the VM image, along with workflow and pipeline management tools. Whether boxes are stored in a public or private registry, using containers will help ensure that applications run consistently on-premises and across multiple clouds. Using the same workload manager across on-premises and cloud resources can simplify user adoption for researchers and other end users.