CLOSE

Specials

I agree We use cookies on this website to enhance your user experience. By clicking any link on this page you are giving your consent for us to set cookies. More info

Skip to: Curated Story Group 1
lifesciencesreview
US
EUROPE
APAC
CANADA
  • US
    • US
    • EUROPE
    • APAC
    • CANADA
    • LATAM
  • Home
  • Contributors
  • News
  • Conferences
  • Newsletter
  • Whitepapers
  • Magazine
×
#

Life Science Review Weekly Brief

Be first to read the latest tech news, Industry Leader's Insights, and CIO interviews of medium and large enterprises exclusively from Life Science Review

Subscribe

loading

Thank you for Subscribing to Life Science Review Weekly Brief

  • Home
  • News

Recommended picks

Types of Synthetic Biology-Relevant Data and Their Representation

Life Sciences Review | Wednesday, October 19, 2022
Tweet

Properly trained deep learning networks take input and use it to accurately predict an output that is represented as matrices or vectors of numbers. These mathematical representations are important for converting biological problems into amenable ones to model training.


FREMONT, CA: Synthetic biology synergises with deep learning and generates large datasets to train models. For example, using DNA synthesis and deep learning models to inform design, such as generating novel parts or suggesting optimal experiments to conduct. Recent research at the engineering biology interface and deep learning have highlighted this potential through successes, including novel biological parts design and biomolecular implementations of artificial neural networks.


A properly trained deep learning network takes an input and uses it to accurately predict an output. Input data is generally represented as matrices or vectors of numbers. These mathematical representations are essential for converting biological problems into ones that are compliant with model training. As the representation codifies which information the model needs and restricts the applicable learning algorithms set, identifying the optimal data representations for specific problems is crucial to developing high-performing and generalisable models.


Practitioners should make careful choices about data representation to ensure that the independent variables pertinent to the issue are represented while constraining the irrelevant or confounding variables that a model must learn to overlook. In addition, the smart selection of data representations allows the practitioner to optimise these representation structures to reduce the problem space and increase data efficiency. Thus, it is necessary to understand the common types of synthetic-relevant data and how they can be numerically represented.


Sequence Data


Today, sequencing capabilities rapidly expand, an area with a vast quantity of data in sequence space. This includes DNA, RNA, or amino acid sequences. 


These data are represented as matrices by embeddings or functions that map sequence elements to vectors. The most basic embedding is a one-hot encoding in which only a single element is hot in each embedding vector, taking on a value of one, and the rest are zero.


One-hot encoding is straightforward, and, in certain cases, it limits the representational power of the model by disregarding the idea that certain amino acids might behave similarly at a given point in the sequence. Amino acid embeddings learned from large unlabeled protein data sets outperform one-hot encodings in certain protein engineering tasks.


Molecular Structure Data


The molecules’ structure at small and macromolecular scales is geometrically described in various ways, either in a string-based representation or a learned embedding. A molecule can also be represented through its structural formula, and this formula is encoded as a graph, upon which graph-based learning methods are directly applied.


Molecules are generally treated as objects in three-dimensional space, giving their constituent atoms explicit coordination with their existing node features. Machine learning workflows use these correlated and node features. Furthermore, these concepts abstract to a higher-level molecular structure view by defining nodes as nucleotides in DNA and RNA structures and amino acids in proteins.


Image Data


Synthetic biology experiments also generate image data like microscopy files. The pixels are represented in the rows and columns in the matrix. The dimensions expand to include values if the image contains multiple colour channels. For example, a colour image that is 400 × 600 pixels is represented as a 400 × 600 × 3 object having data associated with the red, green, and blue colour channels.


Weekly Brief

loading
> <
  • Regulatory Services 2023

    Top Vendors

    Current Issue
  • Clinical Lab Equipment 2023

    Top Vendors

    Current Issue
  • Proteomics 2022

    Top Vendors

    Current Issue
  • Regulatory Services 2023

    Top Vendors

    Current Issue
  • Clinical Lab Equipment 2023

    Top Vendors

    Current Issue
  • Proteomics 2022

    Top Vendors

    Current Issue

Read Also

Hepion Pharmaceuticals to Get a Total of $ 3.2 Million in Non-Dilutive Financing

Benefits that Medical Writing Offers

Antibodies: Immunoglobulin Isotypes

Life Science Consulting Services Trends

Rani Therapeutics and Celltrion Partner to Develop Oral Monoclonal Antibodies

ProPhase Labs Acquires BE-Smart Esophageal Pre-Cancer Screening Test

The Influence of Pharmaceutical Distributors on U.S. Drug Expenditures

Tips to Improve Medical Writing

Loading...

Copyright © 2023 Life Sciences Review . All rights reserved. |  Subscribe |  About Us follow on linkedin

This content is copyright protected

However, if you would like to share the information in this article, you may use the link below:

https://www.lifesciencesreview.com/news/types-of-synthetic-biologyrelevant-data-and-their-representation-nwid-1010.html