Transcription Factor Bound Regions Prediction: Word2Vec Technique with Convolutional Neural Network

Chen, Rixin and Dai, Ruoxi and Wang, Mingye (2020) Transcription Factor Bound Regions Prediction: Word2Vec Technique with Convolutional Neural Network. Journal of Intelligent Learning Systems and Applications, 12 (01). pp. 1-13. ISSN 2150-8402

[thumbnail of jilsa_2019120417191886.pdf] Text
jilsa_2019120417191886.pdf - Published Version

Download (2MB)

Abstract

Genome-wide epigenomic datasets allow us to validate the biological function of motifs and understand the regulatory mechanisms more comprehensively. How different motifs determine whether transcription factors (TFs) can bind to DNA at a specific position is a critical research question. In this project, we apply computational techniques that were used in Natural Language Processing (NLP) to predict the Transcription Factor Bound Regions (TFBRs) given motif instances. Most existing motif prediction methods using deep neural network apply base sequences with one-hot encoding as an input feature to realize TFBRs identification, contributing to low-resolution and indirect binding mechanisms. However, how the collective effect of motifs on binding sites is complicated to figure out. In our pipeline, we apply Word2Vec algorithm, with names of motifs as an input to predict TFBRs utilizing Convolutional Neural Network (CNN) to realize binary classification, based on the ENCODE dataset. In this regard, we consider different types of motifs as separate “words”, and their corresponding TFBR as the meanings of “sentences”. One “sentence” itself is merely the combination of these motifs, and all “sentences” compose of the whole “passage”. For each binding site, we do the binary classification within different cell types to show the performance of our model in different binding sites and cell types. Each “word” has a corresponding vector in high dimensions, and the distances between each vector can be figured out, so we can extract the similarity between each motif, and the explicit binding mechanism from our model. We apply Convolutional Neural Network (CNN) to extract features in the process of mapping and pooling from motif vectors extracted by Word2Vec Algorithm and gain the result of 87% accuracy at the peak.

Item Type: Article
Subjects: Afro Asian Library > Engineering
Depositing User: Unnamed user with email support@afroasianlibrary.com
Date Deposited: 16 Feb 2023 11:08
Last Modified: 29 Jun 2024 12:31
URI: http://classical.academiceprints.com/id/eprint/144

Actions (login required)

View Item
View Item