BERTNDA DataBase
mirRNA-lncRNA-disease repository

Introduction

Non-coding RNAs (ncRNAs) comprise a diverse group of RNA molecules that do not code for proteins but have crucial roles in various biological processes in the human body. Long-noncoding RNAs (lncRNAs), which are characterized by their length greater than 200 nucleotides, are the largest class of ncRNAs and have been shown to play critical roles in transcription, translation, splicing, epigenetic regulation, immune responses, and cell cycle control, among other processes. Specifically, lncRNAs such as HOTAIR, PCA3, and UCA1 have been identified as potential biomarkers for hepatocellular carcinoma recurrence, prostate cancer aggressiveness, and bladder cancer diagnosis, respectively.MiRNAs are small, endogenous, noncoding RNA molecules that typically function as post-transcriptional gene repressors by binding to the 3'-untranslated regions (UTRs) of target mRNAs. Mounting evidence suggests that miRNAs make a significant influence in regulating important biological processes such as cell development, proliferation, and differentiation.

BERTNDA is obtained based on the union relationship of the three types dataset. If a disease has appeared in lncRNA-disease or miRNA-disease, it is included in BERTNDA, so as to also process other miRNAs and lncRNAs, through this method can significantly expand the amount of data. Finally we obtained more than 43869 pairs of associations in the data, including miRNA-disease associations 23581, miRNA-lncRNA 18250, lncRNA-disease 2036, total miRNAs 1596, lncRNAs 2188, diseases 1297.

The framework of our method

The prediction method is consisted of four parts. Firstly, we leverage the graph data structure to obtain a comprehensive molecular representation that facilitates information extraction. Then, the feature extraction component incorporates a multi-scale information representation that combines both global and local information. The backbone network for our prediction task is based on global self-attention mechanism, which excels in capturing internal data correlations. Leveraging the model's powerful representation capabilities, we achieve excellent prediction performance.