基于统计特征的m6A甲基化位点识别研究+源程序

本篇论文主要说明了使用NC法、PSNP法和PSDP法对RNA序列进行特征向量的提取,然后生成出了能够被电脑识别的RNA数据集


摘要:社会不断进步,在信息技术计算机的蓬勃发展下,科学家们将研究生物体生命跟计算机一起结合起来,于是两种学科结合一下衍生出了现在的生物信息学。所以顾名思义,这门学科就是运用计算机对生物的信息进行处理和分析,从而加快了对信息的判断。有了这门学科,科学家们处理生命中的信息效率会大大加快,所以也加快了他们的研究进度。同样,RNA的甲基化位点的识别和研究就需要这种多基础,多功能结合的新兴学科来解决。想要在计算机中完整的表现出RNA的属性特征,就要需要各种各样的统计的方法。应用这种方法,对相关的RNA进行特征的提取,然后通过计算机再对特征向量进行识别。

在本文中,会提及到三种方法,分别是NC法,基于统计特征的PSNP法,基于统计特征的PSDP法。利用这三种方法可以很好的识别出RNA的m6A位点,当我们在识别RNA的甲基化时,首先需要构造的是RNA序列的特征向量,位置的特异性,以及对称的结构都可以作为提取特征向量的方式,但是,如果按照这样的方法去操作,或许会造成预测的精度不是很高。开发出能识别m6A的分布的新方法,将加速全基因组m6A检测。在本研究中,我们分别使用了NC法,基于统计特征的PSNP法,基于统计特征的PSDP法来识别识别m6A位点,这样最终得到的结果可以保证比单个属性提取出来的特征向量所得的结果更优越。

关键词:支持向量机(SVM);RNA甲基化;序列集;统计特征

Abstract:Society continues to progress, in the information technology computer under the vigorous development of the scientists will study the biological life together with the computer together, so the combination of the two disciplines derived from the current bioinformatics. So the name suggests, this discipline is the use of computer information on the biological processing and analysis, thus speeding up the judgments of information. With this discipline, the efficiency of the information that scientists deal with in life will be much faster, so they will speed up their research. Similarly, the identification and research of RNA methylation sites requires this multi-faceted, multifunctional combination of emerging disciplines to solve. Want to complete the performance of the RNA in the computer characteristics of the property, we need a variety of statistical methods. Using this method, the relevant RNA is extracted from the feature, and then the feature vector is identified by the computer.

   In this paper, we will refer to three methods, namely, NC method, based on the statistical characteristics of the PSNP method, based on statistical features of the PSDP method. Using these three methods can be a good identification of RNA m6A site,

   When we identify the methylation of RNA, we first need to construct the eigenvectors, materialization properties, location specificity, and symmetrical structures of RNA sequences as a way to extract eigenvectors. However, if we follow this method To operate, may cause prediction accuracy is not very high. A new method of m6A distribution and function was developed and studied. As a good complementary experimental technique, the computational method will accelerate the detection of whole genome m 6 A. In this study, we used the NC method, the PSNP method based on the statistical feature, and the PSDP method based on the statistical feature to identify and identify the m6A locus. The final result can guarantee the eigenvector extracted from the single attribute The result is superior.

Key words: support vector machine (SVM); RNA methylation ;Sequence set;Statistical characteristics

目录

第一章绪论 6

1.1RNA甲基化研究的背景和意义 6

1.2RNA甲基化国内外研究现状 6

1.3论文主要工作 7

第二章RNA概述及甲基化识别 8

2.1RNA的含义及表达方式 8