promotergene.RdPromoters have a region where a protein (RNA polymerase) must make contact and the helical DNA sequence must have a valid conformation so that the two pieces of the contact region spatially align. The data contains DNA sequences of promoters and non-promoters.
data(promotergene)A data frame with 106 observations and 58 variables.
The first variable Class is a factor with levels + for a promoter gene
and - for a non-promoter gene.
The remaining 57 variables V2 to V58 are factors describing the sequence.
The DNA bases are coded as follows: a adenine c cytosine g
guanine t thymine
Towell, G., Shavlik, J. and Noordewier, M.
Refinement of Approximate Domain Theories by Knowledge-Based
Artificial Neural Networks.
In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90)
data(promotergene)
## Create classification model using Gaussian Processes
prom <- gausspr(Class~.,data=promotergene,kernel="rbfdot",
kpar=list(sigma=0.02),cross=4)
prom
#> Gaussian Processes object of class "gausspr"
#> Problem type: classification
#>
#> Gaussian Radial Basis kernel function.
#> Hyperparameter : sigma = 0.02
#>
#> Number of training instances learned : 106
#> Train error : 0
#> Cross validation error : 0.1695157
## Create model using Support Vector Machines
promsv <- ksvm(Class~.,data=promotergene,kernel="laplacedot",
kpar="automatic",C=60,cross=4)
promsv
#> Support Vector Machine object of class "ksvm"
#>
#> SV type: C-svc (classification)
#> parameter : cost C = 60
#>
#> Laplace kernel function.
#> Hyperparameter : sigma = 0.0160396542958821
#>
#> Number of Support Vectors : 102
#>
#> Objective Function Value : -285.7855
#> Training error : 0.018868
#> Cross validation error : 0.086182