International Science Index

33
10007534
FCNN-MR: A Parallel Instance Selection Method Based on Fast Condensed Nearest Neighbor Rule
Abstract:
Instance selection (IS) technique is used to reduce the data size to improve the performance of data mining methods. Recently, to process very large data set, several proposed methods divide the training set into some disjoint subsets and apply IS algorithms independently to each subset. In this paper, we analyze the limitation of these methods and give our viewpoint about how to divide and conquer in IS procedure. Then, based on fast condensed nearest neighbor (FCNN) rule, we propose a large data sets instance selection method with MapReduce framework. Besides ensuring the prediction accuracy and reduction rate, it has two desirable properties: First, it reduces the work load in the aggregation node; Second and most important, it produces the same result with the sequential version, which other parallel methods cannot achieve. We evaluate the performance of FCNN-MR on one small data set and two large data sets. The experimental results show that it is effective and practical.
Paper Detail
54
downloads
32
10007098
A Psychophysiological Evaluation of an Effective Recognition Technique Using Interactive Dynamic Virtual Environments
Abstract:

Recording psychological and physiological correlates of human performance within virtual environments and interpreting their impacts on human engagement, ‘immersion’ and related emotional or ‘effective’ states is both academically and technologically challenging. By exposing participants to an effective, real-time (game-like) virtual environment, designed and evaluated in an earlier study, a psychophysiological database containing the EEG, GSR and Heart Rate of 30 male and female gamers, exposed to 10 games, was constructed. Some 174 features were subsequently identified and extracted from a number of windows, with 28 different timing lengths (e.g. 2, 3, 5, etc. seconds). After reducing the number of features to 30, using a feature selection technique, K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) methods were subsequently employed for the classification process. The classifiers categorised the psychophysiological database into four effective clusters (defined based on a 3-dimensional space – valence, arousal and dominance) and eight emotion labels (relaxed, content, happy, excited, angry, afraid, sad, and bored). The KNN and SVM classifiers achieved average cross-validation accuracies of 97.01% (±1.3%) and 92.84% (±3.67%), respectively. However, no significant differences were found in the classification process based on effective clusters or emotion labels.

Paper Detail
81
downloads
31
10005377
Predicting Groundwater Areas Using Data Mining Techniques: Groundwater in Jordan as Case Study
Abstract:
Data mining is the process of extracting useful or hidden information from a large database. Extracted information can be used to discover relationships among features, where data objects are grouped according to logical relationships; or to predict unseen objects to one of the predefined groups. In this paper, we aim to investigate four well-known data mining algorithms in order to predict groundwater areas in Jordan. These algorithms are Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbor (kNN) and Classification Based on Association Rule (CBA). The experimental results indicate that the SVMs algorithm outperformed other algorithms in terms of classification accuracy, precision and F1 evaluation measures using the datasets of groundwater areas that were collected from Jordanian Ministry of Water and Irrigation.
Paper Detail
692
downloads
30
10004813
From Type-I to Type-II Fuzzy System Modeling for Diagnosis of Hepatitis
Abstract:

Hepatitis is one of the most common and dangerous diseases that affects humankind, and exposes millions of people to serious health risks every year. Diagnosis of Hepatitis has always been a challenge for physicians. This paper presents an effective method for diagnosis of hepatitis based on interval Type-II fuzzy. This proposed system includes three steps: pre-processing (feature selection), Type-I and Type-II fuzzy classification, and system evaluation. KNN-FD feature selection is used as the preprocessing step in order to exclude irrelevant features and to improve classification performance and efficiency in generating the classification model. In the fuzzy classification step, an “indirect approach” is used for fuzzy system modeling by implementing the exponential compactness and separation index for determining the number of rules in the fuzzy clustering approach. Therefore, we first proposed a Type-I fuzzy system that had an accuracy of approximately 90.9%. In the proposed system, the process of diagnosis faces vagueness and uncertainty in the final decision. Thus, the imprecise knowledge was managed by using interval Type-II fuzzy logic. The results that were obtained show that interval Type-II fuzzy has the ability to diagnose hepatitis with an average accuracy of 93.94%. The classification accuracy obtained is the highest one reached thus far. The aforementioned rate of accuracy demonstrates that the Type-II fuzzy system has a better performance in comparison to Type-I and indicates a higher capability of Type-II fuzzy system for modeling uncertainty.

Paper Detail
679
downloads
29
10003615
HRV Analysis Based Arrhythmic Beat Detection Using kNN Classifier
Abstract:
Health diseases have a vital significance affecting human being's life and life quality. Sudden death events can be prevented owing to early diagnosis and treatment methods. Electrical signals, taken from the human being's body using non-invasive methods and showing the heart activity is called Electrocardiogram (ECG). The ECG signal is used for following daily activity of the heart by clinicians. Heart Rate Variability (HRV) is a physiological parameter giving the variation between the heart beats. ECG data taken from MITBIH Arrhythmia Database is used in the model employed in this study. The detection of arrhythmic heart beats is aimed utilizing the features extracted from the HRV time domain parameters. The developed model provides a satisfactory performance with ~89% accuracy, 91.7 % sensitivity and 85% specificity rates for the detection of arrhythmic beats.
Paper Detail
1121
downloads
28
10002616
Methods of Geodesic Distance in Two-Dimensional Face Recognition
Abstract:

In this paper, we present a comparative study of three methods of 2D face recognition system such as: Iso-Geodesic Curves (IGC), Geodesic Distance (GD) and Geodesic-Intensity Histogram (GIH). These approaches are based on computing of geodesic distance between points of facial surface and between facial curves. In this study we represented the image at gray level as a 2D surface in a 3D space, with the third coordinate proportional to the intensity values of pixels. In the classifying step, we use: Neural Networks (NN), K-Nearest Neighbor (KNN) and Support Vector Machines (SVM). The images used in our experiments are from two wellknown databases of face images ORL and YaleB. ORL data base was used to evaluate the performance of methods under conditions where the pose and sample size are varied, and the database YaleB was used to examine the performance of the systems when the facial expressions and lighting are varied.

Paper Detail
968
downloads
27
10001465
Dependence of Dielectric Properties on Sintering Conditions of Lead Free KNN Ceramics Modified with Li-Sb
Abstract:
In order to produce lead free piezoceramics with optimum piezoelectric and dielectric properties, KNN modified with Li+ (as an A site dopant) and Sb5+ (as a B site dopant) (K0.49Na0.49Li0.02) (Nb0.96Sb0.04) O3 (referred as KNLNS in this paper) have been synthesized using solid state reaction method and conventional sintering technique. The ceramics were sintered in the narrow range of 1050°C-1090°C for 2-3 h to get precise information about sintering parameters. Detailed study of dependence of microstructural, dielectric and piezoelectric properties on sintering conditions was then carried out. The study suggests that the volatility of the highly hygroscopic KNN ceramics is not only sensitive to sintering temperatures but also to sintering durations. By merely reducing the sintering duration for a given sintering temperature we saw an increase in the density of the samples which was supported by the increase in dielectric constants of the ceramics. And since density directly or indirectly affects almost all the associated properties, other dielectric and piezoelectric properties were also enhanced as we approached towards the most suitable sintering temperature and duration combination. The detailed results are reported in this paper.
Paper Detail
1034
downloads
26
10002123
Wavelet Feature Selection Approach for Heart Murmur Classification
Abstract:
Phonocardiography is important in appraisal of congenital heart disease and pulmonary hypertension as it reflects the duration of right ventricular systoles. The systolic murmur in patients with intra-cardiac shunt decreases as pulmonary hypertension develops and may eventually disappear completely as the pulmonary pressure reaches systemic level. Phonocardiography and auscultation are non-invasive, low-cost, and accurate methods to assess heart disease. In this work an objective signal processing tool to extract information from phonocardiography signal using Wavelet is proposed to classify the murmur as normal or abnormal. Since the feature vector is large, a Binary Particle Swarm Optimization (PSO) with mutation for feature selection is proposed. The extracted features improve the classification accuracy and were tested across various classifiers including Naïve Bayes, kNN, C4.5, and SVM.
Paper Detail
1388
downloads
25
10000157
Effect of High-Energy Ball Milling on the Electrical and Piezoelectric Properties of (K0.5Na0.5)(Nb0.9Ta0.1)O3 Lead-Free Piezoceramics
Abstract:

Nanocrystalline powders of the lead-free piezoelectric material, tantalum-substituted potassium sodium niobate (K0.5Na0.5)(Nb0.9Ta0.1)O3 (KNNT), were produced using a Retsch PM100 planetary ball mill by setting the milling time to 15h, 20h, 25h, 30h, 35h and 40h, at a fixed speed of 250rpm. The average particle size of the milled powders was found to decrease from 12nm to 3nm as the milling time increases from 15h to 25h, which is in agreement with the existing theoretical model. An anomalous increase to 98nm and then a drop to 3nm in the particle size were observed as the milling time further increases to 30h and 40h respectively. Various sizes of these starting KNNT powders were used to investigate the effect of milling time on the microstructure, dielectric properties, phase transitions and piezoelectric properties of the resulting KNNT ceramics. The particle size of starting KNNT was somewhat proportional to the grain size. As the milling time increases from 15h to 25h, the resulting ceramics exhibit enhancement in the values of relative density from 94.8% to 95.8%, room temperature dielectric constant (εRT) from 878 to 1213, and piezoelectric charge coefficient (d33) from 108pC/N to 128pC/N. For this range of ceramic samples, grain size refinement suppresses the maximum dielectric constant (εmax), shifts the Curie temperature (Tc) to a lower temperature and the orthorhombic-tetragonal phase transition (Tot) to a higher temperature. Further increase of milling time from 25h to 40h produces a gradual degradation in the values of relative density, εRT, and d33 of the resulting ceramics.

Paper Detail
1630
downloads
24
9998975
Brainwave Classification for Brain Balancing Index (BBI) via 3D EEG Model Using k-NN Technique
Abstract:

In this paper, the comparison between k-Nearest Neighbor (kNN) algorithms for classifying the 3D EEG model in brain balancing is presented. The EEG signal recording was conducted on 51 healthy subjects. Development of 3D EEG models involves pre-processing of raw EEG signals and construction of spectrogram images. Then, maximum PSD values were extracted as features from the model. There are three indexes for balanced brain; index 3, index 4 and index 5. There are significant different of the EEG signals due to the brain balancing index (BBI). Alpha-α (8–13 Hz) and beta-β (13–30 Hz) were used as input signals for the classification model. The k-NN classification result is 88.46% accuracy. These results proved that k-NN can be used in order to predict the brain balancing application.

Paper Detail
1624
downloads
23
9999410
Performance Analysis of Genetic Algorithm with kNN and SVM for Feature Selection in Tumor Classification
Abstract:

Tumor classification is a key area of research in the field of bioinformatics. Microarray technology is commonly used in the study of disease diagnosis using gene expression levels. The main drawback of gene expression data is that it contains thousands of genes and a very few samples. Feature selection methods are used to select the informative genes from the microarray. These methods considerably improve the classification accuracy. In the proposed method, Genetic Algorithm (GA) is used for effective feature selection. Informative genes are identified based on the T-Statistics, Signal-to-Noise Ratio (SNR) and F-Test values. The initial candidate solutions of GA are obtained from top-m informative genes. The classification accuracy of k-Nearest Neighbor (kNN) method is used as the fitness function for GA. In this work, kNN and Support Vector Machine (SVM) are used as the classifiers. The experimental results show that the proposed work is suitable for effective feature selection. With the help of the selected genes, GA-kNN method achieves 100% accuracy in 4 datasets and GA-SVM method achieves in 5 out of 10 datasets. The GA with kNN and SVM methods are demonstrated to be an accurate method for microarray based tumor classification.

Paper Detail
2547
downloads
22
9999449
DWT Based Image Steganalysis
Abstract:

‘Steganalysis’ is one of the challenging and attractive interests for the researchers with the development of information hiding techniques. It is the procedure to detect the hidden information from the stego created by known steganographic algorithm. In this paper, a novel feature based image steganalysis technique is proposed. Various statistical moments have been used along with some similarity metric. The proposed steganalysis technique has been designed based on transformation in four wavelet domains, which include Haar, Daubechies, Symlets and Biorthogonal. Each domain is being subjected to various classifiers, namely K-nearest-neighbor, K* Classifier, Locally weighted learning, Naive Bayes classifier, Neural networks, Decision trees and Support vector machines. The experiments are performed on a large set of pictures which are available freely in image database. The system also predicts the different message length definitions.

Paper Detail
1639
downloads
21
9997960
Relevance Feedback within CBIR Systems
Abstract:

We present here the results for a comparative study of some techniques, available in the literature, related to the relevance feedback mechanism in the case of a short-term learning. Only one method among those considered here is belonging to the data mining field which is the K-nearest neighbors algorithm (KNN) while the rest of the methods is related purely to the information retrieval field and they fall under the purview of the following three major axes: Shifting query, Feature Weighting and the optimization of the parameters of similarity metric. As a contribution, and in addition to the comparative purpose, we propose a new version of the KNN algorithm referred to as an incremental KNN which is distinct from the original version in the sense that besides the influence of the seeds, the rate of the actual target image is influenced also by the images already rated. The results presented here have been obtained after experiments conducted on the Wang database for one iteration and utilizing color moments on the RGB space. This compact descriptor, Color Moments, is adequate for the efficiency purposes needed in the case of interactive systems. The results obtained allow us to claim that the proposed algorithm proves good results; it even outperforms a wide range of techniques available in the literature.

Paper Detail
1582
downloads
20
9997276
Modified Naïve Bayes Based Prediction Modeling for Crop Yield Prediction
Abstract:

Most of greenhouse growers desire a determined amount of yields in order to accurately meet market requirements. The purpose of this paper is to model a simple but often satisfactory supervised classification method. The original naive Bayes have a serious weakness, which is producing redundant predictors. In this paper, utilized regularization technique was used to obtain a computationally efficient classifier based on naive Bayes. The suggested construction, utilized L1-penalty, is capable of clearing redundant predictors, where a modification of the LARS algorithm is devised to solve this problem, making this method applicable to a wide range of data. In the experimental section, a study conducted to examine the effect of redundant and irrelevant predictors, and test the method on WSG data set for tomato yields, where there are many more predictors than data, and the urge need to predict weekly yield is the goal of this approach. Finally, the modified approach is compared with several naive Bayes variants and other classification algorithms (SVM and kNN), and is shown to be fairly good.

Paper Detail
3904
downloads
19
17108
A Distance Function for Data with Missing Values and Its Application
Abstract:

Missing values in data are common in real world applications. Since the performance of many data mining algorithms depend critically on it being given a good metric over the input space, we decided in this paper to define a distance function for unlabeled datasets with missing values. We use the Bhattacharyya distance, which measures the similarity of two probability distributions, to define our new distance function. According to this distance, the distance between two points without missing attributes values is simply the Mahalanobis distance. When on the other hand there is a missing value of one of the coordinates, the distance is computed according to the distribution of the missing coordinate. Our distance is general and can be used as part of any algorithm that computes the distance between data points. Because its performance depends strongly on the chosen distance measure, we opted for the k nearest neighbor classifier to evaluate its ability to accurately reflect object similarity. We experimented on standard numerical datasets from the UCI repository from different fields. On these datasets we simulated missing values and compared the performance of the kNN classifier using our distance to other three basic methods. Our  experiments show that kNN using our distance function outperforms the kNN using other methods. Moreover, the runtime performance of our method is only slightly higher than the other methods.

Paper Detail
1587
downloads
18
16638
An Improved k Nearest Neighbor Classifier Using Interestingness Measures for Medical Image Mining
Abstract:

The exponential increase in the volume of medical image database has imposed new challenges to clinical routine in maintaining patient history, diagnosis, treatment and monitoring. With the advent of data mining and machine learning techniques it is possible to automate and/or assist physicians in clinical diagnosis. In this research a medical image classification framework using data mining techniques is proposed. It involves feature extraction, feature selection, feature discretization and classification. In the classification phase, the performance of the traditional kNN k nearest neighbor classifier is improved using a feature weighting scheme and a distance weighted voting instead of simple majority voting. Feature weights are calculated using the interestingness measures used in association rule mining. Experiments on the retinal fundus images show that the proposed framework improves the classification accuracy of traditional kNN from 78.57 % to 92.85 %.

Paper Detail
2261
downloads
17
1475
A Learning-Community Recommendation Approach for Web-Based Cooperative Learning
Abstract:

Cooperative learning has been defined as learners working together as a team to solve a problem to complete a task or to accomplish a common goal, which emphasizes the importance of interactions among members to promote the whole learning performance. With the popularity of society networks, cooperative learning is no longer limited to traditional classroom teaching activities. Since society networks facilitate to organize online learners, to establish common shared visions, and to advance learning interaction, the online community and online learning community have triggered the establishment of web-based societies. Numerous research literatures have indicated that the collaborative learning community is a critical issue to enhance learning performance. Hence, this paper proposes a learning community recommendation approach to facilitate that a learner joins the appropriate learning communities, which is based on k-nearest neighbor (kNN) classification. To demonstrate the viability of the proposed approach, the proposed approach is implemented for 117 students to recommend learning communities. The experimental results indicate that the proposed approach can effectively recommend appropriate learning communities for learners.

Paper Detail
1186
downloads
16
9996958
Active Segment Selection Method in EEG Classification Using Fractal Features
Abstract:

BCI (Brain Computer Interface) is a communication machine that translates brain massages to computer commands. These machines with the help of computer programs can recognize the tasks that are imagined. Feature extraction is an important stage of the process in EEG classification that can effect in accuracy and the computation time of processing the signals. In this study we process the signal in three steps of active segment selection, fractal feature extraction, and classification. One of the great challenges in BCI applications is to improve classification accuracy and computation time together. In this paper, we have used student’s 2D sample t-statistics on continuous wavelet transforms for active segment selection to reduce the computation time. In the next level, the features are extracted from some famous fractal dimension estimation of the signal. These fractal features are Katz and Higuchi. In the classification stage we used ANFIS (Adaptive Neuro-Fuzzy Inference System) classifier, FKNN (Fuzzy K-Nearest Neighbors), LDA (Linear Discriminate Analysis), and SVM (Support Vector Machines). We resulted that active segment selection method would reduce the computation time and Fractal dimension features with ANFIS analysis on selected active segments is the best among investigated methods in EEG classification.

Paper Detail
1404
downloads
15
15213
Automatic Classification of Initial Categories of Alzheimer's Disease from Structural MRI Phase Images: A Comparison of PSVM, KNN and ANN Methods
Abstract:

An early and accurate detection of Alzheimer's disease (AD) is an important stage in the treatment of individuals suffering from AD. We present an approach based on the use of structural magnetic resonance imaging (sMRI) phase images to distinguish between normal controls (NC), mild cognitive impairment (MCI) and AD patients with clinical dementia rating (CDR) of 1. Independent component analysis (ICA) technique is used for extracting useful features which form the inputs to the support vector machines (SVM), K nearest neighbour (kNN) and multilayer artificial neural network (ANN) classifiers to discriminate between the three classes. The obtained results are encouraging in terms of classification accuracy and effectively ascertain the usefulness of phase images for the classification of different stages of Alzheimer-s disease.

Paper Detail
1779
downloads
14
283
Gene Expression Signature for Classification of Metastasis Positive and Negative Oral Cancer in Homosapiens
Abstract:

Cancer classification to their corresponding cohorts has been key area of research in bioinformatics aiming better prognosis of the disease. High dimensionality of gene data has been makes it a complex task and requires significance data identification technique in order to reducing the dimensionality and identification of significant information. In this paper, we have proposed a novel approach for classification of oral cancer into metastasis positive and negative patients. We have used significance analysis of microarrays (SAM) for identifying significant genes which constitutes gene signature. 3 different gene signatures were identified using SAM from 3 different combination of training datasets and their classification accuracy was calculated on corresponding testing datasets using k-Nearest Neighbour (kNN), Fuzzy C-Means Clustering (FCM), Support Vector Machine (SVM) and Backpropagation Neural Network (BPNN). A final gene signature of only 9 genes was obtained from above 3 individual gene signatures. 9 gene signature-s classification capability was compared using same classifiers on same testing datasets. Results obtained from experimentation shows that 9 gene signature classified all samples in testing dataset accurately while individual genes could not classify all accurately.

Paper Detail
1438
downloads
13
9119
Emotion Classification for Students with Autism in Mathematics E-learning using Physiological and Facial Expression Measures
Abstract:

Avoiding learning failures in mathematics e-learning environments caused by emotional problems in students with autism has become an important topic for combining of special education with information and communications technology. This study presents an adaptive emotional adjustment model in mathematics e-learning for students with autism, emphasizing the lack of emotional perception in mathematics e-learning systems. In addition, an emotion classification for students with autism was developed by inducing emotions in mathematical learning environments to record changes in the physiological signals and facial expressions of students. Using these methods, 58 emotional features were obtained. These features were then processed using one-way ANOVA and information gain (IG). After reducing the feature dimension, methods of support vector machines (SVM), k-nearest neighbors (KNN), and classification and regression trees (CART) were used to classify four emotional categories: baseline, happy, angry, and anxious. After testing and comparisons, in a situation without feature selection, the accuracy rate of the SVM classification can reach as high as 79.3-%. After using IG to reduce the feature dimension, with only 28 features remaining, SVM still has a classification accuracy of 78.2-%. The results of this research could enhance the effectiveness of eLearning in special education.

Paper Detail
1068
downloads
12
4468
Improvement in Power Transformer Intelligent Dissolved Gas Analysis Method
Abstract:
Non-Destructive evaluation of in-service power transformer condition is necessary for avoiding catastrophic failures. Dissolved Gas Analysis (DGA) is one of the important methods. Traditional, statistical and intelligent DGA approaches have been adopted for accurate classification of incipient fault sources. Unfortunately, there are not often enough faulty patterns required for sufficient training of intelligent systems. By bootstrapping the shortcoming is expected to be alleviated and algorithms with better classification success rates to be obtained. In this paper the performance of an artificial neural network, K-Nearest Neighbour and support vector machine methods using bootstrapped data are detailed and shown that while the success rate of the ANN algorithms improves remarkably, the outcome of the others do not benefit so much from the provided enlarged data space. For assessment, two databases are employed: IEC TC10 and a dataset collected from reported data in papers. High average test success rate well exhibits the remarkable outcome.
Paper Detail
1786
downloads
11
14323
Improved Tropical Wood Species Recognition System based on Multi-feature Extractor and Classifier
Abstract:
An automated wood recognition system is designed to classify tropical wood species.The wood features are extracted based on two feature extractors: Basic Grey Level Aura Matrix (BGLAM) technique and statistical properties of pores distribution (SPPD) technique. Due to the nonlinearity of the tropical wood species separation boundaries, a pre classification stage is proposed which consists ofKmeans clusteringand kernel discriminant analysis (KDA). Finally, Linear Discriminant Analysis (LDA) classifier and KNearest Neighbour (KNN) are implemented for comparison purposes. The study involves comparison of the system with and without pre classification using KNN classifier and LDA classifier.The results show that the inclusion of the pre classification stage has improved the accuracy of both the LDA and KNN classifiers by more than 12%.
Paper Detail
1266
downloads
10
3742
An SVM based Classification Method for Cancer Data using Minimum Microarray Gene Expressions
Abstract:
This paper gives a novel method for improving classification performance for cancer classification with very few microarray Gene expression data. The method employs classification with individual gene ranking and gene subset ranking. For selection and classification, the proposed method uses the same classifier. The method is applied to three publicly available cancer gene expression datasets from Lymphoma, Liver and Leukaemia datasets. Three different classifiers namely Support vector machines-one against all (SVM-OAA), K nearest neighbour (KNN) and Linear Discriminant analysis (LDA) were tested and the results indicate the improvement in performance of SVM-OAA classifier with satisfactory results on all the three datasets when compared with the other two classifiers.
Paper Detail
1815
downloads
9
5009
Identification of Cardiac Arrhythmias using Natural Resonance Complex Frequencies
Abstract:
An electrocardiogram (ECG) feature extraction system based on the calculation of the complex resonance frequency employing Prony-s method is developed. Prony-s method is applied on five different classes of ECG signals- arrhythmia as a finite sum of exponentials depending on the signal-s poles and the resonant complex frequencies. Those poles and resonance frequencies of the ECG signals- arrhythmia are evaluated for a large number of each arrhythmia. The ECG signals of lead II (ML II) were taken from MIT-BIH database for five different types. These are the ventricular couplet (VC), ventricular tachycardia (VT), ventricular bigeminy (VB), and ventricular fibrillation (VF) and the normal (NR). This novel method can be extended to any number of arrhythmias. Different classification techniques were tried using neural networks (NN), K nearest neighbor (KNN), linear discriminant analysis (LDA) and multi-class support vector machine (MC-SVM).
Paper Detail
1342
downloads
8
10488
A Novel Prediction Method for Tag SNP Selection using Genetic Algorithm based on KNN
Abstract:

Single nucleotide polymorphisms (SNPs) hold much promise as a basis for disease-gene association. However, research is limited by the cost of genotyping the tremendous number of SNPs. Therefore, it is important to identify a small subset of informative SNPs, the so-called tag SNPs. This subset consists of selected SNPs of the genotypes, and accurately represents the rest of the SNPs. Furthermore, an effective evaluation method is needed to evaluate prediction accuracy of a set of tag SNPs. In this paper, a genetic algorithm (GA) is applied to tag SNP problems, and the K-nearest neighbor (K-NN) serves as a prediction method of tag SNP selection. The experimental data used was taken from the HapMap project; it consists of genotype data rather than haplotype data. The proposed method consistently identified tag SNPs with considerably better prediction accuracy than methods from the literature. At the same time, the number of tag SNPs identified was smaller than the number of tag SNPs in the other methods. The run time of the proposed method was much shorter than the run time of the SVM/STSA method when the same accuracy was reached.

Paper Detail
1011
downloads
7
12509
New Adaptive Linear Discriminante Analysis for Face Recognition with SVM
Abstract:
We have applied new accelerated algorithm for linear discriminate analysis (LDA) in face recognition with support vector machine. The new algorithm has the advantage of optimal selection of the step size. The gradient descent method and new algorithm has been implemented in software and evaluated on the Yale face database B. The eigenfaces of these approaches have been used to training a KNN. Recognition rate with new algorithm is compared with gradient.
Paper Detail
772
downloads
6
6432
Feature Reduction of Nearest Neighbor Classifiers using Genetic Algorithm
Abstract:
The design of a pattern classifier includes an attempt to select, among a set of possible features, a minimum subset of weakly correlated features that better discriminate the pattern classes. This is usually a difficult task in practice, normally requiring the application of heuristic knowledge about the specific problem domain. The selection and quality of the features representing each pattern have a considerable bearing on the success of subsequent pattern classification. Feature extraction is the process of deriving new features from the original features in order to reduce the cost of feature measurement, increase classifier efficiency, and allow higher classification accuracy. Many current feature extraction techniques involve linear transformations of the original pattern vectors to new vectors of lower dimensionality. While this is useful for data visualization and increasing classification efficiency, it does not necessarily reduce the number of features that must be measured since each new feature may be a linear combination of all of the features in the original pattern vector. In this paper a new approach is presented to feature extraction in which feature selection, feature extraction, and classifier training are performed simultaneously using a genetic algorithm. In this approach each feature value is first normalized by a linear equation, then scaled by the associated weight prior to training, testing, and classification. A knn classifier is used to evaluate each set of feature weights. The genetic algorithm optimizes a vector of feature weights, which are used to scale the individual features in the original pattern vectors in either a linear or a nonlinear fashion. By this approach, the number of features used in classifying can be finely reduced.
Paper Detail
1097
downloads
5
11092
Texture Feature Extraction using Slant-Hadamard Transform
Abstract:
Random and natural textures classification is still one of the biggest challenges in the field of image processing and pattern recognition. In this paper, texture feature extraction using Slant Hadamard Transform was studied and compared to other signal processing-based texture classification schemes. A parametric SHT was also introduced and employed for natural textures feature extraction. We showed that a subtly modified parametric SHT can outperform ordinary Walsh-Hadamard transform and discrete cosine transform. Experiments were carried out on a subset of Vistex random natural texture images using a kNN classifier.
Paper Detail
1771
downloads
4
13379
Predictive Clustering Hybrid Regression(pCHR) Approach and Its Application to Sucrose-Based Biohydrogen Production
Abstract:
A predictive clustering hybrid regression (pCHR) approach was developed and evaluated using dataset from H2- producing sucrose-based bioreactor operated for 15 months. The aim was to model and predict the H2-production rate using information available about envirome and metabolome of the bioprocess. Selforganizing maps (SOM) and Sammon map were used to visualize the dataset and to identify main metabolic patterns and clusters in bioprocess data. Three metabolic clusters: acetate coupled with other metabolites, butyrate only, and transition phases were detected. The developed pCHR model combines principles of k-means clustering, kNN classification and regression techniques. The model performed well in modeling and predicting the H2-production rate with mean square error values of 0.0014 and 0.0032, respectively.
Paper Detail
1127
downloads