International Science Index

27
10008894
Lane Detection Using Labeling Based RANSAC Algorithm
Abstract:

In this paper, we propose labeling based RANSAC algorithm for lane detection. Advanced driver assistance systems (ADAS) have been widely researched to avoid unexpected accidents. Lane detection is a necessary system to assist keeping lane and lane departure prevention. The proposed vision based lane detection method applies Canny edge detection, inverse perspective mapping (IPM), K-means algorithm, mathematical morphology operations and 8 connected-component labeling. Next, random samples are selected from each labeling region for RANSAC. The sampling method selects the points of lane with a high probability. Finally, lane parameters of straight line or curve equations are estimated. Through the simulations tested on video recorded at daytime and nighttime, we show that the proposed method has better performance than the existing RANSAC algorithm in various environments.

Paper Detail
82
downloads
26
10008128
A Parallel Implementation of k-Means in MATLAB
Abstract:
The aim of this work is the parallel implementation of k-means in MATLAB, in order to reduce the execution time. Specifically, a new function in MATLAB for serial k-means algorithm is developed, which meets all the requirements for the conversion to a function in MATLAB with parallel computations. Additionally, two different variants for the definition of initial values are presented. In the sequel, the parallel approach is presented. Finally, the performance tests for the computation times respect to the numbers of features and classes are illustrated.
Paper Detail
155
downloads
25
10007607
K-Means Based Matching Algorithm for Multi-Resolution Feature Descriptors
Abstract:

Matching high dimensional features between images is computationally expensive for exhaustive search approaches in computer vision. Although the dimension of the feature can be degraded by simplifying the prior knowledge of homography, matching accuracy may degrade as a tradeoff. In this paper, we present a feature matching method based on k-means algorithm that reduces the matching cost and matches the features between images instead of using a simplified geometric assumption. Experimental results show that the proposed method outperforms the previous linear exhaustive search approaches in terms of the inlier ratio of matched pairs.

Paper Detail
232
downloads
24
10007221
Clustering Categorical Data Using the K-Means Algorithm and the Attribute’s Relative Frequency
Abstract:

Clustering is a well known data mining technique used in pattern recognition and information retrieval. The initial dataset to be clustered can either contain categorical or numeric data. Each type of data has its own specific clustering algorithm. In this context, two algorithms are proposed: the k-means for clustering numeric datasets and the k-modes for categorical datasets. The main encountered problem in data mining applications is clustering categorical dataset so relevant in the datasets. One main issue to achieve the clustering process on categorical values is to transform the categorical attributes into numeric measures and directly apply the k-means algorithm instead the k-modes. In this paper, it is proposed to experiment an approach based on the previous issue by transforming the categorical values into numeric ones using the relative frequency of each modality in the attributes. The proposed approach is compared with a previously method based on transforming the categorical datasets into binary values. The scalability and accuracy of the two methods are experimented. The obtained results show that our proposed method outperforms the binary method in all cases.

Paper Detail
457
downloads
23
10006049
A Minimum Spanning Tree-Based Method for Initializing the K-Means Clustering Algorithm
Abstract:

The traditional k-means algorithm has been widely used as a simple and efficient clustering method. However, the algorithm often converges to local minima for the reason that it is sensitive to the initial cluster centers. In this paper, an algorithm for selecting initial cluster centers on the basis of minimum spanning tree (MST) is presented. The set of vertices in MST with same degree are regarded as a whole which is used to find the skeleton data points. Furthermore, a distance measure between the skeleton data points with consideration of degree and Euclidean distance is presented. Finally, MST-based initialization method for the k-means algorithm is presented, and the corresponding time complexity is analyzed as well. The presented algorithm is tested on five data sets from the UCI Machine Learning Repository. The experimental results illustrate the effectiveness of the presented algorithm compared to three existing initialization methods.

Paper Detail
590
downloads
22
10002969
Customer Segmentation Model in E-commerce Using Clustering Techniques and LRFM Model: The Case of Online Stores in Morocco
Abstract:
Given the increase in the number of e-commerce sites, the number of competitors has become very important. This means that companies have to take appropriate decisions in order to meet the expectations of their customers and satisfy their needs. In this paper, we present a case study of applying LRFM (length, recency, frequency and monetary) model and clustering techniques in the sector of electronic commerce with a view to evaluating customers’ values of the Moroccan e-commerce websites and then developing effective marketing strategies. To achieve these objectives, we adopt LRFM model by applying a two-stage clustering method. In the first stage, the self-organizing maps method is used to determine the best number of clusters and the initial centroid. In the second stage, kmeans method is applied to segment 730 customers into nine clusters according to their L, R, F and M values. The results show that the cluster 6 is the most important cluster because the average values of L, R, F and M are higher than the overall average value. In addition, this study has considered another variable that describes the mode of payment used by customers to improve and strengthen clusters’ analysis. The clusters’ analysis demonstrates that the payment method is one of the key indicators of a new index which allows to assess the level of customers’ confidence in the company's Website.
Paper Detail
2921
downloads
21
16352
Quantity and Quality Aware Artificial Bee Colony Algorithm for Clustering
Abstract:

Artificial Bee Colony (ABC) algorithm is a relatively new swarm intelligence technique for clustering. It produces higher quality clusters compared to other population-based algorithms but with poor energy efficiency, cluster quality consistency and typically slower in convergence speed. Inspired by energy saving foraging behavior of natural honey bees this paper presents a Quality and Quantity Aware Artificial Bee Colony (Q2ABC) algorithm to improve quality of cluster identification, energy efficiency and convergence speed of the original ABC. To evaluate the performance of Q2ABC algorithm, experiments were conducted on a suite of ten benchmark UCI datasets. The results demonstrate Q2ABC outperformed ABC and K-means algorithm in the quality of clusters delivered.

Paper Detail
1601
downloads
20
6965
A Hybrid Approach for Color Image Quantization Using K-means and Firefly Algorithms
Abstract:
Color Image quantization (CQ) is an important problem in computer graphics, image and processing. The aim of quantization is to reduce colors in an image with minimum distortion. Clustering is a widely used technique for color quantization; all colors in an image are grouped to small clusters. In this paper, we proposed a new hybrid approach for color quantization using firefly algorithm (FA) and K-means algorithm. Firefly algorithm is a swarmbased algorithm that can be used for solving optimization problems. The proposed method can overcome the drawbacks of both algorithms such as the local optima converge problem in K-means and the early converge of firefly algorithm. Experiments on three commonly used images and the comparison results shows that the proposed algorithm surpasses both the base-line technique k-means clustering and original firefly algorithm.
Paper Detail
1650
downloads
19
9997047
Simultaneous Clustering and Feature Selection Method for Gene Expression Data
Abstract:

Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. It is used to identify the co-expressed genes in specific cells or tissues that are actively used to make proteins. This method is used to analysis the gene expression, an important task in bioinformatics research. Cluster analysis of gene expression data has proved to be a useful tool for identifying co-expressed genes, biologically relevant groupings of genes and samples. In this work K-Means algorithms has been applied for clustering of Gene Expression Data. Further, rough set based Quick reduct algorithm has been applied for each cluster in order to select the most similar genes having high correlation. Then the ACV measure is used to evaluate the refined clusters and classification is used to evaluate the proposed method. They could identify compact clusters with feature selection method used to genes are selected.

Paper Detail
1330
downloads
18
9996921
A New Hybrid K-Mean-Quick Reduct Algorithm for Gene Selection
Abstract:

Feature selection is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that all genes are not important in gene expression data. Some of the genes may be redundant, and others may be irrelevant and noisy. Here a novel approach is proposed Hybrid K-Mean-Quick Reduct (KMQR) algorithm for gene selection from gene expression data. In this study, the entire dataset is divided into clusters by applying K-Means algorithm. Each cluster contains similar genes. The high class discriminated genes has been selected based on their degree of dependence by applying Quick Reduct algorithm to all the clusters. Average Correlation Value (ACV) is calculated for the high class discriminated genes. The clusters which have the ACV value as 1 is determined as significant clusters, whose classification accuracy will be equal or high when comparing to the accuracy of the entire dataset. The proposed algorithm is evaluated using WEKA classifiers and compared. The proposed work shows that the high classification accuracy.

Paper Detail
1678
downloads
17
3257
Geographic Profiling Based on Multi-point Centrography with K-means Clustering
Abstract:

Geographic Profiling has successfully assisted investigations for serial crimes. Considering the multi-cluster feature of serial criminal spots, we propose a Multi-point Centrography model as a natural extension of Single-point Centrography for geographic profiling. K-means clustering is first performed on the data samples and then Single-point Centrography is adopted to derive a probability distribution on each cluster. Finally, a weighted combinations of each distribution is formed to make next-crime spot prediction. Experimental study on real cases demonstrates the effectiveness of our proposed model.

Paper Detail
1340
downloads
16
631
Ontology-based Concept Weighting for Text Documents
Abstract:
Documents clustering become an essential technology with the popularity of the Internet. That also means that fast and high-quality document clustering technique play core topics. Text clustering or shortly clustering is about discovering semantically related groups in an unstructured collection of documents. Clustering has been very popular for a long time because it provides unique ways of digesting and generalizing large amounts of information. One of the issues of clustering is to extract proper feature (concept) of a problem domain. The existing clustering technology mainly focuses on term weight calculation. To achieve more accurate document clustering, more informative features including concept weight are important. Feature Selection is important for clustering process because some of the irrelevant or redundant feature may misguide the clustering results. To counteract this issue, the proposed system presents the concept weight for text clustering system developed based on a k-means algorithm in accordance with the principles of ontology so that the important of words of a cluster can be identified by the weight values. To a certain extent, it has resolved the semantic problem in specific areas.
Paper Detail
1752
downloads
15
8719
Image Segmentation Using the K-means Algorithm for Texture Features
Abstract:
This study aims to segment objects using the K-means algorithm for texture features. Firstly, the algorithm transforms color images into gray images. This paper describes a novel technique for the extraction of texture features in an image. Then, in a group of similar features, objects and backgrounds are differentiated by using the K-means algorithm. Finally, this paper proposes a new object segmentation algorithm using the morphological technique. The experiments described include the segmentation of single and multiple objects featured in this paper. The region of an object can be accurately segmented out. The results can help to perform image retrieval and analyze features of an object, as are shown in this paper.
Paper Detail
1725
downloads
14
322
A Growing Natural Gas Approach for Evaluating Quality of Software Modules
Abstract:

The prediction of Software quality during development life cycle of software project helps the development organization to make efficient use of available resource to produce the product of highest quality. “Whether a module is faulty or not" approach can be used to predict quality of a software module. There are numbers of software quality prediction models described in the literature based upon genetic algorithms, artificial neural network and other data mining algorithms. One of the promising aspects for quality prediction is based on clustering techniques. Most quality prediction models that are based on clustering techniques make use of K-means, Mixture-of-Guassians, Self-Organizing Map, Neural Gas and fuzzy K-means algorithm for prediction. In all these techniques a predefined structure is required that is number of neurons or clusters should be known before we start clustering process. But in case of Growing Neural Gas there is no need of predetermining the quantity of neurons and the topology of the structure to be used and it starts with a minimal neurons structure that is incremented during training until it reaches a maximum number user defined limits for clusters. Hence, in this work we have used Growing Neural Gas as underlying cluster algorithm that produces the initial set of labeled cluster from training data set and thereafter this set of clusters is used to predict the quality of test data set of software modules. The best testing results shows 80% accuracy in evaluating the quality of software modules. Hence, the proposed technique can be used by programmers in evaluating the quality of modules during software development.

Paper Detail
1228
downloads
13
891
A New Evolutionary Algorithm for Cluster Analysis
Abstract:

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique depend on the initialization of cluster centers and the final solution converges to local minima. In order to overcome K-means algorithm shortcomings, this paper proposes a hybrid evolutionary algorithm based on the combination of PSO, SA and K-means algorithms, called PSO-SA-K, which can find better cluster partition. The performance is evaluated through several benchmark data sets. The simulation results show that the proposed algorithm outperforms previous approaches, such as PSO, SA and K-means for partitional clustering problem.

Paper Detail
1639
downloads
12
10736
Application of a New Hybrid Optimization Algorithm on Cluster Analysis
Abstract:

Clustering techniques have received attention in many areas including engineering, medicine, biology and data mining. The purpose of clustering is to group together data points, which are close to one another. The K-means algorithm is one of the most widely used techniques for clustering. However, K-means has two shortcomings: dependency on the initial state and convergence to local optima and global solutions of large problems cannot found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. This paper is presented an efficient hybrid evolutionary optimization algorithm based on combining Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO), called PSO-ACO, for optimally clustering N object into K clusters. The new PSO-ACO algorithm is tested on several data sets, and its performance is compared with those of ACO, PSO and K-means clustering. The simulation results show that the proposed evolutionary optimization algorithm is robust and suitable for handing data clustering.

Paper Detail
1594
downloads
11
11050
Determining Cluster Boundaries Using Particle Swarm Optimization
Abstract:

Self-organizing map (SOM) is a well known data reduction technique used in data mining. Data visualization can reveal structure in data sets that is otherwise hard to detect from raw data alone. However, interpretation through visual inspection is prone to errors and can be very tedious. There are several techniques for the automatic detection of clusters of code vectors found by SOMs, but they generally do not take into account the distribution of code vectors; this may lead to unsatisfactory clustering and poor definition of cluster boundaries, particularly where the density of data points is low. In this paper, we propose the use of a generic particle swarm optimization (PSO) algorithm for finding cluster boundaries directly from the code vectors obtained from SOMs. The application of our method to unlabeled call data for a mobile phone operator demonstrates its feasibility. PSO algorithm utilizes U-matrix of SOMs to determine cluster boundaries; the results of this novel automatic method correspond well to boundary detection through visual inspection of code vectors and k-means algorithm.

Paper Detail
1152
downloads
10
12161
Color Image Segmentation Using Competitive and Cooperative Learning Approach
Abstract:
Color image segmentation can be considered as a cluster procedure in feature space. k-means and its adaptive version, i.e. competitive learning approach are powerful tools for data clustering. But k-means and competitive learning suffer from several drawbacks such as dead-unit problem and need to pre-specify number of cluster. In this paper, we will explore to use competitive and cooperative learning approach to perform color image segmentation. In competitive and cooperative learning approach, seed points not only compete each other, but also the winner will dynamically select several nearest competitors to form a cooperative team to adapt to the input together, finally it can automatically select the correct number of cluster and avoid the dead-units problem. Experimental results show that CCL can obtain better segmentation result.
Paper Detail
1096
downloads
9
4854
Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning along the Data Axis with the Highest Variance
Abstract:
In this paper, we propose an algorithm to compute initial cluster centers for K-means clustering. Data in a cell is partitioned using a cutting plane that divides cell in two smaller cells. The plane is perpendicular to the data axis with the highest variance and is designed to reduce the sum squared errors of the two cells as much as possible, while at the same time keep the two cells far apart as possible. Cells are partitioned one at a time until the number of cells equals to the predefined number of clusters, K. The centers of the K cells become the initial cluster centers for K-means. The experimental results suggest that the proposed algorithm is effective, converge to better clustering results than those of the random initialization method. The research also indicated the proposed algorithm would greatly improve the likelihood of every cluster containing some data in it.
Paper Detail
2099
downloads
8
2779
A Software Framework for Predicting Oil-Palm Yield from Climate Data
Abstract:
Intelligent systems based on machine learning techniques, such as classification, clustering, are gaining wide spread popularity in real world applications. This paper presents work on developing a software system for predicting crop yield, for example oil-palm yield, from climate and plantation data. At the core of our system is a method for unsupervised partitioning of data for finding spatio-temporal patterns in climate data using kernel methods which offer strength to deal with complex data. This work gets inspiration from the notion that a non-linear data transformation into some high dimensional feature space increases the possibility of linear separability of the patterns in the transformed space. Therefore, it simplifies exploration of the associated structure in the data. Kernel methods implicitly perform a non-linear mapping of the input data into a high dimensional feature space by replacing the inner products with an appropriate positive definite function. In this paper we present a robust weighted kernel k-means algorithm incorporating spatial constraints for clustering the data. The proposed algorithm can effectively handle noise, outliers and auto-correlation in the spatial data, for effective and efficient data analysis by exploring patterns and structures in the data, and thus can be used for predicting oil-palm yield by analyzing various factors affecting the yield.
Paper Detail
1262
downloads
7
4738
Neural Networks Learning Improvement using the K-Means Clustering Algorithm to Detect Network Intrusions
Abstract:
In the present work, we propose a new technique to enhance the learning capabilities and reduce the computation intensity of a competitive learning multi-layered neural network using the K-means clustering algorithm. The proposed model use multi-layered network architecture with a back propagation learning mechanism. The K-means algorithm is first applied to the training dataset to reduce the amount of samples to be presented to the neural network, by automatically selecting an optimal set of samples. The obtained results demonstrate that the proposed technique performs exceptionally in terms of both accuracy and computation time when applied to the KDD99 dataset compared to a standard learning schema that use the full dataset.
Paper Detail
2003
downloads
6
7741
A Quantum-Inspired Evolutionary Algorithm forMultiobjective Image Segmentation
Abstract:
In this paper we present a new approach to deal with image segmentation. The fact that a single segmentation result do not generally allow a higher level process to take into account all the elements included in the image has motivated the consideration of image segmentation as a multiobjective optimization problem. The proposed algorithm adopts a split/merge strategy that uses the result of the k-means algorithm as input for a quantum evolutionary algorithm to establish a set of non-dominated solutions. The evaluation is made simultaneously according to two distinct features: intra-region homogeneity and inter-region heterogeneity. The experimentation of the new approach on natural images has proved its efficiency and usefulness.
Paper Detail
1485
downloads
5
2743
A New Vector Quantization Front-End Process for Discrete HMM Speech Recognition System
Abstract:

The paper presents a complete discrete statistical framework, based on a novel vector quantization (VQ) front-end process. This new VQ approach performs an optimal distribution of VQ codebook components on HMM states. This technique that we named the distributed vector quantization (DVQ) of hidden Markov models, succeeds in unifying acoustic micro-structure and phonetic macro-structure, when the estimation of HMM parameters is performed. The DVQ technique is implemented through two variants. The first variant uses the K-means algorithm (K-means- DVQ) to optimize the VQ, while the second variant exploits the benefits of the classification behavior of neural networks (NN-DVQ) for the same purpose. The proposed variants are compared with the HMM-based baseline system by experiments of specific Arabic consonants recognition. The results show that the distributed vector quantization technique increase the performance of the discrete HMM system.

Paper Detail
1407
downloads
4
13536
A Genetic Algorithm for Clustering on Image Data
Abstract:

Clustering is the process of subdividing an input data set into a desired number of subgroups so that members of the same subgroup are similar and members of different subgroups have diverse properties. Many heuristic algorithms have been applied to the clustering problem, which is known to be NP Hard. Genetic algorithms have been used in a wide variety of fields to perform clustering, however, the technique normally has a long running time in terms of input set size. This paper proposes an efficient genetic algorithm for clustering on very large data sets, especially on image data sets. The genetic algorithm uses the most time efficient techniques along with preprocessing of the input data set. We test our algorithm on both artificial and real image data sets, both of which are of large size. The experimental results show that our algorithm outperforms the k-means algorithm in terms of running time as well as the quality of the clustering.

Paper Detail
1412
downloads
3
7590
A New Algorithm for Cluster Initialization
Abstract:

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the k-means algorithm. Solutions obtained from this technique are dependent on the initialization of cluster centers. In this article we propose a new algorithm to initialize the clusters. The proposed algorithm is based on finding a set of medians extracted from a dimension with maximum variance. The algorithm has been applied to different data sets and good results are obtained.

Paper Detail
1398
downloads
2
10285
Customer Segmentation in Foreign Trade based on Clustering Algorithms Case Study: Trade Promotion Organization of Iran
Abstract:
The goal of this paper is to segment the countries based on the value of export from Iran during 14 years ending at 2005. To measure the dissimilarity among export baskets of different countries, we define Dissimilarity Export Basket (DEB) function and use this distance function in K-means algorithm. The DEB function is defined based on the concepts of the association rules and the value of export group-commodities. In this paper, clustering quality function and clusters intraclass inertia are defined to, respectively, calculate the optimum number of clusters and to compare the functionality of DEB versus Euclidean distance. We have also study the effects of importance weight in DEB function to improve clustering quality. Lastly when segmentation is completed, a designated RFM model is used to analyze the relative profitability of each cluster.
Paper Detail
1522
downloads
1
4766
Object-Based Image Indexing and Retrieval in DCT Domain using Clustering Techniques
Abstract:

In this paper, we present a new and effective image indexing technique that extracts features directly from DCT domain. Our proposed approach is an object-based image indexing. For each block of size 8*8 in DCT domain a feature vector is extracted. Then, feature vectors of all blocks of image using a k-means algorithm is clustered into groups. Each cluster represents a special object of the image. Then we select some clusters that have largest members after clustering. The centroids of the selected clusters are taken as image feature vectors and indexed into the database. Also, we propose an approach for using of proposed image indexing method in automatic image classification. Experimental results on a database of 800 images from 8 semantic groups in automatic image classification are reported.

Paper Detail
1457
downloads