International Science Index
Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques
Sentiment analysis and opinion mining have become
emerging topics of research in recent years but most of the work
is focused on data in the English language. A comprehensive
research and analysis are essential which considers multiple
languages, machine translation techniques, and different classifiers.
This paper presents, a comparative analysis of different approaches
for multilingual sentiment analysis. These approaches are divided
into two parts: one using classification of text without language
translation and second using the translation of testing data to a
target language, such as English, before classification. The presented
research and results are useful for understanding whether machine
translation should be used for multilingual sentiment analysis or
building language specific sentiment classification systems is a better
approach. The effects of language translation techniques, features,
and accuracy of various classifiers for multilingual sentiment analysis
is also discussed in this study.
Causal Relation Identification Using Convolutional Neural Networks and Knowledge Based Features
Causal relation identification is a crucial task in information extraction and knowledge discovery. In this work, we present two approaches to causal relation identification. The first is a classification model trained on a set of knowledge-based features. The second is a deep learning based approach training a model using convolutional neural networks to classify causal relations. We experiment with several different convolutional neural networks (CNN) models based on previous work on relation extraction as well as our own research. Our models are able to identify both explicit and implicit causal relations as well as the direction of the causal relation. The results of our experiments show a higher accuracy than previously achieved for causal relation identification tasks.
Load Forecasting in Microgrid Systems with R and Cortana Intelligence Suite
Energy production optimization has been traditionally very important for utilities in order to improve resource consumption. However, load forecasting is a challenging task, as there are a large number of relevant variables that must be considered, and several strategies have been used to deal with this complex problem. This is especially true also in microgrids where many elements have to adjust their performance depending on the future generation and consumption conditions. The goal of this paper is to present a solution for short-term load forecasting in microgrids, based on three machine learning experiments developed in R and web services built and deployed with different components of Cortana Intelligence Suite: Azure Machine Learning, a fully managed cloud service that enables to easily build, deploy, and share predictive analytics solutions; SQL database, a Microsoft database service for app developers; and PowerBI, a suite of business analytics tools to analyze data and share insights. Our results show that Boosted Decision Tree and Fast Forest Quantile regression methods can be very useful to predict hourly short-term consumption in microgrids; moreover, we found that for these types of forecasting models, weather data (temperature, wind, humidity and dew point) can play a crucial role in improving the accuracy of the forecasting solution. Data cleaning and feature engineering methods performed in R and different types of machine learning algorithms (Boosted Decision Tree, Fast Forest Quantile and ARIMA) will be presented, and results and performance metrics discussed.
Stackelberg Security Game for Optimizing Security of Federated Internet of Things Platform Instances
This paper presents an approach for optimal cyber security decisions to protect instances of a federated Internet of Things (IoT) platform in the cloud. The presented solution implements the repeated Stackelberg Security Game (SSG) and a model called Stochastic Human behaviour model with AttRactiveness and Probability weighting (SHARP). SHARP employs the Subjective Utility Quantal Response (SUQR) for formulating a subjective utility function, which is based on the evaluations of alternative solutions during decision-making. We augment the repeated SSG (including SHARP and SUQR) with a reinforced learning algorithm called Naïve Q-Learning. Naïve Q-Learning belongs to the category of active and model-free Machine Learning (ML) techniques in which the agent (either the defender or the attacker) attempts to find an optimal security solution. In this way, we combine GT and ML algorithms for discovering optimal cyber security policies. The proposed security optimization components will be validated in a collaborative cloud platform that is based on the Industrial Internet Reference Architecture (IIRA) and its recently published security model.
Feature Selection and Predictive Modeling of Housing Data Using Random Forest
Predictive data analysis and modeling involving machine learning techniques become challenging in presence of too many explanatory variables or features. Presence of too many features in machine learning is known to not only cause algorithms to slow down, but they can also lead to decrease in model prediction accuracy. This study involves housing dataset with 79 quantitative and qualitative features that describe various aspects people consider while buying a new house. Boruta algorithm that supports feature selection using a wrapper approach build around random forest is used in this study. This feature selection process leads to 49 confirmed features which are then used for developing predictive random forest models. The study also explores five different data partitioning ratios and their impact on model accuracy are captured using coefficient of determination (r-square) and root mean square error (rsme).
Urban Big Data: An Experimental Approach to Building-Value Estimation Using Web-Based Data
Current real-estate value estimation, difficult for laymen, usually is performed by specialists. This paper presents an automated estimation process based on big data and machine-learning technology that calculates influences of building conditions on real-estate price measurement. The present study analyzed actual building sales sample data for Nonhyeon-dong, Gangnam-gu, Seoul, Korea, measuring the major influencing factors among the various building conditions. Further to that analysis, a prediction model was established and applied using RapidMiner Studio, a graphical user interface (GUI)-based tool for derivation of machine-learning prototypes. The prediction model is formulated by reference to previous examples. When new examples are applied, it analyses and predicts accordingly. The analysis process discerns the crucial factors effecting price increases by calculation of weighted values. The model was verified, and its accuracy determined, by comparing its predicted values with actual price increases.
Building a Scalable Telemetry Based Multiclass Predictive Maintenance Model in R
Many organizations are faced with the challenge of how to analyze and build Machine Learning models using their sensitive telemetry data. In this paper, we discuss how users can leverage the power of R without having to move their big data around as well as a cloud based solution for organizations willing to host their data in the cloud. By using ScaleR technology to benefit from parallelization and remote computing or R Services on premise or in the cloud, users can leverage the power of R at scale without having to move their data around.
Development of Prediction Models of Day-Ahead Hourly Building Electricity Consumption and Peak Power Demand Using the Machine Learning Method
To encourage building owners to purchase electricity at the wholesale market and reduce building peak demand, this study aims to develop models that predict day-ahead hourly electricity consumption and demand using artificial neural network (ANN) and support vector machine (SVM). All prediction models are built in Python, with tool Scikit-learn and Pybrain. The input data for both consumption and demand prediction are time stamp, outdoor dry bulb temperature, relative humidity, air handling unit (AHU), supply air temperature and solar radiation. Solar radiation, which is unavailable a day-ahead, is predicted at first, and then this estimation is used as an input to predict consumption and demand. Models to predict consumption and demand are trained in both SVM and ANN, and depend on cooling or heating, weekdays or weekends. The results show that ANN is the better option for both consumption and demand prediction. It can achieve 15.50% to 20.03% coefficient of variance of root mean square error (CVRMSE) for consumption prediction and 22.89% to 32.42% CVRMSE for demand prediction, respectively. To conclude, the presented models have potential to help building owners to purchase electricity at the wholesale market, but they are not robust when used in demand response control.
A Minimum Spanning Tree-Based Method for Initializing the K-Means Clustering Algorithm
The traditional k-means algorithm has been widely used as a simple and efficient clustering method. However, the algorithm often converges to local minima for the reason that it is sensitive to the initial cluster centers. In this paper, an algorithm for selecting initial cluster centers on the basis of minimum spanning tree (MST) is presented. The set of vertices in MST with same degree are regarded as a whole which is used to find the skeleton data points. Furthermore, a distance measure between the skeleton data points with consideration of degree and Euclidean distance is presented. Finally, MST-based initialization method for the k-means algorithm is presented, and the corresponding time complexity is analyzed as well. The presented algorithm is tested on five data sets from the UCI Machine Learning Repository. The experimental results illustrate the effectiveness of the presented algorithm compared to three existing initialization methods.
Prediction-Based Midterm Operation Planning for Energy Management of Exhibition Hall
Large exhibition halls require a lot of energy to maintain comfortable atmosphere for the visitors viewing inside. One way of reducing the energy cost is to have thermal energy storage systems installed so that the thermal energy can be stored in the middle of night when the energy price is low and then used later when the price is high. To minimize the overall energy cost, however, we should be able to decide how much energy to save during which time period exactly. If we can foresee future energy load and the corresponding cost, we will be able to make such decisions reasonably. In this paper, we use machine learning technique to obtain models for predicting weather conditions and the number of visitors on hourly basis for the next day. Based on the energy load thus predicted, we build a cost-optimal daily operation plan for the thermal energy storage systems and cooling and heating facilities through simulation-based optimization.
Hand Gesture Interpretation Using Sensing Glove Integrated with Machine Learning Algorithms
In this paper, we present a low cost design for a smart glove that can perform sign language recognition to assist the speech impaired people. Specifically, we have designed and developed an Assistive Hand Gesture Interpreter that recognizes hand movements relevant to the American Sign Language (ASL) and translates them into text for display on a Thin-Film-Transistor Liquid Crystal Display (TFT LCD) screen as well as synthetic speech. Linear Bayes Classifiers and Multilayer Neural Networks have been used to classify 11 feature vectors obtained from the sensors on the glove into one of the 27 ASL alphabets and a predefined gesture for space. Three types of features are used; bending using six bend sensors, orientation in three dimensions using accelerometers and contacts at vital points using contact sensors. To gauge the performance of the presented design, the training database was prepared using five volunteers. The accuracy of the current version on the prepared dataset was found to be up to 99.3% for target user. The solution combines electronics, e-textile technology, sensor technology, embedded system and machine learning techniques to build a low cost wearable glove that is scrupulous, elegant and portable.
Chemical Reaction Algorithm for Expectation Maximization Clustering
Clustering is an intensive research for some years
because of its multifaceted applications, such as biology, information
retrieval, medicine, business and so on. The expectation maximization
(EM) is a kind of algorithm framework in clustering methods, one
of the ten algorithms of machine learning. Traditionally, optimization
of objective function has been the standard approach in EM. Hence,
research has investigated the utility of evolutionary computing and
related techniques in the regard. Chemical Reaction Optimization
(CRO) is a recently established method. So the property embedded
in CRO is used to solve optimization problems. This paper presents
an algorithm framework (EM-CRO) with modified CRO operators
based on EM cluster problems. The hybrid algorithm is mainly
to solve the problem of initial value sensitivity of the objective
function optimization clustering algorithm. Our experiments mainly
take the EM classic algorithm:k-means and fuzzy k-means as an
example, through the CRO algorithm to optimize its initial value, get
K-means-CRO and FKM-CRO algorithm. The experimental results
of them show that there is improved efficiency for solving objective
function optimization clustering problems.
Performance Comparison of Different Regression Methods for a Polymerization Process with Adaptive Sampling
Developing complete mechanistic models for polymerization reactors is not easy, because complex reactions occur simultaneously; there is a large number of kinetic parameters involved and sometimes the chemical and physical phenomena for mixtures involving polymers are poorly understood. To overcome these difficulties, empirical models based on sampled data can be used instead, namely regression methods typical of machine learning field. They have the ability to learn the trends of a process without any knowledge about its particular physical and chemical laws. Therefore, they are useful for modeling complex processes, such as the free radical polymerization of methyl methacrylate achieved in a batch bulk process. The goal is to generate accurate predictions of monomer conversion, numerical average molecular weight and gravimetrical average molecular weight. This process is associated with non-linear gel and glass effects. For this purpose, an adaptive sampling technique is presented, which can select more samples around the regions where the values have a higher variation. Several machine learning methods are used for the modeling and their performance is compared: support vector machines, k-nearest neighbor, k-nearest neighbor and random forest, as well as an original algorithm, large margin nearest neighbor regression. The suggested method provides very good results compared to the other well-known regression algorithms.
Context Detection in Spreadsheets Based on Automatically Inferred Table Schema
Programming requires years of training. With natural language and end user development methods, programming could become available to everyone. It enables end users to program their own devices and extend the functionality of the existing system without any knowledge of programming languages. In this paper, we describe an Interactive Spreadsheet Processing Module (ISPM), a natural language interface to spreadsheets that allows users to address ranges within the spreadsheet based on inferred table schema. Using the ISPM, end users are able to search for values in the schema of the table and to address the data in spreadsheets implicitly. Furthermore, it enables them to select and sort the spreadsheet data by using natural language. ISPM uses a machine learning technique to automatically infer areas within a spreadsheet, including different kinds of headers and data ranges. Since ranges can be identified from natural language queries, the end users can query the data using natural language. During the evaluation 12 undergraduate students were asked to perform operations (sum, sort, group and select) using the system and also Excel without ISPM interface, and the time taken for task completion was compared across the two systems. Only for the selection task did users take less time in Excel (since they directly selected the cells using the mouse) than in ISPM, by using natural language for end user software engineering, to overcome the present bottleneck of professional developers.
A Static Android Malware Detection Based on Actual Used Permissions Combination and API Calls
Android operating system has been recognized by most application developers because of its good open-source and compatibility, which enriches the categories of applications greatly. However, it has become the target of malware attackers due to the lack of strict security supervision mechanisms, which leads to the rapid growth of malware, thus bringing serious safety hazards to users. Therefore, it is critical to detect Android malware effectively. Generally, the permissions declared in the AndroidManifest.xml can reflect the function and behavior of the application to a large extent. Since current Android system has not any restrictions to the number of permissions that an application can request, developers tend to apply more than actually needed permissions in order to ensure the successful running of the application, which results in the abuse of permissions. However, some traditional detection methods only consider the requested permissions and ignore whether it is actually used, which leads to incorrect identification of some malwares. Therefore, a machine learning detection method based on the actually used permissions combination and API calls was put forward in this paper. Meanwhile, several experiments are conducted to evaluate our methodology. The result shows that it can detect unknown malware effectively with higher true positive rate and accuracy while maintaining a low false positive rate. Consequently, the AdaboostM1 (J48) classification algorithm based on information gain feature selection algorithm has the best detection result, which can achieve an accuracy of 99.8%, a true positive rate of 99.6% and a lowest false positive rate of 0.
Mining User-Generated Contents to Detect Service Failures with Topic Model
Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.
Road Accidents Bigdata Mining and Visualization Using Support Vector Machines
Useful information has been extracted from the
road accident data in United Kingdom (UK), using data analytics
method, for avoiding possible accidents in rural and urban areas.
This analysis make use of several methodologies such as data
integration, support vector machines (SVM), correlation machines
and multinomial goodness. The entire datasets have been imported
from the traffic department of UK with due permission. The
information extracted from these huge datasets forms a basis for
several predictions, which in turn avoid unnecessary memory
lapses. Since data is expected to grow continuously over a period
of time, this work primarily proposes a new framework model
which can be trained and adapt itself to new data and make
accurate predictions. This work also throws some light on use of
SVM’s methodology for text classifiers from the obtained traffic
data. Finally, it emphasizes the uniqueness and adaptability of
SVMs methodology appropriate for this kind of research work.
Automatic Threshold Search for Heat Map Based Feature Selection: A Cancer Dataset Analysis
Public health is one of the most critical issues today;
therefore, there is great interest to improve technologies in the area
of diseases detection. With machine learning and feature selection,
it has been possible to aid the diagnosis of several diseases such
as cancer. In this work, we present an extension to the Heat Map
Based Feature Selection algorithm, this modification allows automatic
threshold parameter selection that helps to improve the generalization
performance of high dimensional data such as mass spectrometry.
We have performed a comparison analysis using multiple cancer
datasets and compare against the well known Recursive Feature
Elimination algorithm and our original proposal, the results show
improved classification performance that is very competitive against
Towards Developing a Self-Explanatory Scheduling System Based on a Hybrid Approach
In the study, we present a conceptual framework for developing a scheduling system that can generate self-explanatory and easy-understanding schedules. To this end, a user interface is conceived to help planners record factors that are considered crucial in scheduling, as well as internal and external sources relating to such factors. A hybrid approach combining machine learning and constraint programming is developed to generate schedules and the corresponding factors, and accordingly display them on the user interface. Effects of the proposed system on scheduling are discussed, and it is expected that scheduling efficiency and system understandability will be improved, compared with previous scheduling systems.
Evaluation of Ensemble Classifiers for Intrusion Detection
One of the major developments in machine learning in the past decade is the ensemble method, which finds highly accurate classifier by combining many moderately accurate component classifiers. In this research work, new ensemble classification methods are proposed with homogeneous ensemble classifier using bagging and heterogeneous ensemble classifier using arcing and their performances are analyzed in terms of accuracy. A Classifier ensemble is designed using Radial Basis Function (RBF) and Support Vector Machine (SVM) as base classifiers. The feasibility and the benefits of the proposed approaches are demonstrated by the means of standard datasets of intrusion detection. The main originality of the proposed approach is based on three main parts: preprocessing phase, classification phase, and combining phase. A wide range of comparative experiments is conducted for standard datasets of intrusion detection. The performance of the proposed homogeneous and heterogeneous ensemble classifiers are compared to the performance of other standard homogeneous and heterogeneous ensemble methods. The standard homogeneous ensemble methods include Error correcting output codes, Dagging and heterogeneous ensemble methods include majority voting, stacking. The proposed ensemble methods provide significant improvement of accuracy compared to individual classifiers and the proposed bagged RBF and SVM performs significantly better than ECOC and Dagging and the proposed hybrid RBF-SVM performs significantly better than voting and stacking. Also heterogeneous models exhibit better results than homogeneous models for standard datasets of intrusion detection.
Ubiquitous Life People Informatics Engine (U-Life PIE): Wearable Health Promotion System
Since Google launched Google Glass in 2012, numbers of commercial wearable devices were released, such as smart belt, smart band, smart shoes, smart clothes ... etc. However, most of these devices perform as sensors to show the readings of measurements and few of them provide the interactive feedback to the user. Furthermore, these devices are single task devices which are not able to communicate with each other. In this paper a new health promotion system, Ubiquitous Life People Informatics Engine (U-Life PIE), will be presented. This engine consists of People Informatics Engine (PIE) and the interactive user interface. PIE collects all the data from the compatible devices, analyzes this data comprehensively and communicates between devices via various application programming interfaces. All the data and informations are stored on the PIE unit, therefore, the user is able to view the instant and historical data on their mobile devices any time. It also provides the real-time hands-free feedback and instructions through the user interface visually, acoustically and tactilely. These feedback and instructions suggest the user to adjust their posture or habits in order to avoid the physical injuries and prevent illness.
An Analysis of Classification of Imbalanced Datasets by Using Synthetic Minority Over-Sampling Technique
Analysing unbalanced datasets is one of the challenges that practitioners in machine learning field face. However, many researches have been carried out to determine the effectiveness of the use of the synthetic minority over-sampling technique (SMOTE) to address this issue. The aim of this study was therefore to compare the effectiveness of the SMOTE over different models on unbalanced datasets. Three classification models (Logistic Regression, Support Vector Machine and Nearest Neighbour) were tested with multiple datasets, then the same datasets were oversampled by using SMOTE and applied again to the three models to compare the differences in the performances. Results of experiments show that the highest number of nearest neighbours gives lower values of error rates.
Case-Based Reasoning: A Hybrid Classification Model Improved with an Expert's Knowledge for High-Dimensional Problems
Data mining and classification of objects is the process of data analysis, using various machine learning techniques, which is used today in various fields of research. This paper presents a concept of hybrid classification model improved with the expert knowledge. The hybrid model in its algorithm has integrated several machine learning techniques (Information Gain, K-means, and Case-Based Reasoning) and the expert’s knowledge into one. The knowledge of experts is used to determine the importance of features. The paper presents the model algorithm and the results of the case study in which the emphasis was put on achieving the maximum classification accuracy without reducing the number of features.
A Case-Based Reasoning-Decision Tree Hybrid System for Stock Selection
Stock selection is an important decision-making problem. Many machine learning and data mining technologies are employed to build automatic stock-selection system. A profitable stock-selection system should consider the stock’s investment value and the market timing. In this paper, we present a hybrid system including both engage for stock selection. This system uses a case-based reasoning (CBR) model to execute the stock classification, uses a decision-tree model to help with market timing and stock selection. The experiments show that the performance of this hybrid system is better than that of other techniques regarding to the classification accuracy, the average return and the Sharpe ratio.
Comparing Emotion Recognition from Voice and Facial Data Using Time Invariant Features
The problem of emotion recognition is a challenging problem. It is still an open problem from the aspect of both intelligent systems and psychology. In this paper, both voice features and facial features are used for building an emotion recognition system. A Support Vector Machine classifiers are built by using raw data from video recordings. In this paper, the results obtained for the emotion recognition are given, and a discussion about the validity and the expressiveness of different emotions is presented. A comparison between the classifiers build from facial data only, voice data only and from the combination of both data is made here. The need for a better combination of the information from facial expression and voice data is argued.
An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data
The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.
Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting
In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.
Discriminant Analysis as a Function of Predictive Learning to Select Evolutionary Algorithms in Intelligent Transportation System
In this paper, we present the use of the discriminant analysis to select evolutionary algorithms that better solve instances of the vehicle routing problem with time windows. We use indicators as independent variables to obtain the classification criteria, and the best algorithm from the generic genetic algorithm (GA), random search (RS), steady-state genetic algorithm (SSGA), and sexual genetic algorithm (SXGA) as the dependent variable for the classification. The discriminant classification was trained with classic instances of the vehicle routing problem with time windows obtained from the Solomon benchmark. We obtained a classification of the discriminant analysis of 66.7%.
Prediction of MicroRNA-Target Gene by Machine Learning Algorithms in Lung Cancer Study
MicroRNAs are small non-coding RNA found in
many different species. They play crucial roles in cancer such as
biological processes of apoptosis and proliferation. The identification
of microRNA-target genes can be an essential first step towards to
reveal the role of microRNA in various cancer types. In this paper,
we predict miRNA-target genes for lung cancer by integrating
prediction scores from miRanda and PITA algorithms used as a
feature vector of miRNA-target interaction. Then, machine-learning
algorithms were implemented for making a final prediction. The
approach developed in this study should be of value for future studies
into understanding the role of miRNAs in molecular mechanisms
enabling lung cancer formation.
Performance Comparison of Situation-Aware Models for Activating Robot Vacuum Cleaner in a Smart Home
We assume an IoT-based smart-home environment where the on-off status of each of the electrical appliances including the room lights can be recognized in a real time by monitoring and analyzing the smart meter data. At any moment in such an environment, we can recognize what the household or the user is doing by referring to the status data of the appliances. In this paper, we focus on a smart-home service that is to activate a robot vacuum cleaner at right time by recognizing the user situation, which requires a situation-aware model that can distinguish the situations that allow vacuum cleaning (Yes) from those that do not (No). We learn as our candidate models a few classifiers such as naïve Bayes, decision tree, and logistic regression that can map the appliance-status data into Yes and No situations. Our training and test data are obtained from simulations of user behaviors, in which a sequence of user situations such as cooking, eating, dish washing, and so on is generated with the status of the relevant appliances changed in accordance with the situation changes. During the simulation, both the situation transition and the resulting appliance status are determined stochastically. To compare the performances of the aforementioned classifiers we obtain their learning curves for different types of users through simulations. The result of our empirical study reveals that naïve Bayes achieves a slightly better classification accuracy than the other compared classifiers.