International Science Index

21
10007337
Distributed Coverage Control by Robot Networks in Unknown Environments Using a Modified EM Algorithm
Abstract:
In this paper, we study a distributed control algorithm for the problem of unknown area coverage by a network of robots. The coverage objective is to locate a set of targets in the area and to minimize the robots’ energy consumption. The robots have no prior knowledge about the location and also about the number of the targets in the area. One efficient approach that can be used to relax the robots’ lack of knowledge is to incorporate an auxiliary learning algorithm into the control scheme. A learning algorithm actually allows the robots to explore and study the unknown environment and to eventually overcome their lack of knowledge. The control algorithm itself is modeled based on game theory where the network of the robots use their collective information to play a non-cooperative potential game. The algorithm is tested via simulations to verify its performance and adaptability.
Paper Detail
75
downloads
20
10006027
A Reinforcement Learning Approach for Evaluation of Real-Time Disaster Relief Demand and Network Condition
Abstract:

Relief demand and transportation links availability is the essential information that is needed for every natural disaster operation. This information is not in hand once a disaster strikes. Relief demand and network condition has been evaluated based on prediction method in related works. Nevertheless, prediction seems to be over or under estimated due to uncertainties and may lead to a failure operation. Therefore, in this paper a stochastic programming model is proposed to evaluate real-time relief demand and network condition at the onset of a natural disaster. To address the time sensitivity of the emergency response, the proposed model uses reinforcement learning for optimization of the total relief assessment time. The proposed model is tested on a real size network problem. The simulation results indicate that the proposed model performs well in the case of collecting real-time information.

Paper Detail
675
downloads
19
10004827
A Hybrid Expert System for Generating Stock Trading Signals
Abstract:

In this paper, a hybrid expert system is developed by using fuzzy genetic network programming with reinforcement learning (GNP-RL). In this system, the frame-based structure of the system uses the trading rules extracted by GNP. These rules are extracted by using technical indices of the stock prices in the training time period. For developing this system, we applied fuzzy node transition and decision making in both processing and judgment nodes of GNP-RL. Consequently, using these method not only did increase the accuracy of node transition and decision making in GNP's nodes, but also extended the GNP's binary signals to ternary trading signals. In the other words, in our proposed Fuzzy GNP-RL model, a No Trade signal is added to conventional Buy or Sell signals. Finally, the obtained rules are used in a frame-based system implemented in Kappa-PC software. This developed trading system has been used to generate trading signals for ten companies listed in Tehran Stock Exchange (TSE). The simulation results in the testing time period shows that the developed system has more favorable performance in comparison with the Buy and Hold strategy.

Paper Detail
810
downloads
18
10002131
Bounded Rational Heterogeneous Agents in Artificial Stock Markets: Literature Review and Research Direction
Abstract:
In this paper, we provided a literature survey on the artificial stock problem (ASM). The paper began by exploring the complexity of the stock market and the needs for ASM. ASM aims to investigate the link between individual behaviors (micro level) and financial market dynamics (macro level). The variety of patterns at the macro level is a function of the AFM complexity. The financial market system is a complex system where the relationship between the micro and macro level cannot be captured analytically. Computational approaches, such as simulation, are expected to comprehend this connection. Agent-based simulation is a simulation technique commonly used to build AFMs. The paper proceeds by discussing the components of the ASM. We consider the roles of behavioral finance (BF) alongside the traditionally risk-averse assumption in the construction of agent’s attributes. Also, the influence of social networks in the developing of agents interactions is addressed. Network topologies such as a small world, distance-based, and scale-free networks may be utilized to outline economic collaborations. In addition, the primary methods for developing agents learning and adaptive abilities have been summarized. These incorporated approach such as Genetic Algorithm, Genetic Programming, Artificial neural network and Reinforcement Learning. In addition, the most common statistical properties (the stylized facts) of stock that are used for calibration and validation of ASM are discussed. Besides, we have reviewed the major related previous studies and categorize the utilized approaches as a part of these studies. Finally, research directions and potential research questions are argued. The research directions of ASM may focus on the macro level by analyzing the market dynamic or on the micro level by investigating the wealth distributions of the agents.
Paper Detail
1447
downloads
17
17298
Q-Learning with Eligibility Traces to Solve Non-Convex Economic Dispatch Problems
Abstract:

Economic Dispatch is one of the most important power system management tools. It is used to allocate an amount of power generation to the generating units to meet the load demand. The Economic Dispatch problem is a large scale nonlinear constrained optimization problem. In general, heuristic optimization techniques are used to solve non-convex Economic Dispatch problem. In this paper, ideas from Reinforcement Learning are proposed to solve the non-convex Economic Dispatch problem. Q-Learning is a reinforcement learning techniques where each generating unit learn the optimal schedule of the generated power that minimizes the generation cost function. The eligibility traces are used to speed up the Q-Learning process. Q-Learning with eligibility traces is used to solve Economic Dispatch problems with valve point loading effect, multiple fuel options, and power transmission losses.

Paper Detail
1163
downloads
16
1689
Biologically Inspired Controller for the Autonomous Navigation of a Mobile Robot in an Evasion Task
Abstract:
A novel biologically inspired controller for the autonomous navigation of a mobile robot in an evasion task is proposed. The controller takes advantage of the environment by calculating a measure of danger and subsequently choosing the parameters of a reinforcement learning based decision process. Two different reinforcement learning algorithms were used: Qlearning and Sarsa (λ). Simulations show that selecting dynamic parameters reduce the time while executing the decision making process, so the robot can obtain a policy to succeed in an escaping task in a realistic time.
Paper Detail
869
downloads
15
4981
Agent-based Simulation for Blood Glucose Control in Diabetic Patients
Abstract:
This paper employs a new approach to regulate the blood glucose level of type I diabetic patient under an intensive insulin treatment. The closed-loop control scheme incorporates expert knowledge about treatment by using reinforcement learning theory to maintain the normoglycemic average of 80 mg/dl and the normal condition for free plasma insulin concentration in severe initial state. The insulin delivery rate is obtained off-line by using Qlearning algorithm, without requiring an explicit model of the environment dynamics. The implementation of the insulin delivery rate, therefore, requires simple function evaluation and minimal online computations. Controller performance is assessed in terms of its ability to reject the effect of meal disturbance and to overcome the variability in the glucose-insulin dynamics from patient to patient. Computer simulations are used to evaluate the effectiveness of the proposed technique and to show its superiority in controlling hyperglycemia over other existing algorithms
Paper Detail
1045
downloads
14
6389
A Cognitive Robot Collaborative Reinforcement Learning Algorithm
Abstract:
A cognitive collaborative reinforcement learning algorithm (CCRL) that incorporates an advisor into the learning process is developed to improve supervised learning. An autonomous learner is enabled with a self awareness cognitive skill to decide when to solicit instructions from the advisor. The learner can also assess the value of advice, and accept or reject it. The method is evaluated for robotic motion planning using simulation. Tests are conducted for advisors with skill levels from expert to novice. The CCRL algorithm and a combined method integrating its logic with Clouse-s Introspection Approach, outperformed a base-line fully autonomous learner, and demonstrated robust performance when dealing with various advisor skill levels, learning to accept advice received from an expert, while rejecting that of less skilled collaborators. Although the CCRL algorithm is based on RL, it fits other machine learning methods, since advisor-s actions are only added to the outer layer.
Paper Detail
1018
downloads
13
2583
Behavioral Analysis of Team Members in Virtual Organization based on Trust Dimension and Learning
Abstract:
Trust management and Reputation models are becoming integral part of Internet based applications such as CSCW, E-commerce and Grid Computing. Also the trust dimension is a significant social structure and key to social relations within a collaborative community. Collaborative Decision Making (CDM) is a difficult task in the context of distributed environment (information across different geographical locations) and multidisciplinary decisions are involved such as Virtual Organization (VO). To aid team decision making in VO, Decision Support System and social network analysis approaches are integrated. In such situations social learning helps an organization in terms of relationship, team formation, partner selection etc. In this paper we focus on trust learning. Trust learning is an important activity in terms of information exchange, negotiation, collaboration and trust assessment for cooperation among virtual team members. In this paper we have proposed a reinforcement learning which enhances the trust decision making capability of interacting agents during collaboration in problem solving activity. Trust computational model with learning that we present is adapted for best alternate selection of new project in the organization. We verify our model in a multi-agent simulation where the agents in the community learn to identify trustworthy members, inconsistent behavior and conflicting behavior of agents.
Paper Detail
1225
downloads
12
12363
Optimizing Dialogue Strategy Learning Using Learning Automata
Abstract:
Modeling the behavior of the dialogue management in the design of a spoken dialogue system using statistical methodologies is currently a growing research area. This paper presents a work on developing an adaptive learning approach to optimize dialogue strategy. At the core of our system is a method formalizing dialogue management as a sequential decision making under uncertainty whose underlying probabilistic structure has a Markov Chain. Researchers have mostly focused on model-free algorithms for automating the design of dialogue management using machine learning techniques such as reinforcement learning. But in model-free algorithms there exist a dilemma in engaging the type of exploration versus exploitation. Hence we present a model-based online policy learning algorithm using interconnected learning automata for optimizing dialogue strategy. The proposed algorithm is capable of deriving an optimal policy that prescribes what action should be taken in various states of conversation so as to maximize the expected total reward to attain the goal and incorporates good exploration and exploitation in its updates to improve the naturalness of humancomputer interaction. We test the proposed approach using the most sophisticated evaluation framework PARADISE for accessing to the railway information system.
Paper Detail
986
downloads
11
13687
A Computer Model of Language Acquisition – Syllable Learning – Based on Hebbian Cell Assemblies and Reinforcement Learning
Abstract:
Investigating language acquisition is one of the most challenging problems in the area of studying language. Syllable learning as a level of language acquisition has a considerable significance since it plays an important role in language acquisition. Because of impossibility of studying language acquisition directly with children, especially in its developmental phases, computer models will be useful in examining language acquisition. In this paper a computer model of early language learning for syllable learning is proposed. It is guided by a conceptual model of syllable learning which is named Directions Into Velocities of Articulators model (DIVA). The computer model uses simple associational and reinforcement learning rules within neural network architecture which are inspired by neuroscience. Our simulation results verify the ability of the proposed computer model in producing phonemes during babbling and early speech. Also, it provides a framework for examining the neural basis of language learning and communication disorders.
Paper Detail
895
downloads
10
10811
Acquiring Contour Following Behaviour in Robotics through Q-Learning and Image-based States
Abstract:
In this work a visual and reactive contour following behaviour is learned by reinforcement. With artificial vision the environment is perceived in 3D, and it is possible to avoid obstacles that are invisible to other sensors that are more common in mobile robotics. Reinforcement learning reduces the need for intervention in behaviour design, and simplifies its adjustment to the environment, the robot and the task. In order to facilitate its generalisation to other behaviours and to reduce the role of the designer, we propose a regular image-based codification of states. Even though this is much more difficult, our implementation converges and is robust. Results are presented with a Pioneer 2 AT on a Gazebo 3D simulator.
Paper Detail
936
downloads
9
2609
Learning Classifier Systems Approach for Automated Discovery of Censored Production Rules
Abstract:
In the recent past Learning Classifier Systems have been successfully used for data mining. Learning Classifier System (LCS) is basically a machine learning technique which combines evolutionary computing, reinforcement learning, supervised or unsupervised learning and heuristics to produce adaptive systems. A LCS learns by interacting with an environment from which it receives feedback in the form of numerical reward. Learning is achieved by trying to maximize the amount of reward received. All LCSs models more or less, comprise four main components; a finite population of condition–action rules, called classifiers; the performance component, which governs the interaction with the environment; the credit assignment component, which distributes the reward received from the environment to the classifiers accountable for the rewards obtained; the discovery component, which is responsible for discovering better rules and improving existing ones through a genetic algorithm. The concatenate of the production rules in the LCS form the genotype, and therefore the GA should operate on a population of classifier systems. This approach is known as the 'Pittsburgh' Classifier Systems. Other LCS that perform their GA at the rule level within a population are known as 'Mitchigan' Classifier Systems. The most predominant representation of the discovered knowledge is the standard production rules (PRs) in the form of IF P THEN D. The PRs, however, are unable to handle exceptions and do not exhibit variable precision. The Censored Production Rules (CPRs), an extension of PRs, were proposed by Michalski and Winston that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: IF P THEN D UNLESS C, where Censor C is an exception to the rule. Such rules are employed in situations, in which conditional statement IF P THEN D holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence are tight or there is simply no information available as to whether it holds or not. Thus, the IF P THEN D part of CPR expresses important information, while the UNLESS C part acts only as a switch and changes the polarity of D to ~D. In this paper Pittsburgh style LCSs approach is used for automated discovery of CPRs. An appropriate encoding scheme is suggested to represent a chromosome consisting of fixed size set of CPRs. Suitable genetic operators are designed for the set of CPRs and individual CPRs and also appropriate fitness function is proposed that incorporates basic constraints on CPR. Experimental results are presented to demonstrate the performance of the proposed learning classifier system.
Paper Detail
938
downloads
8
7494
A Modular On-line Profit Sharing Approach in Multiagent Domains
Abstract:
How to coordinate the behaviors of the agents through learning is a challenging problem within multi-agent domains. Because of its complexity, recent work has focused on how coordinated strategies can be learned. Here we are interested in using reinforcement learning techniques to learn the coordinated actions of a group of agents, without requiring explicit communication among them. However, traditional reinforcement learning methods are based on the assumption that the environment can be modeled as Markov Decision Process, which usually cannot be satisfied when multiple agents coexist in the same environment. Moreover, to effectively coordinate each agent-s behavior so as to achieve the goal, it-s necessary to augment the state of each agent with the information about other existing agents. Whereas, as the number of agents in a multiagent environment increases, the state space of each agent grows exponentially, which will cause the combinational explosion problem. Profit sharing is one of the reinforcement learning methods that allow agents to learn effective behaviors from their experiences even within non-Markovian environments. In this paper, to remedy the drawback of the original profit sharing approach that needs much memory to store each state-action pair during the learning process, we firstly address a kind of on-line rational profit sharing algorithm. Then, we integrate the advantages of modular learning architecture with on-line rational profit sharing algorithm, and propose a new modular reinforcement learning model. The effectiveness of the technique is demonstrated using the pursuit problem.
Paper Detail
846
downloads
7
15667
Relational Representation in XCSF
Abstract:
Generalization is one of the most challenging issues of Learning Classifier Systems. This feature depends on the representation method which the system used. Considering the proposed representation schemes for Learning Classifier System, it can be concluded that many of them are designed to describe the shape of the region which the environmental states belong and the other relations of the environmental state with that region was ignored. In this paper, we propose a new representation scheme which is designed to show various relationships between the environmental state and the region that is specified with a particular classifier.
Paper Detail
696
downloads
6
11990
A Probabilistic Reinforcement-Based Approach to Conceptualization
Abstract:

Conceptualization strengthens intelligent systems in generalization skill, effective knowledge representation, real-time inference, and managing uncertain and indefinite situations in addition to facilitating knowledge communication for learning agents situated in real world. Concept learning introduces a way of abstraction by which the continuous state is formed as entities called concepts which are connected to the action space and thus, they illustrate somehow the complex action space. Of computational concept learning approaches, action-based conceptualization is favored because of its simplicity and mirror neuron foundations in neuroscience. In this paper, a new biologically inspired concept learning approach based on the probabilistic framework is proposed. This approach exploits and extends the mirror neuron-s role in conceptualization for a reinforcement learning agent in nondeterministic environments. In the proposed method, instead of building a huge numerical knowledge, the concepts are learnt gradually from rewards through interaction with the environment. Moreover the probabilistic formation of the concepts is employed to deal with uncertain and dynamic nature of real problems in addition to the ability of generalization. These characteristics as a whole distinguish the proposed learning algorithm from both a pure classification algorithm and typical reinforcement learning. Simulation results show advantages of the proposed framework in terms of convergence speed as well as generalization and asymptotic behavior because of utilizing both success and failures attempts through received rewards. Experimental results, on the other hand, show the applicability and effectiveness of the proposed method in continuous and noisy environments for a real robotic task such as maze as well as the benefits of implementing an incremental learning scenario in artificial agents.

Paper Detail
933
downloads
5
2732
Adaptive PID Controller based on Reinforcement Learning for Wind Turbine Control
Abstract:
A self tuning PID control strategy using reinforcement learning is proposed in this paper to deal with the control of wind energy conversion systems (WECS). Actor-Critic learning is used to tune PID parameters in an adaptive way by taking advantage of the model-free and on-line learning properties of reinforcement learning effectively. In order to reduce the demand of storage space and to improve the learning efficiency, a single RBF neural network is used to approximate the policy function of Actor and the value function of Critic simultaneously. The inputs of RBF network are the system error, as well as the first and the second-order differences of error. The Actor can realize the mapping from the system state to PID parameters, while the Critic evaluates the outputs of the Actor and produces TD error. Based on TD error performance index and gradient descent method, the updating rules of RBF kernel function and network weights were given. Simulation results show that the proposed controller is efficient for WECS and it is perfectly adaptable and strongly robust, which is better than that of a conventional PID controller.
Paper Detail
2345
downloads
4
14894
Trajectory-Based Modified Policy Iteration
Abstract:
This paper presents a new problem solving approach that is able to generate optimal policy solution for finite-state stochastic sequential decision-making problems with high data efficiency. The proposed algorithm iteratively builds and improves an approximate Markov Decision Process (MDP) model along with cost-to-go value approximates by generating finite length trajectories through the state-space. The approach creates a synergy between an approximate evolving model and approximate cost-to-go values to produce a sequence of improving policies finally converging to the optimal policy through an intelligent and structured search of the policy space. The approach modifies the policy update step of the policy iteration so as to result in a speedy and stable convergence to the optimal policy. We apply the algorithm to a non-holonomic mobile robot control problem and compare its performance with other Reinforcement Learning (RL) approaches, e.g., a) Q-learning, b) Watkins Q(λ), c) SARSA(λ).
Paper Detail
831
downloads
3
8236
Learning Classifier Systems Approach for Automated Discovery of Crisp and Fuzzy Hierarchical Production Rules
Abstract:
This research presents a system for post processing of data that takes mined flat rules as input and discovers crisp as well as fuzzy hierarchical structures using Learning Classifier System approach. Learning Classifier System (LCS) is basically a machine learning technique that combines evolutionary computing, reinforcement learning, supervised or unsupervised learning and heuristics to produce adaptive systems. A LCS learns by interacting with an environment from which it receives feedback in the form of numerical reward. Learning is achieved by trying to maximize the amount of reward received. Crisp description for a concept usually cannot represent human knowledge completely and practically. In the proposed Learning Classifier System initial population is constructed as a random collection of HPR–trees (related production rules) and crisp / fuzzy hierarchies are evolved. A fuzzy subsumption relation is suggested for the proposed system and based on Subsumption Matrix (SM), a suitable fitness function is proposed. Suitable genetic operators are proposed for the chosen chromosome representation method. For implementing reinforcement a suitable reward and punishment scheme is also proposed. Experimental results are presented to demonstrate the performance of the proposed system.
Paper Detail
859
downloads
2
8830
Markov Game Controller Design Algorithms
Abstract:
Markov games are a generalization of Markov decision process to a multi-agent setting. Two-player zero-sum Markov game framework offers an effective platform for designing robust controllers. This paper presents two novel controller design algorithms that use ideas from game-theory literature to produce reliable controllers that are able to maintain performance in presence of noise and parameter variations. A more widely used approach for controller design is the H∞ optimal control, which suffers from high computational demand and at times, may be infeasible. Our approach generates an optimal control policy for the agent (controller) via a simple Linear Program enabling the controller to learn about the unknown environment. The controller is facing an unknown environment, and in our formulation this environment corresponds to the behavior rules of the noise modeled as the opponent. Proposed controller architectures attempt to improve controller reliability by a gradual mixing of algorithmic approaches drawn from the game theory literature and the Minimax-Q Markov game solution approach, in a reinforcement-learning framework. We test the proposed algorithms on a simulated Inverted Pendulum Swing-up task and compare its performance against standard Q learning.
Paper Detail
880
downloads
1
1978
A Learning Agent for Knowledge Extraction from an Active Semantic Network
Abstract:

This paper outlines the development of a learning retrieval agent. Task of this agent is to extract knowledge of the Active Semantic Network in respect to user-requests. Based on a reinforcement learning approach, the agent learns to interpret the user-s intention. Especially, the learning algorithm focuses on the retrieval of complex long distant relations. Increasing its learnt knowledge with every request-result-evaluation sequence, the agent enhances his capability in finding the intended information.

Paper Detail
878
downloads