The key challenge for learning-based autonomous systems operating in time-varying environments is to predict when the learned model may lose relevance. If the learned model loses relevance, then the autonomous system is at risk of making wrong decisions. The entropic value at risk (EVAR) is a computationally efficient and coherent risk measure that can be utilized to quantify this risk. In this paper, we present a Bayesian model and learning algorithms to predict the state-dependent EVAR of time-varying datasets. We discuss applications of EVAR to an exploration problem in which an autonomous agent has to choose a set of sensing locations in order to maximize the informativeness of the acquired data and learn a model of an underlying phenomenon of interest. We empirically demonstrate the efficacy of the presented model and learning algorithms on four real-world datasets.
Showing posts with label Information Theory. Show all posts
Showing posts with label Information Theory. Show all posts
The Explore-Exploit Dilemma in Nonstationary Decision Making Under Uncertainty
Chapter 2 of Handling Uncertainty and Networked Structure in Robot Control
It is often assumed that autonomous systems are operating in environments that may be described by a stationary (time-invariant) environment. However, real-world environments are often nonstationary (time-varying), where the underlying phenomena changes in time, so stationary approximations of the nonstationary environment may quickly lose relevance. Here, two approaches are presented and applied in the context of reinforcement learning in nonstationary environments. In Section 2, the first approach leverages reinforcement learning in the presence of a changing reward-model. In particular, a functional termed the Fog-of-War is used to drive exploration which results in the timely discovery of new models in nonstationary environments. In Section 3, the Fog-of-War functional is adapted in real-time to reflect the heterogeneous information content of a real-world environment; this is critically important for the use of the approach in Section 2 in real world environments.
A Hybridized Bayesian Parametric-Nonparametric Approach to the Pure Exploration Problem
Information-driven approaches to reinforcement learning (RL) and bandit problems largely rely on optimizing with respect to an expectation on calculated Kullback-Leibler (KL) divergence values. Although KL divergence may provide bounds on problem domain models, bounds on the expected KL divergence itself are absent from information-driven approaches. As such, we focus our investigation on the pure exploration problem, a key component of RL and bandit problems, where the objective is to efficiently gain knowledge about the problem domain. For this task, we develop an algorithm using a Poisson exposure process Cox Gaussian process (Pep-CGP), a hybridized Bayesian parametric-nonparametric L\'{e}vy process, and theoretically derive a bound for the Pep-CGP expectation on KL divergence. Our algorithm, Real-time Adaptive Prediction of Time-varying and Obscure Rewards (RAPTOR), is validated on 4 real-world datasets, wherein baseline pure exploration approaches are outperformed by RAPTOR.
Uninformed-to-Informed Exploration in Unstructured Real-World Environments
Conventionally, the process of learning the model (exploration) is initialized as either an uninformed or informed policy, where the latter leverages observations to guide future exploration. Informed exploration is ideal as it may allow a model to be learned in fewer samples. However, informed exploration cannot be implemented from the onset when a-priori knowledge on the sensing domain statistics are not available; such policies would only sample the first set of locations, repeatedly. Hence, we present a theoretically-derived bound for transitioning from uninformed exploration to informed exploration for unstructured real-world environments which may be partially-observable and time-varying. This bound is used in tandem with a sparsified Bayesian nonparametric Poisson Exposure Process, which is used to learn to predict the value of information in partially-observable and time-varying domains. The result is an uninformed-to-informed exploration policy which outperforms baseline exploration algorithms in real-world data-sets.
Exploitation by Informed Exploration between Isolated Operatives for Information-theoretic Data Harvesting
We consider the problem of ferrying data between nodes of a sparsely distributed sensing network of Unattended Ground Sensors (UGS) with endurance-constrained Unmanned Aerial Systems (UAS). The sensing domain wherein the sparsely distributed UGS network is deployed is assumed to be highly nonstationary (time-varying) and noisy. This makes the dataferrying problem very complicated as the expected value-of-information at a sensing location can rapidly change. To address this issue, we present a new data ferrying algorithm termed Exploitation by Informed Exploration between Isolated Operatives (EIEIO), and show that with several reasonable assumptions and a model on the predicted accumulation of value-of-information, the problem can be simplified to a mathematical linear program. To solve the linear program, the UAS learns to anticipate regions in the sensing domain that have the highest degree of change. The degree of change, is learned using a novel implementation of a Cox Process called the Cox-Gaussian Process (CGP). Our approach does not require a priori knowledge of the sensing domain model to arrive at
an optimal UAS allocation strategy.
Given knowledge of the informatic content of all sampling locations, a closed path may be formed so the data-ferrying agent visits the most informative subset of data source locations. |
Learning to Exploit Time-Varying Heterogeneity in Distributed Sensing using the Information Exposure Rate
We consider the problem of ferrying data between nodes of a sparsely distributed sensing network of Unattended Ground Sensors (UGS) with endurance-constrained Unmanned Aerial Systems (UAS). The sensing domain wherein the sparsely distributed UGS network is deployed is assumed to be highly nonstationary (time-varying) and noisy. This makes the data-ferrying problem very complicated as the expected value-of-information at a sensing location can rapidly change. To address this issue, we present a new class of data ferrying and persistent exploration algorithms termed Exploitation by Informed Exploration between Isolated Operatives (EIEIO), and show that with several reasonable assumptions and a model on the predicted accumulation of value-of-information, the problem can be simpli ed to a mathematical linear program. To solve the linear program, the UAS learns to anticipate regions in the sensing domain that have the highest degree of change. The degree of change, is learned using a novel implementation of a Cox Process called the Cox-Gaussian Process (CGP). Our approach does not require a priori knowledge of the sensing domain model to arrive at an optimal UAS allocation strategy.
Video Descriptions: The videos depict that available Kullback-Leibler (KL) divergence in the Intel Berkeley Data Set in green, the modeled KL divergence in blue, and the green circles turn red when they are selected in a particular episode.
Left Video: This video shows the performance of sequentially sampling the sensing locations in batches of 6.
Right Video: This video shows the performance of the Real-time Adaptive Prediction of Time-varying and Obscure Rewards (RAPTOR) sampling algorithm, where locations are selected in batches of 6.
Airborne Detection and Tracking of Geologic Leakage Sites
Safe storage of CO2 to reduce greenhouse gas emissions without adversely affecting energy use or hindering economic growth requires development of monitoring technology that is capable of validating storage permanence while ensuring the integrity of sequestration operations. Soil gas monitoring has difficulty accurately distinguishing gas flux signals related to leakage from those associated with meteorologically driven changes of soil moisture and temperature. Integrated ground and airborne monitoring systems are being deployed capable of directly detecting CO2 concentration in storage sites. Two complimentary approaches to detecting leaks in the carbon sequestration fields are presented. The first approach focuses on reducing the requisite network communication for fusing individual Gaussian Process (GP) CO2 sensing models into a global GP CO2 model. The GP fusion approach learns how to optimally allocate the static and mobile sensors. The second approach leverages a hierarchical GP-Sigmoidal Gaussian Cox Process for airborne predictive mission planning to optimally reducing the entropy of the global CO2 model. Results from the approaches will be presented.
Paper | Bibtex |
Authors & Details:
2014,
A. Axelrod,
Abstract,
APS,
Autonomy,
C. Brown,
Data Ferrying,
Entropy Reduction,
G. Chowdhary,
GP Fusion,
IC,
IG,
Information Theory,
J. Jacob,
Leak Detection,
ML,
Nonstationarity,
R. Allamraju,
RL,
T. Mitchell
Adaptive Algorithms for Autonomous Data-Ferrying in Nonstationary Environments
Unattended ground sensors (UGS) in long-term distributed sensing deployments benefit greatly from the incorporation of unmanned aerial systems (UAS). For instance, the mobility of data-ferrying UAS may be leveraged to reduce the cost of communication between UGS, as well as extend the effective coverage and endurance of the distributed UGS network. Since the UAS are also limited in endurance, a UAS may only ferry data between a subset of the UGS during each sortie. This is particularly problematic for extended operations in nonstationary spatio-temporal domains, as the model obtained from the set of UGS may rapidly lose relevance. Moreover, the informativeness of, or the Value-of-Information (VoI) available at, each UGS may not be equal. Our approach, termed Exploitation by Informed Exploration between Isolated Operatives (EIEIO), learns a generative spatio-temporal model for the arrival of VoI at each UGS. Through EIEIO, we anticipate and prioritize the subset of UGS with the highest VoI for each data ferrying sortie. Furthermore, a lower bound on the requisite sampling time for homogeneous Poisson processes is leveraged to provide a bound on how many times the UAS must visit each UGS in order to learn a spatio-temporal VoI model.
Subscribe to:
Posts (Atom)