The key challenge for learning-based autonomous systems operating in time-varying environments is to predict when the learned model may lose relevance. If the learned model loses relevance, then the autonomous system is at risk of making wrong decisions. The entropic value at risk (EVAR) is a computationally efficient and coherent risk measure that can be utilized to quantify this risk. In this paper, we present a Bayesian model and learning algorithms to predict the state-dependent EVAR of time-varying datasets. We discuss applications of EVAR to an exploration problem in which an autonomous agent has to choose a set of sensing locations in order to maximize the informativeness of the acquired data and learn a model of an underlying phenomenon of interest. We empirically demonstrate the efficacy of the presented model and learning algorithms on four real-world datasets.
Publications
Learning How-to-Learn While On-the-Job
The Explore-Exploit Dilemma in Nonstationary Decision Making Under Uncertainty
Chapter 2 of Handling Uncertainty and Networked Structure in Robot Control
It is often assumed that autonomous systems are operating in environments that may be described by a stationary (time-invariant) environment. However, real-world environments are often nonstationary (time-varying), where the underlying phenomena changes in time, so stationary approximations of the nonstationary environment may quickly lose relevance. Here, two approaches are presented and applied in the context of reinforcement learning in nonstationary environments. In Section 2, the first approach leverages reinforcement learning in the presence of a changing reward-model. In particular, a functional termed the Fog-of-War is used to drive exploration which results in the timely discovery of new models in nonstationary environments. In Section 3, the Fog-of-War functional is adapted in real-time to reflect the heterogeneous information content of a real-world environment; this is critically important for the use of the approach in Section 2 in real world environments.
A Hybridized Bayesian Parametric-Nonparametric Approach to the Pure Exploration Problem
Information-driven approaches to reinforcement learning (RL) and bandit problems largely rely on optimizing with respect to an expectation on calculated Kullback-Leibler (KL) divergence values. Although KL divergence may provide bounds on problem domain models, bounds on the expected KL divergence itself are absent from information-driven approaches. As such, we focus our investigation on the pure exploration problem, a key component of RL and bandit problems, where the objective is to efficiently gain knowledge about the problem domain. For this task, we develop an algorithm using a Poisson exposure process Cox Gaussian process (Pep-CGP), a hybridized Bayesian parametric-nonparametric L\'{e}vy process, and theoretically derive a bound for the Pep-CGP expectation on KL divergence. Our algorithm, Real-time Adaptive Prediction of Time-varying and Obscure Rewards (RAPTOR), is validated on 4 real-world datasets, wherein baseline pure exploration approaches are outperformed by RAPTOR.
Uninformed-to-Informed Exploration in Unstructured Real-World Environments
Conventionally, the process of learning the model (exploration) is initialized as either an uninformed or informed policy, where the latter leverages observations to guide future exploration. Informed exploration is ideal as it may allow a model to be learned in fewer samples. However, informed exploration cannot be implemented from the onset when a-priori knowledge on the sensing domain statistics are not available; such policies would only sample the first set of locations, repeatedly. Hence, we present a theoretically-derived bound for transitioning from uninformed exploration to informed exploration for unstructured real-world environments which may be partially-observable and time-varying. This bound is used in tandem with a sparsified Bayesian nonparametric Poisson Exposure Process, which is used to learn to predict the value of information in partially-observable and time-varying domains. The result is an uninformed-to-informed exploration policy which outperforms baseline exploration algorithms in real-world data-sets.
Exploitation by Informed Exploration between Isolated Operatives for Information-theoretic Data Harvesting
We consider the problem of ferrying data between nodes of a sparsely distributed sensing network of Unattended Ground Sensors (UGS) with endurance-constrained Unmanned Aerial Systems (UAS). The sensing domain wherein the sparsely distributed UGS network is deployed is assumed to be highly nonstationary (time-varying) and noisy. This makes the dataferrying problem very complicated as the expected value-of-information at a sensing location can rapidly change. To address this issue, we present a new data ferrying algorithm termed Exploitation by Informed Exploration between Isolated Operatives (EIEIO), and show that with several reasonable assumptions and a model on the predicted accumulation of value-of-information, the problem can be simplified to a mathematical linear program. To solve the linear program, the UAS learns to anticipate regions in the sensing domain that have the highest degree of change. The degree of change, is learned using a novel implementation of a Cox Process called the Cox-Gaussian Process (CGP). Our approach does not require a priori knowledge of the sensing domain model to arrive at
an optimal UAS allocation strategy.
Given knowledge of the informatic content of all sampling locations, a closed path may be formed so the data-ferrying agent visits the most informative subset of data source locations. |
Gaussian Process based Subsumption of a Parasitic Control Component
Many existing control architectures assume that the main control system being designed is the only controller that governs a system's actuators. However, with the increasing availability of off-the shelf controls packages, the number of internal unadjustable control systems is increasing. Some of these control systems may behave in parasitic way by enforcing a rigid set of behaviors that could disrupt a desired system behavior. We present a control architecture that can subsume parasitic control behavior through iteratively shaping the main control command with an intelligent feed-forward term. Our architecture requires very little prior knowledge about the subsystem whose behavior is to be subsumed, rather it relies on online learned sparsified predictive Gaussian Process (GP) models. We provide rigorous quantifiable bounds relating the sparsification of the GP to the accuracy in estimating and subsuming the parasitic subsystem. The presented subsumption architecture is realized using a variant of D-Type iterative learning control (ILC) and is validated through a series of flight tests on a Parrot AR Drone 2.0 quadrotor where the quadrotor's sonar based altitude control loop's behavior of maintaining a fixed altitude over ground surfaces is subsumed through a main controller via a feed-forward term.
Authors & Details:
2015,
A. Axelrod,
ACC,
Autonomy,
BNPs,
Conference,
G. Chowdhary,
H. Kingravi,
ILC,
ML,
Parasitic Control,
Robotics,
ROS,
Sparse BNPs
Learning to Exploit Time-Varying Heterogeneity in Distributed Sensing using the Information Exposure Rate
We consider the problem of ferrying data between nodes of a sparsely distributed sensing network of Unattended Ground Sensors (UGS) with endurance-constrained Unmanned Aerial Systems (UAS). The sensing domain wherein the sparsely distributed UGS network is deployed is assumed to be highly nonstationary (time-varying) and noisy. This makes the data-ferrying problem very complicated as the expected value-of-information at a sensing location can rapidly change. To address this issue, we present a new class of data ferrying and persistent exploration algorithms termed Exploitation by Informed Exploration between Isolated Operatives (EIEIO), and show that with several reasonable assumptions and a model on the predicted accumulation of value-of-information, the problem can be simpli ed to a mathematical linear program. To solve the linear program, the UAS learns to anticipate regions in the sensing domain that have the highest degree of change. The degree of change, is learned using a novel implementation of a Cox Process called the Cox-Gaussian Process (CGP). Our approach does not require a priori knowledge of the sensing domain model to arrive at an optimal UAS allocation strategy.
Video Descriptions: The videos depict that available Kullback-Leibler (KL) divergence in the Intel Berkeley Data Set in green, the modeled KL divergence in blue, and the green circles turn red when they are selected in a particular episode.
Left Video: This video shows the performance of sequentially sampling the sensing locations in batches of 6.
Right Video: This video shows the performance of the Real-time Adaptive Prediction of Time-varying and Obscure Rewards (RAPTOR) sampling algorithm, where locations are selected in batches of 6.
Collaborative Goal and Policy Learning from Human Operators of Construction Co-Robots
Human operators of real-world co-robots, such as excavator, require extensive experience to skillfully handle these complicated machines in uncertain safety-critical environments. We consider the problem of human-robot collaborative learning and task execution, where efficient human-robot interaction is critical to safely and efficiently accomplish complex tasks in uncertain environments. Our collaborative learning algorithm enables a construction co-robot to learn latent task subgoals from the demonstrations of skilled human operators which can then be used to guide novice human operators in completing complex tasks under uncertainty. The effectiveness our algorithm is demonstrated through experimentation on a scaled model of an excavator with guided and unguided human operators. Our results demonstrate that when the co-robot’s inferred subgoals are communicated back to the novice human operator, task performance significantly improves.
Paper | Bibtex |
Airborne Detection and Tracking of Geologic Leakage Sites
Safe storage of CO2 to reduce greenhouse gas emissions without adversely affecting energy use or hindering economic growth requires development of monitoring technology that is capable of validating storage permanence while ensuring the integrity of sequestration operations. Soil gas monitoring has difficulty accurately distinguishing gas flux signals related to leakage from those associated with meteorologically driven changes of soil moisture and temperature. Integrated ground and airborne monitoring systems are being deployed capable of directly detecting CO2 concentration in storage sites. Two complimentary approaches to detecting leaks in the carbon sequestration fields are presented. The first approach focuses on reducing the requisite network communication for fusing individual Gaussian Process (GP) CO2 sensing models into a global GP CO2 model. The GP fusion approach learns how to optimally allocate the static and mobile sensors. The second approach leverages a hierarchical GP-Sigmoidal Gaussian Cox Process for airborne predictive mission planning to optimally reducing the entropy of the global CO2 model. Results from the approaches will be presented.
Paper | Bibtex |
Authors & Details:
2014,
A. Axelrod,
Abstract,
APS,
Autonomy,
C. Brown,
Data Ferrying,
Entropy Reduction,
G. Chowdhary,
GP Fusion,
IC,
IG,
Information Theory,
J. Jacob,
Leak Detection,
ML,
Nonstationarity,
R. Allamraju,
RL,
T. Mitchell
Adaptive Algorithms for Autonomous Data-Ferrying in Nonstationary Environments
Unattended ground sensors (UGS) in long-term distributed sensing deployments benefit greatly from the incorporation of unmanned aerial systems (UAS). For instance, the mobility of data-ferrying UAS may be leveraged to reduce the cost of communication between UGS, as well as extend the effective coverage and endurance of the distributed UGS network. Since the UAS are also limited in endurance, a UAS may only ferry data between a subset of the UGS during each sortie. This is particularly problematic for extended operations in nonstationary spatio-temporal domains, as the model obtained from the set of UGS may rapidly lose relevance. Moreover, the informativeness of, or the Value-of-Information (VoI) available at, each UGS may not be equal. Our approach, termed Exploitation by Informed Exploration between Isolated Operatives (EIEIO), learns a generative spatio-temporal model for the arrival of VoI at each UGS. Through EIEIO, we anticipate and prioritize the subset of UGS with the highest VoI for each data ferrying sortie. Furthermore, a lower bound on the requisite sampling time for homogeneous Poisson processes is leveraged to provide a bound on how many times the UAS must visit each UGS in order to learn a spatio-temporal VoI model.
Human Aware UAS Path Planning in Urban Environments using Nonstationary MDPs
A growing concern with deploying Unmanned Aerial Vehicles (UAVs) in urban environments is potential violation of human privacy, and the backlash this could entail. Therefore, there is a need for UAV path planning algorithms that minimize the likelihood of invading human privacy. Such algorithms would be useful for pipeline and agricultural survey, wildfire monitoring and other missions where surveillance of humans should be avoided. We formulate the problem of human-aware path planning as a nonstationary Markov Decision Process, and provide a novel model-based reinforcement learning solution that leverages Gaussian process clustering. Our algorithm is flexible enough to accommodate changes in human population densities, and is real-time computable, as opposed to competing approaches employing Bayesian nonparametrics. The approach is validated experimentally on a large-scale long duration experiment with both simulated and real UAVs.
Authors & Details:
2013,
2014,
A. Axelrod,
Autonomy,
C. Crick,
Conference,
G. Chowdhary,
H. Kingravi,
ICRA,
ML,
NIPS,
Nonstationarity,
R. Allamraju,
R. Grande,
RL,
Robotics,
ROS,
Sparse GPs,
W. Sheng,
Workshop
Method and Apparatus for Testing Quality of Seal and Package Integrity
Embodiments relate to a method and apparatus for determining information relating to a leak in a package. In an embodiment, a solenoid/gravity system is used to rapidly pressurize a flexible package to a desired pressure and to rapidly withdraw the pressurizing agent, where another solenoid is used to rapidly and retractably impact a region on the package under test. Sensors are used to sense data corresponding to a wave in the package generated from the region of impact. The data is acquired and processed to determine information regarding a leak in the package, such as whether there is a leak in the package under test, the size of the leak, and/or the location of the leak.
Patent Bibtex
Subscribe to:
Posts (Atom)