The task of the IHM is to develop a planning optimization based on reinforcement learning and to make it available to ML-based optimal control. It is particularly challenging to comprehensively capture the complex system of the emergency room with its large number of actors and technical devices. The first step is therefore to identify the different fields of application and the appropriate procedures. In order to identify suitable procedures. The procedures suitable for the NotPASS system are identified by reproducing selected results.
Bayesian reinforcement learning (BRL) has established itself as a state-of-the-art method for data-efficient, ML-based optimal control. However, the PILCO - Probabilistic Inference for Learning Control method is not generally applicable for complex inverse tasks due to the use of standard Gaussian processes and inflexible cost functionals and controllers in the context of model validation. Deep reinforcement learning (DRL) methods model the temporal dynamics using an ensemble of probabilistic neural networks as controllers. Based on model predictive control, it enables data-efficient control of dynamic systems. Both methods are based on the direct minimization of a predefined cost functional. The cost functional contains multiple concatenations of the ML model with itself. This leads to problems in terms of runtime and convergence for high-dimensional problems and even with a moderate number of samples, so that a new approach for ML-based optimal control is to be pursued in this joint project. In order to overcome the aforementioned limitations of BRL, the joint project aims to extend the Proximal Policy Optimization (PPO) algorithm so that any cost functional can be approximated with sufficient accuracy. Instead of the usual artificial neural networks (ANN), probabilistic ML models are to be used as actors and critics, taking model uncertainties into account. The novel ML method Deep Gaussian Covariance Networks (DGCN) developed by the applicants uses ANNs in combination with Gaussian processes (GP) to map highly multimodal, nonlinear and "noisy" data. It is expected that DGCN can be used to extend the class of admissible cost functionals with significantly less data and samples.
To validate and check the extrapolation and forecasting ability of the developed ML algorithms, validations are carried out, the data of which can in turn be used during the online learning of the ML models. In this context, the probabilistic properties of DGCN are to be used to make a reliable statement about the error probability of the model prediction reliability. With the help of suitable prediction measures, the explainability of the data is to be analyzed with the help of cross-validation and confidence intervals, thus creating transparency as to the extent to which the model is reliable in the prediction and analysis of the data. The algorithms are to be continuously optimized in the project so that the results can ultimately be transmitted to a feedback system. As the project progresses, the expenditure of the optimization results will be adapted to make them usable for the NotPASS system.