TNC Outlier Resistant Time Series Operations via Qualitative Robustness and Saddle-Point Game Formalizations-A Review: Filtering and Smoothing

Time series operations are sought in numerous applications, while the observations used in such operations are generally contaminated by data outliers. The objective is thus to design outlier resistant or “robust” time series operations whose performance is characterized by stability in the presence versus the absence of data outliers. Such a design is guided by the theory of qualitative robustness and is completed by saddle-point game formalizations. The approach is used for the development of outlier resistant filtering and smoothing operations.


Introduction
The fundamental desirable characteristic of outlier resistant or "robust: time series operations is performance stability; that is, a robust statistical procedure should guarantee small performance deviations for small perturbations in the data generating stochastic process. Thus, statistical robustness may be qualitatively defined along the latter lines, where for an analytical definition, the use of appropriate stochastic distance measures is essential. This qualitative definition is developed by the theory of qualitative robustness, while it also intimately related to the robust saddle-point game theoretic formalizations. The theory of qualitative robustness provides necessary conditions to be satisfied by robust operations, while the robust saddle-point game theoretic formalizations provide specific solutions within the qualitatively robust class of operations.
In this paper, we will review this composite construction of statistically robust operations. We will then present solutions for outlier resistant or robust filtering and smoothing.
The definition of qualitative robustness was first given by Hampel (1971, who considered only memoryless data processes. The definition was extended to include processes with memory, first by Papantoni- Kazakos and Gray (1979) and then by Cox (1978), Bustos et al (1984) and Papantoni- Kazakos (1984aKazakos ( , 1984bKazakos ( , 1987. Solutions for outlier resistant prediction, filtering and smoothing were first developed by Tsaknakis et al (1988Tsaknakis et al ( , 1986, while an overview of the theory can be found in Kazakos et al (1990). Extensions of the theory of qualitative robustness to include robust block encoders and quantizers were dimensional data sequences, where g(·) could be, for example, a test function in hypothesis testing or a parameter estimate. Let h 0g and hg denote respectively the density function of the random variables ) ( n X g and ) ( n Y g (where n X is generated by n f 0 , and where n Y is generated by n f ), and let From the above definition, we conclude that qualitative robustness is a local (around n f 0 ) stability property, parallel to the continuity property of real function. The specific analytical properties of a qualitatively robust data operation ) , ( f ⋅ ⋅ and its arguments x and y are all real and scalar, and where x and y take values respectively in the subsets A and B of the real line R. Consider the metric d ( u , v ) = | uv | on the real line, and let the subsets A and B both be convex with respect to that metric. Let at least one of those two subsets also be compact with respect to the metric , and let the payoff function f (x , y) be convex in x, concave in y, and continuous in x and y, with respect to the same metric. Then, the existence of a saddle-point solution (x * ,y * ) such that If, on the hand, the function f (x , y) is not continuous in x and y, then the existence of a saddle-point solution is not generally guaranteed. The continuity of the payoff function is thus an essential property for the guaranteed existence of a saddle-point solution. The same is true when instead of x and y, we have density functions f n and hg as in Definition 1. In the latter case, the metric | uv | on the real line is replaced by the stochastic distance measure . A weak distance that also represents closeness in data sequences and best reflects the outlier model as well is the Prohorov distance [10], with data distortion measure ) , ( The Prohorov distance with data distortion measure as in (1) is a metric; that is, it satisfies the triangular property. For classes of memoryless processes, the distance is identical to the Prohorov distance with dtata distortion measure Vasershtein or Rho-Bar distances [10] are appropriate. Indeed, those two distances are strong and they both bound difference in expected error performance. The choice of the data distortion measure within the latter distances depends on the particular application, where a popular and useful such choice is the difference squared distortion measure , closely h 0g fits f2, as compared to the fitness of hg to f 2 . A similar conclusion is drawn, when the data distortion measure is the difference squared, The definition of qualitative robustness, in conjunction with the Prohorov and Rho-Bar or Vasershtein distances lead to constructive sufficient conditions that data operations should satisfy [2], [6], [7] and [10]. These conditions are included in Theorem 1 below, whose proof can be found in [2].
Theorem 1 : Consider a scalar real operation g(x n ) on data sequences x n of length n. Let g(x n ) be bounded, and such that : i.
If n is finite, then g(x n ) is pointwise continuous as a function of the data. That is, ii. If n is asymptotically large, and given some data generating density function f0, then g (x n ) is pointwise asymptotically continuous at f0. That is, given ε > 0 and η > 0, there exist δ > 0, positive integers m and n0, and for each n > n0 some set A n ε R n , such that Then the operation ) (⋅ g is qualitatively robust at the density function n f 0 , where in Definition 12.1.1, ) , ( d 1 ⋅ ⋅ is replaced by the Prohorov distance with data distortion measure as in (1) and is replaced by either the Vasershtein or the Rho-Bar distances with distortion measure ) , ( v u ρ equal either to |u -v| or some continuous function of | u -v|. From Theorem 1, we conclude that to be qualitatively robust, it suffices that a data operation be bounded and continuous. For data sequences of finite length continuity is defined in the usual functional sense. For asymptotically large data sequences, continuity is defined as follows at some data generating density function: If some sequence x n is representative of the latter density function, in the sense that it belongs to a high-probability set A n , and if the majority of the elements of another sequence y n are close to the corresponding elements of the sequence x n , then the values g(x n ) and g(y n ) of the data operating are close as well. Due to the above results, we conclude that linear operations are not qualitatively robust. This is so because such operations are not bounded, and because closeness between the majority of corresponding elements of two sequences does not guarantee closeness in the values of those operations. As may be deduced from the presentation in this section, qualitative robustness is a performance stability property and its time series applications include prediction, interpolation and filtering or smoothing. Solutions for the later time series operations require the marriage of qualitative robustness with the theory of saddle-point game theoretic formalizations. In this paper, we present such solutions for noncausal filtering or smoothing as well as for causal filtering.

Robust Filtering
The objective of either non-causal or causal filtering is the extraction of information carrying data from noisy observations. That is, the outcomes generated by an information process are estimated, when distorted by interferences from a noise process. We will assume that the relationship between the information and noise processes is additive. In the robust filtering problem, the information and noise processes are modeled by two disjoint classes, Fs and FN, respectively. Arbitrary dimensionality probability density functions in classes Fs and FN are respectively denoted fs and fN.   T r a n s a c t i o n s o n N e t w o r k s a n d C o m m u n i c a t i o n s ; V o l u m e 6 , N o . 1 , F e b r u a r y 2 0 1 8   C o p y r i g h t © S o c i e t y f o r S c i e n c e a n d E d u c a t i o n U n i t e d K i n g d o m 17 Let f0S and f0N be two nominal well known, stationary density functions, such that f0s ∈ FS and f0N ∈F N . Let us assume that some density function fs from class Fs is a priori selected by the system designer to represent the information process throughout the over all observation interval, and let us denote by …, X-1 , X0, X1, … a random data sequence generated by fs . We initially assume that the class Fs, consists of f0s only.
Let us denote by …, W-1 , W0, W1, … random noise data sequences, and let …, Z-1 , Z0, Z1, … be data sequences from the nominal noise density function f0N . Given some number N ε in (0,1), let the class F N of noise processes then be such that where …, V-1 , V0, V1, … is a random sequence generated by any arbitrary dimensionality stationary density function. The noise model in (2) represents the occurrence of outliers, with probability N ε per datum.
Given fs in Fs and fN in F N , we assume that the data sequences from fs and fN. are additive and that fs and fN. are mutually independent. Then, if …, Y-1 , Y0, Y1, … denote random observation sequences, we have, where Xn is generated by fs , Wn is generated by fN [ as in (2) (10) Then, the supremum in (7) reduces to the search of the infimum below, where F denotes the class induced by f 0S and fN ; that is, (11) We consider the class FN of noise processes, as described by the probability density functions these processes induce and we select this class to be given by expression (12) below.
We then express Theorem 2 below. This theorem and the subsequent Lemma 1 are due to Tsaknakis et. al. (1986). T r a n s a c t i o n s o n N e t w o r k s a n d C o m m u n i c a t i o n s ; V o l u m e 6 , N o . 1 , F e b r u a r y 2 0 1 8   C o p y r i g h t © S o c i e t y f o r S c i e n c e a n d E d u c a t i o n U n i t e d K i n g d   where denoting c = Given εN , n and l, the constant λ is positive and unique. Given n and l, λ decreases monotonically with increasing εN. For εN = 0, λ equals infinity, and the filtering operation in (19) becomes then identical to the optimal at the Gaussian noise, linear, mean-squared filter.
We observe that the filtering operation is (19) is a truncated linear function of the data; it is thus bounded and continuous in the sense of part i in Theorem 1, but it is not asymptotically continuous in the sense of part ii in the same theorem. The latter operation is therefore qualitatively robust for finite data dimensionalities n+l only. We will extend the operation in (19), to create a filtering operation that is both asymptotically and non-asymptotically robust. We distinguish between casual and non-casual filtering, and we present then two different extensions.

Robust Non-Causal Filtering or Smoothing for Nominally Gaussian Information and Noise Processes
Consider the Gaussian densities f0s and f0N in Lemma 1. We then select some N   T r a n s a c t i o n s o n N e t w o r k s a n d C o m m u n i c a t i o n s ; V o l u m e 6 , N o . 1 , F e b r u a r y 2 0 1 8   C o p y r i g h t © S o c i e t y f o r S c i e n c e a n d E d u c a t i o