Quantile-based measurements

Introduction

In the following a generic measurement process characterized by a self-similar behaviour will be taken into account. This process is defined over discrete time scales since it usually involves a sampling operation. Our purpose is to obtain as more information as possible about the physical phenomena under study from these measurement data.
In order to do so we need:
(A) a suitable analytical model to represent the process itself and
(B) an estimator for finding out the parameters of the proposed model.

It should be remembered that parameter estimation from experimental data tends to become less accurate as model complexity grows, unless available data also increases accordingly by considering longer observations. This sets the terms of a tradeoff between estimation accuracy, which is important for reliable detection of parameter changes, and measurement time, that affects the capability to track data variations. Longer observation periods provide more accurate measurements but they limit the capability of tracking variations over time. On the other hand, by observing only a small number of samples, measurements would be affected by a high variability and therefore lacking in information.

Most of the literature concerning data analysis is focused on the estimation of the mean or second-order statistics, like variance or autocorrelation function, of some appropriately chosen function of data measurements. It is important to note that in the case of a well-behaved time series, second order statistics are sufficient to represent the time series features. However, the observed data may be affected by occasional outliers, due to sporadic physical phenomena, that can greatly affect estimates of the arithmetic mean or the variance [see SSP 2005]. In these situations a more useful and robust statistic is provided by quantile analysis. Quantiles provide in fact additional information about the distribution of the measurements and in certain cases they may be of more interest than second-order statistics [see AMUEM 2008 ].

Time series

Let be the time series of the measurements obtained from a given physical phenomenon.
If network traffic measurements are taken into account, this time series represents the number of bytes or packets over consecutive time intervals having equal length T. In other applications, may represent the rainfall or the average temperature during an observation period, and so on. The main interest in the following is about network traffic measurements, however the same analysis can be applied also to other time physical phenomena.

The time series represents the highest time resolution process and therefore in the following we refer to it as: . Evolutions characterized by lower resolutions are represented by the m -aggregated process , which is a less-detailed copy of the process , obtained by averaging it:

(1)

It directly follows from this formula that the time series and have necessarily the same mean value.

Marginal probability distribution function

In this paragraph the marginal probability distribution function of these time series will be taken into account.

Let be the marginal probability distribution function of the time series and the marginal probability distribution of the m-aggregated time series .

In the following we will consider the properties of quantiles related to these distributions. In particular, the quantile associated to the violation probability g, for a generic random process having marginal distribution function ,is defined as:

(2)

Self-Similar processes

Self-similar processes are used to model data in a great variety of engineering problems, ranging from the classical applications in hydrology to analysis of turbulent flows and, lately, of traffic computer and communication networks. In these contexts, random processes are a useful tool in the absence of a feasible physical model. If a realistic physical model is not available or cannot be determined with sufficient accuracy, then a simple stationary model is often sufficient, since it allows to assess essential quantities with reasonable accuracy. Therefore, measurement and analysis of self-similar time series represents a useful methodology for characterization and diagnosis of physical phenomena.

Self-similarity is a concept, related to fractals and chaos theory, which can be associated with objects having the same geometrical, statistical or dynamical structure over different scales. In particular, a time series is self-similar when it exhibits the same statistical properties over different time scales, which can vary by many orders of magnitudes: from milliseconds to minutes and even hours; however, these properties do not hold indefinitely for real phenomena and, at some point, this structure breaks down when non-stationarity behaviors come into play.

Under the assumption that the starting time series is a self-similar random process with scaling exponent H , also known as the Hurst parameter, it follows that:

(3)

The parameter H is comprised between 0.5 for an uncorrelated process and 1.0 for a completely correlated process. The Hurst parameter quantifies the asymptotic self-similar scaling as well as the degree of long-range dependence.

From the relationship between marginal distribution probability functions, we obtain:

(4)

It follows that also the quantiles, associated to the same violation probability, satisfies a scaling relationship. The parameter m represents the mean value of the time series and .

Therefore, by considering quantiles the attention is focused on distribution-related effects. It has been also noted that quantiles take into account the scaling properties of measurement data. Results achieved by applying this analysis to network traffic time series are reported in: VTOOLS 2006, IMTC 2006, IMTC 2007, TRANS 2007, PEVA 2008 and TRANS 2008.

A generalization of the quantile analysis approach can be obtained by applying it to the time series of discrete wavelet transform (DWT) coefficients.

Discrete Wavelet Transform

In the following we apply the Discrete Wavelet Transform to the time series previously introduced. In this way, for each aggregation level, we obtain two time series: the time series of approximation coefficients and the time series of detail coefficients, where .

As already known from the early pioneering studies presented in literature, for a self-similar process a scaling relationship among wavelet coefficients exists and has the same form for both aggregation and detail coefficients. Therefore, by using the symbol   to generically indicate either of the two, it can be found the following scaling relationship:

(5)

if the definition of aggregate process given in (1) is used.

Quantile-based estimation

Let be the (1-g)-quantile of coefficients at scale j. It provides a bound on the value that the samples of can assume, which can be exceeded with a probability g, called violation probability. This means that:

(6)

The self-similarity relationship between the wavelet coefficients extends to their quantiles. In the case of detail coefficients, where the mean value is null, the scaling relationship can be obtained in a straightforward manner by substituting equation (5) in (6). It results in:

(7)

where:

(8)

This provides the scaling expression for detail coefficients:

(9)

In general, by taking into account also the mean value, the scaling relationship between wavelet coefficients is:

(10)

Rewriting expression (10) in a log-log scale the scaling exponent can be estimated by a linear regression, as done by the Abry-Veitch estimator that is based instead on time series variances.

This analysis has been also reported in: I2MTC 2009 and COST 2009.

Tool

In the following you can find the Matlab code of the quantile-based estimators. Please note that the software is under the GNU general public license [see http://www.gnu.org/licenses/].

dwtquantile.m

If you need help or more information, please contact me.
My e-mail is: giada@dei.unipd.it.

Bibliography

[I2MTC 2009] GIORGI G., DINDO S., VANTINI M., NARDUZZI C. A Real-Time Non-Intrusive Tool for Traffic Analysis, I2MTC 2009.
[COST 2009] GIORGI G., NARDUZZI C. Scaling Analysis of Wavelet Quantiles in Network Traffic, COST-TMA 2009.
[TRANS 2008] GIORGI G., NARDUZZI C. A study of measurement-based traffic models for network diagnostics. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, vol. 57; p. 1642-1650.
[PEVA 2008] GIORGI G., NARDUZZI C. Rate-interval curves - A tool for the analysis and monitoring of network traffic. PERFORMANCE EVALUATION, vol. 65; p. 441-462.
[AMUEM 2008] GIORGI G., NARDUZZI C. Uncertainty of Quantile Estimates in the Measurement of Self-Similar Processes. In: Workshop on Advanced Methods for Uncertainty Estimation Measurements, AMUEM 2008. Sardagna, Trento, Italy.
[TRANS 2007] BENETAZZO L., GIORGI G., NARDUZZI C. On the Analysis of Communication and Computer Networks by Traffic Flow Measurement. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, vol. 56; p. 1157-1164.
[IMTC 2007] GIORGI G., NARDUZZI C. A Study of Measurement-based traffic models for network diagnostics. In: 2007 IEEE Instrumentation and Measurement Technology Conference Proceedings. Varsavia, Polonia, May 01-03, 2007.
[IMTC 2006] BENETAZZO L., GIORGI G., NARDUZZI C. On the Analysis of Communication and Computers Networks by Traffic Flow Measurement. In: Proc. IEEE Instrum. Meas. Technol. Sorrento, Italy, 24-27 April 2006, p. 212-217.
[VTOOLS 2006] GIORGI G., NARDUZZI C. Analysis of Traffic Flow Measurement by Rate-Interval Curves. In: ACM International Conference Proceeding Series. Pisa, Italy, 11-13 October 2006, vol. 180.
[SSP 2005] GIORGI G., NARDUZZI C., PEGORARO P. A. Statistical analysis of local features in network traffic processes. In: Proceedings 13th IEEE Workshop on Statistical Signal Processing SSP'05.