ISAS 2006

The 3rd International Service Availability Symposium

Helsinki, Finland, May 15-16, 2006

Pre-conference tutorial

A 1-day pre-conference tutorial, supported by Nokia Research Center under a Scientific Conference Sponsorship, will be held on Sunday 14th May 2006.

Tutorial Venue

Radisson SAS Plaza Hotel
Mikonkatu 23
FIN-00100 Helsinki
Finland.

Program

9:00 - 12:00   "Assurance for Continuous Availability"
Professor Kishor Trivedi
Pratt School of Engineering
Duke University, U.S.
and
Veena B. Mendiratta
Bell Labs, Lucent Technologies
Naperville, Illinois, U.S.
 
12:00-13:30   Lunch (not included)
 
13:30 - 16:30   "Predictive Algorithms and Technologies for Availability Enhancement"
Professor Miroslaw Malek
Institut für Informatik
Humboldt-Universitaet zu Berlin, Germany

Assurance for Continuous Availability

Summary

In this tutorial we consider methods of availability asssurance to accompany the design and deployment of continuous availability systems. The commonly used approach for high availability assurance is more akin to "assurance by argument". However, to have confidence in the availability projections, a more quantitative approach should be followed. This can be provided during the architecture and design phases of product development by means of formal probabilistic reliability models. The modeling approach using hierarchical or multilevel models is described via a few examples. Later in the devlopment cycle, when the product (or prototype) is available, a hybrid approach using a judicious combination of experimental or test results and modeling can be applied. When the system is deployed a method of monitoring and statistical analysis can be used. The method is described via examples. The data from monitoring can be used for feedback control so as to achieve the desired availability.

About the Speakers

Kishor S. Trivedi holds the Hudson Chair in the Department of Electrical and Computer Engineering at Duke University, Durham, NC. He has been on the Duke faculty since 1975. He is the author of a well known text entitled, Probability and Statistics with Reliability, Queuing and Computer Science Applications, with a thoroughly revised second edition being published by John Wiley. He has also published two other books entitled, Performance and Reliability Analysis of Computer Systems, published by Kluwer Academic Publishers and Queueing Networks and Markov Chains, John Wiley. His research interests are in reliability and performance assessment of computer and communication systems. He has published over 300 articles and lectured extensively on these topics. He has supervised 39 Ph.D. dissertations. He has made seminal contributions in software rejuvenation, solution techniques for Markov chains, fault trees, stochastic Petri nets and performability models. He has actively contributed to the quantification of security and survivability. He is a Fellow of the Institute of Electrical and Electronics Engineers. He is a Golden Core Member of IEEE Computer Society. He is a co-designer of HARP, SAVE, SHARPE and SPNP software packages that have been well circulated.

Veena B. Mendiratta is a Consulting Member of Technical Staff at Bell Labs, Lucent Technologies, Naperville, Illinois, USA. Her interests are in architectur e, reliability, fault tolerant computing and software reliability engineering. Most of her work has focused on the reliability and performance analysis of telecommunications systems to gui de system architecture solutions; recent work has focused on Voice over Packet solu tions and architecting survivable switching solutions. She has presented papers and tu torials at several refereed conferences and is a member of IEEE and INFORMS. She holds a PhD in Operations Research from Northwestern University, Evanston, Illinois, USA and a B.Tech in Engineering from the Indian Institute of Technology, New Delhi, India.

Predictive Algorithms and Technologies for Availability Enhancement

Summary

Predicting the future has fascinated people throughout civilizations but until the 20th century it has been more of a magic than science. Ability to predict the future may have a significant impact on wide spectrum of applications ranging from communication systems to health monitoring. With development of dynamic systems research, data mining and computer technology we seem to be better equipped to tackle the prediction problem with a more science-based, goal-oriented approach.

In this tutorial, we focus on predictive algorithms and technologies which may have a major impact on computer systems availability and performance. We first survey long-term and short-term prediction techniques, introduce prediction quality measures, and then demonstrate how the availability of software and hardware systems can be increased by preventive measures which are triggered by short-term failure prediction mechanisms. We present and evaluate mainly non-parametric techniques which model and predict the occurrence of failures as a function of discrete and continuous measurements of system variables.

We introduce two modelling approaches in detail: Markov models and a function approximation technique utilising universal basis functions. The presented modelling methods are data driven rather than analytical and can handle large amounts of variables and data. They offer the potential to capture the underlying dynamics of even high-dimensional and noisy systems. Both modelling techniques have been applied to real data of a commercial telecommunication platform. The data includes event-based log files and measured system states. We compare the effectiveness of discussed techniques with other methods in terms of precision, recall, F-measure and cumulative cost. The two methods demonstrate significantly improved forecasting performance compared to alternative approaches such as linear ARMA models.

Finally, we present a plethora of preventive measures that can be applied once it is established that a failure appears to be imminent.

By using the presented prediction and prevention techniques the system availability may be improved by an order of magnitude.

About the Speaker

Miroslaw Malek received the M.Sc. degree in Electrical Engineering in 1970 and the Ph.D. degree in Computer Science in 1975, both from the Technical University of Wroclaw, Poland.

He is professor and holder of the Chair in Computer Architecture and Communication at Humboldt University in Berlin since 1994. In 1977, he was a visiting scholar at the Department of Systems Design at the University of Waterloo, Waterloo, Ontario, Canada, then Assistant, Associate and Full Professor at the University of Texas at Austin where he was also a holder of the Bettie Margaret Smith and the Southwestern Bell Professor in Engineering, Malek's research interests focus on dependability, failure prediction, composability and mobility mainly in distributed systems but also in parallel architectures, real-time systems, and interconnection networks. He has participated in two pioneering parallel computer projects, contributed to the theory and practice of parallel network design, developed the comparison-based method for system diagnosis, codeveloped comprehensive WSI and networks testing techniques, proposed the consensus-based framework for responsive (fault-tolerant, real-time) computer systems design and failure prediction methods and has made numerous other contributions, reflected in over 150 publications including five books

He has organized, chaired and been a program committee member of numerous IEEE and ACM international conferences and workshops. Among others, he was Program and General Chairman of the Real-Time Systems Symposium in 1984 and 1985, respectively and in 1994 General Chairman of the 24th Fault-Tolerant Computing Symposium, Program Co-chairman of the 22nd Symposium on Reliable Distributed Computing in 2003, Program Chairman and General Chairman of the International Service Availability Symposium in 2004 and 2005, respectively. He serves or served on the editorial boards of various journals, among them the Journal of Parallel and Distributed Computing as well as Real-Time Systems journal.

Malek was a Visiting Scientist at Bell Labs in Murray Hill and at IBM's T. J. Watson Research Center, Yorktown Heights, NY. He held the IBM Chair at Keio University in Japan in 1992. He was also a Visiting Professor at Stanford University (1997/1998), New York University (2001) and the Italian National Research Center and Pisa University (2002), Chinese University of Hong Kong (2005) and Universita di Roma "La Sapienza"(2005/6).

Organized by University of Helsinki and Nokia in cooperation with: GI/ITG Technical Committee "Dependability and Fault Tolerance" and the Service Availability Forum