Research Statement for Karthik Pattabiraman
As computer systems become more and more complex, it becomes increasingly difficult to ensure their correct operation. The problem is exacerbated by the pervasiveness and ubiquity of computer systems in mission- and life-critical applications, which renders the cost of their failure unacceptably high. Further, in a networked world, systems that were once considered isolated are becoming vulnerable to malicious security attacks. My primary goal is to design dependable systems that are resilient to errors and security attacks. During the course of my graduate research, I have pursued this goal by developing techniques spanning multiple areas of computer science, namely - compilers and programming languages, computer architecture, and formal methods. This statement provides an overview of my past work and outlines my future plans.
1 Current Research
Existing approaches to dependability such as duplication and type-safe languages are “all-or-nothing” approaches i.e. either everything in the system has to be protected or no guarantees can be provided. This results in unnecessary overheads as well as in wasteful detections i.e. detection of benign errors. In contrast, my research leverages application properties to selectively protect data that is important to the application from a reliability or security point of view (critical data). We call this paradigm application-aware checking 1[1][2]. Application-aware checking ensures the integrity of the critical data even if the rest of the application is not protected. This allows the application to tolerate both accidental as well as malicious faults. The application-aware checking paradigm represents a fundamental shift in the way dependable systems are designed. Rather than blindly implement dependability techniques at multiple levels of the system, the goal is to tailor the mechanisms at each level to fit the needs of the application, as this is what the end-user perceives. This tailoring provides the following advantages: (1) Systems can be optimized to deliver dependability with minimum cost and disruption to the end-user, (2) The costs and benefits of the deployed techniques can be justified in terms that the application developer can understand, and she can make an informed decision about which techniques to deploy, and (3) Reliability and security can be considered along with performance, power and ease-of-use as variables in a multi-dimensional space of application demands. Different applications occupy different points in this multi-dimensional space, and the system can adapt based on the application. My dissertation focuses on automated derivation and implementation of application-aware error and attack detectors. A detector is an executable assertion that checks a specific data value in the application. Detectors prevent errors and attacks from propagating in the application and causing failures or compromises. I have developed compiler-based techniques for deriving application-aware detectors and have implemented the detectors using reconfigurable hardware (FPGAs). I have also analyzed the efficacy of the derived detectors using formal verification techniques. These directions are summarized here.
1.1 Compiler-based Techniques
I have developed a unified compiler framework for deriving application-aware error and attack detectors from application code. The main steps followed in the framework are as follows: (1) Identification of critical program variables from a reliability or security point of view using either heuristics or programmer annotations, (2) Extraction of the backward slices of critical variables in the program i.e. instructions that can influence the critical variable’s value, and (3) Conversion of the extracted slices into efficient runtime checks that can be implemented using a combination of reconfigurable hardware and software For deriving error detectors, the critical variables are identified using heuristics based on the application’s dataflow graph. For example, I found that errors in program variables having a high dynamic fanout (influence over other variables) cause over 80% of application failures [4]. Once the critical variables have been identified, their backward slices in the program are automatically extracted on a per function basis. By creating specialized versions of the backward slices for each control path in the program, my technique derives optimal symbolic expressions to represent the computation of critical variables in the program [3]. I have implemented this technique by extending the LLVM compiler developed at the University of Illinois. For deriving attack detectors, the programmer is required to identify security critical variables through annotations. The backward slice of security critical variables is automatically extracted, and the dependencies in the slice are converted to a signature called the Information Flow Signature [5]. A security attack on the critical variable will impact the backward dependencies of the variable, thus violating the derived signature (for that application). This technique protects against a broad class of security attacks, including memory corruption attacks such as buffer overflows, and insider attacks in which parts of the application or the operating system are malicious. We have implemented this technique using the IMPACT compiler, also developed at the University of Illinois. This project was a joint effort with my colleagues at UIUC.
1
I co-invented the application-aware checking paradigm along with my advisor and colleagues at the University of Illinois. Page 1 of 4
Research Statement for Karthik Pattabiraman
1.2 Reconfigurable Hardware
The application-aware error and attack detectors derived by my compiler framework have been implemented in reconfigurable hardware (FPGAs). The main advantage of hardware implementation is that errors and attacks can be detected before they have had a chance to propagate and affect the application. Further, the detectors can be executed in parallel with the application without incurring significant performance overheads. Hardware implementation also ensures that the system continues to provide reliability and security to applications even if the Operating System fails or is compromised. The hardware implementation of the application-aware detectors is an integral part of the Trusted Illiac project at the University of the Illinois. The Trusted Illiac project uses reconfigurable and off-the-shelf hardware to provide transparent dependability to applications [1] and is a joint effort between the University of Illinois and multiple industrial partners such as AMD, HP, IBM, Xilinx and Nallatech. The project has received significant coverage in the media, in both newspapers such as the Chicago Tribune and the Seattle Times and technical magazines such as Dr. Dobb’s journal. I was the lead student researcher in building the Trusted Illiac prototype, and I worked closely with fellow graduate students, undergraduate students and research programmers to bring this project to fruition. This has been a tremendous learning experience for me, for in addition to the technical challenges, I faced significant challenges in leading a team of graduate students with diverse backgrounds in software and hardware design. This experience taught me the importance of articulating a clear vision and breaking down a larger problem into challenging sub-problems so that each person can unleash their creative energies towards a common goal. I look forward to similar experiences in the future.
1.3 Formal Verification
The third main thrust in my dissertation research has been the use of formal verification techniques for dependability validation. Formal verification is a valuable complement to empirical evaluation, and can be used to rigorously evaluate the capabilities of detection mechanisms and expose corner-case scenarios in their design. To this end, I built a formal framework to evaluate the effects of hardware errors on software programs and reason about detection mechanisms expressed in a generic form. The framework is called SymPLFIED [9] (Symbolic Program Level Fault Injection and Error Detection Framework) and was implemented using the Maude system developed at UIUC. It uses symbolic execution and model checking to comprehensively explore all possible manifestations of an error in the program under a given hardware fault model. Based on this work, I was awarded William C. Carter award at the IEEE International Symposium on Dependable Systems and Networks (DSN), 2008. The William C. Carter award is awarded annually by the IEEE Technical Committee on Fault-tolerant Computing and IFIP Working Group on Dependable Computing to “recognize an individual who has made a significant contribution to the field of dependable computing based on his or her graduate dissertation research”. Earlier in my graduate research program, I worked with a fellow graduate student to find security vulnerabilities in library functions [6] using automated theorem proving techniques. This was also implemented using the Maude system.
1.4 Industrial Research Experience
During my PhD studies, I did a summer internship at Microsoft Research (Redmond), where I built a runtime system called Samurai to protect an application from memory corruption errors [10]. Samurai provides the benefits of type-safe languages such as Java and C# to programs written using type-unsafe languages such as C and C++. To use Samurai, the programmer identifies critical data that must be protected in the program. Samurai keeps multiple copies of the critical data at random locations in the process heap to minimize the probability of correlated errors. Valid stores in the program update all replicas while invalid stores only update one of the replicas, thereby causing an inconsistency that is repaired using majority voting. Samurai was featured on “lambda-the-ultimate”, a popular programming languages research weblog. As we were designing and implementing Samurai, we realized that the principles embodied in Samurai could be abstracted and generalized as a memory model called Critical Memory [11]. Critical memory interposes separate memory semantics for critical data on top of existing memory semantics. The main advantage is that even clients that are unaware of critical memory can interoperate seamlessly with it. Critical Memory can also be used to recover from concurrency errors, notably race conditions in lock-based programs [12] and violations of strong atomicity in transactional memory-based programs [11]. These projects were carried out in collaboration with colleagues at Microsoft Research. Other institutions at which I have done internships are IBM Research (Watson), where I worked on virtual machine migration and Los Alamos National Labs, where I worked on power-aware parallel computing. I have also collaborated with researchers from Sun Microsystems on modeling checkpointing schemes in large-scale parallel systems [13]. These experiences have broadened my perspective and given me a valuable glimpse into the world of industrial research. Page 2 of 4
Research Statement for Karthik Pattabiraman
2 Future Research
In my future research, I will pursue the following directions that build on my dissertation research and industry experience.
2.1 System Partitioning and Isolation
Software developers face a constant tension between ensuring system correctness and adding new features that increase its complexity. The pull for more features wins and increases system complexity to a point where it is not practical to ensure the correct operation of the entire system. One way to effectively deal with complexity is to isolate the critical components needed for core functionality from the non-critical components. The key challenges involved in this approach are (1) Automated identification and extraction of critical components in the system, (2) Assurance that errors or attacks in non-critical components cannot propagate to critical components, and (3) Mechanisms to automatically recover the non-critical components from the critical components in the event of a failure or compromise. My PhD research has explored the first two directions, while the third direction remains an open problem. This is a direction I plan to explore in the near term.
2.2 Compilers and Architectural Techniques for Dependability
In the near- to medium-term, I would like to explore how compiler and architectural techniques can be used for improving system dependability. Specifically, I am interested in studying the impact of compile-time and runtime optimizations on application dependability. For example, the guarantees provided by security techniques often depend on the accuracy of the underlying static analysis, yet little research has been done on understanding this relationship. Our earlier work [5] briefly explored this space, but a lot more needs to be done before we can solve the problem. Another example is that modern superscalar processors have a significant amount of redundancy in functional units, which are often underutilized. By making small changes to the instruction scheduler, it is possible to exploit this redundancy to achieve high reliability. I have explored this idea in [14], but this is just one example of how small architectural changes can lead to huge wins in dependability. I also plan to involve the larger research community in promoting the importance of dependability in the architecture and compiler areas, which have traditionally focused on performance enhancement. To this end, I proposed and organized a workshop on Compiler and Architectural Techniques for Application-level Reliability and Security (CATARS) as part of the International Conference on Dependable Systems and Networks (DSN), 2008. The workshop received an overwhelmingly positive response from the attendees and we hope this workshop becomes a regular feature at DSN.
2.3 Dependability in the Multi-core Era
In the longer term, the advent of multi-core processors has provided a unique opportunity to obtain unprecedented performance gains in applications. However, multi-core processors may expose even more failures and security vulnerabilities in applications than exist today. In order to realize the performance benefits of multi-core without breaking existing applications, we would need techniques that can leverage multi-core processors for detection and recovery. For example, in a multi-core processor the application’s state can be distributed across the cores so that each core can check the operation of its neighbors. In case a core fails, its state can be inferred based on the state of its neighboring cores. Such techniques will require fundamental innovations in the design of both application software and multi-core processors. I plan to explore the area of multi-core dependability by drawing on my previous experience in compiler and reconfigurable hardware design. I will also leverage virtualization-based techniques to provide transparent recovery from errors and attacks.
2.4 Dependable Pervasive Systems
A pervasive system consists of a large number of computing devices that are invisibly embedded in the environment. These systems have been applied in a wide variety of domains such as health-care, education, home automation and public places. As pervasive systems assume a larger role in our daily lives, it becomes important to ensure their dependability [15]. One challenge in this domain is that the system must be capable of transparent reconfiguration in the event of a failure. Further, malicious users must be incapable of penetrating the system even if they have physical access to one or more devices. Often these systems operate in harsh environments, and they must be capable of automatic adaptation to environmental conditions. Currently, adaptation, reconfiguration and intrusion-tolerance policies are specified and integrated manually into pervasive systems, which makes them brittle and error-prone. My goal is to automatically infer policies based on end-to-end system requirements and to integrate them automatically into the system. I will also investigate policies to mitigate operator error in these systems. This will allow pervasive systems to be deployed in mission-critical situations where high dependability is required, such as hospitals and disaster areas. I plan to draw upon diverse disciplines such as evolutionary biology and cognitive psychology to achieve these goals. This is the future of computing, and I will focus my research on shaping it. Page 3 of 4
Research Statement for Karthik Pattabiraman
References
[1] [2] [3] [4] [5] [6] [7] [8]
[9] [10] [11] [12] [13] [14] [15]
R. Iyer, Application-aware Reliability and Security: The Trusted Illiac Approach, Intl. Symp. on Network Computing and Applications (NCA), 2006. Iyer, R. K., Kalbarczyk, Z., Pattabiraman, K., Healey, W., Hwu, W. W., Klemperer, P., Farivar, R. Towards Application-Aware Security and Reliability. IEEE Security and Privacy 5, 1, Jan. 2007. Pattabiraman, K., Kalbarczyk, Z., and Iyer, R. K. Automated Derivation of Application-aware Error Detectors using Static Analysis. In Proceedings of the 13th IEEE international on-Line Testing Symposium (IOLTS), July 2007. Pattabiraman, K., Kalbarczyk, Z., and Iyer, R. K. Application-Based Metrics for Strategic Placement of Detectors. In Proceedings of the 11th Pacific Rim international Symposium on Dependable Computing (PRDC), December, 2005. W. Healey, K. Pattabiraman, S. Ryoo, P. Dabrowski, R. Iyer, Z.Kalbarczyk and W. M. Hwu, Enforcing Critical Data Integrity through Information Flow Signatures, Technical Report (UILU-ENG-02-2202), University of Illinois. S. Chen, K. Pattabiraman, Z. Kalbarczyk, R. K. Iyer: Formal Reasoning of Various Categories of Widely Exploited Security Vulnerabilities by Pointer Taintedness Semantics. 9th IFIP International Information Security Conference (SEC), August 2004. P. Klemperer, S. Chen, K. Pattabiraman, Z. Kalbarczyk and R.K. Iyer, FPGA Implementation of Statically Derived Error Detectors, Workshop on Dependable and Secure Nano-computing (WDSN), July 2007. Hardware Implementation of Information Flow Signatures Derived via Program Analysis, Paul Dabrowski, William Healey, Karthik Pattabiraman, Shelley Chen and Zbigniew Kalbarczyk and Ravishankar Iyer, Workshop on Dependable and Secure Nano-Computing (WDSN), July 2008. K. Pattabiraman, N. Nakka, Z. Kalbarczyk and R.K. Iyer, SymPLFIED: Symbolic Program-Level Fault Injection and Error Detection Framework, International Conference on Dependable Systems and Networks (DSN), June 2008. K. Pattabiraman, V. Grover and B.G Zorn, Samurai: Protecting Critical Heap Data in Unsafe Languages, European Systems Conference (EuroSys), April, 2008. K. Pattabiraman, V. Grover and B.G Zorn, Software Critical Memory: All Memory is not Created Equal, Technical Report (2006-128), Microsoft Corpration, 2006. R. Nagpal, K. Pattabiraman, D. Kirovski and B.G Zorn, Tolerace: Tolerating and Detecting Races, Workshop on Software Tools for Multi-core Systems (STMCS), 2007. L. Wang, K.Pattabiraman, Z. Kalbarczyk, R.K. Iyer, L. Votta, C. Vick, A. Wood, , “Modeling Coordinated Checkpointing for LargeScale Supercomputers”, International Conference on Dependable Systems and Networks (DSN), 2005. Nakka, N., Pattabiraman, K., and Iyer, R. Processor-Level Selective Replication. In Proceedings of the Intl. Conf. on Dependable Systems and Networks (DSN), June, 2007. C. Fetzer et al.. Challenges in making pervasive systems dependable. In Future Directions in Distributed Computing, pp. 186--190. 2003.
Page 4 of 4