Log Warehousing: What? Why? How? Dr. Anton Chuvakin WRITTEN: 2006 DISCLAIMER: Security is a rapidly changing field of human endeavor. Threats we face literally change every day; moreover, many security professionals consider the rate of change to be accelerating. On top of that, to be able to stay in touch with such ever-changing reality, one has to evolve with the space as well. Thus, even though I hope that this document will be useful for to my readers, please keep in mind that is was possibly written years ago. Also, keep in mind that some of the URL might have gone 404, please Google around. Introduction It is easy to understand that decision makers want to make the decisions using accurate information that can also be obtained without expending too much time and energy. For example, a marketing team hoping to implement customer- tailored marketing crusades would be happy to have quick access to information like customer budgets as well as what customers purchased what item and when. The same hope applies to a security team looking to choose the next technology to deploy and trying to balance the need versus the ongoing threat landscape. Along the same lines, being able to review current data in the context of historical data to determine what to do next would be helpful. This is where data warehousing comes in. The optimal way for enterprises to easily access relevant information for business decision-making is to place it in an optimized central repository known as a data warehouse. Advantages of data warehousing include enhanced end-user access in one location to data from a myriad of different of sources (which saves the time and energy of having to collect and analyze data from the individual sources) and the ability to quickly run specified trend reports of current and historical data to add a historical angle to present decision-making. Let’s not forget, however, that over 25% of all enterprise data is log data, and this pool of unmanaged information is generated, unsecured, at rates sometimes reaching millions of messages per second. Key IT information buried in terabytes of log data must be located, analyzed, and reported on. Having a data warehouse as a centralized repository for this log data is crucial to IT security, regulatory compliance (especially considering some mandates like PCI DSS, FISMA, and HIPAA specifically call for log management) as well as immediate operational needs. This paper introduces log data warehousing technology and how this technology aids enterprises in their security and compliance initiatives. Setting the Stage: The origins of data warehousing Let us introduce a few basic data warehousing concepts first. All organizations operate on two different types of information systems that perform different functions but work together to keep the enterprise running smoothly: operational and informational. Both are related to a category of business intelligence (BI) systems that help gather, analyze, and provide access to information about company operations on all levels from employees to executive management. Operational systems are vital to the day-to-day workings of the enterprise— managing accounting or payroll, for example. Informational systems manage the information that is used to make key decisions for managing and planning in the company—things like marketing plans and financial analysis. As a result of their importance to the everyday operations, operational systems have historically been the first draw of company resources and the first parts to be computerized and integrated. Such online transaction processing (OLTP) applications, which make it possible to automatically record millions of discrete transactions each day, now allow enterprises to grow without having to expend resources. With operational OLTP systems meeting day-to-day information needs for many enterprises, the resource focus has shifted to meeting informational business requirements. Data warehousing, a key BI element, stems from this need to meet decision-making business requirements and provide access to information without slowing down the operational systems. How can organizations effectively and efficiently access the data that is at the core of the enterprise’s most critical decision-making functions? Data warehousing is the answer. What is a Data Warehouse? A data warehouse, refers, broadly, to an integrated solution that provides collection, retention, and analysis of transaction data for security, compliance, and systems management. More specifically, according to Bill Inmon, who coined the phrase, a data warehouse is “a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision making process,” defining the aforementioned terms as follows: Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations. Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. Time-variant: All data in the data warehouse is identified with a particular time period. Non-volatile: Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business. At this point, the astute readers will notice that the above requirements are very similar to what they are looking to have in their log management system. We will discuss this similarity in detail further in the paper. A data warehouse environment typically consists of a database (designed for query and analysis), an extraction, transportation, transformation, and loading (ETL) solution (which is the tool that puts data from other sources into the central repository that is the data warehouse), an online analytical processing (OLAP, as opposed to OLTP, which processes transactions rather than data that helps with planning, problem-solving, and decision support) engine to perform complex analysis and ad hoc queries, and possibly other site-specific BI applications. Note that this stack of applications also looks remarkably parallel to leading log management solutions. Why use data warehouses? Data warehouses simplify a time-consuming and frustrating process. The data in question is key to company decision-making, so without a central repository, business managers must manually go through multiple distinct operational inquiries or reports in an attempt to find the information. Further, sometimes the processing required to extract the vital data from each operational system will use so much of the system resources that the administrator must wait until after hours before running the queries required to produce the report. To boot, the time delays and potential for human error means that the result report may contain data that is inconsistent, inaccurate, or obsolete, as information sources can be unreliable and can alter or delete data. Broadly, a data warehouse simplifies this process and makes data collection, analysis, and reporting quick and easy. Information is extracted from a variety of sources as it is generated and integrated with existing data at the data warehouse. By the time an IT professional submits a query, the information sought is already at his fingertips, with all inconsistencies or differences already resolved. Running queries on data from all different sources becomes simple and efficient. Data is organized and stored safely and reliably for as long as necessary. Organizations can make informed decisions. More specifically, data warehouses: Offer easy but secure access to previously difficult-to-access data: Because it integrates enterprise data from a variety of sources, a data warehouse spares businesses the need to learn, understand, or find operational data in their complicated and esoteric points of origin. Provide immutable data: Inconsistencies and differences in the data from different sources are already resolved by the time the data enters the data warehouse. Since the data warehouse offers a common source of information, it quells concerns about the accuracy and precision of data. Record the past accurately: Decision-makers not only want to understand the “now” data, but they also want to see the “now in the context of then” data. A large portion of enterprise data has little business relevance until it is viewed in light of historical data. In the data warehouse, historical data is integrated with current data for quick access to information that will help shape the future of the business. See data from all vantage points: Businesses with the ability to evaluate data from a variety of angles, in a variety of formats and detail, and with a variety of parameters can more quickly meet their own information needs without expending time, energy, and manpower to collect, format, and separate the relevant information. Separate informational and operational processing: Different types of data have different types of processing requirements. Attempts to meet both informational and operational information needs with one system simultaneously is inefficient, slows down the system, and makes maintaining normal system functioning difficult. Data warehousing allows informational processing to occur separately from operational processing. LDW as a platform for log analysis, audit and compliance applications : PCI Case Study Let’s consider a large retain chain that deploys a log warehousing technology to satisfy PCI DSS requirements across their entire organization, from end stores to the datacenters. The retailer went ahead and deployed a commercial log warehousing solution when their PCI auditor strongly suggested that they need to look into it to satisfy the PCI Requirement 10 for the upcoming audit. Given a diverse combination of PCI in-scope systems, some running legacy operating systems and retail-specific vertical applications, as well as extreme shortage of skilled IT personnel, the retailer didn’t even consider building their own log management application. As a result, they stepped from not even collecting their logs to running an advanced log warehouse for collection and analysis. The project took a few months following a phased approached. They decided to implement it from outside in, based on their risk assessment as well as auditor suggestion. However, they decided to also incorporate their main processing systems, which were deployed internally, in Phase I. They started from their DMZ firewalls and then progress with feeding the following logs into a log warehouse system Actual payment processing applications from all involved servers The server logs from the above processing servers Select internal firewalls that control access to payment processing systems All Internet DMZ firewalls and intrusion-prevention systems Routers and other network gear from the in-scope network ranges Other in-scope servers which are connected to the main processing servers Databases that are involved in payment processing A few things need to be said about the above approach . The sequence is based on both their risk assessment (thus DMZ and outside network segments come first) and complexity of log collection and analysis (thus databases come last). The former lead them to focus on the outside threat first, while the latter delayed some of the log collection efforts: it is much easier to forward Cisco PIX firewall logs to an analysis server, but a database logging configuration, collection and analysis present a significant challenge. For example, database logging presents a challenge due to multiple factors such as performance degradation, grabbing logs securely as well as a relative dearth of database-specific log analytics to run on a data warehouse. As a result of the phased approach as well as clearfly defined success criterial (namely, satisfy the PCI requirements), the project was a successful implementation of a log data warehouse. The organization did pass the PCI audit with flying colors. In addition, their IT team demonstrated that their PCI logging implementation actually helped address a few other compliance mandates, such as Sarbanes-Oxley act, since PCI DSS covers essentially the same areas of IT governance. At the same time, a log data warehouse tools also strengthened their operational troubleshooting capabilities and even affected overall IT efficiency. Conclusion: Future of Log Data Warehousing To conclude, let’s try to peek at the future of log warehousing. First, the future will bring even more diversity and much more volume to the world of logs. Higher network bandwidth, bigger servers coupled with more stringent compliance requirements will challenge even the best organizations and will make log data warehousing technology even more essential. However, leading log data warehouse vendors will meet the challenges due to the strength and flexibility of their architecture as well as research into new log data analytics. ABOUT THE AUTHOR: This is an updated author bio, added to the paper at the time of reposting in 2009. Dr. Anton Chuvakin (http://www.chuvakin.org) is a recognized security expert in the field of log management and PCI DSS compliance. He is an author of books "Security Warrior" and "PCI Compliance" and a contributor to "Know Your Enemy II", "Information Security Management Handbook" and others. Anton has published dozens of papers on log management, correlation, data analysis, PCI DSS, security management (see list www.info-secure.org) . His blog http://www.securitywarrior.org is one of the most popular in the industry. In addition, Anton teaches classes and presents at many security conferences across the world; he recently addressed audiences in United States, UK, Singapore, Spain, Russia and other countries. He works on emerging security standards and serves on the advisory boards of several security start-ups. Currently, Anton is developing his security consulting practice www.securitywarriorconsulting.com, focusing on logging and PCI DSS compliance for security vendors and Fortune 500 organizations. Dr. Anton Chuvakin was formerly a Director of PCI Compliance Solutions at Qualys. Previously, Anton worked at LogLogic as a Chief Logging Evangelist, tasked with educating the world about the importance of logging for security, compliance and operations. Before LogLogic, Anton was employed by a security vendor in a strategic product management role. Anton earned his Ph.D. degree from Stony Brook University.