Distributed Data Processing See chapter 3, Stalling’s book Ho Sooi Hock Outlines Types of Data Processing Advantages and Disadvantages of CDP Driving Factors and Reasons for DDP Advantages and Disadvantages of DDP Forms of DDP Distributed Applications and Databases Networking requirements for DDP Data Processing Centralized data processing Computer, processing, data, control, staff and development are centralized Distributed data processing (DDP) May include centralized center plus satellite facilities Involves distributed computers, data, and processing Greater flexibility in meeting individual needs More redundancy and more autonomy Advantages of CDP Economies of scale in the purchase and operation of equipment and software Shared professional resources Ease of control for data processing procurement, standards for programming, data file structure and design of security policies An Example of CDP Distributed Data Processing (DDP) “A data processing system in which processing is decentralized, with the computers and storage devices in dispersed locations.”  “Data processing in which some of the functions are performed in different places and connected by transmission facilities.”   www.iclml.mdx.ac.uk/DELBERT/glossary_terms.htm  http://www.cogsci.princeton.edu/cgi-bin/webwn?stage=1&word=distributed+data+processing Distributed Data Processing (DDP) One in which computers (usually smaller computers) are dispersed throughout an organisation Processing information in the most effectively way based on operational, economic and geographic considerations Redundancy Autonomy An Example of DDP Key Factors of Increasing DDP Dramatic and continuing decrease of hardware costs, accompanied by an increase in its capability More powerful desktop computers Improved user interfaces (GUI) Growing repertoire of applications software Technologies to share data across multiple servers Reasons for DDP Need for new applications On large centralized systems, development can take years On small distributed systems, development can be component-based and very fast Need for short response time Centralized systems result in contention among users and processes Distributed systems provide dedicated resources Benefits of DDP Responsiveness Availability Correspondence to organisation patterns Resource sharing Incremental growth Avoiding the “all or nothing” approach Increased user involvement and control Decentralised operation and centralised control End-User productivity Distance and location independence Privacy and security Vendor independence Flexibility Potential Drawbacks of DDP Difficult to test and diagnose failure Dependent on communication technology Incompatibility among equipments Incompatibility among data Network management and control Difficulty in control of corporate information resources Sub-optimisation Duplication of effort Forms of Distributed Data Processing Distributed Applications Distributed Devices Network Management and Control Distributed Data Distributed Applications One application splits up into components that are dispersed among a number of machines One application replicated on a number of machines A number of different applications distributed among a number of machines Can be characterised by vertical or horizontal partitioning Vertical Partitioning Data processing is distributed in a hierarchical structure This distribution may reflect organisational structure, or may simply be the most appropriate for the application Examples of Vertical Partitioning Insurance Data processing distribution is often a two-level hierarchy, branches prepare new contracts and process the claims. Summary information is sent to a head office. Process Control Each major operation is controlled by a workstation. The microprocessor are responsible for the automated control of sensors. All the workstations are linked to a higher-level computer concerned with operations planning, optimization, management of information and general corporate data processing. Horizontal Partitioning Data processing is distributed among a number of computers that have peer relationship There is no concept of master/slave Computers in a horizontal configuration normally operate autonomously One application is replicated on different systems, e.g. word processor and spreadsheet Different application on different systems Examples of Horizontal Partitioning Office automation support system Secretarial staff and other personnel are equipped with personal computers linked in a network. Each user’s PC contains software packages useful to that user (word processing, spreadsheet). The systems exchange message, files and other information. Air traffic control Each regional center operates autonomously. Within the center, several computers are used to process radar and radio data to provide a visual status to their traffic controllers. Distributed Devices Support a distributed set of devices that can be controlled by processors, e.g. ATMs or laboratory interface equipments Distribution of processing technology to various locations of the manufacturing process in factory automation Network Management and Control Control of access to the facilities in the distributed system Monitor the status of various components in the distributed system Manage communications facility to ensure availability and responsiveness Each distributed computers must include some management and control logic to interact with the central network management system. Client/Server Architecture Combine the best aspects of both distributed and centralised computing Users work on powerful workstations or PCs which support end user programming and use of off-the-shelf software Good response time inherent in distributed architecture Cost-effective, e.g. economies of scale by centralising support for specialised functions Flexible and scalable Distributed Data Small organisations can function with a collection of files, e.g. report files, spreadsheets Large organisations need one or more databases Distributed organisations will often require distributed database “A distributed database is one in which portions of the data are dispersed among a number of computer systems.” Three ways to organise database: centralised, replicated and partitioned Centralised Database No duplication of data, little reorganisation required Used when security and integrity of the data are important Drawbacks Contention when accessing data simultaneously Response time Poor reliability Replicated Database All or parts of database is copied at two or more computers Less contention and improved response time Provides backup and recovery High reliability Drawbacks High storage and database reorganisation costs Replication strategy variants Real time (two phase commit) Near real time (batch backups, e.g. every 30 minuets) Deferred (bulk transfer, once or twice per day) Partitioned Database The database exists as distinct and non-overlapping segments Dispersed load hence good response time Eliminates single point of failure Drawbacks Difficult to produce ad hoc management reports Complex processing logics Not good for request involving data from multiple partitions Networking Implications of DDP Connectivity The ability of components in the system to exchange data Availability The percentage of the time that a particular function or application is available for users Performance Response time Summary Differences between CDP and DDP Advantages and disadvantages of CDP and DDP Different forms of DDP Distributed applications and databases Networking requirements Acknowlegements This module was taught by Dr. Payam Mamaani Barnaghi since 2005. Most slides have been adopted from his lecture materials and original works of William Stallings with some changes.