Reliability and Safety

Document Sample
Reliability and Safety Powered By Docstoc
					Reliability and Safety

What can go wrong?
Risks of Computing

u They support many aspects of our
  security:
   u Fly by wire aircraft
   u Patient monitoring and care
     administration
   u Financial transactions
   u Telephone networks
   u Military surveillance and
     responses
Possible States of a
Computer
u   Functioning correctly
u   Functioning incorrectly
u   Down
u   Intentionally off
Computer failure causes:

u Faulty design
u Sloppy implementation
u Careless or insufficiently trained
  users
u Poor user interfaces
u Hardware/Software malfunctions
u Specification errors
u Scope/Application inconsistency
Computer users
perspective
u Should understand
  limitations of the
  computers
u Need for proper training
u Need for responsible use
u Difference between good
  products and bad ones
Computer Professional
Perspective
u Study computer failures
u Study computer ethics
Educated Member of
Society Perspective
u Help us evaluate the
  reliability and safety of
  various computer
  applications
u Help evaluate computer
  technology
Three Categories of
Failures
u Problems for individuals
u System failures that affect
  large numbers of people or
  cost large amounts of
  money
u Problems in safety-critical
  applications
Problems for Individuals

u Billing Errors
  u design and/or
     implementation of programs
  u Not enough care - input
     error
  u Not enough testing -
     reasonable range
  u Not enough training
Database Accuracy
Problems
u Info in database is not
  accurate
u Automatic entering of info -
  mistakes can be overlooked
u Copies of incorrect info can be
  in other systems
u Not knowledgeable enough
  about the system
Causes

u Large population
u Most of our financial
  interactions are with strangers
u Automated processing without
  human common sense
u Overconfidence in accuracy of
  data
u Lack of accountability
Consumer Hardware and
Software
u Usually have more serious errors in
  their first releases
u Regularly sold with known bugs
u Hardware also has flaws
u tradeoff between cost, debugging,
  and marketing
u Dishonesty, denials of problems,
  lack of adequate response to
  complaints
System Failures

u Lots of $$$$
u Complete shutdown of basic
  services
u Areas:
  u communications
  u Business and financial
    systems
  u Military
WHY?

u Not enough testing
u Technical difficulties
u Poor management
  decisions
u Dishonesty in promoting
  the system and responding
  to problems
Communications

u Phone Service
u How Bad?
  u pagers
  u phone calls
  u 911
  u Communications for airports
  u cellular phones
Business and financial
systems
u Stock exchange
u ATM
u Contest by Pepsi
  u too many winning tickets
    issued
Destroying Business

u Loss of sales
u incorrect info affects
  business
u dissatisfied customers
u incorrect prices
u loss of data
Military

u Data management
u Weapons system design
u Battle simulation
u Battle management
  u command/control
  u communications
  u intelligence
u Nuclear war
Why?

u Not enough testing
u technical difficulties
u poor management decisions
u dishonesty in promoting the
  system and responding to
  problems
u Results in delays and abandonment
  of projects
The Denver Airport
baggage system
u Outbound luggage checked at
  ticket counters or curbside
   u to be delivered to anywhere in
     <10 minutes
   u via automated system of cars on
     tracks
   u connecting flights or terminals
u Laser scanners
u tracks - 4000 cars
Problems Encountered

u Cars crash into each other
  at intersections
u Luggage misrouted,
  dumped or flung
u Needed cars were idle or
  put to rest
Specific problems

u Real world problems
  u scanners got dirty
  u knocked out of alignment
u Software error
  u rerouting of cars to
    waiting area - idle
Causes

u Time allows for development
  and testing was insufficient
u Significant changes in
  specifications were made after
  project began
u Not enough debug time
u Poor management
u Unrealistic plan
Safety Critical
Applications
u Use of computers is increasing rapidly in
  these areas
u Use of computers in these areas can save
  $
u Areas
   u Military               Medical
     Applications
   u Power plants
   u Aircraft
   u Trains
Aircraft - Fly by Wire

u Pilots do not directly control plane
u Actions are input to computers that
  control the aircraft systems
u Pilot interaction is critical
u Need for easy way to override
  computers
u Easy transfer between automatic
  and manual control
Air Traffic Control

u Long delays
u Increased risk of collisions
u Old machines - computer
  systems
u Political - government
  spends $ elsewhere
Case Study - Therac-25

u Software controlled radiation therapy
  machine used to treat people with cancer
u Problems:
   u Massive overdoses administered
   u Repeated overdoses due to faulty
     display
   u Death
u Operated in dual machine mode -
  electron beam or x-ray photon beam
Why?

u Lapses in good safety design
u Insufficient testing
u Bugs in software that
  controlled machines
u Inadequate system of
  reporting and investigating
  accidents and deaths
Specific problems

u Some hardware safety features
  were eliminated in newer models
u Software used was assumed
  correct form older systems
u Malfunctioned frequently
u Weakness in design of operator
  interface
u inadequate explanation of error
  messages if any
Specific problems
continued
u Machine allowed one-key
  intervention versus
  automatic shutdown
u Inadequate documentation
u Poor test plan
Software Errors - bugs

u Fatal error was a simple fix
u Fixes are complex, expensive, and
  prevents use of machine while fixing
u Bugs
   u can be intermittent and hard to detect
   u importance of self checking
   u importance of using good
     programming techniques
Overconfidence

u Leaving out changes that
  are necessary
u Ignoring error messages
u Not using backup devices
  (video or audio)
Conclusion and
Perspective
u Irresponsibility leads to criminal
  charges
u Responsibility leads to merit awards
u Importance of good software
  development
u Consequences of carelessness, cutting
  corners, unprofessional work, or
  attempts to avoid responsibility
u Lack of appreciation for risks
u Poor training
Ways to prevent problems

u   Good computer systems
u   Good training
u   Accountability
u   Individual responsibility
u   Management responsibility
u   IE IEEE Code of Ethics
Increasing Reliability and
Safety
u What goes wrong?
  u Many lines of code and
    many programmers
  u See page 130
  u Problems are
    managerial, technical,
    social, legal, ethical
Overconfidence

u Unappreciative of risks
u Ignore warnings
u Don’t consult manuals
Professional Techniques

u Use good software engineering
  techniques at all stages of
  development:
   u specifications
   u design
   u implementation
   u documentation
   u testing
Professional Techniques

u Study the techniques and
  tools available
u Knowing or learning
  enough about the
  application field and the
  software or systems being
  used
Why Study Failures?

u Provides technical lessons
u Leads to improved
  hardware and software
  products
u Provide ethical data
u Lead to improved ethical
  codes/laws
Lessons Learned

u Accidents are not the result of
  unknown scientific principles but
  rather a failure to apply well-
  known engineering practices
u Accidents will not be prevented by
  technological fixes alone, requires
  control of all aspects of the
  development and operation of the
  system
Lessons Learned

u Software developers need
  to recognize the limitations
  of software, and use
  hardware safety
  mechanisms
User interfaces and
human factors
u Aircraft control systems
  u Pilot needs feedback to
     understand what the automated
     system is doing at any time
  u The system should behave as
     the pilot expects
  u   workload that is too low can be
     dangerous
Redundancy and Self-
checking
u Redundancy - judging - expensive
u Complex systems collect
  information to diagnose and
  correct errors
u Audit trails are vital
u Detail records help protect against
  theft and help trace and correct
  errors
Redundancy and Self-
checking
u Designed to constantly monitor itself and
  correct problems automatically
u Half of the computing power is devoted
  to checking
u The rest for errors
   u closes off part of teh system
   u reroutes
   u corrects problems and reroutes again
TESTING

u CRITICAL!
u Principles and techniques
  exist
u can use another company
  to perform
  Independent verification
  and validation
Dangerous Tendencies

u Operators
   u bypass check mechanisms through
     familiarity
u Technicians
   u Blame random mechanical or signal
     glitches rather than software
u Corporate Managers
   u Initially deny and ignore - then cover
     up
   u Finally - deal with expensive fixes
Overall Lessons Learned

u Should not declare problem
  understood with first hypothesis
u Should not expect management to
  follow through on field reports
u Overconfidence in software leads
  to economical marginal designs
Overall Lessons Learned

u Enforcement of software
  engineering practices is
  often abysmal
u Basing risk assessments on
  individual subsystems
  often leads to unrealistic
  optimism
Lessons for systems
engineering
u Hardware backups valuable
u Software must not be
  presumed innocent
u Software errors related can be
  indistinguishable
u Audit trails are critical
u Risk estimates are subjective
u User feedback is valuable
Lessons for software
engineering
u Documentation should be on-going
u Designs should be kept simple
u Testing should be built into
  software
u Software must be tested out of
  system and in system
u Reuse of software should be tested
  like new software
Lessons for oversight

u Users are more likely to
  make initial observations
  than monitoring officials
u Users need reliable
  information in order to be
  maximally valuable
Laws and Regulations

u Criminal and Civil penalties
u Suits against company that
  designs or sells the system
u Criminal charges when fraud
  or criminal negligence occurs
u Need contracts
u Need well designed laws and
  standards
Regulation

u Requirement for approval by a
  government agency before a new
  product can be sold
   u including specific testing
     requirements
u The profit motive cause skimping on
  safety
u Better to abandon in some cases
u Inadequate abilities to judge by
  customer
u Hard to sue large companies
Regulation

u Expensive and time-
  consuming
u Newer procedures may not
  be enforced
u Lots of paperwork
Professional licensing

u Licensing of software development
  professionals to protect against
  poor quality and unethical behavior
   u Specific training
   u Passing competency exam
   u Ethical requirements
   u Continuing education

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:12/15/2013
language:Unknown
pages:55
leader6 leader6
About leader6@yeah.net