Document Sample
chapter4 Powered By Docstoc
Data Organization

              Chapter 04
•   Data Representation
      Number Systems
      Character Codes
•   Computer File Concepts (Data Organization)
      Elements of a Computer File
      Types of Files
      Access to Files
      File Organization & Access Methods
      File Maintenance
•   Information Processing Methods
      Batch Processing
      On-line Processing
      Real Time Processing
      Centralized Processing
      Decentralized Processing                  2
      Distributed Processing
 Data Representation
• Data appears in several forms,
  …including graphic images, pictures, and sound.
• However…
  There are TWO basic types of data, which are
      characters and numbers.
  Characters include letters and special symbols.
      Example: SLIIT – Metro
  Numbers are processed using arithmetic operations
    such as add, subtract, multiply and divide.
      In this case we assign values to numbers and the
       processing results in new values.
Data Representation
• People represent data by using a group of
  characters, such as a group of letters for a
  name, or group of digits for a quantity.
• Computers ???
   Cannot represent this data in the same form
    that people use !!!
     The data stored in NUMBER SYSTEMS.

Number Systems
There are FOUR important number
All numbering systems are based on
    TWO concepts:
     1. Absolute value
     2. Positional value

Number Systems
• Decimal (Integer) Number System:
   People use this number systems.
   Absolute values:        0 to 9
   Positional values allied to powers of :   10

• Binary Number System:
   Computer uses this numbering systems.
   Absolute values:        0 to 1
   Positional values allied to powers of :   2

Number Systems
• Octal Number System:
  Absolute values:        0 to 7
  Positional values allied to powers of :   8

• Hexadecimal Number System:
  Most complex numbering system.
  Absolute values:        0 to 15
  Positional values allied to powers of :   16

Number Systems
• Integer
  The simplest type of numbers that we want to
   store and manipulate as data is the integer.
  Whole numbers without decimal places.
  They are used to represent things that cannot be
   divided into smaller simpler things.
       e.g. number of people in an office, number of houses in
        a city.
  Could be either positive or negative no’s
  e.g. int


  Can be stored 100% accurately in a
  computer            Least significant digit

  Most significant digit
Data      Representation
• Computers represent data (integer) using
  patterns of ON-OFF states in a series of
  electronic circuits.

• Computer stores data converting the data to this
  two-state representation…
   Which is called binary representation.
Data Representation
• To show data in binary representation on
   people use…
    the digit 1 for the “ON” state, and
    the digit 0 for the “OFF” state.
• The digits 1 and 0 are called:

Bits & Bytes
• Bit is the smallest piece of data that can
  be recognized and used by digital
• A bit can either be a 1 or a 0.
• Byte is a grouping of eight bits. e.g.
• Nibble is half a byte. e.g. 1100.
• Byte is used as the basic unit of measuring
  the size of memories.

  Pure Binary Form
• Sign & Magnitude Format
    The decimal number…
       First converted into binary and
       Select an area of one or more bytes depending on the number of
        bits for the decimal value.
       The first bit of the area is used to store the Sign.
         – By convention;
              • 0 is positive, and
              • 1 is negative.
              • Note: Negative numbers are usually stored in such a way that
                when they are added to their positive equivalents - this is
                called “twos complement” representation.
       Example:                                         One Byte Area
         – Decimal 8510            Binary 1010101
                                                   0      1010101
                                                  Sign    Magnitude
Octal and Hexadecimal
• Octal – Base 8 and Hexadecimal – Base
  16 numbers are also used in Computer

• These two number bases can be used to
  represent binary numbers in a short form
 i.e. Conversion between binary and
  hexadecimal, octal and vice versa can be
  done without any calculation.
Binary Coded Decimal (BCD)
• The values of 0 to 9 are stored in four bit
• This means that the number stored is not
  in the integer format, a special care needs
  to be taken by the processor
   if a value is stored in this way, as the
     processing required in mathematical
     instructions is more complex.

BCD (Binary Coded Decimals)
 •0   0000
 •1   0001
 •2   0010
 •3   0011
 •4   0100
 •5   0101
 •6   0110
 •7   0111
 •8   1000
 •9   1001                    16
BCD (Binary Coded Decimals)
• Here each digit of a number is stored using a bit


           0101 0011 0010

                                      17        17
Real Numbers

  Cannot be stored 100% accurately in a

Real Numbers
Real Numbers can be stored in two ways:

• Fixed Point Representation
   12.45
   670.75
• Floating Point Representation
   0.125 x 104
   6.24 x 103

Floating Point
• It is used when it is necessary to measure
  a value that can change smoothly and
• They are often recognizable by the
  presence of a decimal point or a fraction.

• E.g. are temperature, length, weight and

Fixed Point Representation
• This is the normal way numbers are
  represented in day to day life.

7890.35     7867.456

• There are problems representing very large
  and very small numbers this way.

 Fixed Point Representation
• Floating-point numbers usually require more
  bytes to represent than integers.
• They are based on logarithms and thus contain
  three parts.
• Sign – for negative and positive numbers.
• Exponent – representing the power that a base
  number is raised to.
• Mantissa – a number that is multiplied by the
• The number is encoded and stored as:
          M * be
Floating Point Representation


       4.345 x        10 4

      Mantissa         Radix or

Floating Point Representation
In computing the mantissa is taken as
a fraction.

        0.4345 x            10 5

      Mantissa             Radix or
  Representing Characters
• Since the production of the first electronic computers in
  the late 1940s,
   there have been various methods developed for
     representing character representation in computer
• Characters are also stored in binary format.
• A chart is used to assign a number to each character.
• The common codes used today:
   American      Standard      Code      for   Information
     Interchange (ASCII) -7 bit code
   Extended Binary Coded Decimal Interchange Code
     (EBCDIC) - 8 bit code
   Universal Code (UNICODE), a new Worldwide
     character Standards - 16 bit code
• The most common way of representing numbers in
  personal computers.
• The name ASCII is pronounced “as-key”.
• A character consists of 7 bits
• 128 characters/combinations of 7 bits
   a-z, A-Z, 1-9, punctuation and some special characters.
• Computers use 8 bits for 256 characters
• Each character uses a different binary number.
• E.g, the name JOHN in ASCII is:
        J       O       H       N
     1001010 1001111 1001000 1001110                   26
• Although, ASCII is a seven-bit code, computers use
  an eight bit version.
   ASCII used in:
    All microcomputers,    AC
    computers, and    65     67

            01000001        01000011
• Mainly used in Mini and Main Frame computers.
• A character consists of 8 bits
• 256 characters/combinations of 8 bits
• The name of this code is pronounced “eb-si-dick”
• The chart is slightly different from the ASCII chart.
• E.g, the name JOHN in EBCDIC is:
        J       O        H        N
    11010001 11010110 11001000 11010101
• Notice that 32 bits are needed for the name, eight
  for each character.
• Data representation problems
  ASCII and EBCDIC computers cannot
   communicate without special HW/SW
  256 characters may not be enough in
   the future
  Potential successor is 16 bit Unicode

• In an effort to create a single code for all
  A 16-bit code has been developed.
• With 16-bits, there are 65,536 combinations.
  Enough for all the character used in all the alphabets
   and writing systems in the world.
• Developed as the need arose to support many
  other languages.
• The 16 bit code is used to even support Asian
• Although not widely used yet…
  It may some day be the standard code on all
   computers.                                       30
Other Simple Data Types
• Number and character are both examples of
  very simple forms of data.
• Computers understand these forms of data and
  can perform operations on them directly.
• Sometimes other simple forms of data are also
• One example that you may have already
  encountered is Boolean data - data that
  represents the values True and False.

Data Organization
• A file holds data that is required for providing
   I.e.    it   contains     a    collection   of related
    information…which is processed as a single unit that
    is further divided into records and fields.
• Some files are processed at regular intervals to
  provide this information (e.g. payroll file) and
• Others will hold data that is required at regular
  occurrences (e.g. a file containing prices of

Data Organization
• From what viewpoints is a file considered
  in terms of its organization???
   There are TWO common ways of
     viewing files:
   Logical Files
   Physical Files

Logical File
• A “logical file” is a file viewed in terms of:
   what data items its records contain and
   what processing operations may be
    performed upon the file.
   The user of the file will normally adopt
    such a view.
• A logical file can usually give rise to
  number of alternative physical file
Physical File
• A “physical file” is a file viewed in terms of:
  how the data is stored on a storage
    device such as a magnetic disk and
  how the processing operations are
    made possible.

Elements of a Computer File
• File is the simplest way to store data
  A file is made up of records
  A record is made up of fields
  A field is made up of characters
• Records are uniquely identified by a key (also
  called a key field in each record)
• Key fields are coded fields of characters

Elements of a Computer File

Elements of a Computer File
• Character: A character is the smallest element
  in a file and can be alphabetic, numeric or

• Field: An item of data within a record is called a
  field – it is made up of a number of characters,
  e.g. name, a date or an amount.

• Record: A record is made up of a number of
  related fields, e.g. .a customer record, or an
  employee payroll record.
Types of Files
• Depending on the nature and the
  permanency of data, files can be classified
  as follows:
 Master File
 Transaction File
 Work File
 Security File
 Audit File
 Reference File
Master File
• A file that is permanent in the sense that it is
  never, apart from the time of its creation, empty.
• The normal means of updating a master file is
   Adding records,
   Amending records,
   Deleting records
• It need to be updated regularly to reflect the
  current status of an organization.
• Example:
   Employee File
Master File
• It can be subdivided in TWO types:
   Static master file:
      File describe are of a permanent or semi-
       permanent nature.
        –Products, Suppliers, Employers, etc.
   Dynamic master file:
      Files describes are of transitory nature.
        –Customer order, project files, etc.
Transaction File
• Transaction files contain data that record events.
• Records in a transaction file are placed in time
  order and are processed by a computer to
  update related master file records.
• Also known as movement file.
• Example:
   Customer’s orders for products (to update an order
   Details of price changes for products (to update the
    product file)

 Work File
• Work files are temporary files that are created
  after one stage of processing to be used in the
  next stage.
• A work file is deleted when processing is
• Work files are generated during processes that
  involve certain types of sorting and merging.
• They are also very typical of batch processing
  where a job may consist of a number of steps
  during each of which a different program is run.
• Files are used intermediate results between job
• Also known as transfer file.                     43
Security File

• These files are taken in order to provide
  back up copies, in case of loss or damage
  to current version.

 Audit File
• Audit files are a particular type of transaction file.
• They record events and enable the auditor to check
  the correct functioning of computer procedures.
• This is accomplished by storing copies of all the
  transactions, which cause the system’s master files
  to be updated.
• Example:
   Invoice number, date, cash amount for each invoice
   Date and amount of cash received
• The records being created at the time of master file
  update and accumulated.
Reference Files
• These contain data that may be required
  for reference purposes during processing
  or inquiring data.
• Also known as table file.
• Example:
  A price list, discount tables, tax tables’ etc., are
   usually stored in such files.

  Files Organization & Files Access
• Key Fields
   When files of data are created…
     needs   a means of access to particular records within those files.
     In general terms this is usually done by giving each record a
      “key” field by which the record will be organized or identified.
     Such a key is normally a unique identifier of a record and is then
      called the “Primary Key”.

   Primary Key

Files Organization & Files Access
• Key Fields
   Sometimes the primary key is made from the
    combination of two fields in which case it may be
    called a
   “Composite Key” or “Compounded Key”.

                                           48           48
Files Organization & Files Access
• Key Fields
   Any other field used for the purpose of identifying
    records, or sets of records, is called a “Secondary

Files Organization & Files Access
• System designers choose to organize, access,
  and process records and files in different ways
  depending on the type of application and the
  needs of users.
• The commonly used file organizations used in
  business data processing applications are:
   Serial file organization
   Sequential file organization
   Direct / random file organization
   Indexed / indexed sequential file organization
• The selection of a particular file organization
  depends upon the type of application.
   Files Organization & Files Access
• Serial File Organization
   There is no sequence or order to records that are stored
    in a serial file.
   They are stored in the order they are received and new
    records are added at the end of the file.
   In order to access (read or amend) a record in a serial file,
    the whole file has to be read from the beginning until the
    desired record is located.
   This form of file a normally useful for storing
    transaction data, or for storing data prior to sorting.
   To add a record to serial file, the file can be opened at the
    end and a new record appended.
   A record can be marked for deletion so that it is ignored
    when reading.
   A utility can then be used to remove marked records later.
 Files Organization & Access Method
• Sequential File Organization
   Records are organized in sequence.
   The records in the file are usually arranged into ascending
    or descending order, based on the attribute value.
   Records can only be accessed sequentially.
   Key field in a record in a sequential file identifies which
    record was retrieved.
   It is important that the sequence key uniquely identifies the
    record, otherwise duplicates may exist and the sequence of
    records remains unpredictable and uncertain.
   To add a record to a sequential file, each record must be
    copied over to a new file, adding in the new record at the
    appropriate place.
   A record can be marked for deletion so that it is ignored
    when reading. A utility can then be used to remove marked 52
    records later.
Files Organization & Access Method
 • Sequential

Files Organization & Access Method
• Advantage:
  Easy to organize, maintain, and
• Disadvantage:
  The entire sequential file may need to
    be read just to retrieve and update few
• Storage media:
  Magnetic tape, magnetic disk, and
    optical disk.
Files Organization & Access Method
• Direct File Organization
   Also called random
   Where the record is stored is determined by the key
    field value
   The records are not stored in any particular sequence.
   Instead, a mathematical relationship is established
    between the record key value and the address of its
    physical location on the storage media.
   Can be retrieved in random order.
   Must use secondary storage with random access
  Direct File Organization is very suitable for random
    processing or when a small proportion of the total
    number of records in a file is to be processed.   55
Files Organization & Access Method
  • Direct (cont’d)

                          56     56
Files Organization & Access Method
 • Advantages:
    The access to, and retrieval of a record is
     quick and direct.
    Any record can be located and retrieved
     directly in a fraction of a second without the
     need for a sequential search of the file.
 • Disadvantage:
    May be less efficient in the use of storage

  Files Organization & Access Method
• Indexed File Organization
    Also called indexed sequential
    Actually two files - a data file and an index file
   Data file is sequential with records in increasing order by
     key field
    Index file has one record per record in the data file
    Index record contains the key value and location of each
     record in the data file.
    In addition index with pointers to certain data records in the
    The index, which helps in locating a record in the data file,
     basically consists of two columns:
      The   first column contains the value of the record key.
      The second, a pointer to the physical location of the record in the
Files Organization & Access Method
• Indexed (cont’d)

                          59     59
Files Organization & Access Method
• Indexed File Organization
   Can be accessed sequentially or randomly,
    requiring random storage
   Indexed sequential files are used for
    applications that sometimes require the file
    to be accessed randomly, and sometimes
    require the file to be processed sequentially.

File Maintenance
• The data is made available by organizing the
  data in one of the three ways.
• Files are kept current by regular updates
• Updates can mean
   Adding records
   Deleting records
   Changing data in records
• The process of adding, adjusting and removing
  records from a file is called file maintenance.

File Maintenance
• File Updation
  File updation means bringing the
    information in the file to reflect the
    current position.
  In other words it is making the data
  The updation method depends on the
    file storage media.

File Maintenance
• File Updation
 When files on magnetic tapes are updated a new
  file is generated.
 The old file can be preserved for future use-in
  case the new file gets destroyed or damaged.
 The technique of updation is also called the
  “Grand father–father-son” technique.
 This name is given due to the fact that as a result
  of updation a generation of files is produced.
 This provides an automation security against data
  corruption on files and an automatic audit trail.
   File Maintenance

Updating Files Held on Magnetic Tape (Magnetic Tape Updating)
File Maintenance
• Hit Rate
  This refers to the percentage of records
   updated against the total records in a
  For e.g. if only 50 records where
   updated in a master file containing 200
   records, then the hit rate will be:
     (50/200) *100 = 25.

 File Maintenance
• Fixed Length Format File
   Every single field and record in a file will have
    a defined length.
   As a result all records in the file will be of the
    same length.
   However, in each record, within fields, there
    may be blank spaces.
   Databases and 4th generation language files
    usually use this type of file, as it is easy to
    maintain and update data.
File Maintenance
• Variable Length Format File

 The record length will vary due to more
  or less fields in each record and due to
  some fields containing more data.

File Maintenance
• Data Backup and Recovery
  The designers of computer systems need to
    provide a reasonable backup facility that will
    restore the lost data in the event of an
    emergency, at a reasonable cost.
  There are a variety of methods used for backing
    up computer data.
  Three basic systems, which reflect a
    responsible approach to preserving data, are:
     Periodic full backups;
     Incremental backups;
     File generation backups;                 68
 File Maintenance
• Data Backup and Recovery
 Periodic Full Backups
  The most common method used to ensure
   that data is not lost is to make periodic full
   This is done by making regular copies of
   all data files and storing them in a secure
   The    designed time period between
   backups will be dependent on the amount of
   data being processed through the computer
   system.                                  69
File Maintenance
• Data Backup and Recovery
 Incremental Backups
   In  this approach, for most backups only
    the changes since the previous
    backup are recorded.
   The units of backup may be complete
    files, or a record within files.
   At intervals full backup is taken.

  File Maintenance
• Data Backup and Recovery
  File Generation Backups
    Where    a file updating process produces a new
     master file and leaves the old one intact, the
     generation system of a file backup may be used.
    This means that when a file is updated, the
     previous version is retained, along with any
     associated transactions.
    If the latest version of the master file is damaged
     or destroyed, it can be recovered by updating the
     previous master files with the corresponding
     transaction file.                               71
  File Maintenance
• Data Recovery
• In the previous example of file generation backups, it can
  be seen that recovery is possible up to the point of the last
  backup being taken.
• A transaction log is often used to record all transactions
  following a backup, so that when a failure occurs, the
  system can be restored to the state that it was in
  immediately prior to the failure.
• Checks must be taken on a regular basis so that recovery
  is indeed possible, and that staff involved are aware of the
  backup processes used, the physical location of backup
  storage media, and its identification.
Information Processing Methods
• Batch Processing
  The prominent feature of batch processing is
    that the data is collected over a defined
    period of time, processed together and the
    information is obtained.
  The period of data collection can vary.
  For e.g. the end of the day, end of the week,
    end of the month or until a sufficient number of
    data are collected.

 Information Processing Methods
• Batch Processing                    Disadvantages
  Advantages                           Data       capture    and
                                         transmission are done
     Large volumes of data are          manually, which is slow.
      processed at once.
                                        The data may not be
     This makes good use of
                                         accurate. Thus usually a
      the    computer’s       time       verification and data
      because of off-line storage        control procedures has
      and operations.                    to be implemented with
     Processing    efficiency is        such a system.
      considered to be more             Information is not up to
      important     than     rapid       date.
      turnaround of results.
                                        Distribution of results is
                                         done manually.

   Information Processing Methods
• Batch Processing

                           75       75
 Information Processing Methods
• On-line Processing
  In on-line processing, as soon as the data is received,
   they are entered to the computer, verification is and done
   validation is performed and the semi-processed data is
   stored for further processing.
  The updation and production of information is same as in
  Hence this mode of data processing is slightly faster than
   the batch processing.
  On-line processing system feature random and rapid input
   of transactions and immediate and direct access to record
   contents as and when needed.
 Information Processing Methods
• On-line Processing
  Advantages:
     A fast response time.
     Validity checks can be made on transactions at the
      time they are entered.
     Mistakes picked up immediately.
     Ensuring that decisions are based on a more complete
      set of information.
  Example:
     The customer credit status may be checked. The
      operator may then be given the option to accept an
      order, despite an unsatisfactory credit position, or to
      have that order put on rejected list to be reported later.
Information Processing Methods
• Real Time Processing
  As soon as the data is entered on-line, verification and
   validation is performed, data processed, files are
   updated, and information is generated and distributed to
   those who require.
  Hence for single input, the entire processing cycle is
   carried out.
  Real-time means immediate response from the
  A system in which a transaction accesses and updates a
   file quickly enough to affect the original decision making
   is called a real time system.
Information Processing Methods
• Real Time Processing
 A real time system may be described as
  an on-line processing system with
  severe time limitations.
 It may be noted that a real time system
  uses on-line processing, but an on-line
  system need not necessarily operate in
  real time mode.

Information Processing Methods
• Real Time Processing

 Information Processing Methods
• Real Time Processing
  Provides a fast turnaround of information
  Provides up to date information
  Helpful in decision support systems.
  Hardware and software are very expensive.
  Difficult to plan and design the system
  Poor testing might provide incorrect or in accurate

Information Processing Methods
• Real Time Processing
  Examples:
    Airtraffic control system
    Reservation system
    Systems that provide immediate updating of
     customer accounts in saving banks.

 Information Processing Methods
• Centralized Processing
  This is a technique where the data is processed in one
   central location.
  For e.g. consider a group of companies which has got
   branches in different areas.
     If the company adopts a centralized processing system, then the
      data will be collected from various departments and brought to the
      central office for processing.
     After processing the results are distributed to the respective
  To transfer data and information from and to the central
   office the branches can use an electronic data
   communication method or use a manual data exchange
Information Processing Methods
Centralized Processing
 Since the processing is done from a central
   location proper control and standards can be
 The central location can monitor the operations
   of all branches and assess their performance.
 Less staff is required when compared to
   decentralized or distributed processing.
 Cost of hardware and software will be less.

 Information Processing Methods
• Centralized Processing
• Disadvantages
  Branches have no flexibility to obtain special
    information or use processing methods suitable
    for branch operations.
  The processing of data takes time and as a
    result immediate information is not available.
  There is high risk in the event of a central office
    computer failure, because all branches are
    depending on the central computer.

                                       85        85
 Information Processing Methods
• Decentralized Processing
 In this method each branch will have a
  computer system, as a result it will be
  able to process its own data.
 The processed information may be
  distributed between the branches and
  the central office.

Information Processing Methods
• Centralized Vs. Decentralized

                                  87   87
Information Processing Methods
• Decentralized Processing
 Branches have flexibility of using the most
  appropriate system that will suit the branch
 Information can be processed faster and each
  branch has access to its own information.
 Shared risk in an event of failure of a branch
  or a central office computer.

 Information Processing Methods
• Decentralized Processing
 Since each branch processes its own data,
  standards may not be maintained.
 The central office will have no control over the
  branch activities.
 A large amount of staff and technical personnel
  are required.
 There can be duplication of data in each of the
 Information Processing Methods
• Distributed Processing
    This is an extension of decentralized processing where
     the branches will have their own information
     processing systems and databases, interconnected to
     share data, information and processing functions.
    The following definition can be considered :
       A distributed system is one which there are several
        autonomous but, interacting processors and/or data
        stored at different geographical locations.
    The development of database and network
     technologies has contributed to the growth of this
 Information Processing Methods
• Distributed Processing
     This will enable the sharing and transfer of
       databases from one location to another
       (mobile data bases), performing some
       processing activities on behalf of other
       locations and reducing risk of data loss.
     Communication problems and computer
       security will be the main threats the firm will
       have to face.                               91
Thank You!!!!


Shared By:
Description: Fundamentals of IT