# chapter4

Document Sample

```					      Data
&
Data Organization

Chapter 04
Outline
•   Data Representation
 Number Systems
 Character Codes
•   Computer File Concepts (Data Organization)
 Elements of a Computer File
 Types of Files
 File Organization & Access Methods
 File Maintenance
•   Information Processing Methods
 Batch Processing
 On-line Processing
 Real Time Processing
 Centralized Processing
 Decentralized Processing                  2
 Distributed Processing
Data Representation
• Data appears in several forms,
…including graphic images, pictures, and sound.
• However…
There are TWO basic types of data, which are
namely;
 characters and numbers.
Characters include letters and special symbols.
 Example: SLIIT – Metro
Numbers are processed using arithmetic operations
such as add, subtract, multiply and divide.
 In this case we assign values to numbers and the
processing results in new values.
3
Data Representation
• People represent data by using a group of
characters, such as a group of letters for a
name, or group of digits for a quantity.
• Computers ???
Cannot represent this data in the same form
that people use !!!
How???
The data stored in NUMBER SYSTEMS.

4
Number Systems
There are FOUR important number
systems.
All numbering systems are based on
TWO concepts:
1. Absolute value
2. Positional value

5
Number Systems
• Decimal (Integer) Number System:
 People use this number systems.
 Absolute values:        0 to 9
 Positional values allied to powers of :   10

• Binary Number System:
 Computer uses this numbering systems.
 Absolute values:        0 to 1
 Positional values allied to powers of :   2

6
Number Systems
• Octal Number System:
 Absolute values:        0 to 7
 Positional values allied to powers of :   8

 Most complex numbering system.
 Absolute values:        0 to 15
 Positional values allied to powers of :   16

7
Number Systems
• Integer
 The simplest type of numbers that we want to
store and manipulate as data is the integer.
 Whole numbers without decimal places.
 They are used to represent things that cannot be
divided into smaller simpler things.
   e.g. number of people in an office, number of houses in
a city.
 Could be either positive or negative no’s
 e.g. int

8
Integers

Can be stored 100% accurately in a
computer            Least significant digit

Most significant digit
9
Data      Representation
• Computers represent data (integer) using
patterns of ON-OFF states in a series of
electronic circuits.

• Computer stores data converting the data to this
two-state representation…
 Which is called binary representation.
10
Data Representation
• To show data in binary representation on
paper….
people use…
the digit 1 for the “ON” state, and
the digit 0 for the “OFF” state.
• The digits 1 and 0 are called:
BINARY DIGITs or BITs.

11
Bits & Bytes
• Bit is the smallest piece of data that can
be recognized and used by digital
computers.
• A bit can either be a 1 or a 0.
• Byte is a grouping of eight bits. e.g.
11000110.
• Nibble is half a byte. e.g. 1100.
• Byte is used as the basic unit of measuring
the size of memories.

12
Pure Binary Form
• Sign & Magnitude Format
 The decimal number…
 First converted into binary and
 Select an area of one or more bytes depending on the number of
bits for the decimal value.
 The first bit of the area is used to store the Sign.
– By convention;
• 0 is positive, and
• 1 is negative.
• Note: Negative numbers are usually stored in such a way that
when they are added to their positive equivalents - this is
called “twos complement” representation.
 Example:                                         One Byte Area
– Decimal 8510            Binary 1010101
0      1010101
13
Sign    Magnitude
• Octal – Base 8 and Hexadecimal – Base
16 numbers are also used in Computer
Science.

• These two number bases can be used to
represent binary numbers in a short form
notation.
i.e. Conversion between binary and
hexadecimal, octal and vice versa can be
done without any calculation.
14
Binary Coded Decimal (BCD)
• The values of 0 to 9 are stored in four bit
groups.
• This means that the number stored is not
in the integer format, a special care needs
to be taken by the processor
if a value is stored in this way, as the
processing required in mathematical
instructions is more complex.

15
BCD (Binary Coded Decimals)
•0   0000
•1   0001
•2   0010
•3   0011
•4   0100
•5   0101
•6   0110
•7   0111
•8   1000
•9   1001                    16
BCD (Binary Coded Decimals)
• Here each digit of a number is stored using a bit
pattern.

532

0101 0011 0010

17        17
Real Numbers

Cannot be stored 100% accurately in a
computer.

18
Real Numbers
Real Numbers can be stored in two ways:

• Fixed Point Representation
 12.45
 670.75
• Floating Point Representation
 0.125 x 104
 6.24 x 103

19
Floating Point
• It is used when it is necessary to measure
a value that can change smoothly and
continuously.
• They are often recognizable by the
presence of a decimal point or a fraction.

• E.g. are temperature, length, weight and
voltage.

20
Fixed Point Representation
• This is the normal way numbers are
represented in day to day life.

7890.35     7867.456

• There are problems representing very large
and very small numbers this way.

568900000000000.678
0.0000000000000000567
21
Fixed Point Representation
• Floating-point numbers usually require more
bytes to represent than integers.
• They are based on logarithms and thus contain
three parts.
• Sign – for negative and positive numbers.
• Exponent – representing the power that a base
number is raised to.
• Mantissa – a number that is multiplied by the
exponent.
• The number is encoded and stored as:
M * be
22
Floating Point Representation

Exponent

4.345 x        10 4

Base

23
Floating Point Representation
In computing the mantissa is taken as
a fraction.
Exponent

0.4345 x            10 5

Base
24
Representing Characters
• Since the production of the first electronic computers in
the late 1940s,
there have been various methods developed for
representing character representation in computer
systems.
• Characters are also stored in binary format.
• A chart is used to assign a number to each character.
• The common codes used today:
American      Standard      Code      for   Information
Interchange (ASCII) -7 bit code
Extended Binary Coded Decimal Interchange Code
(EBCDIC) - 8 bit code
Universal Code (UNICODE), a new Worldwide
25
character Standards - 16 bit code
ASCII
• The most common way of representing numbers in
personal computers.
• The name ASCII is pronounced “as-key”.
• A character consists of 7 bits
• 128 characters/combinations of 7 bits
 a-z, A-Z, 1-9, punctuation and some special characters.
• Computers use 8 bits for 256 characters
• Each character uses a different binary number.
• E.g, the name JOHN in ASCII is:
J       O       H       N
1001010 1001111 1001000 1001110                   26
ASCII
• Although, ASCII is a seven-bit code, computers use
an eight bit version.
ASCII used in:
All microcomputers,    AC
many
minicomputers,
mainframe
computers, and    65     67
supercomputers.

01000001        01000011
27
EBCDIC
• Mainly used in Mini and Main Frame computers.
• A character consists of 8 bits
• 256 characters/combinations of 8 bits
• The name of this code is pronounced “eb-si-dick”
• The chart is slightly different from the ASCII chart.
• E.g, the name JOHN in EBCDIC is:
J       O        H        N
11010001 11010110 11001000 11010101
• Notice that 32 bits are needed for the name, eight
for each character.
28
ASCII vs. EBCDIC
• Data representation problems
ASCII and EBCDIC computers cannot
communicate without special HW/SW
256 characters may not be enough in
the future
Potential successor is 16 bit Unicode

29
UNICODE
• In an effort to create a single code for all
characters,
A 16-bit code has been developed.
• With 16-bits, there are 65,536 combinations.
Enough for all the character used in all the alphabets
and writing systems in the world.
• Developed as the need arose to support many
other languages.
• The 16 bit code is used to even support Asian
languages.
• Although not widely used yet…
It may some day be the standard code on all
computers.                                       30
Other Simple Data Types
• Number and character are both examples of
very simple forms of data.
• Computers understand these forms of data and
can perform operations on them directly.
• Sometimes other simple forms of data are also
used.
• One example that you may have already
encountered is Boolean data - data that
represents the values True and False.

31
Data Organization
• A file holds data that is required for providing
information.
 I.e.    it   contains     a    collection   of related
information…which is processed as a single unit that
is further divided into records and fields.
• Some files are processed at regular intervals to
provide this information (e.g. payroll file) and
• Others will hold data that is required at regular
occurrences (e.g. a file containing prices of
items).

32
Data Organization
• From what viewpoints is a file considered
in terms of its organization???
There are TWO common ways of
viewing files:
Logical Files
Physical Files

33
Logical File
• A “logical file” is a file viewed in terms of:
what data items its records contain and
what processing operations may be
performed upon the file.
The user of the file will normally adopt
such a view.
• A logical file can usually give rise to
number of alternative physical file
implementations.
34
Physical File
• A “physical file” is a file viewed in terms of:
how the data is stored on a storage
device such as a magnetic disk and
how the processing operations are

35
Elements of a Computer File
• File is the simplest way to store data
A file is made up of records
A record is made up of fields
A field is made up of characters
• Records are uniquely identified by a key (also
called a key field in each record)
• Key fields are coded fields of characters

36
Elements of a Computer File

37
Elements of a Computer File
• Character: A character is the smallest element
in a file and can be alphabetic, numeric or
special.

• Field: An item of data within a record is called a
field – it is made up of a number of characters,
e.g. name, a date or an amount.

• Record: A record is made up of a number of
related fields, e.g. .a customer record, or an
employee payroll record.
38
Types of Files
• Depending on the nature and the
permanency of data, files can be classified
as follows:
Master File
Transaction File
Work File
Security File
Audit File
Reference File
39
Master File
• A file that is permanent in the sense that it is
never, apart from the time of its creation, empty.
• The normal means of updating a master file is
by:
 Amending records,
 Deleting records
• It need to be updated regularly to reflect the
current status of an organization.
• Example:
 Employee File
40
Master File
• It can be subdivided in TWO types:
Static master file:
File describe are of a permanent or semi-
permanent nature.
Example:
–Products, Suppliers, Employers, etc.
Dynamic master file:
Files describes are of transitory nature.
Example:
–Customer order, project files, etc.
41
Transaction File
• Transaction files contain data that record events.
• Records in a transaction file are placed in time
order and are processed by a computer to
update related master file records.
• Also known as movement file.
• Example:
 Customer’s orders for products (to update an order
file)
 Details of price changes for products (to update the
product file)

42
Work File
• Work files are temporary files that are created
after one stage of processing to be used in the
next stage.
• A work file is deleted when processing is
complete.
• Work files are generated during processes that
involve certain types of sorting and merging.
• They are also very typical of batch processing
where a job may consist of a number of steps
during each of which a different program is run.
• Files are used intermediate results between job
steps.
• Also known as transfer file.                     43
Security File

• These files are taken in order to provide
back up copies, in case of loss or damage
to current version.

44
Audit File
• Audit files are a particular type of transaction file.
• They record events and enable the auditor to check
the correct functioning of computer procedures.
• This is accomplished by storing copies of all the
transactions, which cause the system’s master files
to be updated.
• Example:
 Invoice number, date, cash amount for each invoice
raised.
 Date and amount of cash received
• The records being created at the time of master file
update and accumulated.
45
Reference Files
• These contain data that may be required
for reference purposes during processing
or inquiring data.
• Also known as table file.
• Example:
 A price list, discount tables, tax tables’ etc., are
usually stored in such files.

46
Files Organization & Files Access
• Key Fields
 When files of data are created…
 needs   a means of access to particular records within those files.
 In general terms this is usually done by giving each record a
“key” field by which the record will be organized or identified.
 Such a key is normally a unique identifier of a record and is then
called the “Primary Key”.

Primary Key

47
Files Organization & Files Access
• Key Fields
 Sometimes the primary key is made from the
combination of two fields in which case it may be
called a
 “Composite Key” or “Compounded Key”.

48           48
Files Organization & Files Access
• Key Fields
 Any other field used for the purpose of identifying
records, or sets of records, is called a “Secondary
Key”.

49
Files Organization & Files Access
• System designers choose to organize, access,
and process records and files in different ways
depending on the type of application and the
needs of users.
• The commonly used file organizations used in
 Serial file organization
 Sequential file organization
 Direct / random file organization
 Indexed / indexed sequential file organization
• The selection of a particular file organization
depends upon the type of application.
50
Files Organization & Files Access
• Serial File Organization
 There is no sequence or order to records that are stored
in a serial file.
 They are stored in the order they are received and new
records are added at the end of the file.
 In order to access (read or amend) a record in a serial file,
the whole file has to be read from the beginning until the
desired record is located.
 This form of file a normally useful for storing
transaction data, or for storing data prior to sorting.
 To add a record to serial file, the file can be opened at the
end and a new record appended.
 A record can be marked for deletion so that it is ignored
 A utility can then be used to remove marked records later.
51
Files Organization & Access Method
• Sequential File Organization
 Records are organized in sequence.
 The records in the file are usually arranged into ascending
or descending order, based on the attribute value.
 Records can only be accessed sequentially.
 Key field in a record in a sequential file identifies which
record was retrieved.
 It is important that the sequence key uniquely identifies the
record, otherwise duplicates may exist and the sequence of
records remains unpredictable and uncertain.
 To add a record to a sequential file, each record must be
copied over to a new file, adding in the new record at the
appropriate place.
 A record can be marked for deletion so that it is ignored
when reading. A utility can then be used to remove marked 52
records later.
Files Organization & Access Method
• Sequential

53
Files Organization & Access Method
Easy to organize, maintain, and
understand.
The entire sequential file may need to
be read just to retrieve and update few
records.
• Storage media:
Magnetic tape, magnetic disk, and
optical disk.
54
Files Organization & Access Method
• Direct File Organization
 Also called random
 Where the record is stored is determined by the key
field value
 The records are not stored in any particular sequence.
 Instead, a mathematical relationship is established
between the record key value and the address of its
physical location on the storage media.
 Can be retrieved in random order.
 Must use secondary storage with random access
capabilities.
Direct File Organization is very suitable for random
processing or when a small proportion of the total
number of records in a file is to be processed.   55
Files Organization & Access Method
• Direct (cont’d)

56     56
Files Organization & Access Method
quick and direct.
Any record can be located and retrieved
directly in a fraction of a second without the
need for a sequential search of the file.
May be less efficient in the use of storage
space.

57
Files Organization & Access Method
• Indexed File Organization
 Also called indexed sequential
 Actually two files - a data file and an index file
Data file is sequential with records in increasing order by
key field
 Index file has one record per record in the data file
 Index record contains the key value and location of each
record in the data file.
 In addition index with pointers to certain data records in the
file
 The index, which helps in locating a record in the data file,
basically consists of two columns:
 The   first column contains the value of the record key.
 The second, a pointer to the physical location of the record in the
58
file.
Files Organization & Access Method
• Indexed (cont’d)

59     59
Files Organization & Access Method
• Indexed File Organization
Can be accessed sequentially or randomly,
requiring random storage
Indexed sequential files are used for
applications that sometimes require the file
to be accessed randomly, and sometimes
require the file to be processed sequentially.

60
File Maintenance
• The data is made available by organizing the
data in one of the three ways.
 Deleting records
 Changing data in records
records from a file is called file maintenance.

61
File Maintenance
• File Updation
File updation means bringing the
information in the file to reflect the
current position.
In other words it is making the data
current.
The updation method depends on the
file storage media.

62
File Maintenance
• File Updation
When files on magnetic tapes are updated a new
file is generated.
The old file can be preserved for future use-in
case the new file gets destroyed or damaged.
The technique of updation is also called the
“Grand father–father-son” technique.
This name is given due to the fact that as a result
of updation a generation of files is produced.
This provides an automation security against data
corruption on files and an automatic audit trail.
63
File Maintenance

Updating Files Held on Magnetic Tape (Magnetic Tape Updating)
64
File Maintenance
• Hit Rate
This refers to the percentage of records
updated against the total records in a
file.
For e.g. if only 50 records where
updated in a master file containing 200
records, then the hit rate will be:
 (50/200) *100 = 25.

65
File Maintenance
• Fixed Length Format File
Every single field and record in a file will have
a defined length.
As a result all records in the file will be of the
same length.
However, in each record, within fields, there
may be blank spaces.
Databases and 4th generation language files
usually use this type of file, as it is easy to
maintain and update data.
66
File Maintenance
• Variable Length Format File

The record length will vary due to more
or less fields in each record and due to
some fields containing more data.

67
File Maintenance
• Data Backup and Recovery
The designers of computer systems need to
provide a reasonable backup facility that will
restore the lost data in the event of an
emergency, at a reasonable cost.
There are a variety of methods used for backing
up computer data.
Three basic systems, which reflect a
responsible approach to preserving data, are:
Periodic full backups;
Incremental backups;
File generation backups;                 68
File Maintenance
• Data Backup and Recovery
Periodic Full Backups
The most common method used to ensure
that data is not lost is to make periodic full
backups.
 This is done by making regular copies of
all data files and storing them in a secure
location.
 The    designed time period between
backups will be dependent on the amount of
data being processed through the computer
system.                                  69
File Maintenance
• Data Backup and Recovery
Incremental Backups
In  this approach, for most backups only
the changes since the previous
backup are recorded.
The units of backup may be complete
files, or a record within files.
At intervals full backup is taken.

70
File Maintenance
• Data Backup and Recovery
 File Generation Backups
 Where    a file updating process produces a new
master file and leaves the old one intact, the
generation system of a file backup may be used.
 This means that when a file is updated, the
previous version is retained, along with any
associated transactions.
or destroyed, it can be recovered by updating the
previous master files with the corresponding
transaction file.                               71
File Maintenance
• Data Recovery
• In the previous example of file generation backups, it can
be seen that recovery is possible up to the point of the last
backup being taken.
• A transaction log is often used to record all transactions
following a backup, so that when a failure occurs, the
system can be restored to the state that it was in
immediately prior to the failure.
• Checks must be taken on a regular basis so that recovery
is indeed possible, and that staff involved are aware of the
backup processes used, the physical location of backup
storage media, and its identification.
72
Information Processing Methods
• Batch Processing
The prominent feature of batch processing is
that the data is collected over a defined
period of time, processed together and the
information is obtained.
The period of data collection can vary.
For e.g. the end of the day, end of the week,
end of the month or until a sufficient number of
data are collected.

73
Information Processing Methods
 Advantages                           Data       capture    and
transmission are done
 Large volumes of data are          manually, which is slow.
processed at once.
 The data may not be
 This makes good use of
accurate. Thus usually a
the    computer’s       time       verification and data
because of off-line storage        control procedures has
and operations.                    to be implemented with
 Processing    efficiency is        such a system.
considered to be more             Information is not up to
important     than     rapid       date.
turnaround of results.
 Distribution of results is
done manually.

74
Information Processing Methods
• Batch Processing

75       75
Information Processing Methods
• On-line Processing
 In on-line processing, as soon as the data is received,
they are entered to the computer, verification is and done
validation is performed and the semi-processed data is
stored for further processing.
 The updation and production of information is same as in
batch.
 Hence this mode of data processing is slightly faster than
the batch processing.
 On-line processing system feature random and rapid input
contents as and when needed.
76
Information Processing Methods
• On-line Processing
 A fast response time.
 Validity checks can be made on transactions at the
time they are entered.
 Mistakes picked up immediately.
 Ensuring that decisions are based on a more complete
set of information.
 Example:
 The customer credit status may be checked. The
operator may then be given the option to accept an
order, despite an unsatisfactory credit position, or to
have that order put on rejected list to be reported later.
77
Information Processing Methods
• Real Time Processing
 As soon as the data is entered on-line, verification and
validation is performed, data processed, files are
updated, and information is generated and distributed to
those who require.
 Hence for single input, the entire processing cycle is
carried out.
 Real-time means immediate response from the
computer.
 A system in which a transaction accesses and updates a
file quickly enough to affect the original decision making
is called a real time system.
78
Information Processing Methods
• Real Time Processing
A real time system may be described as
an on-line processing system with
severe time limitations.
It may be noted that a real time system
uses on-line processing, but an on-line
system need not necessarily operate in
real time mode.

79
Information Processing Methods
• Real Time Processing

80
Information Processing Methods
• Real Time Processing
 Provides a fast turnaround of information
 Provides up to date information
 Helpful in decision support systems.
 Hardware and software are very expensive.
 Difficult to plan and design the system
 Poor testing might provide incorrect or in accurate
information

81
Information Processing Methods
• Real Time Processing
 Examples:
Airtraffic control system
Reservation system
Systems that provide immediate updating of
customer accounts in saving banks.

82
Information Processing Methods
• Centralized Processing
 This is a technique where the data is processed in one
central location.
 For e.g. consider a group of companies which has got
branches in different areas.
 If the company adopts a centralized processing system, then the
data will be collected from various departments and brought to the
central office for processing.
 After processing the results are distributed to the respective
branches.
 To transfer data and information from and to the central
office the branches can use an electronic data
communication method or use a manual data exchange
methods.
83
Information Processing Methods
Centralized Processing
Since the processing is done from a central
location proper control and standards can be
maintained.
The central location can monitor the operations
of all branches and assess their performance.
Less staff is required when compared to
decentralized or distributed processing.
Cost of hardware and software will be less.

84
Information Processing Methods
• Centralized Processing
Branches have no flexibility to obtain special
information or use processing methods suitable
for branch operations.
The processing of data takes time and as a
result immediate information is not available.
There is high risk in the event of a central office
computer failure, because all branches are
depending on the central computer.

85        85
Information Processing Methods
• Decentralized Processing
In this method each branch will have a
computer system, as a result it will be
able to process its own data.
The processed information may be
distributed between the branches and
the central office.

86
Information Processing Methods
• Centralized Vs. Decentralized

87   87
Information Processing Methods
• Decentralized Processing
Branches have flexibility of using the most
appropriate system that will suit the branch
operations.
Information can be processed faster and each
Shared risk in an event of failure of a branch
or a central office computer.

88
Information Processing Methods
• Decentralized Processing
Since each branch processes its own data,
standards may not be maintained.
The central office will have no control over the
branch activities.
A large amount of staff and technical personnel
are required.
There can be duplication of data in each of the
branches.
89
Information Processing Methods
• Distributed Processing
 This is an extension of decentralized processing where
the branches will have their own information
processing systems and databases, interconnected to
share data, information and processing functions.
 The following definition can be considered :
 A distributed system is one which there are several
autonomous but, interacting processors and/or data
stored at different geographical locations.
 The development of database and network
technologies has contributed to the growth of this
technique.
90
Information Processing Methods
• Distributed Processing
This will enable the sharing and transfer of
databases from one location to another
(mobile data bases), performing some
processing activities on behalf of other
locations and reducing risk of data loss.
Communication problems and computer
security will be the main threats the firm will
have to face.                               91
Thank You!!!!

92

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 27 posted: 5/26/2010 language: English pages: 92
Description: Fundamentals of IT