Lab WBC White blood cell by MikeJenny

VIEWS: 235 PAGES: 81

									Microsoft Word
At the end of the chapter you will be able to:
 Create a new Word document;
 Add text to a document;
 Move around in the document;
 Work with and formatting text;
 Save a document;
 Create tables and work with them;
 Work with Clip Art and drawn own diagrams;
 Insert equations;
 Create document templates.

About Microsoft Word

Create a New Document
       The steps that you must to follow up are: Start – Programs – Microsoft Word.
       If you are already in a Word application: File – New (Ctrl+N).

Add Text to a Word Document
       The blinking insertion point show where the text you type will appear.
       The mouse pointer: use to click buttons, select text and so on. The pointer shape varies
with the task you are doing.
       If you want to write a paragraph it is not necessary to press ENTER at the end of the line,
Word automatically moves to the next line. Press ENTER only when you want to start a new
paragraph.

Move Around in the Document
      If you use the keyboard to navigate, you may find it is easiest to move around in the
document by pressing direction key:
 RIGHT ARROW
 LEFT ARROW
 UP ARROW
 DOWN ARROW
 HOME – the pointer will be move to the begin of the document
 END – the pointer will be move to the end of the document
 PAGE UP – go up on your document
 PAGE DOWN – go down into your document.
      You can also get where you want to go with a mouse clicks.

Select the Text
        When you want to select text into a document, according with your desire, you can:
   Drag over the text that you want to be selected: select any amount of text;
   Double-click on a word: select the word;
   Left click to the left of the line: select a line;
   Click to the left of a line and drag up or down: select multiple lines.
       To undo a mistake, such as accidentally deleting a word, click the Undo button -      or
[Edit – Undo Typing]. If you decide you want to go through with the action after all, click the
Redo button -       .

Insert and Delete Text
         If you have already practiced moving the insertion point, you know how to insert text:
just click where you want to start inserting, and then type the new text.
         To delete just a few characters, use the DELETE (delete the characters from the right of
the mouse pointer) or BACKSPACE (delete the characters from the left of the mouse pointer).
To delete much more characters, lines or paragraph, first you must to select it and then: [Edit -
Cut] or click the Delete button or activate simultaneously CRTL and x keys.

Move and Copy a Text
        To move text, select it, click the Cut button (or [Edit - Cut], or CTRL + x), click in the
new location and then click the Paste button (or [Edit - Paste], or CTRL + v).
        To copy text, select it, click the Copy button (or [Edit - Copy], or CTRL + c), click in the
new location, and then click the Paste button (or [Edit - Paste], or Ctrl + v). (You can paste the
text as many times as you want; the text remains on the Clipboard – a temporary storage location
– until you cut or copy different text).

Formatting Text

       Characters
       Apply bold formatting:
           1. Select the text you want to change.
           2. On the Formatting toolbar, click Bold.
       Apply italic formatting:
           1. Select the text you want to change.
           2. On the Formatting toolbar, click Italic.
       Change the font:
           3. Select the text you want to change.
           4. On the Formatting toolbar, click a font name in the Font box.
       Change the size of text:
           5. Select the text you want to change.
           6. On the Formatting toolbar, type or click a point size in the Font Size box. For
            example, type 10.5
       Make text superscript or subscript
           1. Select the text you want to format as superscript or subscript.
           2. On the Format menu, click Font, and then click the Font.
           3. Select the Superscript or Subscript check box
       Underline text
       Do one of the following:
       a. Add a basic underline
           1. Select the text you want to change.
           2. On the Formatting toolbar, click Underline.
       b. Add a decorative underline
           3. Select the text you want to change.
           4. On the Format menu, click Font, and then click the Font.
           5. In the Underline style list, click the style you want.
           6. In the Underline color list, click the color you want.

About Text Alignment and Spacing
       The factors that determine how text is positioned are:
   Page margins: determines the distance from the edge for all the text on a page;
   Paragraph indentation and alignment: determines how paragraphs fit between the margins;
   Spacing before and/or after paragraphs: determine how much space occurs between before
    and/or after paragraphs.
   Line spacing: determine how much space occurs between lines. The types of line spacing are
    as follows:
        o Single: Accommodates the largest font in that line, plus a small amount of extra
            space. The amount of extra space varies depending on the font used.
        o 1.5 lines: One-and-one-half times that of single line spacing.
        o Double: Twice that of single line spacing.
        o At least: Minimum line spacing that is needed to fit the largest font or graphic on the
            line.
        o Exactly: Fixed line spacing that Microsoft Word does not adjust.
        o Multiple: Line spacing that is increased or decreased by a percentage that you
            specify. For example, setting line spacing to 1.2 will increase the space by 20 percent.


        Positioning and aligning text
        Margins determine the overall width of the main text area - in other words, the space
between the text and the edge of the page ([File – Page Setup – Margins]).
        Horizontal alignment determines the appearance and orientation of the edges of the
paragraph: left-aligned [Format – Paragraph – Indents and Spacing – General Alignment - Left],
right-aligned [Format – Paragraph – Indents and Spacing – General Alignment - Right], centered
[Format – Paragraph – Indents and Spacing – General Alignment - Centered], or justified
[Format – Paragraph – Indents and Spacing – General Alignment - Justified].
        Vertical alignment determines the paragraph's position relative to the top and bottom
margins. This is useful, for example, when you are creating a title page, because you can position
text precisely at the top or center of the page, or justify the paragraphs so that they are spaced
evenly down the page.

        Changing the space between lines or paragraphs
        Line spacing determines the amount of vertical space between lines of text in a
paragraph. By default, lines are single-spaced, meaning that the spacing accommodates the
largest font in that line, plus a small amount of extra space.
        Paragraph spacing determines the amount of space above or below a paragraph. When
you press ENTER to start a new paragraph, the spacing is carried over to the next paragraph, but
you can change the settings for each paragraph.
        If a line contains a large text character, graphic, or formula, Microsoft Word increases the
spacing for that line. To space all lines evenly, use exact spacing, and specify an amount of space
that is large enough to fit the largest character or graphic in the line. If items appear cut off,
increase the amount of spacing.

Spelling and Grammar
        Some of the content in this topic may not be applicable to some languages.
        By default, Microsoft Word checks spelling and grammar automatically as you type,
using wavy red underlines to indicate possible spelling problems and wavy green underlines to
indicate possible grammatical problems.
        You can also check spelling and grammar all at once.
1. Make sure automatic spelling and grammar checking are turned on.
         On the Tools menu, click Options, and then click the Spelling & Grammar tab.
         Select the Check spelling as you type and Check grammar as you type check boxes.
2. Type in the document.
3. Right-click a word with a wavy red or green underline, and then select the command or the
spelling alternative you want.
        If you mistype a word but the result is not a misspelling (for example, typing "from"
instead of "form" or "there" instead of "their"), the spelling checker will not flag the word. To
catch those types of problems, use the grammar checker.
Save a Document
       If you save the document for the first time, from the File menu chouse Save As option (it
is necessary to specified where you want to save it, what is the document name and what kind of
type do you want to save it). You can save the document, by click the Save button from the
button bar (or CTRL + s keys) (if the document is saving for the first time, Word asks for the
name of the document).

Headers and Footers
        A header or footer is a text (such as a page number, chapter title, or date) that appears at
the top of bottom of every page. To add headers and footers, click [View - Header and Footer].
You will see boxes for entering the heaters and footers.

Bulleted and Numbered Lists
       To organize your information, it can be add a simple bulleted list or it can be created a
numbered list like: 1, 2, 3 or a), b), c); or i., ii., iii.
       Bulleted and numbered lists in Microsoft Word are easy to create:
 Create a numbered list: [Format – Bullets and Numbering –Numbered] – write and then
   confirm with ENTER.
 Create a bulleted list: [Format - Bullets and Numbering –Bulleted] and chouse kind of the
   bullets.

Table
       A table is made up of rows and columns of cells that you can fill with text and graphics.
Tables are often used to organize and present information.

       Create a table
       Microsoft Word offers a number of ways to make a table. The best way depends on how
you like to work, and on how simple or complex the table needs to be.
 Click where you want to create a table.
 On the Table menu, chouse Insert option, and then Table.
 Under Table size, select the number of columns and rows.
 Under AutoFit behavior, choose options to adjust table size.
 To use a built-in table format, click AutoFormat.

        Delete a table or clear its contents
        You can delete an entire table. You can also clear the contents of cells without deleting
the cells themselves.
Delete a table and its contents
 Click the table.
 On the Table menu, point to Delete, and then click Table.

       Merge cells into one cell in a table
       You can combine two or more cells in the same row or column into a single cell. For
example, you can merge several cells horizontally to create a table heading that spans several
columns.
    1. Select the cells you want to merge.
        To select:
        A cell: Click the left edge of the cell.
        A row: Click to the left of the row.
        A column: Click the column's top gridline or border.
        Multiple cells, rows, or columns: Drag across the cell, row, or column.
        Select multiple items that are not necessarily in order: Click the first cell, row, or
        column you want, press CTRL, and then click the next cells, rows, or columns you want.
        The entire table: Click the table move handle, or drag over the entire table.
        You can also select rows, columns, or the entire table by clicking in the table and then
using the Select commands on the Table menu, or by using keyboard shortcuts.
    2. On the Table menu, click Merge Cells.
        When you merge several cells in a column to create a vertically oriented table heading
that spans several rows, click Change Text Direction on the Tables and Borders toolbar to
change the orientation of the heading text.

        Insert text before a table
        Use this procedure to insert text before a table that is on the first line of the first page in a
document.
 To insert text before a table, click in the upper-left cell in the first row of the table, and then
    press ENTER.
If you have text in the upper-left cell, place the insertion point before the text.
 Type text.

      Copy a table
 In print layout view, rest the pointer on the upper-left corner of the table until the table move
   handle appears.
 Rest the pointer on the table move handle until a four-headed arrow appears.
 Press CTRL, and drag the copy to a new location.
You can also copy a table by selecting it and then copying and pasting.

Clip Art, Graphics and Drawing
       Add clip art or another type of graphic: [Insert – Picture –Clip Art].
       Create your own drawing: [View – Toolbars – Drawing] - it will open a drawing bar
which allows creating of own drawings.

Lines, Boxes and Shaded Backgrounds
       Add borders or shading. Select an item and then click [Format – Borders and Shadings]
and chose the border that you wand – confirm with Ok.

Multiple Columns
        If you want a text on 2 or 3 … columns first select the text, then from Format menu
chose Columns and from it chose the number of the column that you want for your text and
press apply to selected text. If you want the whole document on 2 columns, with CTRL + a keys
select whole the document and then Format – Column – set your number of column and apply
to whole document.

Insert an equation
       Some of the content in this topic may not be applicable to some languages.
1. Click where you want to insert the equation.
2. On the Insert menu, click Object, and then click the Create New tab.
3. In the Object type box, click Microsoft Equation 3.0.
   If Microsoft Equation Editor is not available, you may need to install it.
4. Click OK.
5. Build the equation by selecting symbols from the Equation toolbar




and by typing variables and numbers. From the top row of the Equation toolbar, you can choose
from more than 150 mathematical symbols. From the bottom row, you can choose from a variety
of templates or frameworks that contain symbols such as fractions, integrals, and summations.
       If you need help, click Equation Editor Help Topics on the Help menu.
6. To return to Microsoft Word, click the Word document.

Document templates
       When you are saving a template, Word switches to the User templates location (Tools
menu, Options command, File Locations tab), which by default is the Templates folder and its
subfolders. If you save a template in a different location, the template will not appear in the
Templates dialog box.

Exercises

Exercise 1
1. Create a new Word document (Start - Programs - Microsoft Word). Introduce the following
text:

CRITICAL FACTS ABOUT LUNG CANCER
Smoking Is Not the Only Risk Factor
Although long-term cigarette smoking is solidly linked with lung cancer, more than half of those
who develop the disease have never smoked or have quit. Other risk factors include contact with:
 second-hand smoke
 asbestos
 radioactive gas radon
 diesel fuel
 toxic industrial chemicals.
Those working with certain minerals such as silica and beryllium are at increased risk, as are
some patients with recurring lung inflammation (tuberculosis is an example). Finally, though no
definite connection has been made, marijuana contains many of the cancer-promoting substances
found in conventional cigarettes.

Early Detection is the Key
Lung cancer can be effectively treated provided it is found at an early stage. Unfortunately, these
cancers are able to develop, grow and even spread to other body sites over a period of years
without producing any outward signs that something is wrong. It is not uncommon for the first
symptoms to appear outside the lungs after the cancer has spread. As a rule, once persistent
symptoms of lung cancer are present, the tumor is so far advanced that a good response to
treatment cannot be expected.

New Imaging Methods are Changing the Picture
Today radiologists and cancer specialists are finding new ways of detecting and treating lung
cancers earlier and more effectively. After obtaining a chest x-ray-which is usually, the first step
in the work-up-a special type of exam called spiral computed tomography (CT) scanning could
produce three-dimensional images that clearly show the exact location, size, and shape of a lung
mass. Both CT and magnetic resonance (MR) imaging can detect cancer that has spread to other
parts of the body, a finding that will alter the treatment plan. Another new method, called
positron emission tomography, or PET, helps to determine whether a suspicious lung mass is
cancer and, if it is, how far it has progressed. Guidance by either x-rays or ultrasound makes it
easier to obtain a tissue sample - the final step in diagnosis.

2. Format your document as follows:
 Paper format: A4;
 Select the title of the document and follow the steps: Format - Font - Font (ARIAL); Font
    Style (Bold); Size (18); choose a color from Font color, Shadow and All Caps from Effects.
 Text Spacing and Alignment: select the whole document and follow the next steps: Format -
    Paragraph - Indent and Spacing: General-Alignment: Justified; Spacing-Line Spacing: 1.5
    lines.
 To emphasize the title, use a negative indent to push it out into the margin: Format -
    Paragraph - Indent and Spacing - Indentation: Left -0.5.
 Custom Margins: You can reduce the margins to fit more text on the page, or expand them to
    create a custom design. To set margins click Page Setup from the File menu: margins: top =
    30mm, bottom = 30mm, side = 15 mm; paragraph indentation is 3.5 mm.
 Page Numbers, Headers and Footers. To add headers and footers, click Headers and Footers
    (View menu). In the header area, insert the title of the document aligned to the right. In the
    footer area, insert the page number by using Insert - Page Numbers.
 Lines, Boxes and Shaded Background. Add Borders and shadings: select an item and then
    click Borders and Shading (Format menu): Borders - chose a setting, a style, a color and a
    width for the border; Shading: chose a fill. If you need a border for the whole document
    choose Page Border from Borders and Shadings.
3. Save the document as CriticalFactsAboutLungCancer.doc.

Exercise 2
1. Create a new Word document containing the following text:




Colorectal cancer is the third most common cancer in men and women. An estimated 131,000
Americans are diagnosed with this disease each year and some 55,000 die as a result of it.
Certain genetic factors play a role in the development of this cancer. The specific cause of
colorectal cancer is unknown; however, environmental, genetic, familial factors and
preexisting Ulcerative Colitis have been linked to the development of this cancer.
The survival rate in colorectal cancer is determined by the stage of the disease at the time of
diagnosis and, to some degree, to the response to treatment Following is a current survival table
for patients at various stages of this illness. The statisticians have taken into consideration the
impact of proper treatment.
                             Nr. Stage             5 year survival
                             1     Duke A          85-90 %
                             2     Duke B          60-80 %
                             3     Duke C          40-45%
                             4     Duke D          Less than 5 %
Hints:
 The title of the text is written using WordArt. To activate the Drawing bar, from the View
    menu, choose Toolbars and check Drawing. Activate the Insert WordArt button from the
    drawing bar and write the title of the document.
 Format the second paragraph as a two-column text. First, write the text, then select it and
    from the Format menu choose the Columns option.
 Insert a table in the document: Table menu – Insert – Table. Choose the proper number of
    columns and rows.
2. Save the document as ColorectalCancer.doc.


Exercise 3
1. Draw the diagram below.




                                  Transmission of blood type

Hint:
Use View - Toolbars - Drawing (for the arrows) and Format - Font - Superscript (for
superscript). The text should be inserted into text boxes.
2. Save the document as BloodTypeTransmission.doc.

Exercise 4
1. Insert the next equations into a Word document:
a.
                                       1 n
                                        ( xi  X ) 2
                                       n i 1
b.
                                                 Pr(A  B)
                                   Pr(A / B) 
                                                   Pr(B)

Hint:
From the Insert menu, choose the Object command and, subsequently, Microsoft Equation
3.0.
2. Save the file as Equations.doc.

Exercise 5
1. Create a new Word document (Start - Programs - Microsoft Word) and introduce the
following text using Times New Roman, Regular, 12:

DEPARTMENT OF RADIOLOGY
BARIUM MEAL INFORMATION SHEET

Patient name:
Appointment: Date: Time:
Have TOTHING TO EAT OR DRINK AFTER MIDNIGHT ON:

Please keep the whole day free, as the examination may be prolonged.
The purpose of this x-ray examination is to investigate your stomach and upper digestive system.
It is important that your stomach is empty and, for this reason, no food or drink is allowed for six
hours prior to the examination.
At the start of the examination, you will be given a cup of white liquid to drink.
The lights in the room will be dimmed and the radiologist will take pictures of you in various
positions, both standing and lying.
You may be given an injection in your arm to help relax your stomach.
Please expect to be in the X-ray Department for up to one hour (longer if we have been asked to
perform a follow through examination to investigate your small bowel).
You may find that the barium makes you slightly constipated, in which case it would be
advisable to take a mild laxative if it proves necessary.
If you are diabetic, do not follow any dietary instructions without discussing this with us first.
If you are unable to attend for this examination, please inform us immediately so that we may
offer the appointment slot to someone else.
For women within childbearing age (teenage - middle age) it is very important that you have had
a period within the last 28 days before your appointment date. If you have not, would you please
contact us and we will change your appointment to a date when you have.

Hints:
1. Save the document as a template. From the File menu choose Save As.
     File type: Document Template
     File name: BariumMealProgramming (BariumMealProgramming.dot)
2. You want to inform three patients about their barium investigation appointment. To create the
announcement for these patients open the My Documents folder and double-click the first
template document. Introduce data for the first patient and save the document as FirstPatient.
Create the document for the second and third patient in a similar way and save these documents
as SecondPatient and ThirdPatient, respectively.


Exercise 6
Create the next table in a Word document:

           Record no. Name Date of birth Birth Weight Day of Weight Blood Pressure
                                            (gram)     care (gram) Systolic Diastolic
                                                                   (mmHg) (mmHg)
                                                      1       3150       75       50
                                                      2       3140       80       50
           1.         Barbara 02.08.2002 3200         3       3130       75       55
                                                      4       3150       75       50
                                                      5       3170       80       55
                                                      1       2700       70       45
                                                      2       2650       70       45
          2         John   02.09.2002   2800      3        2700      75         50
                                                  4        2800      75         50
                                                  5        2850      75         50

Hints:
 Insert a table with 8 columns and 12 rows
 To make the table look as shown above, use Merge Cells from the Table menu.
Save the document as BirthTable.doc.
Microsoft Excel
At the end of the chapter you will be able to:
 Create an Excel Workbook;
 Work with Sheets;


Create a New Microsoft Excel Workbook
       Make sense of your data by organizing, calculating and analyzing it with Microsoft
Excel. You work with your data on one or more worksheets in a workbook.

Create a Workbook File
       You can create a new, blank workbook or, to save time, open an existing workbook or a
template and fill in your data.
       What is the difference between a workbook and a worksheet? A workbook is a Microsoft
Excel file containing one or more sheets; each worksheet is a ―page‖ in the workbook on which
you enter and work with data. Every workbook start with three worksheets but you can add
worksheets and other kind of sheets.

What is on the Screen?
       When you create a new workbook, the Microsoft Excel window display a worksheet with
grid of row and columns. Each box or cell has a reference indicating its row and column
location, for example C3 refer the cell which is at the intersection of column C with row 3.

Select sheets
       When you enter or change data, the changes affect all selected sheets. These changes may
replace data on the active sheet and other selected sheets.
To select                Do this
A single sheet           Click the sheet tab.
                         If you don't see the tab you want, click the tab scrolling buttons to display the tab, and
                         then click the tab.
Two or more adjacent Click the tab for the first sheet, and then hold down SHIFT and click the tab for the
sheets                   last sheet.
Two or more nonadjacent Click the tab for the first sheet, and then hold down CTRL and click the tabs for the
sheets                   other sheets.
All sheets in a workbook Right-click a sheet TAB, and then click Select All Sheets on the shortcut menu.

If sheet tabs have been color-coded, the sheet tab name will be underlined in a user-specified
color when selected. If the sheet tab is displayed with a background color, the sheet has not been
selected.

Rename a sheet
1. To rename the active sheet, on the Format menu, point to Sheet and then click Rename.
2. Type the new name over the current name.

Add More Sheets to the Workbook
       To organize your data, you can add more sheets to a workbook. Another kind of sheet
you can add is a chart sheet, which displays data graphically. The number of sheets you can add
to workbook is limited only by available system memory using: Insert – Worksheet.
Give workbook sheets meaningful names. Named tabs can help you locate sheets in your
workbook. Double-click the tab at the button of the window and type the name you want.
Enter data in worksheet cells (numbers, text, a date or a time)
   1. Click the cell where you want to enter data.
   2. Type the data and press ENTER or TAB.
      Numbers and text in a list
          1. Enter data in a cell in the first column, and then press TAB to move to the next
              cell.
          2. At the end of the row, press ENTER to move to the beginning of the next row.
          3. If the cell at the beginning of the next row doesn't become active, click Options
              on the Tools menu, and then click the Edit tab. Under Settings, select the Move
              selection after Enter check box, and then click Down in the Direction box.
      Dates Use a slash or a hyphen to separate the parts of a date; for example, type 9/5/2002
      or 5-Sep-2002. To enter today's date, press CTRL+; (semicolon).
      Times To enter a time based on the 12-hour clock, type a space and then a or p after the
      time; for example, 9:00 p. Otherwise, Microsoft Excel enters the time as AM. To enter
      the current time, press CTRL+SHIFT+: (colon).

Work in Cells and Ranges
When you work with data in worksheet cells – for example entering, copying, deleting or
formatting data – first you select the area to work in. The selection can be a single cell or a range
of cells.
After making your selection, perform the action you want. Data you enter and work with can be
text, such a list of name and addresses; values, such blood pressure or a formula that calculates a
value.
Use the TAB key to move to the next cell to the right. When you reach the end of a row, press
ENTER to move to the first cell in the next row.

Formatting numbers, dates, and times
On the Tools menu, click Options, click the Edit tab, and then clear the Fixed Decimal check
box.
To remove decimal points from numbers you've already entered, you can multiply the numbers
by a power of 10. In an empty cell, enter a number such as 10, 100, or 1,000, depending upon the
number of decimal places you want to remove. For example, enter 100 in the cell if the numbers
contain two decimal places and you want whole numbers. Copy the cell to the Clipboard
and select a range of adjacent cells that contain numbers with decimal places. On the Edit menu,
click Paste Special, and then click Multiply.
Numbers are not displayed or calculated as numeric values
If the numbers are aligned to the left of the cell and if you have not changed the default
alignment (General), the numbers are formatted or entered as text. To change them to numbers,
do the following:
    1. Select a blank cell that you know has the General number format.
        If you aren't sure of the cell format, click Cells on the Format menu, and then click the
        Number tab. In the Category box, click General, and then click OK.
    2. In the cell, type 1 and then press ENTER.
    3. Click the cell, and then click Copy on the Standard toolbar.
    4. Select the range of cells that contain the "text" numbers.
    5. On the Edit menu, click Paste Special, click Multiply, and then click OK.
The number in a worksheet is not the same as the number in the formula bar
The number format applied to a cell determines the way Microsoft Excel displays a number in
that cell on the worksheet. The format does not affect the cell value used in calculations, which is
displayed in the formula bar when the cell is active.
To remove number formats that may affect the displayed value, select the cells.
    1. On the Format menu, click Cells, and then click the Number tab.
    2. In the Category box, click General.
The number of custom number formats has been exceeded. If you must delete one or more of the
existing custom number formats in order to add new ones.
    1. On the Format menu, click Cells, and then click the Number tab.
    2. In the Category list, click Custom.
    3. At the bottom of the Type box, click the custom format you want to delete. Click Delete.

Formatting text
Rotated text is not displayed at the correct angle
You can obtain rotated text from Format menu, click Cells and than click the Alignment tab. If
you've saved a workbook in another file format, the rotated text format might be lost. Most file
formats do not support rotation within the full 180 degrees (+90 through – 90 degrees), which is
possible in the current version of Microsoft Excel. Earlier versions of Excel can rotate text only
at angles of +90, 0 (zero), or – 90 degrees. If the specified angle of rotation cannot be maintained
in the other file format, the text is not rotated.
Borders are not displayed the way I want
Look at adjacent cells: If you apply borders to a selected cell, the border is also applied to
adjacent cells that share a bordered cell boundary. For example, if you apply a box border
enclosing the range B1:C5, the cells D1:D5 acquire a left border.
Check which border was last applied: If you apply two different types of borders to a shared cell
boundary, the most recently applied border is displayed.
Choose the appropriate border type: A selected range of cells is formatted as a single block of
cells. If you apply a right border to the range of cells B1:C5, the border is displayed only on the
right edge of the cells C1:C5. To display interior borders, use the          button on the Borders
palette.

Enter Data Automatically
Avoid repetitive typing and save time by entering some kinds of data automatically. You can
automatically enter the same information in several cells or enter an incremental series.
For that enter the beginning of the series and select the entities and drag the fill handle. The first
of the series are filled automatically.

Modify the Data
To edit a cell’s contents, double-click it and then make the change.
Use the Cut, Copy or Paste command: first make your selection and then right click to display
the shortcut menu.
Make a mistake? Click the Undo button.
Clear data from a cell. Select the cell and press Delete.

Adjust the Spacing and Alignment of Data
To help distinguish different types of information in cells, adjust the alignment of cell contents
using the alignment buttons. You can insert rows and columns to set data or labels apart using
the Rows and Columns commands (Insert menu). Adjust the width and height of rows and
columns by dragging or double-click the line to the right of the column letter or below the row
number in the header.
Merge cells across columns. You can easily merge headings across the top of a range of cells.
Type the title in the leftmost cell in the range, select the range and then click from Format menu
Cells – Alignment – Merge cells.

About functions
Functions are predefined formulas that perform calculations by using specific values, called
arguments, in a particular order, or structure. Functions can be used to perform simple or
complex calculations.
Structure of a function
Structure. The structure of a function begins with an equal sign (=), followed by the function
name, an opening parenthesis, the arguments for the function separated by commas, and a
closing parenthesis.
Arguments. Arguments can be numbers, text, logical values such as TRUE or FALSE, arrays,
error values such as #N/A, or cell references (the set of coordinates that a cell occupies on a
worksheet. For example, the reference of the cell that appears at the intersection of column B and
row 3 is B3.). The argument you designate must produce a valid value for that argument.
Arguments can also be constants, formulas, or other functions.

Entering formulas
When you create a formula that contains a function, the Insert Function dialog box helps you
enter worksheet functions. As you enter a function into the formula, the Insert Function dialog
box displays the name of the function, each of its arguments, a description of the function and
each argument, the current result of the function, and the current result of the entire formula.

About calculation operators
Operators specify the type of calculation that you want to perform on the elements of a formula.
Microsoft Excel includes four different types of calculation operators: arithmetic, comparison,
text, and reference.
Types of operators
a. Arithmetic operators To perform basic mathematical operations such as addition,
subtraction, or multiplication; combine numbers; and produce numeric results, use the following
arithmetic operators.

                                Arithmetic operator      Meaning (Example)
                                + (plus sign)            Addition (3+3)
                                – (minus sign)           Subtraction     (3–1)
                                                         Negation (–1)
                                * (asterisk)             Multiplication (3*3)
                                / (forward slash)        Division (3/3)
                                % (percent sign)         Percent (20%)
                                ^ (caret)                Exponentiation (3^2)

b. Comparison operators You can compare two values with the following operators. When two
values are compared by using these operators, the result is a logical value either TRUE or
FALSE.

                    Comparison operator                  Meaning (Example)
                    = (equal sign)                       Equal to (A1=B1)
                    > (greater than sign)                Greater than (A1>B1)
                    < (less than sign)                   Less than (A1<B1)
                    >= (greater than or equal to sign)   Greater than or equal to (A1>=B1)
                    <= (less than or equal to sign)      Less than or equal to (A1<=B1)
                    <> (not equal to sign)               Not equal to (A1<>B1)

c. Text concatenation operator Use the ampersand (&) to join, or concatenate, one or more text
strings to produce a single piece of text.

 Text operator   Meaning (Example)
 & (ampersand)   Connects, or concatenates, two values to produce one continuous text value ("North"&"wind")
d. Reference operators Combine ranges of cells for calculations with the following operators.

Reference        Meaning (Example)
operator
: (colon)        Range operator, which produces one reference to all the cells between two references, including
                 the two references (B5:B15)
, (comma)        Union      operator,   which   combines    multiple      references    into    one    reference
                 (SUM(B5:B15,D5:D15))
(space)          Intersection operator, which produces on reference to cells common to the two references
                 (B7:D7 C6:C8)

The order in which Excel performs operations in formulas
Formulas calculate values in a specific order. A formula in Excel always begins with an equal
sign (=). The equal sign tells Excel that the succeeding characters constitute a formula.
Following the equal sign are the elements to be calculated (the operands), which are separated by
calculation operators. Excel calculates the formula from left to right, according to a specific
order for each operator in the formula.
Operator precedence
If you combine several operators in a single formula, Excel performs the operations in the order
shown in the following table. If a formula contains operators with the same precedence — for
example, if a formula contains both a multiplication and division operator — Excel evaluates the
operators from left to right.

                       Operator           Description
                       : (colon)          Reference operators
                         (single space)
                       , (comma)
                       –                  Negation (as in –1)
                       %                  Percent
                       ^                  Exponentiation
                       * and /            Multiplication and division
                       + and –            Addition and subtraction
                       &                  Connects two strings of text (concatenation)
                       = < > <= >= <>     Comparison

Use of parentheses
To change the order of evaluation, enclose in parentheses the part of the formula to be calculated
first. For example, the following formula produces 11 because Excel calculates multiplication
before addition. The formula multiplies 2 by 3 and then adds 5 to the result. ―=5+2*3‖
In contrast, if you use parentheses to change the syntax, Excel adds 5 and 2 together and then
multiplies the result by 3 to produce 21. ―= (5+2)*3‖
In the example below, the parentheses around the first part of the formula force Excel to
calculate B4+25 first and then divide the result by the sum of the values in cells D5, E5, and F5.
―= (B4+25)/SUM (D5:F5)‖


Create a formula
Formulas are equations that perform calculations on values in your worksheet. A formula starts
with an equal sign (=).
Create a simple formula
The following formulas contain operators and constants.

                                  Example formula       What it does
                                  =128+345              Adds 128 and 345
                                   =5^2                Squares 5
    1. Click the cell in which you want to enter the formula.
    2. Type = (an equal sign).
    3. Enter the formula.
    4. Press ENTER.
Create a formula that contains references or names: =A1+23
The following formulas contain relative references to and names of other cells. The cell that
contains the formula is known as a dependent cell when its value depends on the values in other
cells. For example, cell B2 is a dependent cell if it contains the formula =C2.

                Example formula     What it does
                =C2                 Uses the value in the cell C2
                =Sheet2!B2          Uses the value in cell B2 on Sheet2
                =Asset-Liability    Subtracts a cell named Liability from a cell named Asset

   1. Click the cell in which you want to enter the formula.
   2. In the formula bar (a bar at the top of the Excel window that you use to enter or edit
       values or formulas in cells or chart. Displays the constant value or formula stored in the
       active cell), type = (equal sign).
   3. Do one of the following:
           o To create a reference, select a cell, a range of cells, a location in another
               worksheet, or a location in another workbook. You can drag the border of the cell
               selection to move the selection, or drag the corner of the border to expand the
               selection.
           o To create a reference to a named range, press F3, select the name in the Paste
               name box, and click OK.
   4. Press ENTER.
Create a formula that contains a function: =AVERAGE (A1:B4)
The following formulas contain functions.

                        Example formula         What it does
                        =SUM(A:A)               Adds all numbers in column A
                        =AVERAGE(A1:B4)         Averages all numbers in the range
     1. Click the cell in which you want to enter the formula.
     2. To start the formula with the function, click Insert Function on the formula bar.
     3. Select the function you want to use. You can enter a question that describes what you
        want to do in the Search for a function box (for example, "add numbers" returns the
        SUM function), or browse from the categories in the Or Select a category box.
     4. Enter the arguments. To enter cell references as an argument, click Collapse Dialog to
        temporarily hide the dialog box. Select the cells on the worksheet, and then press
        Expand Dialog.
     5. When you complete the formula, press ENTER.

IF
Returns one value if a condition you specify evaluates to TRUE and another value if it evaluates
to FALSE.
Use IF to conduct conditional tests on values and formulas.
Syntax: IF(logical_test,value_if_true,value_if_false)
Logical_ test is any value or expression that can be evaluated to TRUE or FALSE. For
example, A10=100 is a logical expression; if the value in cell A10 is equal to 100, the expression
evaluates to TRUE. Otherwise, the expression evaluates to FALSE. This argument can use any
comparison operator..
Value_ if_ true is the value that is returned if logical test is TRUE.
Valu_ if_false is the value that is returned if logical test is FALSE.
Remarks
    Up to seven IF functions can be nested as value_if_true and value_if_false arguments to
       construct more elaborate tests. See the last of the following examples.
    When the value_if_true and value_if_false arguments are evaluated, IF returns the value
       returned by those statements.
    If any of the arguments to IF are array , every element of the array is evaluated when the
       IF statement is carried out.
    Microsoft Excel provides additional functions that can be used to analyze your data based
       on a condition. For example, to count the number of occurrences of a string of text or a
       number within a range of cells, use the COUNTIF worksheet function. To calculate a
       sum based on a string of text or a number within a range, use the SUMIF worksheet
       function.
Example
The example may be easier to understand if you copy it to a blank worksheet.
   1. Create a blank workbook or worksheet and introduce the data from the next table.
   2. In the worksheet, select cell A1, and press CTRL+V.
   3. To switch between viewing the results and viewing the formulas that return the results,
       press CTRL+` (grave accent), or on the Tools menu, point to Formula Auditing, and
       then click Formula Auditing Mode.

    A                               B
1   Cholesterol blood level         Normal Cholesterol blood level
2   250                             220
3   340
4   500
    Formula                         Description (Result)
    = IF(A2>$B$2,"ill","not ill")   Checks whether the first row is over normal cholesterol blood level (ill)
    =IF(A3>$B$3,"ill","not ill")    Checks whether the second row is over normal cholesterol level (not ill)


About charts
Charts are visually appealing and make it easy for users to see comparisons, patterns, and trends
in data. For instance, rather than having to analyze several columns of worksheet numbers, you
can see at a glance whether sales are falling or rising over quarterly periods, or how the actual
sales compare to the projected sales.
Creating charts
You can create a chart on its own sheet or as an embedded object on a worksheet. To create a
chart, you must first enter the data for the chart on the worksheet. Then select that data and use
the Chart Wizard to step through the process of choosing the chart type and the various chart
options, or use the Chart toolbar to create a basic chart that you can format later.

Create a chart
    1. Make sure the data on your worksheet is arranged properly for the type of chart you want
       to use.
       For Column, Bar, Line, Area. Surface or Radar chart
       Arrange your data in columns or in rows.
       For a Pie or Doughnut chart
       Regular pie charts have only one series of data, so you should use only one column or
       one row of data.
       For an XY scatter or Bubble chart
       Arrange your data in columns, with x values in the first column and corresponding y
       values and/or bubble size values in adjacent columns.
    2. Do one of the following:
       Customize your chart as you create it.
          1. Select the cells that contain the data you want to use for your chart.
          2. Click Chart Wizard button.
          3. Follow the instructions in the Chart Wizard.
       Create a basic chart that you can customize later.
          4. Display the Chart toolbar. To show the Chart toolbar, point to Toolbars on the
              View menu and then click Chart.
          5. Select the cells that contain the data you want to use for your chart.
          6. Click Chart Type.

Add a trendline to a chart
   1. Click the data series (related dada points that are plotted in a chard. Each data series in a
      chart has a unique color or pattern and is represented in the chart legend) to which you
      want to add a trendline (a graphic representation of trends in data series. Trendlines are
      used for the study of problems of prediction, also called regression analysis).
   2. On the Chart menu, click Add Trendline.
   3. On the Type tab, click the type of regression trendline or moving average you want.
          o If you select Polynomial, enter in the Order box the highest power for the
              independent variable.
          o If you select Moving Average, enter in the Period box the number of periods to
              be used to calculate the moving average

Equations for calculating trendlines
Linear
Calculates the least squares fit for a line represented by the following equation:
y  mx  b
where m is the slope and b is the intercept.
Polynomial
Calculates the least squares fit through points by using the following equation:
y  b  c1 x  c 2 x 2  c3 x 3  .......  c6 x 6
where b and c1 .....c 6 are constants.
Logarithmic
Calculates the least squares fit through points by using the following equation:
y  c ln x  b
where c and b are constants and ln is the natural logarithm function.
Exponential
Calculates the least squares fit through points by using the following equation:
y  ce bx
where c and b are constants, and e is the base of the natural logarithm.

Display the R-squared value for a trendline
   1. Click the trendline for which you want to display the R-squared value (an indicator from
      0 to 1 that reveals how closely the estimated values for the trendline correspond to your
      actual data. A trendline is most reliable when R-squared values is at or near 1. Also
      known as the coefficient of determination).
   2. On the Format menu, click Selected Trendline.
   3. On the Options tab, select Display R-squared value on chart.

Sort a list
Sort rows in ascending order (A to Z, or 0 to 9) or descending (Z to A, or 9 to 0)
   1. Click a cell in the column you would like to sort by.
   2. Click Sort Ascending            or Sort Descending     .
Sort rows by 2 or more columns
For best results, the list you sort should have column labels.
   1. Click a cell in the list you want to sort.
   2. On the Data menu, click Sort.
   3. In the Sort by and Then by boxes, click the columns you want to sort.
   4. Select any other sort options you want, and then click OK.

Exercises
Problem_1
1. Create a new Excel document with the following data:




2. Compute the cost of hospitalization by using the next formula: =B6*$G$3. To create any
formulas, begin by pressing the equal sign (=).You can enter values directly in a formula, for
example, by typing =B6*$G$3. Press Enter to see the value resulting from the formula. To avoid
repetitive typing and save time by entering same formula select the first entities and drag the fill
down.
3. Create a Pie chart to relive the sex distribution from the study and a Scatter chart between age
and duration of hospital stay.
You can insert a graphic in a worksheet from Insert menu by using Chart or press Chart Wizard
button. First you must to select the range of worksheet data you want to include in a chart. Then
click the Chart Wizard button. Follow the instructions in the wizard to specify the chart type and
options you want. The wizard offers you the option of creating a chart on the worksheet or
creating a separate chart sheet in the workbook. If you create a chart on the worksheet, you can
reposition and resize it.
For the Pie chart, first you must to compute the number of male and respectively female. In
Insert menu chose the Function option and after that chose the function COUNTIF. Into the
range you must to select the data from the sex column and for the criteria you must to select a
cell that contains the number who corresponds to sex (for example: cell D5 for female and D7
for the male).
4. To the scatter chart add a trendline: select a point from the graphic and with right click chose
Add Trendline: from Type select Linear, from Options select: Display the equation (to obtain
equation of the line) on chart and Display R-squared value on chart (to obtain coefficient of
determination).
5. After you select the entire data sort: Data – Sort – sort by: Column C, descending; Then by:
Column B, descending; Then by: Column G, descending.
6. Save this file in your name folder: File – Save as – a dialog box (Save in:
FirstName.SecondName, File name: Excel_1 - Save).
7. Close the application and don’t forget to log off: Start – Shut Down – a dialog box where ask
you ―What do you want to computer to do?‖ and click on Log off FirstName.SecondName.

Problem_2
1. Create a new Microsoft Excel document: Start - Programs- Microsoft Excel and introduce the
next data:
Table 2.12 comes from a paper giving the distribution of astigmatism in 133 young men, aged
18-22, who were accepted for military service in Great Britain. Assume that astigmatism is
rounded to the nearest 10th of a diopter.




Table 2.12 Distribution of astigmatism in 1033 young men aged 18-22
2. Compute the grouped arithmetic mean (average): from the Insert menu chose Function and
Average (for arithmetic mean).
3. Compute the grouped standard deviation: Insert - Function - STDEV (standard deviation).
4. Plot a histogram to properly illustrate these data: Select the data from the two columns and
from Insert menu chose Chart - Column. The chart must to look like:

                                             Histogram

                          500

                          400

                          300

                          200

                          100

                            0
                                0,0- 0,2- 0,4- 0,6- 1,1- 2,1- 3,1- 4,1- 5,1-
                                0,1 0,3 0,5 1,0 2,0 3,0 4,0 5,0 6,0

5. Save the document in your folder as Excel_2.
Problem_3
1. Create a new Microsoft Excel document: Start - Programs- Microsoft Excel and introduce the
next data:




Table 1.1. Serum-cholesterol levels before and after adopting a vegetarian diet.

2. Compute the arithmetic mean (average): Insert -Function - Average (for arithmetic mean),
max (MAX) and (MIN) min for the cholesterol before and after adopting a vegetarian diet and
for the differences.
3. Create a scatter chart that to represent the link between age and serum cholesterol level before
adopting the vegetarian diet. Add a trendline and the display the equation and the R-squared
value on the chart.
4. Save the document in your folder as Excel_3. Minimize the Excel windows and open a new
Word document. Copy here all the data that you create it in Excel. Save the Word document as
Word Excel.
5. Close all the application and do not forgot to log off: Start - Shut Down - Log off.

Problem_4
1. Create a new Excel document: Start – Programs – Microsoft Excel and make a file with the
following data:
In the Patient status column introduce the next formula: =IF(B2<$D$1,"No","Yes").
2. Sort the data by the Patient status column: select the data first and from menu Data chose Sort
– sort by: Patient status, descending. Now, the data must to be as bellow:




3. To organize the program for descriptive statistics, from the Tools menu chose Add-Ins and
then chose Analysis ToolPak and Analysis ToolPak-VBA:
4. Compute the correlation between Blood level of sugar and Urinary level of sugar: Tools - Data
Analysis and chose the same as bellow:




5. Compute the Covariation between Blood level of sugar and Urinary level of sugar: Tools -
Data analysis and chose the same as bellow:




6. Calculate the parameters of descriptive statistics: Tools - Data analysis- Descriptive statistics
and chose the same as bellow:




7. Save the document in tour folder as Excel_4.
8. Close the application and don’t forget to log off: Start – Shut Down – a dialog box where ask
you ―What do you want to computer to do?‖ and click on Log off FirstName.SecondName.
Microsoft PowerPoint
About Microsoft PowerPoint
Any time you communicate with a group of people, your are giving a presentation. You can
communicate information better and more easily with PowerPoint presentation, a series of slides
that create by using PowerPoint.

About creating presentations
Creating a presentation in Microsoft PowerPoint involves starting with a basic design; adding
new slides and content; choosing layout (the arrangement of elements such title and subtitle text,
lists, pictures. tables. charts, AutoShapes and movies on the chart); modifying slide design, if
you want, by changing the color scheme or applying different design templates (a file that
contains the style in a presentation, including the type and size of bullets and fonts, background
design and fill, color schemes and a slide master); and creating effects such as animated slide
transitions. The information below focuses on the options available to you when you start the
process.
The New Presentation in PowerPoint gives you a range of ways with which to start creating a
presentation. These include:
Blank Start with slides that have minimal design and no color applied to them.
Existing presentation Base your new presentation on one that you have already written and
designed. This command creates a copy of an existing presentation so you can make the design
or content changes you want for the new presentation.
Design template Base your presentation on a PowerPoint template that already has a design
concept, fonts, and color scheme. In addition to the templates that come with PowerPoint, you
can use one you created yourself.
Templates with suggested content Use the AutoContent Wizard to apply a design template that
includes suggestions for text on your slides. You then type the text that you want.
A template on a Web site Create a presentation using a template located on a Web site.

About PowerPoint views
Microsoft PowerPoint has three main views: normal view, slide sorter view, and slide show
view.
Normal view
Normal view is the main editing view, which you use to write and design your presentation. The
view has three working areas: on the left, tabs that alternate between an outline of your slide text
(Outline tab) and your slides displayed as thumbnails (Slides tab); on the right, the slide pane,
which displays a large view of the current slide; and on the bottom, the note pane.
    Outline tab: Showing your slide text in outline form, this area is a great place to start
    writing your content — to capture your ideas, plan how you want to present them, and move
    slides and text around.
    Slides tab: Switch to this tab to see the slides in your presentation as thumbnail-sized
    images while you edit. The thumbnails make it easy for you to navigate through your
    presentation and to see the effects of your design changes. You can also rearrange, add, or
    delete slides.
    Slide pane: With the current slide shown in this large view, you can add text, insert
    pictures, tables, charts, drawing objects, text boxes, movies, sounds, hyperlinks, and
    animations.
    Notes pane: Add notes that relate to each slide's content, and use them in printed form to
    refer to as you give your presentation, or create notes that you want your audience to see
    either in printed form or on a Web page.
The Outline and Slides tabs change to display an icon when the pane becomes narrow, and if
you only want to see the current slide in the window as you edit, you can close the tabs with a
Close box in the right corner.
Slide sorter view
Slide sorter view is an exclusive view of your slides in thumbnail form.
When you are finished creating and editing your presentation, slide sorter gives you an overall
picture of it — making it easy to reorder, add, or delete slides and preview your transition and
animation effects.
Slide show view
Slide show view takes up the full computer screen, like an actual slide show presentation. In this
full-screen view, you see your presentation the way your audience will. You can see how your
graphics, timings, movies, animated elements, and transition effects will look in the actual show.

Create a presentation using a design template
   1. If the New Presentation task pane isn't displayed, on the File menu, click New.
   2. Under New, click From Design Template.
   3. In the Slide Design task pane, click a design template that you'd like to apply.
   4. If you want to keep the default title layout for the first slide, go to step 5. If you want a
      different layout for the first slide, on the Format menu, click Slide Layout, and then
      click the layout you want.
   5. On the slide or on the Outline tab, type the text for the first slide.
   6. To insert a new slide, on the toolbar, click New Slide               , and click the layout
      you want for the slide.
   7. Repeat steps 5 and 6 to keep adding slides, and add any other design elements or effects
      you want.
   8. To save the presentation, on the File menu, click Save; in the File name box type a name
      for the presentation, and then click Save.

Duplicate slides within a presentation
Duplicated slides are inserted directly below the slides you have selected.
   1. On the Outline tab or Slides tab in normal view, select the slides you want to duplicate.
       (If you want to select slides in order, press SHIFT as you click; for slides not in order,
       press CTRL as you click.)
   2. On the Insert menu, click Duplicate Slide.

Change slide order
Do one of the following:
    On the Outline tab in normal view, select one or more slide icons and then drag the
       selection to a new location.
    On the Slides tab in normal view, select one or more slide thumbnails, and then drag the
       selection to a new location.
    In slide sorter view, select one or more slide thumbnails, and then drag the thumbnails to
       a new location.
To select multiple slides in a row, press SHIFT before clicking the slide icon or thumbnail.

Delete a slide
   1. On the Outline tab or Slides tab in normal view, select the slides you want to delete. (If
      you want to select slides in order, press SHIFT as you click; for slides not in order, press
      CTRL as you click.)
   2. On the Edit menu, click Delete Slide.
About adding text to a slide
There are four types of text you can add to a slide: placeholder text, text in an AutoShape, text in
a text box, and WordArt text.
The text you type into placeholders, such as titles and bulleted lists, can be edited on the slide or
on the Outline tab, and it can be exported from the Outline tab to Microsoft Word. Text in an
object, such as a text box or AutoShape, and WordArt text do not appear on the Outline tab and
must be edited on the slide.
Placeholders: Slide layouts contain text and object placeholders in a variety of combinations. In
the text placeholders, type titles, subtitles, and body text onto your slides. You can resize and
move placeholders and format them with borders and colors.
AutoShapes: AutoShapes such as callout balloons and block arrows lend themselves to text
messages. When you type text into an AutoShape, the text is attached to the shape and moves or
rotates with the shape.
Text boxes: Use text boxes to place text anywhere on a slide, such as outside a text placeholder.
For example, you can add a caption to a picture by creating a text box and positioning it near the
picture. Also, a text box is handy if you want to add text to an AutoShape, but you don't want the
text to attach to the shape. A text box can have a border, fill, shadow, or three-dimensional (3-D)
effect, and you can change its shape.
WordArt: Use WordArt for fancy text effects. WordArt can stretch, skew, curve, and rotate your
text or make it 3-D or vertical.

Add a picture
   1. Click where you want to insert the picture.
   2. Insert a picture from a file:
           On the Drawing toolbar click Insert Picture From File.
           Locate the folder that contains the picture that you want to insert, and then click
           the picture file.
           Do one of the following:
                   To embed (information that contained in a source file and inserted into a
                      destination file. Once embedded, the object becomes part of the
                      destination file. Changes you make to the embedded object are reflected in
                      the destination file) the picture, click Insert.
                   To link (linked object: an object that is created in a source file and inserted
                      into a destination file, while maintaining a connection between the two
                      files) the picture to the picture file on your hard disk, click the arrow next
                      to Insert, and then click Link to File.

To animate your presentation
You can animate text, graphics, diagrams, charts, and other objects on your slides so that you can
focus on important points, control the flow of information, and add interest to your presentation.
Custom Animation
Custom animations can be applied to items on a slide, in a placeholder, or to a paragraph (which
includes single bullets or list items). For instance, you can apply the fly-in animation to all items
on a slide or you can apply it to a single paragraph in a bulleted list. Use entrance, emphasis, or
exit options, in addition to preset or custom motion path. Also, you can apply more than one
animation to an item; so, you can make that bullet item fly in and then fly out.
Most animation options include associated effects to choose from. These might include options
for playing a sound with the animation, and text animations usually let you apply the effect by
letter, word, or paragraph (such as having a title fly in a word at a time instead of all at once).
You can preview the animation of your text and objects for one slide or for the whole
presentation.
Animate the object from the slides: chose from the slide Show menu the option Custom
animation. After that select from the slide the object that you want to animate it and from Add
effects chose an effect. Make the same things for the entire object.
Animate the whole presentation: from Slide Show menu chose Slide Transition. Chose then a
type of transition; from Modify transition chose o speed of transition; from Advance slide chose
on mouse click and chose to Apply to all slides.
To view your presentation from View menu chose Slide Show.

Start a slide show presentation
Do one of the following:
Start a slide show from within Microsoft PowerPoint
   1. Open the presentation you want to view as a slide show.
   2. Do one of the following:
           o    Click Slide Show      at the lower left of the PowerPoint window.
           o    On the Slide Show menu, click View Show.
           o    Press F5.
Start a slide show from your desktop
   1. In My Computer or Microsoft Windows Explorer, locate the presentation file you want to
        open as a slide show.
   2. Right-click the file name, and then click Show.
Start a PowerPoint Show (.pps)
Use this procedure to open and play a slide show you have saved as a PowerPoint Show (.pps).
   1. In My Computer or Microsoft Windows Explorer, locate the PowerPoint Show file you
        want to open.
   2. Double-click the file name to open it.

Navigate between slides during a presentation
Use the following commands in slide show view. For each type of navigation, choose from
several methods.
Go to the next slide
    Click the mouse.
    Press SPACEBAR or ENTER.
    Right-click, and on the shortcut menu, click Next.
Go to the previous slide
    Press BACKSPACE.
    Right-click, and on the shortcut menu, click Previous.
Go to a specific slide
    Type the slide number, and then press ENTER.
    Right-click, point to Go on the shortcut menu, then point to By Title, and click the slide
        you want.
See previously viewed slide
    Right-click, point to Go on the shortcut menu, and then click Previously Viewed.

About hyperlinks and action buttons
In Microsoft PowerPoint, a hyperlink is a connection from a slide to another slide, a custom
show, a Web page, or a file. The hyperlink itself can be text or an object such as a picture, graph,
shape, or WordArt. An action button is a ready-made button that you can insert into your
presentation and define hyperlinks for.
If the link is to another slide, the destination slide is displayed in the PowerPoint presentation. In
PowerPoint, hyperlinks become active when you run your presentation, not when you are
creating it.
When you point to a hyperlink, the pointer becomes a hand, indicating that it is something you
can click. Text that represents a hyperlink is displayed underlined and in a color that coordinates
with your color scheme. Pictures, shapes and other object hyperlinks have no additional
formatting. You can add action settings, such as sound or highlighting, to emphasize hyperlinks.

About printing
You can print your entire presentation — the slides, outline, notes pages, and audience
handouts — in color, grayscale, or pure black and white. You can also print specific slides,
handouts, notes pages, or outline pages.
Color, black and white, or grayscale
Most presentations are designed to be shown in color, but slides and handouts are usually printed
in black and white or shades of gray (grayscale). When you choose to print, Microsoft
PowerPoint sets the colors in your presentation to match your selected printer's capabilities. For
example, if your selected printer is black and white, your presentation will automatically be set
to print in grayscale.
With print preview, you can see how your slides, notes, and handouts will look in pure black and
white or in grayscale, and adjust the look of objects before you print.
You can also make certain changes when you preview before printing. You can select:
    What you want to print: the presentation, handouts, notes pages, or just the outline.
    A layout for handouts.
    To add a frame around each slide for print out only.
    Orientation (portrait or landscape) for handouts, notes pages, or an outline.
    Header and footer options.
Slides and transparencies
You might choose to print only the slides and use them as handouts. Slides print one per page
and can be sized to fit a variety of paper sizes. Slides can also be sized to fit standard
transparencies (for overhead projectors), 35mm slides, or you can customize the fit and
orientation.
Outline, notes pages and handouts
Outline: You can choose to print all the text in your outline or just the slide titles, in either
landscape (horizontal) or portrait (vertical) orientation. The printout might look different from
the screen display; while you can show or hide formatting (such as bold or italic) in the Outline
pane on screen, on the printout the formatting will always appear.
Notes pages: Print your notes pages either for your own use when delivering a presentation or
to include as handouts for your audience. Notes pages can be designed and formatted with
colors, shapes, charts, and layout options. Each notes page includes a copy of the slide it refers to
and prints one slide per page, with the notes printed under the slide image. To print two slides
per page with the associated notes printed next to the slides, you can send the presentation to
Microsoft Word. Headers and footers on the notes pages are separate from the headers and
footers on the slides.
Handouts: You can design and create handouts similarly to the notes pages. However, you can
choose from many layout options for printing: from 1 slide per page to 9 slides per page. The 3-
slides-per page option includes lined space for note-taking by the audience. For additional layout
options, you can send the presentation to Microsoft Word. Headers and footers on handouts are
separate from the headers and footers on the slides.

Problems
Problem_1
1. Create a PowerPoint presentation about how a physicians can use a computer in his every day
work (described hardware and software components and the principals programs - ex. Word -
documents and letters; Excel-organizing, calculating and analyzing your data; PowerPoint-
presentations; Access-to keep your data about a subject in one place). Animate your presentation.
Save your presentation in your folder as First Presentation.
2. Close all programs and from Shut Down chose Log off your name.

Problem_2
1. Create a PowerPoint presentation about the informatics lab. The presentation must to have a
title slide, a slide about word documents (when you can use Word document), a slide about
Microsoft Excel (when you can use Excel document) and an end slide.
2. The Word and Excel slides must to have:
      Some actions button who do the hyperlink with your Word lab and with your Excel lab.
      The title must to be in WordArt format (see Drawing bare)
      You must to has some pictures and some AutoShapes (see Drawing bare)
3. Save your presentation as Second presentation in your folder.
4. Close all programs and from Shut Down chose Log off your name.
Microsoft Access
About Microsoft Access
Designing a database
Good database design ensures that your database is easy to maintain. You store data in tables and
each table contains data about only one subject, such as customers. Therefore, you update a
particular piece of data, such as an address, in just one place and that change automatically
appears throughout the database.
A well-designed database usually contains different types of queries that show the information
you need. A query might show a subset of data, such as all customers in London, or
combinations of data from different tables, such as order information combined with customer
information.
Before you use Microsoft Access to actually build tables, queries, forms, and other objects, it's a
good idea to sketch out and rework your design on paper first. You can also examine well-
designed databases similar to the one you are designing.
Follow these basic steps when designing your database.
1. Determine the purpose of your database
The first step in designing a database is to determine its purpose and how it's to be used:
       Talk to people who will use the database. Brainstorm about the questions you and they
would like the database to answer.
       Sketch out the reports you'd like the database to produce.
       Gather the forms you currently use to record your data.
2. Determine the fields you need in the database
Each field is a fact about a particular subject. For example, you might need to store the following
facts about your patients: id_number, name, address, city, date of birth, and phone number. You
need to create a separate field for each of these facts. When determining which fields you need,
keep these design principles in mind:
       Include all of the information you will need.
       Store information in the smallest logical parts. For example, patient names are often split
into two fields, FirstName and SecondName, so that it's easy to sort data by SecondName.
       Don't create fields for data that consists of lists of multiple items.
       Don't include derived or calculated data (data that is the result of an expression).
       Don't create fields that are similar to each other. For example, in a Patients table, if you
create the field age and date of birth it will be more difficult to find the information about a
patient age.
3. Determine the tables you need in the database
Each table should contain information about one subject. Your list of fields will provide clues to
the tables you need.
4. Determine which table each field belongs to
When you decide which table each field belongs to, keep these design principles in mind:
       Add the field to only one table.
       Don't add the field to a table if it will result in the same information appearing in multiple
records in that table. If you determine that a field in a table will contain a lot of duplicate
information, that field is probably in the wrong table.
When each piece of information is stored only once, you update it in one place. This is more
efficient, and it also eliminates the possibility of duplicate entries that contain different
information.
5. Identify the field or fields with unique values in each record
In order for Microsoft Access to connect information stored in separate tables — for example, to
connect a customer with all the customer's orders — each table in your database must include a
field or set of fields that uniquely identifies each individual record in the table. Such a field or set
of fields is called a primary key.
6. Determine the relationship between tables
Now that you've divided your information into tables and identified primary key fields, you need
a way to tell Microsoft Access how to bring related information back together again in
meaningful ways. To do this, you define relationships between tables.
7. Refine your design
After you have designed the tables, fields, and relationships you need, it is time to study the
design and detect any flaws that might remain. It is easier to change your database design now
than it will be after you have filled the tables with data.
8. Enter data and create other database objects
When you are satisfied that the table structures meet the design principles described here, then
it's time to go ahead and add all your existing data to the tables. You can then create other
database objects – queries, forms, reports, data access pages, macros and modules.

Determine the table you need in the database
Each table should contain information about one subject. Your list of fields will provide clues to
the tables you need.

Determine which table each field belongs to
When you decide which table each field belongs to, keep these design principles in mind:
       Add the field to only one table.
       Don't add the field to a table if it will result in the same information appearing in multiple
records in that table. If you determine that a field in a table will contain a lot of duplicate
information, that field is probably in the wrong table.
When each piece of information is stored only once, you update it in one place. This is more
efficient, and it also eliminates the possibility of duplicate entries that contain different
information.

Identify the field or fields with unique values in each record
In order for Microsoft Access to connect information stored in separate tables — for example, to
connect a patient with all the his/her consultation — each table in your database must include a
field or set of fields that uniquely identifies each individual record in the table. Such a field or set
of fields is called a primary key.

Determine the relationships between tables
Now that you've divided your information into tables and identified primary key (one or more
fields – columns – whose value or values uniquely identify each record in a table. A primary key
cannot allow Null values and must always have a unique index. A primary key is used to relate a
table to foreign keys in other tables) fields, you need a way to tell Microsoft Access how to bring
related information back together again in meaningful ways. To do this, you define relationship
(an association established between common fields – columns – in two tables. A relationship can
be one-to-one, one-to many or many-to-many) between tables.

Create an Access database
Microsoft Access provides two methods to create an Access database. You can use a Database
Wizard to create in one operation the required tables, forms, and reports for the type of database
you choose — this is the easiest way to start creating your database. Or you can create a blank
database and then add the tables, forms, reports, and other objects later — this is the most
flexible method, but it requires you to define each database element separately. Either way, you
can modify and extend your database at any time after it has been created.
Create a Blank Database
      1.From the File menu chose New options.
      2.In the New File task pane, under New, click Blank Database.
      3.In the File New Database dialog box, specify a name and location for the database, and
then click Create.
Open an Access database
      1.On the File menu, click Open.
      2.Click a shortcut in the left side of the Open dialog box, or in the Look in box, click the
drive or folder that contains the Microsoft Access database that you want.
      3.In the folder list, double-click folders until you open the folder that contains the database.
      4.Double-click the database.

Create a table in Design View
       1.Click Tables under Objects.
       2.Double-click on Create table Design view.
       3.Define each of the fields in your table.
       4.Define a primary key field before saving your table.
          Open a table in Design view.
          Select the field or fields you want to define as the primary key.
             To select one field, click the row selector for the desired field.
             To select multiple fields, hold down the CTRL key and then click the row selector for
             each field.
          Click Primary Key on the toolbar.
You do not have to define a primary key, but it is usually a good idea. If you do not define a
primary key, Microsoft Access asks if you want Access to create one for you when you save the
table.
       5.When you are ready to save your table, click Save button on the toolbar, and then type a
unique name for the table.

About primary keys
The power of a relational database system such as Microsoft Access comes from its ability to
quickly find and bring together information stored in separate tables using queries, forms and
reports. In order to do this, each table should include a field or set of fields that uniquely
identifies each record stored in the table. This information is called the primary key of the table.
Once you designate a primary key for a table, Access will prevent any duplicate or Null values
from being entered in the primary key fields.
There are three kinds of primary keys that can be defined in Microsoft Access:
1. AutoNumber primary keys
An AutoNumber field can be set to automatically enter a sequential number as each record is
added to the table. Designating such a field as the primary key for a table is the simplest way to
create a primary key. If you don't set a primary key before saving a newly created table,
Microsoft Access will ask if you want it to create a primary key for you. If you answer Yes,
Microsoft Access will create an AutoNumber primary key.
2. Single-field primary keys
If you have a field that contains unique values such as ID numbers or part numbers, you can
designate that field as the primary key. You can specify a primary key for a field that already
contains data as long as that field does not contain duplicate values or Null values.
3. Multiple-field primary keys
In situations where you can't guarantee the uniqueness of any single field, you may be able to
designate two or more fields as the primary key. The most common situation where this arises is
in the table used to relate two other tables in a many-to-many relationship. For example, a
Consultations table can relate the Patients tables. Its primary key consists of two fields: NumPac
and Date_of_consultation.
About relationships in an Access database
After you've set up different tables for each subject in your Microsoft Access database, you need
a way of telling Microsoft Access how to bring that information back together again. The first
step in this process is to define relationships between your tables. After you've done that, you can
create queries, forms, and reports to display information from several tables at once. For
example, this form includes information from four tables:
A one-to-many relationship
A one-to-many relationship is the most common type of relationship. In a one-to-many
relationship, a record in Table A can have many matching records in Table B, but a record in
Table B has only one matching record in Table A.
A many-to-many relationship
In a many-to-many relationship, a record in Table A can have many matching records in Table
B, and a record in Table B can have many matching records in Table A. This type of relationship
is only possible by defining a third table (called a junction table) whose primary key consists of
two fields — the foreign keys from both Tables A and B. A many-to-many relationship is really
two one-to-many relationships with a third table.
A one-to-one relationship
In a one-to-one relationship, each record in Table A can have only one matching record in Table
B, and each record in Table B can have only one matching record in Table A. This type of
relationship is not common, because most information related in this way would be in one table.
You might use a one-to-one relationship to divide a table with many fields, to isolate part of a
table for security reasons, or to store information that applies only to a subset of the main table.

About forms
A form is a type of a database object that is primarily used to enter or display data in a database.
You can also use a form as a switchboard that opens other forms and reports in the database, or
as a custom dialog box that accepts user input and carries out an action based on the input.
Creating a form in Design view
You can customize a form in Design view in the following ways:
Record source: Change the tables and queries that a form is based on.
Controlling and assisting the user: You can set form properties to allow or prevent users from
adding, deleting, or editing records displayed in a form. You can also add custom Help to a form
to assist your users with using the form.
Form window: You can add or remove Maximize and Minimize buttons, short cut menus, and
other Form window elements.
Sections: You can add, remove, hide, or resize the header, footer, and details sections of a form.
You can also set section properties to control the appearance and printing of a form.
Controls: You can move, resize, or set the font properties of a control. You can also add
controls to display calculated values, totals, current date and time, and other useful information
on a form.
Add or edit data in a datasheet or form
       1.Open a datasheet or open a form in Form view.
       2.Do one of the following:
          To add a new record, click New Record button on the toolbar, type the data, and then
            press TAB to go to the next field. At the end of the record, press TAB to go to the
            next record.
          To edit data within a field, click in the field you want to edit, and then type the data.
          To replace the entire value, move the pointer to the leftmost part of the field until it
            changes into the plus pointer, and then click. Type the data.
Notes: To correct a typing mistake, press BACKSPACE. To cancel your changes in the current
field, press ESC. To cancel your changes in the entire record, press ESC again before you move
out of the field.

About designing a query
When you open a query in Design view (a window that shows the design of these database
objects: tables, queries, forms, reports, macros, and data access pages. In Design view, you can
create new database objects and modify the design of existing ones.), or open a form, report, or
datasheet.
Add or remove tables, queries and field
You can add a table or query if the data you need isn't in the query, or remove a table or query if
you decide you don't need them. Once you add the tables or queries you need, you can then add
the fields that you want to work with to the design grid, or remove them if you decide you don't
need them.
Limit results by using criteria
You can limit the records that you see in the query's results or the records that are included in a
calculation by specifying criteria.

About reports
A report is an effective way to present your data in a printed format. Because you have control
over the size and appearance of everything on a report, you can display the information the way
you want to see it.
Most reports are bound to one or more table and query in the database. A report's record source
refers to the fields in the underlying tables and queries. A report need not contain all the fields
from each of the tables or queries that it is based on.
A bound report gets its data from its underlying record source. Other information on the form,
such as the title, date, and page number, is stored in the report's design.
Creating a report
You can create different types of reports quickly by using wizards. Use the Label Wizard to
create mailing labels, the Chart Wizard to create charts, or the Report Wizard to create a standard
report. The wizard asks you questions and creates a report based on your answers.
Creating report by using wizard
You can customize a report in the following ways:
Record source: Change the tables and queries that a report is based on.
Sorting and grouping data: You can sort data in ascending or descending order. You can also
group records on one or more fields, and display subtotals and grand totals on a report.
Report window: You can add or remove Maximize and Minimize buttons, change the title bar
text, and other Report window elements.
Sections: You can add, remove, hide, or resize the header, footer, and details sections of a report.
You can also set section properties to control the appearance and printing of a report.
Controls: You can move, resize, or set the font properties of a control. You can also add controls
to display calculated values, totals, current date and time, and other useful information on a
report.

Problems
Problem_1
1. Create a new Access document: Start - Programs - Microsoft Access. From File menu chose
New - Microsoft Access Application. The name of the document it will be First Database.
2. Open the Database files with double click on his icon and create two tables: From the Objects
column chose Tables and after that chose Create table in Design view.
The first table is named Patients and will have the next field:
The next table name is Consultations and will have the next field:




Primary key: The primary key for the table Patients will be the field named Id_Pacients; in the
table Consultations the primary key will be the fields: NumPac and Date of consultations.
    o Open a table in Design view.
    o Select the field or fields you want to define as the primary key.
    o To select one field, click the row selector for the desired field.
    o To select multiple fields, hold down the CTRL key and then click the row selector for
       each field.
    o Click Primary Key on the toolbar.
To create a relationship between the two tables:
       o In Design view, open the table named Consultation.
       o Click in that field NumPac row and in the Data Type column, click the arrow and
           select Lookup Wizard.
       o In the first dialog box of the Lookup Wizard, select the option that indicates you want
           the Lookup field to look up the values in a table or query.
       o Click Next and follow the directions in the remaining Lookup Wizard dialog boxes.
       When you click the Finish button, Microsoft Access creates a Lookup field whose
       properties are based on the choices you made in the wizard.
Introduce in your table 10 patients. Every patient must to have more then one consultations.
3. Close the application and don’t forget to log off: Start – Shut Down – a dialog box where ask
you ―What do you want to computer to do?‖ and click on Log off FirstName.SecondName.

Problem_2
1. Create a new Access document: Start - Programs - Microsoft Access. From File menu chose
New - Microsoft Access Application. The name of the document it will be Second Database.
2. Open the Database files with double click on his icon and create two tables: From the Objects
column chose Tables and after that chose Create table in Design view.
The first table is named Patients and will have the next field:
The next table name is Analysis and will have the next field:




The primary key for the table Patients will be the field named Id_Pacients; in the table Analysis
the primary key will be the fields: Patient and Date_of _analysis.
Create a relationship between the field Name from the Patients table and the field Patients from
the tables Analysis. To create the relationship follows the next steps:
        o Open the table named Analysis in Design view.
        o Click in that field Patient row and in the Data Type column, click the arrow and
            select Lookup Wizard.
        o In the first dialog box of the Lookup Wizard, select the option that indicates you want
            the Lookup field to look up the values in a table or query.
        o Click Next and follow the directions in the remaining Lookup Wizard dialog boxes.
        When you click the Finish button, Microsoft Access creates a Lookup field whose
        properties are based on the choices you made in the wizard.
Introduce in your table 10 patients. Every patient must to have more then one consultations.
3. Create an interrogation (Query) to find which patients have the level of glicemia more than
110. For that follow the next steps:
            a. From the Access Objects chose Queries.
            b. Double-click on Create Queries in Design view.
            c. Click on the table Patients and after that click on Add button. Make the same
                steps to add the Analysis table.
            d. Chose from the table Patients the field Name and from the table Analysis chose
                the fields Date_of_analysis and Glicemia. As criteria from the column Glicemia
                introduce >=110.
4. Create a report with all patients from the data based.
        a. From the Access Object chose Reports.
        b. Double click on Create Report by using wizard.
        c. Chose form the table Patients, by using Add button, the next field: Name, Date of
            birth and Sex and from Analysis table chose: Date_of_analysis, Glicemia and
            Cholesterol.
        d. Follow the next steps until finish the report. The name of report must to be Patients.
5. Close the application and don’t forget to log off: Start – Shut Down – a dialog box where ask
you ―What do you want to computer to do?‖ and click on Log off FirstName.SecondName.
EpiInfo 2000

About EpiInfo 2000
The main programs of EpiInfo can be accessed either through the PROGRAMS menu or by
clicking on the icon buttons. The buttons can be turned on or off with the BUTTONS item on
the SETTINGS menu.
EpiInfo 2000 have some principal menus:
            Programs
            Examples
            Adv.Stats
            Language
            Settings
            Manual
            DoEpiTutorials
With the Program menu we can open the next programs:
Make View         Designing a new form or questionary
Enter Data        Entering data; Opening an existing View and database;
                  Searching for particular records
Analyze Data      READ a view or a data file or table; LIST the contents of
                  the database; Obtain the FREQuency of values for a field;
                  Cross-tabulate with the TABLES command and resulting
                  epidemiologic statistics; Define a new variable and
                  assigning a value; Use an IF statement to determine and
                  assign case status; SELECT a subset of records to
                  process; RECODE values to group the AGE field;
                  WRITE data to another file or table; READ a non Access
                  file and related tables in a view and analyze data from
                  more than one table
StatCalc          An epidemiological calculator that produces statistics from
                  summary data entered on the screen. Offered three types of
                  calculation: statistics from 2-by-to to 2-by-9 table; sample
                  size calculation; chi square for trend by the Mantel
                  Extension Method.
Epi Map           Adding a shape file to create a map; Adding data to be
                  represented as a color density map;       Creating a map of
                  cholera cases in John Snow's London using X and Y
                  coordinates of case households
Nutrition         Entering data from one child's measurements;
                  Interpreting results from calculations based on accepted
                  reference standards; Graphing more than one result to
                  show a child's growth;          Customizing the data entry
                  screen
Visualize Data    To update the data
Word Processing   Word processor (for improving the results)
EXIT              To close EpiInfo sessions


MakeView Program
     Designing a new form or questionnaire (a View)
     Text and numeric fields
     Specifying a list of Legal values
     Inserting a grid, the automatic way to deal with repeating data within a questionnaire
     Large text (multiline) fields
To run MakeView, click on the Make View button on the main menu screen. You should see a
blank page for constructing a ―View.‖ Questionnaires are called Views in EpiInfo 2000 because
there can be more than one View of a database or data table.
To make a view, from the FILE menu choose Make New View. Enter a name for your project
database, such as your name or initials, and click OPEN.
A project or database (.MDB for "Microsoft Database") file can hold as many Views and data
tables as you wish (well, up to 1000, anyway). Place the cursor near the upper left corner of the
blank page and click the right mouse button. The field dialog box that appears offers options for
entering the prompt, the field type and length, and a number of the characteristics that were
previously implemented.
For the first field, enter the prompt ―First Name‖ and press Enter twice. This makes a text field
that can hold up to 255 characters.
For the next field, you could move the cursor and right-click with the mouse on a suitable
location. Below First Name, right-click to add another field. Enter the prompt ―Today’s Date,‖
and use the scroll bar to the right of the field types to see the rest of the list of types. Choose the
DATE type and the appropriate date format as MM-DD-YYYY or DD-MM-YYYY in the
dialog. Click OK. Add another field for ―Date of Birth,‖ using the same field type and pattern.
Click OK. Right-click on the form to make a field for AGE. Type ―Age‖ as the prompt. Choose
NUMBER for the TYPE and then choose ### or ## from the PATTERN list. You can also type
patterns into the pattern window. Click on OK at the bottom of the dialog.
The next field is ―Sex.‖ We will use it to illustrate how variable names are constructed. Right-
click where you would like to place the field. Type ―Male, Female, or Unknown Sex‖ in the
prompt window, press Enter, and note what appears in the Field Name window on the right.
If you need another page, then click on the ADD PAGE button under the page window on the
left side of the screen. The first page is saved automatically and a blank page appears.

Problem_1: Create a questionnaire with the next field:
                           Name of field                 Type of field
                      FirstName                Text
                      SecondName               Text
                      Sex                      Yes/No (Yes = Male, No = Female)
                      DateOfBirth              Date
                      Address                  Text
                      Phone number             Number (9 from the Pattern list)
                      Profession               Text
                      DateOfConsultation       Date
                      Weight                   Number (### from the Pattern list)
                      Height                   Number (#.## from the Pattern list)
                      SystolicBloodPressure    Number (### from the Pattern list)
                      DiastolicBloodPressure   Number (### from the Pattern list)
                      Cholesterol              Number (### from the Pattern list)
                      Hypertension             Yes/No (Yes = ill, No = not ill)

The name of the questionnaire will be Example and you will save this file in your personal
folder.

Enter Program
    Entering data and verifying that your age calculation works
    Moving from page to page
    Opening an existing View and database
    Navigating from record to record
    Searching for particular records.
You should have the EXAMPLE questionnaire on the screen. If not, go back to the main menu
and choose ENTER DATA, OPEN on the FILE menu, and then the database that you created
and the EXAMPLE view. It will appear the next windows:




                                   and confirm with OK.
Enter data in the fields displayed. To move from one field to another pres TAB or move your
mouse pointer into the field that you want to enter the data. After you finish with a patient, to

move to another one you can click on the NEW button (                  ) to save the current record and
move to a blank record.

Moving From Record to Record
Examine the records in the file by moving from record to record with the arrow buttons on the


lower left (            ). The double arrows move to the first and last records; single arrows
move one record at a time. To move to a new record, click on the double right arrows twice.

Finding Records
To find records matching specified criteria, click on the FIND button on the left. A dialog box
appears. Choose the SystolicBloodPressure field and then type ―>160‖ (without quotation
marks) in the field that appears. Click on the OK button to find all the records in which
SystolicBloodPressure is greater then 160. If you want to continue with the current record, click
on the BACK button.

Problem_2: Enter the data in the questionnaire EXAMPLE for the 25 patients.

Problem_3: Using Find button, find all the records in which Systolic blood-pressure is greater
than 140, Diastolic blood-pressure is greater than 90, the Sex of patients are ―Yes‖ and who have
from the Hypertension field the option ―Yes‖.

Analysis Program
     READ a view or a data file or table
     LIST the contents of the database
     Obtain the FREQuency of values for a field
     Cross-tabulate with the TABLES command and resulting epidemiologic statistics
     The library of previous output, all in HTML for the Internet
     Choose how "Yes" and "No" are displayed
     Define a new variable and assigning a value
     Use an IF statement to determine and assign case status
     SELECT a subset of records to process
     RECODE values to group the AGE field
     WRITE data to another file or table
     READ a non Access file
     READ related tables in a view and analyze data from more than one table
To run Analysis, click on the ANALYZE DATA button on the main menu screen. Note that all
commands are shown in the tree view on the left. Clicking on a command will bring up a dialog
that places the command in appropriate form in the program editor at the bottom of the screen.
Results appear in the third window above the program editor, which is a simplified version of the
Microsoft Internet browser.
The most frequently Analysis Command used are:
        Data: Read (Import), Relate, Write (Export), Merge
        Select/If: Select, Cancel Select, If, Sort, Cancel Sort
        Statistics: List, Frequencies, Tables, Means, Graph, Map
        Advanced Statistics: Linear Regression, Logistic Regression, Kaplan-Maier
        Frequencies, Complex Sample Tables, Complex Sample Mean
        Output: Header, Type, RoutOut, CloseOut, PrintOut, Storing Output
Selection of statistical methods depends on the type of data and the purpose of the analysis.
EpiInfo provides a number of statistical methods within the commands FREQ, TABLES,
MEANS, and REGRESS.
The commands used for statistical analysis in EpiInfo, the types of data to be analyzed, and the
purposes of various analyses are as follows:

For all types of data
Purpose                                     See the data, record by record
Analysis command                            LIST

For Text, Numbers or Dates – Categorical Data
Purpose                                determine the frequency of each value
Analysis command                       FREQ
Statistical method(s)                  confidence intervals on proportions

For Text, Numbers or Dates – Categorical Data – 2 values per variable
Purpose                 evaluate association between variables cross tabulation 2x2 tables
Analysis command        TABLES (Risk Factor)(Outcome)
Statistical method(s)   odds ratios with confidence limits
                        Risk Ratio with confidence limits
                        Chi Square
                        Fisher Exact

For Text, Numbers or Dates – Categorical Data – 2 values per variable with multiple value for a possible
confounder or modifier used for stratification
Purpose                       assessing confounding and interaction (Effect Modification)
Analysis command              TABLES (Risk Factor)(Outcome)(Stratifier(s))
Statistical method(s)         Mantel-Haenszel odd ratio with confidence limits
                              Summary Risk Ratio with confidence limits
                              Fisher Exact

For Text, Numbers or Dates – Categorical Data – More than 2 Values per Variable
Purpose               cross tabulation RxC
Analysis command      TABLES (Risk Factor)(Outcome) optional (Stratifier(s))
Statistical method(s) Chi Square

For Numbers – Continuous Data, Single Variable
Purpose               express series of numbers as a single value
Analysis command      MEANS VariableName
Statistical method(s) Sum, Mean, Median, Mode, Percentiles, Standard Deviation, Variance

For Numbers – Continuous Data, Single Variable
Purpose                      test whether a series of differences (e.g. before and after) differs from zero
Analysis command             MEANS VariableName
Statistical method(s)        t-test

For Numbers – Continuous Data, Grouped by Another (Categorical) Variable
Purpose                          see if group values differ
Analysis command                 MEANS Number GroupVar
                                 Bartlett’s test for homogeneity of variances
Statistical method(s)            ANOVA
                                 (and Student’s t-test if two groups)
                                 Kruskal-Wallis Test
                                 (Mann-Whitney/Wilcoxon if two groups)

For Numbers – Continuous Data, Two Numeric Variables
Purpose             measure Correlation
Analysis command    REGRESS Outcome=Risk Factor
Statistical         Pearson Correlation Coefficient
method(s)           Mean, Beta coefficient, Lower and Upper Confidence Limits, Standard Error, F-Test, Y-
                    Intercept

For Numbers – Continuous Data, More Than Two Numeric Variables
Purpose             predict outcome from Risk Factors
Analysis            REGRESS Outcome=Risk Factor(s)
command
Statistical         Pearson Correlation Coefficient
method(s)           Mean, Beta coefficient, Lower and Upper Confidence Limits, Standard Error, Partial F-
                    Test, Y-Intercept


READing a View in Analysis
READ makes one or more views the active dataset. It also removes any previously active
datasets and associated DEFINED variables, and dataset-specific commands.
Syntax:
READ <Table specification> LINKNAME= <LinkTable Name>
FILESPEC HDR="NO" FMT="<File format>"
LINKNAME: The name of a link table in the current (home) MDB that constitutes a link to an
external file or data table.
FILESPEC: Additional information necessary to process the data table, which depends upon the
type of data being processed.
With EpiInfo we can read some different files as: Epi2000 Views, Access Table, Desktop
Databases, Microsoft Excel, ODBC databases, Text Files, HTML Files.
Click on the READ command. A dialog box appears so that you can choose a database and a
view.
Problem_3: Read the questionnaire that you create it.

LISTS
LIST does a line listing of the current dataset. If variable names are given, LIST will list only
these variables. LIST * will list all variables of all active records, using several pages across to
accommodate all the variables, if necessary
Click on the LIST command. In the dialog that appears, choose one or more variables or click
on ―ALL‖ to choose all. The variables are displayed in columns in a scrolling window. Click
the button with the "X" in the upper right corner of the Grid to leave the Grid. Try the LIST
command again, but choose HTML as the output format. This time the results appear in the
form of an Internet web page displayed in the small browser included with EpiInfo 2000. This
browser displays web pages on the local machine, but is not itself connected with the Internet.
Problem_4: List all the field from the questionnaire EXAMPLE.

FREQUENCIES
FREQ produces a table from the table(s) specified in the last READ statement, showing how
many records have each value of the variable. Confidence limits for each proportion are
included.
Syntax: FREQ <Variable>
Choose the Frequencies command. In the dialog box, use the dropdown menu to select one or
more variables, and then click OK. After a short wait, the results should appear in the browser
window. Scroll up and down and note that each table is accompanied by yellow bars to the right
that indicates the frequencies. Statistics will be displayed below the table if the value of the
variable is numeric, as in Cholesterol, but not for Yes/No fields like HYPERTENSION.
Example_1: FREQ ILL

                              ILL     Frequency   Percent   Cum Percent
                              Yes     46          61.3%     61.3%
                              No      29          38.7%     100.0%
                              Total   75          100.0%    100.0%
95% Conf Limits
                                        Yes   49.4%    72.4%
                                        No    27.6%    50.6%
In the table:
ILL shows values for the variable ILL. The representation of Yes and No is determined by a
SET command under Options, but the underlying data values are always 0 for No and 1 for Yes.
Frequency is the number of records in the dataset having the indicated value. In the example,
there are 46 people who said they were ill and 29 who were not ill.
Percent is the number of ill divided by the total (i.e., 46/75 or 61.3%).
Cum. Percent The cumulative percent cumulatively adds the percent column.
Total shows the total number of observations in table (in this example, 75) and the total percent,
which is always 100.0%
95% Confidence Limits are given for each proportion. If the 75 records had been chosen
randomly from a large number of interviews, we would predict with 95% confidence (i.e. be
wrong 5 out of 100 tries) that the number of ill in the larger population lies between 49.4% and
72.4%.

Problem_5: Using Frequencies command find the frequencies of HYPERTENSION if you
use like weight CHOLESTEROL.

TABLES
TABLES does a cross-tabulation of the specified variables and sends the table to the screen,
printer, or other current output. Values of the first variable will appear on the left margin of the
table, and those of the second variable will be across the top of the table. Normally cells contain
counts of records matching the values in the corresponding marginal labels.
Example_2: Yes/No or True/False Variables and 2x2 Tables
In epidemiology, 2x2 tables are frequently used. In these tables, there is usually an ―exposure‖
variable that has two levels (e.g. exposed vs. not exposed) and a dichotomous (2-value) outcome
variable (e.g. the person had the disease or outcome of interest or they did not; see Table 1).
With 2x2 tables, the odds ratio (OR), risk ratio (RR) and other parameters can be calculated.
To get the correct estimates for these parameters in EpiInfo, the table must be set up as follows:
Table 1. Standard table setup and notation for EpiInfo (count data and unmatched data)
                                         Disease
                            Exposed      Yes         No           Total
                            Yes          a           b            n1
                            No           c           d            n0
                            Total        m1          m0           n


The ―exposure‖ is the row variable, and the ―disease‖ or ―outcome‖ is the column variable. The
cases with the exposure of interest should be in the first row (in this example ―Yes‖) and those
without the exposure in the second row. For the ―disease‖ or ―outcome‖ variable, those with the
outcome of interest should be in the first column (in this example ―Yes‖) and those without the
disease in the second column.
Parameters, such as the risk ratio and odds ratio, are calculated regardless of whether the table is
set up correctly or incorrectly. The computer program relies on the user to assure that the
information is provided correctly, as depicted in Table 1. What happens if the table is set up
incorrectly? There are eight possible ways to mix up a 2x2 table. Only two possible odds ratios
can be calculated, the true OR (1.75) or its inverse (0.57 or 1/1.75). There are eight different
possible ―risk ratios‖ that can be calculated, of which only one (RR=1.24) is correct.
In the output below the exposure is a dichotomous variable, CAT (catecholamine level); those
exposed have high CAT levels (1) and the nonexposed have low CAT levels (0). The outcome is
coronary heart disease (CHD); those who had a coronary event are 1 and those without CHD are
0. The first and last three records of the data for CAT and CHD are depicted below.
 Evans County data listing for CAT and CHD
Current View             C:\data\Data.mdb : viewKkmyn
No Of Records            609               Date                  02/22/2002 4:51:35 PM

                                      ID           CAT    CHD
                                      21           0      0
                                      31           0      0
                                      51           1      1
                                       ...          ...    ...
                                      19091        0      0
                                      19121        0      0
                                      19161        0      0
The output from the TABLES command for a 2x2 table is provided in three sections. The first
section provides the table, the second the parameter estimates, and the third the statistical tests.

TABLES CAT CHD
First section, the Table. We continue with the above example, showing the TABLES command
for a 2x2 Table:
Current View      C:\data\Sample.mdb : viewEvansCounty
No Of Records     609            Date          02/24/2002 8:26:25 PM

CAT by CHD
CHD
                                      CAT       YES      No    Total
                                      Yes       27       95    122
                                      No        44       443   487
                                      Total     71       538   609

Single Table Analysis
                                              Point               95% Confidence Interval
                                              Estimate            Lower            Upper
     PARAMETERS: Odds-based
     Odds Ratio (cross product)               2.8615              1.6878           4.8514 (T)
     Odds Ratio (MLE)                         2.8554              1.6690           4.8350 (M)
                                                                  1.6148           4.9853 (F)
     PARAMETERS: Risk-based
     Risk Ratio (RR)                          2.4495              1.5837           3.7887 (T)
     Risk Difference (RD)                     13.0962             5.3021           20.8903 (T)

     (T=Taylor series; C=Cornfield; M=Mid-P; F=Fisher Exact)

     STATISTICAL TESTS                        Chi-square          1-tailed p       2-tailed p
     Chi square - uncorrected                 16.2465                              0.0000567826
     Chi square - Mantel-Haenszel             16.2198                              0.0000575712
     Chi square - corrected (Yates)           14.9998                              0.0001086935
     Mid-p exact                                                  0.0000910000
     Fisher exact                                                 0.0001400000

When the table is set up correctly for EpiInfo (as depicted in Table 1), the row proportions have
epidemiologic meaning. In a cohort study, in the first row, the row percentage in the first
column is the ―risk‖ of disease among the exposed; in the second row, the risk in the unexposed.
If the data are from an outbreak of acute disease, these are sometimes referred to as ―attack
rates.‖ If the data are based on prevalence information, the row proportions are the prevalence of
disease in the exposed and unexposed.
Interpretation of Statistical Tests
All of the p-values in the example are < 0.001, indicating a statistically significant association
between CAT and CHD in the study population. One question is the use of one-tailed vs. two-
tailed p-values. Many authors argue that two-tailed p-values are appropriate in the majority of
situations. Which of the two-sided p-values should be used? When the cell sizes are reasonably
large, all of the p-values will be similar; when the data are sparse, the exact p-values, with the
exact mid-p p-values, seem to be the most frequently recommended. Prior to performing
analyses, the user should decide which p-value(s) to use to determine statistical significance.
Occasionally one method will have a statistically significant p-value and others may not.

Problem_6: Click the TABLES command. In the EXPOSURE VARIABLE field, choose
SEX and for the OUTCOME VARIABLE, choose HYPERTENSION. This will perform a
cross-tabulation of SEX by HYPERTENSION.
MEANS
MEANS <Variable1> [<Variable2>]
 <Variable1>: A numeric variable to be used to calculate means (or * for all numeric variables)
<Variable2>: Any variable used for cross-tabulation (optional, or * for all numeric variables)
The MEANS command has two formats. If only one variable is supplied, the program produces
a table like that produced by FREQuencies, plus descriptive statistics.
If two variables are supplied, the first is a numeric variable containing data to be analyzed and
the second is a variable that indicates how groups will be distinguished. The output of this
format is a table like that produced by TABLES, plus descriptive statistics of the numeric
variable for each value of the group variable.
MEANS produces the following statistical tests:
     Parametric tests
            o ANOVA (for two or more samples)
            o Student's t-test (for two samples)
     Non-parametric tests
            o Kruskal-Wallis one-way analysis of variance (for two or more samples)
            o Mann-Whitney U Test = Wilcoxon Rank Sum Test (for two samples)
            o Further details are given in the chapter on Statistics.
Example_3: For A Single Numeric Variable — the MEANS Command
The MEANS command is used when the variable of interest is numeric and measured on a
continuous scale. A continuous variable can have decimal values (real numbers like 44.645) or
integer values (44). In some ways, AGE can be considered either categorical (with 1-year
categories) or continuous, but to use the MEANS command the Mean or Average value of the
values must be of interest. The Mean AGE of one or more groups of people is useful
information, whereas the Mean of the numeric codes for countries of the world would usually be
of no interest, even though both sets of data might contain numeric values.
Let’s examine the output from a request for MEANS AGE in the viewOSWEGO dataset.
Current View    D:\EPI2000\Sample.Mdb : viewOswego
No Of Records   74           Date          10/28/1999 9:31:38 AM
MEANS of age
                               AGE     Frequency   Percent   Cum Percent
                             3         1           1.4%      1.4%
                             7         2           2.7%      4.1%
                             8         2           2.7%      6.8%
                             9         1           1.4%      8.1%
                             10        1           1.4%      9.5%
                             11        4           5.4%      14.9%
                             12        1           1.4%      16.2%
                             13        2           2.7%      18.9%
                             14        1           1.4%      20.3%
                             15        3           4.1%      24.3%
                             16        1           1.4%      25.7%
                             .------   -----.      -----.    -----.
                             18        1           1.4%      31.1%
                             70        1           1.4%      95.9%
                             72        1           1.4%      97.3%
                             74        1           1.4%      98.6%
                             77        1           1.4%      100.0%
                             Total     74          100.0%    100.0%

Total       Sum Mean        Variance         Std Dev      Std Err
74          2744 37.0811    461.0344         21.4717      2.4960
T statistic=14.8560               df=73      p-value=0.0000
Example_4: Comparing a Numeric Variable Across Groups – The MEANS Command
with a Second Variable
The MEANS command can compare mean values of a variable between groups of records. The
numeric variable of interest, AGE, for example, is processed as for a single variable, but another
categorical or ―group‖ variable (such as ill/not ill) is used to divide the records into groups for
comparison. MEANS AGE ILL, for example, compares the ages of the ill and well persons and
provides statistics to evaluate whether or not there is really a difference.
If there are only two groups, the equivalent of an independent t-test is performed. If there are
more than two groups, then a one-way analysis of variance (ANOVA) is computed. ―One way‖
means that there is only one grouping variable (in the above example: ill/not ill). If there were
two grouping variables, such as ill/not ill and sex, then that would be a two-way ANOVA, which
EpiInfo does not perform. The one-way ANOVA can be thought of as an extension of the
independent t-test to more than two groups. Because the ANOVA test requires some
assumptions about the data and the underlying population, another test (Kruskal-Wallis, also
known as the Mann Whitney/Wilcoxon test if there are only two groups) is also provided. This
is a non-parametric test, meaning that it does not require assumptions about the underlying
population. Non-parametric tests are more conservative in detecting a statistically significant
difference, but a result that is ―significant‖ in the non-parametric test will also be so in the
ANOVA test.
With a grouping variable, the MEANS command has the form:
MEANS <continuous var> <grouping var>
 The output is provided in 5 different sections:
    1. A table of the two variables with the continuous variable forming the rows and the
     grouping variable forming the columns.
    2. Descriptive information of the continuous variable by each group: number of
     observations, mean, variance, and standard deviation; minimum and maximum values; the
     25th, 50th (median), and 75th percentiles; and the mode.
    3. An Analysis of Variance (ANOVA) table and a p-value for whether or not the
     means are equal.
    4. A test to determine whether the variances in each group are similar (Bartlett's test for
     homogeneity of variance).
    5. A non-parametric equivalent to the independent t-test and one-way ANOVA.
Using the Oswego dataset, let’s examine the ages of ill and well persons in the outbreak.
MEANS AGE ILL
                                             ILL
                                   AGE       yes    no     TOTAL
                                   3         1      0      1
                                   7         1      1      2
                                   8         2      0      2
                                   etc.      etc.   etc.   etc.
                                   72        1      0      1
                                   74        1      0      1
                                   76        1      0      1
                                   TOTAL     46     29     75

Descriptive Statistics for Each Value of Crosstab Variable
                               Obs Total      Mean    Variance Std Dev
                          yes 46   1806.0000 39.2609 477.2638 21.8464
                          no 29    955.0000   32.9310 423.7094 20.5842
                        Minimum 25%        Median 75%       Maximum Mode
                  yes   3.0000    17.0000 38.5000 59.0000 77.0000      15.0000
                  no    7.0000    14.0000 35.0000 50.0000 69.0000      11.0000
ANOVA, a Parametric Test for Inequality of Population Means
(For normally distributed data only)
                           Variation     SS           df    MS           F statistic
                           Between       712.6550     1     712.6550     1.5604
                           Within        33340.7316   73    456.7224
                           total         34053.3867   74
P-value = 0.2156

Bartlett's Test for Inequality of Population Variances
                          Bartlett's chi square=   0.1193   df=1     P value=0.7298
A small p-value (less than 0.05) suggests that the variances are not homogeneous and that the
ANOVA may not be appropriate.
Mann-Whitney/Wilcoxon Two-Sample Test (Kruskal-Wallis test for two groups)
                          Kruskal-Wallis H (equivalent to Chi square) =      1.1612
                          Degrees of freedom =                               1
                          P value =                                          0.2812


Problem_7: Click the MEANS command. In the Means of field, choose a numeric variable as
SystolicBloodPressure and for the Cross-tabulate by value of, choose Hypertension.
The program will perform:
    Descriptive Statistics for Each Value of Crosstab Variable
    ANOVA, a Parametric Test for Inequality of Population Means
    Mann-Whitney/Wilcoxon Two-Sample Test (Kruskal-Wallis test for two groups)

REGRESS
REGRESS can be used for simple linear regression (only one independent variable), for multiple
linear regression (more than one independent variable), and for quantifying the relationship
between two continuous variables (correlation). Regression is used when the primary interest is
to predict one dependent variable (y) from one or more independent variables (x1, ... xk).
The correlation coefficient or r (sometimes referred to as the Pearson correlation coefficient) is a
useful measure of how two continuous variables are related. If the correlation is greater than 0,
the variables are positively correlated; as x increases, y also increases. If the correlation is less
than 0, the variables are negatively correlated; as x increases, y decreases. If the correlation is
exactly 0, then the variables are uncorrelated. The correlation coefficient can vary between +1
and -1. For positive correlations (r > 0), the closer to +1 the stronger the correlation; for negative
correlations (r < 0), the closer to -1 the stronger the correlation.
If the data are ordinal or far from normal, significance tests based on the Pearson correlation
coefficient are not valid and a non-parametric equivalent to Pearson’s should be used.
Example_5: Simple Linear Regression
Current View:   D:\EPI2000\Sample.Mdb:viewEstriolAndBirthweight
Record Count:   31 Deleted         Excluded Date: 09/118/200117:14
                    Records:                        AM
REGRESS BIRTHWEIGHT = ESTRIOL

Birthweight = Estriol
Correlation Coefficient: r^2=0.37
                     Source         df   Sum of Squares     Mean Square        F-statistic
                     Regression     1    248.421            248.421            16.811
                     Residuals      29   428.547            14.777
                     Total          30   676.968

                        Variable     Coefficient    Std Error      F-test    P-Value
                        Intercept    21.536         2.636          66.7390   0.000000
                        Estriol    0.606       0.148       16.8108   0.000076

Correlation coefficient The Pearson correlation coefficient, or ―r‖. In this example, the
correlation is 0.61, indicating a relatively strong positive correlation between estriol and
birthweight. With only a single independent variable, the correlation = square root (R2).
r^2 Sometimes represented as r2 or R2, R squared.                             The R2 value =
Regression Sum of Squares / Total Sums of Squares (in the above example, 250.5745/674 =
0.37177). The R2 can be thought of as the proportion of variance of y (in this example,
birthweight) that can be explained by x (in this example, estriol). In this example, 37% of the
variance in birthweight can be explained by the women’s estriol levels. If R 2 = 1, then all of the
variability is explained, which would mean that all data points fall on the regression line. If R 2 =
0, then no variance is explained. A 95% confidence interval for the R2 value is also provided
(0.02, 0.64).
F-Statistic The F-statistic is the Regression Mean Square / Residual Mean Square (in the above
example, 250.5745/14.6009 = 17.16). The F-statistic is calculated to determine if the slope of the
regression line is significantly different from 0. EpiInfo does not provide the p-value.
Mean The average value for ESTRIOL; could also be determined with FREQ ESTRIOL.
Coefficient The slope in the line, sometimes referred to as the ―regression coefficient.‖ In this
example, 0.608 can be interpreted as: For every 1 unit increase in estriol (1 mg/24 hr), there is a
0.61 increase in each birthweight unit (g/100). Statistics concerning the slope are also provided:
the standard error (Std Error), 0.146812, and the 95% confidence interval (0.307921, 0.908460).
The interpretation would be that although we observed for every 1 unit increase in Estriol a 0.61
increase in birthweight, we are 95% confident that the ―true‖ slope would be captured between
0.31 and 0.91. (As above, the confidence interval type can be changed, for example to 90%,
with the command SET CONFIDENCE= 90.)

GRAPH
Numerous settings are available in the GRAPH module, and these can be saved as a Graph
Template. When a Graph Template is referred to by name, the settings are taken from this
template. If explicit settings are given as above, they override the settings in the Graph
Template.
EpiInfo 2000 can perform the next type of graph: Line, Bar, Horizontal Bar, Histogram,
Mark, Pie, Ares, Pareto, Scatter, Scatter Line, Doughnut, Surface, Polar, Cube.
Y Axis titles and scale can be defined using YTITLE, YRANGE and YTICK. LEGEND can
be used to specify labels for the legend; if not specified, either the variable name or prompt will
be used, depending on the PROMPTS setting.
X Axis titles and the scale can be defined using XTITLE, XRANGE, and XTICK. XLABELS
can be used to specify labels for each value of the X axis; if not specified, the value will be used.
XORIENT can be used to specify the direction in which the X axis labels will be displayed.
Problem_8: Click the GRAPH command. In a Graph Type field chose Pie, as Title: ―Sex
distribution‖ and as Y Variable chose Sex. Use a 3D diagram. Your diagram will appear as
below:
Problem_9: Click the GRAPH command. In a Graph Type field chose Scatter, Y Variable:
Cholesterol, X Variable: Weight, Title: ―Correlation between Weight and Cholesterol‖ Your
diagram will appear as below:




Problem_10
a. Create a questionnaire about diabetes. This questionnaire will have the next fields: Id_patient,
Age, Sex, Profession, Weight, Height, SystolicBloodPressure, DiastolicBloodPressure,
SugarBloodLevel, and Diabetes. Chose for all these fields the correct type.
b. Introduce the data for 30 patients.

Problem_11
a. Create a questionnaire named HIV QUESTIONNAIRE about AIDS with the next fields:
Id_patient, Name, Age, Sex (Yes = Male, No = Female), Occupation, STD (sexually transmitted
diseases) in antecedents (Yes, No), Unprotected sexual intercourse (Yes/No), HIV Test (Yes =
Positive and No = Negative), T4-lymphocites, Stage of disease (
                       Laboratory stages   Stages of disease
                                           A               B              C
                       T4-lymphocytes/l   asymptomatic symptoms          symptoms
                                           infection       not yet AIDS   AIDS
                       more than 500       A1              B1             C1
                       200 to 499          A2              B2             C2
                       less than 200       A3              B3             C3
), Ill (Yes/No). Chose for all these fields the correct type.
b. Fill the field specified to the above for 40 patients.
c. Read the HIV QUESTIONNAIRE..
d. List the next field from the questionnaire: Age, Sex, Occupation, STD, Stage of disease and
Ill.
e. Using Frequencies command finds the frequencies of STD, Age, Sex, Occupation and HIV
test.
f. Click the TABLES command. In the EXPOSURE VARIABLE field, choose Unprotected
sexual intercourse and for the OUTCOME VARIABLE, choose T4-lymphocites. This will
perform a cross-tabulation of SEX by HYPERTENSION.
g. Click the MEANS command. In the Means of field, choose a numeric variable as T4-
lymphocites and for the Cross-tabulate by value of, choose Ill.
h. Click the GRAPH command. Create a histogram of Occupation and of Stage of disease.
e. Is there any correlation between age and T4-lymphocites?
Descriptive Statistics
Measures of Central Location
A. The Arithmetic Mean
The Arithmetic Mean is the sum of the all the observations divided by the number of
observation.
                                       1 n
                                    x   xi
In statistical terms is written as:    n i 1
                                                                              n

                                                                            x
The sign 
                                                                                    i
              (sigma) is referred to as a summation sign. The expression:    i 1


Is simply short way of writing quantity: 1
                                           x  x2  ......  xn .
Example_1:
If       x1 =2, x2 =5, x3 =-4
         3

        x
        i 1
               i    254  3
Then:

B. The Median
An alternative measure of central location, perhaps second in popularity to the arithmetic mean,
is the median, or sample median.
Suppose these are n observations in a sample. If these observations are order from smallest to
largest, then the median is defined as follow:
The median sample is:
    n 1
 x       
    2  if n is odd
            
  xn  xn 
         1 
  2     2 

       2       if n is even.
Example_2: Compute the sample median for the sample of birth weights of live-born infants
born at a private hospital in San Diego, California, during a 1-week period (g).
Solution: First, arrange the sample ascending order:
2069, 2581, 2759, 2834, 2838, 2841, 3031, 3101, 3200, 3245, 3248, 3260, 3265, 3314, 3323,
3484, 3541, 3069, 3649, 4146
Since n is even, sample median= average of the 10th and 11th largest observation
                                             3245  3248
                                   Median                  3246.5
                                                   2
Observations:
If the distribution is symmetric, then the relative position of the points on each side of the sample
median will be the same.
If the distribution is positively skewed (or skewed to the right), the points above the median will
tend to be farther from the median in absolute value than the points below the median.
If the distribution is negatively skewed (or skewed to the left), the points below the median will
tend to be farther from the median in absolute value that the points above the median.

C. The Mode
The mode is the most frequently occurring value among all observation in a sample.
Example_3: Consider the sample of time intervals between successive menstrual periods of a
group of 500 women aged 18-21, as shown in the table. The frequency column gives the number
of women who reported each of the respective.
value      frequency      value         frequency          value         frequency
24         5              29            96                 34            7
25         10             30            63                 35            3
26         28             31            24                 36            2
27         64             32            9                  37            1
28         185            33            2                  38            1
28 days is the mode because it is the most frequently occurring value.
Some distributions have more than one mode. In fact, one useful method of classifying
distributions is the number of modes present. A distribution with one mode is referred to as
unimodal; two modes, bimodal; three mode, trimodal; and so forth.
Example_4: Total/HDL cholesterol (mmol/L) and weight (kg) for 10 patients are:
                    Cholesterol   6.8   5.3   4.3    5.0    7.1    5.5   3.8   4.6   4.0   6.0
                    Weight        90    75    70     73     110    67    60    65    59    80
Compute the arithmetic mean and median for cholesterol and weight.
Solution:
a. The mean of cholesterol levels is given by:
      6.8  5.3  4.3  5.0  7.1  5.5  3.8  4.6  4.0  6.0
xc                                                              5.24
                                  10
The mean of weight:
      90  75  70  73  110  67  60  65  59  80
xw                                                         74.9
                               10
b. First we order the cholesterol level from smallest to largest:
3.8 4.0 4.3 4.6 5.0 5.3 5.5 6.0 6.8 7.1
                                                The median of cholesterol is given by:
            
 xn  xn 
         1    x  x6 5.0  5.3
 2      2 
                5                    5.15
      2            2            2
First we order the cholesterol level from smallest to largest:
59 60 65 67 70 73 75 80 90 110
The median of cholesterol is given by:

           
 xn  xn 
        1 
               x  x6 70  73
 2     2 
               5             71.5
      2           2      2                          Measures of Spread
The Range
The range is the difference between the largest and the smallest observation in a sample.
The range in the sample of birth weights of live-born infants born at a private hospital in San
Diego, California, during a 1-week period (g) is: 4146-2069=2077g

Quantiles
Another approach that addresses some of the shortcomings of the range in quantifying the spread
in a data set is the use of quantiles or percentiles.
The pth percentile is the value Vp such that p% of the sample points are less than or equal to Vp.
The median, being the 50th percentile, is a special case of quantile.

The pth percentile is defined as by:
                                     np
              th
The (k+1) largest sample point if 100 is not an integer; k=the largest integer less than np/100
                                                                          np
The average of the (np/100)th and (np/100+1)th largest observations if 100 is an integer.
The spread of distribution can be characterized by specifying several percentiles. For examples,
the 10th and 90th percentiles are often used to characterize spread. Percentiles have the
advantages over the range of being less sensitive to outliers and of not being much affected by
the sample size (n).
Example_5: Compute the 10th and 90th percentile for the birth weight data.
Solution: Since 20 x 0.1=2 and 20 x 0.9=18 are integers, the 10th and 90th percentiles are
defined by:
10th percentiles: average of the 2nd and the 3rd largest value = (2581+2759)/2 = 2670g
90th percentiles: average of the 18th and 19th largest values = (3609+3649)/2 = 3629g.
There is no limit to the number of percentiles that can be computed. Frequently used percentiles
are quartiles (25th, 50th, and 75th percentiles), quintiles (20th, 40th, 60th, and 80th percentiles),
and deciles (10th, 20th, ….. 90th percentiles).

The Variance and Standard Deviation
The sample variance, or variance, is defined as follows:

                                                             x              
                                                              n
                                                                               2
                                                                     i   x
                                                   s2       i 1

                                                n 1
The sample standard deviation, or standard deviation, is defined as follows:

                                        x             
                                        n
                                                        2
                                               i   x
                              s       i 1
                                                               sample var iance
                                              n 1

Example_6: The white blood counts taken on admission of all patients entering a small hospital
in Pennsylvania, on a given day, are:
                                                    i       xi      i     xi
                                                    1       7       6     3
                                                    2       35      7     10
                                                    3       5       8     13
                                                    4       9       9     8
                                                    5       8       10    12
Compute the variance of the white blood count:
Solution: First we must to compute the arithmetic mean:
    x  x x  x3  x 4  x5  x6  x7  x8  x9  x10
x 1                                                  
                            10
  7  35  5  9  8  3  10  13  8  12
                                            11
                     10
Second, we compute the variance:

        x             
        n
                        2
               i   x
s2    i 1

              n 1
s2 
       1
     10  1
                                                                              
            x1  x  x2  x  x3  x  x4  x  x5  x  x6  x  x7  x  x8  x  x9  x  x10  x

     1
     9
       
s 2  7  11  35  11  5  11  9  11  8  11  3  11  10  11  13  11  8  11  12  11 s 
               2          2          2          2          2          2           2           2          2
                                                                                                                    2 2  720
                                                                                                                           9
                                                                                                                               80
And now, we can compute de standard deviation:

            x             
            n
                            2
                   i   x
s         i 1
                                 80  8.94
                  n 1

The coefficient of Variation is given by:
                                                  s
                                                     100  CV 
                                                  x
Example_7: Compute the coefficient of variance from the Example_5.
Solution: The coefficient of variance of the last example is:
      8.94
CV         100  81.27
       11
The coefficient of variance is most useful in comparing variability of several different samples,
each with different arithmetic means, This is because a higher variability is usually expected
when the mean increase, and CV is a measure that accounts for this variability.

Correlation indices
Sum of deviation products (SPE) is given by:
                                                            n
                                                   SPE   ( xi  x)  ( y i  y )
                                                           I 1
Covariance is given by:
                                                                1 n
                                          COV ( x, y )            ( xi  x )  ( y i  y )
                                                                n I 1
Coefficient of correlation is given by:
                                                   COV ( x, y )
                                                           r
                                                     Sx  Sy
a. correlation from -0.25 to +0.25 = little or no relationship
b. correlation from 0.25 to 0.50 (or – 0.25 to – 0.50) = an acceptable degree of association
c. correlation from 0.50 to 0.75 (or – 0.50 to – 0.75) = a moderate to good association
b. correlation upper than 0.75 (or lower than – 0.75) = a very good association.
Coefficient of determination is given by:
d  r2
Example_8: For the statistical data Example_4 compute the covariance between cholesterol and
weight.
Solution:
The covariance is given by:
                1 n
COV ( x, y )   ( xi  x)  ( y i  y )
                n I 1
              1
COV ( x, y)  [23.5  0.006  4.6  0.456  64.585  2.054  21.456  6.336  19.71  3.876]
             10
                142.471
COV ( x, y)             14.271
                   10

Grouped Data
Sometimes the sample size is prohibitively large to display all the raw data. Also, data are
frequently collected in grouped form, since the required degree of accuracy to specify a
measured quantity exactly is often lacking, because of either measurement error or imprecise
patient call.
A frequency distribution is an ordered display of each value in a data set together with its
frequency, that is, the number of times that value occurs in the data set. In addition, the
percentage of sample points that take on a particular value is also typically given.
We work with: frequency, cumulative frequency (CUM FREQ), relative frequency (PERCENT),
and cumulative percent (CUM PERCE). Cumulative frequency (CUM FREQ) is the number in
the sample that are less than or equal to a specific number x.
                   FREQUENCY
                                    x100
The PERCENT =              n              , while the cumulative percent,
                    CUM FREQ
                                   x100
CUM PERCENT=               n            .

Some general instructions for categorizing the data are:
Subdivide the data into k intervals, starting at some lower bound y1 and ending at some upper
bound yk+1.
The first interval is from y1 inclusive to y2 exclusive; the second interval is from y2 inclusive to
y3 exclusive; ….. the kth interval is from yk inclusive to yk+1 exclusive. The rationale for this
representation is to make certain that the group intervals include all possible values and do not
overlap.
The group intervals are generally chosen to be equal, although the appropriateness of equal
group sizes should be dictated more by by subject-matter considerations.
A count is made of the number of units that fall in each interval, which is denoted by the
frequency within that interval.
The midpoint of each group interval is computed for calculation of descriptive statistics. The
                                                    y  y2
                                              m1  1
midpoint of the first interval is denoted by:         2 , the midpoint of the second interval by:
       y  y3                                                   y  y k 1
m2  2                                                    mk  k
          2 , and the midpoint of the last interval by:             2      .
Finally, for the purpose of computing descriptive statistics, the group intervals and their
midpoints, mi , and frequencies, fi, are displayed concisely in a table such the next table:
                       Group interval    Midpoint of group interval   Frequency
                       >= y1, < y2       m1                           f1
                       >= y2, < y3       m2                           f2
                       >= yi, < yi+1     mi                           fi
                       ……….              …………….                       ……………….
                       …………              …………..                       ……………..
                       >= yk, < yk+1     mk                           fk

Example_8:
                   Birthwt   Frequency     CUM FREQ       PERCENT      CUM PERCENT
                   32        1             1              1.000        1.000
                   58        1             2              1.000        2.000
                   64        1             3              1.000        3.000
                   67        1             4              1.000        4.000
                   68        1             5              1.000        5.000
                   83        1             6              1.000        6.000
                   85        2             8              2.000        8.000
                   86        1             9              1.000        9.000
                   87        1             10             1.000        10.000
                   88        2             12             2.000        12.000
                          89           3              15                 3.000            15.000
                          91           1              16                 1.000            16.000
                          92           1              17                 1.000            17.000
                          93           1              18                 1.000            18.000
                          94           2              20                 2.000            20.000
                          95           1              21                 1.000            21.000
                          96           1              22                 1.000            22.000
                          98           3              25                 3.000            25.000
                          99           1              26                 1.000            26.000
                          100          1              27                 1.000            27.000
                          101          1              28                 1.000            28.000
                          102          1              29                 1.000            29.000
                          103          1              30                 1.000            30.000
                          104          5              35                 5.000            35.000
                          105          2              37                 2.000            37.000
                          106          1              38                 1.000            38.000
                          107          1              39                 1.000            39.000
                          108          4              43                 4.000            43.000
                          109          2              45                 2.000            45.000
                          110          2              47                 2.000            47.000

Example_9: Obtain the frequency class table for typical total/HDL cholesterol for the following
three interval classes: 3.0 to 4.9, 5.0 to 6., 6.1 to 7.0 and 7.1 to 8.9 using the data from the
Example_4. Draw the associated histogram.
Solution: The frequency class is:

                                      Interval     Frequency       CUM FREQ        PERCENTE
                                      3.0-4.9      4               4               40
                                      5.0-6.0      4               8               80
                                      6.1-7.0      1               9               90
                                      7.1-8.9      1               10              100
The histogram associated to this data is (Excel):

                                           Histogram
                5                                                                120,00%
    Frequency




                4                                                                100,00%
                                                                                 80,00%
                3
                                                                                 60,00%
                2
                                                                                 40,00%
                1                                                                20,00%
                0                                                                ,00%
                                                                                           Frequency
                    4,9         6,0         7,0            8,9      More
                                                                                           Cumulative
                                           Bin                                             %




Problems:
Problem_1: Sample of birth weights of live-born infants born at a private hospital in San
Diego, California, during a 1-week period (g).
                                       i    xi      i       xi      i     xi        i      xi
                                       1    3565    6       3323    11    2581      16     2759
                                       2    3260    7       3649    12    2841      17     3248
                                       3    3245    8       3200    13    3609      18     3314
                                       4    3484    9       3031    14    2838      19     3101
                                       5    4146    10      3069    15    3541      20     2834
Table ... Birth weight of live-born infants, hospital in San Diego, California.
Compute the arithmetic mean for that sample.

Problem_2:
The white blood counts taken on admission of all patients entering a small hospital in
Pennsylvania, on a given day, are:
                                         i    xi   i    xi
                                         1    7    6    3
                                         2    35   7    10
                                         3    5    8    13
                                         4    9    9    8
                                         5    8    10   12
Table ... Wight blood counts.
Compute the median white-blood count.

Problem_3: Table 2.12 comes from a paper giving the distribution of astigmatism in 133
young men, aged 18-22, who were accepted for military service in Great Britain. Assume that
astigmatism is rounded to the nearest 10th of a diopter.
                                Degree of astigmatism    Frequency
                                (diopters)
                                0.0 or less than 0.2     485
                                0.2-0.3                  268
                                0.4-0.5                  151
                                0.6-1.0                  79
                                1.1-2.0                  44
                                2.1-3.0                  19
                                3.1-4.0                  9
                                4.1-5.0                  3
                                5.1-6.0                  2
Table 2.1 Distribution of astigmatism in 1033 young men aged 18-22
a. Compute the grouped arithmetic mean (average).
b. Compute the grouped standard deviation.
c. Plot a histogram to properly illustrate these data.

Problem_4: The data in Table 2.13 are sample of cholesterol levels taken from 22 hospital
employees who were on a standard American diet and who agreed to adopt a vegetarian diet for
1 month. Serum-cholesterol measurements were made before adopting the diet and 1 month
after.

                         1    Subject   Age   Before    After   Before-After
                         2    1         45    195       146
                         3    2         25    145       155
                         4    3         36    205       178
                         5    4         85    159       146
                         6    5         45    244       208
                         7    6         69    166       147
                         8    7         57    250       202
                         9    8         51    236       215
                         10   9         42    192       184
                         11   10        31    224       208
                         12   11        26    238       206
                         13   12        59    197       169
                         14   13        76    169       182
                         15   14        84    158       127
                         16   15        52    151       149
                         17   16        43    197       178
                         18   17        61    180       161
                         19   18        50    222       187
                          20   19      42   168     176
                          21   20      49   168     145
                          22   21      58   167     154

Tabel 2.2 Serum-cholesterol levels before and after adopting a vegetarian diet.
a. Compute the mean change in cholesterol.
b. Compute the standard deviation of the change in cholesterol levels.
c. Compute the median change in cholesterol.
d. Compute the covariance and coefficient of correlation between age and cholesterol level after
adopting vegetarian diet.
Probability
Example_1: The probability of developing a new case of breast cancer in 30 years in 40 year-old
women who have never had breast cancer is approximately 1/11. This probability means that
over a large sample of 40-year-old women who never had breast cancer, approximately 1 in 11
will develop the disease over 30 years, with this proportion becoming increasingly close to 1 in
11 as the number of women sample increases.

Definition of probability
In referring to probabilities of events, an event is any set of outcomes of interest. The probability
of an event is the relative frequency of this set of outcomes over an indefinitely large number of
trials.
1. The probability of an event E, denoted by Pr(E), always satisfies 0  Pr(E )  1
2. If outcomes A and B are two events that cannot both happen at the same time, then Pr(A or B
occurs)= Pr(A) + Pr(B)

Example_2: Let A be the event that a person has normotensiv diastolic blood pressure (DBP)
readings (DBP< 90) and let be the event that a person has borderline DBP reading
( DBP  95 and  90 ). Suppose that Pr(A)= 0.7 and Pr(B)= 0.1. Let C be the event that a person
has DBP<95. Compute the Pr(C).
Solution: If the outcomes A and B are two events that cannot both happen at the same time, then
Pr(A or B occurs) = Pr(A)+Pr(B)
In our case: Pr(C) = 0.7+0.1 = 0.8
Two events A and B are mutually exclusive if they cannot both happen at the same time.
Example_3: Let x be DBP, C be the event that x  90 , and D be the event that 75  x  100 .
The event C and D are not mutually exclusive, since they both occur when 90  x  100

Some useful probabilistic notation
The symbol {} is used as short hard for the phrase "the event".
A B is the event that either A or B occurs or they both occur.
Figure 1 and 2 diagrammatically depicts A B both for the case where A and B are and are not
mutually exclusive.




             A


                                      B

  A B
Figure 1.Diagrammatic representation of A B : A, B are mutually exclusive
            A
                A
                                B


  A B
Figure 2. Diagrammatic representation of A B : A, B are not mutually exclusive
Example_4: Let the event A and B be define as in Example_2; that is A={x<90},
 B  {90  x  95} , where x=DBP. Then, A  B  {x  95}
 A B is the event that both A and B occur simultaneously. A B is depicted diagrammatically
in Figure 3.



                    A

                                            B




    A B
Figure 3. Diagrammatic representation of A B .

Example_5: Let C be the event that x  90 and D be the event that 75  x  100 .
Then,   C  D  {90  x  100}
A is the event that A does not occur. It is sometimes referred to as the complement of A.
Notice that Pr( A )=1 - Pr(A), since A occurs only when A does not occur.

Example_6: Let be the events: A  {x  90} and C  {x  90} . Then, C  A , since C can only
occur when A does not occur.

The multiplication law of probability
Two event A and B are referred to as independent events if
 Pr(A  B)  Pr(A)  Pr(B)
Example_7: Suppose we are conducting a hypertension-screening program in the home. We are
interested in whether the mother or father is hypertensive, which is described, respectively, by
the events A = { mother’s DBP  95 }, B= { father’s DBP  95 }.
Suppose we know that Pr(A) = 0.1, Pr(B) = 0.2. What can we say about
 Pr(A  B)  Pr(mother' s DBP  95 and father' s DBP  95) ?
Solution: If A and B are independent events, then:
Pr(A  B)  Pr(A) x Pr(B)  0.1x0.2  0.02

Example_8: Consider all possible diastolic blood pressure measurements from a mother and her
first child.
Let A  {mother ' s   DBP  95} and
 B  { first  born child ' s    DBP  95}
Suppose Pr(A  B)  0.05 , Pr(A) = 0.1 and Pr(B) = 0.2.

Then Pr( A  B )  0.05  Pr( A) x Pr( B)  0.02 and the events A, B would be dependent.


The Addition Law of Probability:
If A and B are any events, then
Pr(A  B)  Pr(A)  Pr(B)  Pr(A  B)
Example_9: Suppose two doctors, A and B, diagnose all patients coming into a clinic for
syphilis. Let the events A+ = {doctor A makes a positive diagnosis}, B+ = {doctor B make a
positive diagnosis}. Suppose that doctor A diagnoses 10% of all patients as positive, doctor B
diagnoses 17% of all patients as positive, and both doctors diagnose 8% of all patients as
positive. Suppose a patient is referred for further lab tests if either doctor A or B makes a
positive diagnosis. What is the probability that patients will be referred for further lab tests?
Pr(A+) = 0.1
Pr(B+) = 0.17
Pr( A   B  )  0.08
Therefore, from the addition law of probability,
Pr( A   B  )  Pr( A  )  Pr( B  )  Pr( A   B  )  0.1  0.17  0.08  0.19
Thus, 19% of all patients will refer for further lab tests.
Additional law of probability for independent events
If two events A and B are independent, then
                             Pr(A  B)  Pr(A)  Pr(B) x[1  Pr(A)]

Conditional probability
                                                       Pr( A  B)
                                        Pr( B / A) 
                                                         Pr( A)
If A and B are independent events, then Pr( B / A)  Pr( B)  Pr( B / A) .
If two events A, B                are    dependent,        then     Pr( B / A)  Pr( B)  Pr( B / A)   and
Pr(A  B)  Pr(A)  Pr(B).

                                                           Pr( B / A)
Relative risk (RR)            of B given A is: RR                    . If that two events A, B are
                                                           Pr( B / A)
independent, then the relative risk will be 1 and if that two events are dependent, then the relative
risk will be different from 1. The more the dependence between events increases, the further the
RR is from 1.
Example_10: Suppose that 1 person in 10000 from the people with negative skin tests has TB,
or Pr( B / A)  0.0001 and 1 person in 100 from those with positive skin test has TB, or
 PR( B / A)  0.01. Compute the relative risk.
                  Pr( B / A)   0.01
Solution: RR                        100 . That means; people with positive skin tests are 100
                  Pr( B / A) 0.0001
times as likely to have TB as those with negative skin tests.

Total probability rule
                                             k
                                  Pr( B)   Pr( B / Ai ) x Pr( Ai )
                                            i 1
Example_11: We are planning a 5-year study of cataract in a population of 5000 people 60 years
of age or older. We know from census data that 45% of these populations are ages 60-64, 28%
are ages 65-69, 20%are ages 70-74 and 7% are age 75 or older. We also know from a study that
2,4%, 4,6%, 8,8% and 15,3% of the people in those respective age groups will develop cataract
over the next 5 years. What percentage of our population will develop cataract over 5 years and
how many cataracts does this percentage represent?
Solution: Let A1 = {ages 60-64}, A2 ={ages 65-69}, A3 ={ages 70-74}, A2 ={ages 75+}. These
events are mutually exclusive and exhaustive, since exactly one event occur for each person in
our population. We also know that Pr(A1) = 0.45, Pr(A2) = 0.28, Pr(A3) = 0.2, Pr(A4) = 0.07,
Pr(B/A1) = 0.024, Pr(B/A2) =0.046, Pr(B/A3) = 0.088, Pr(B/A4) = 0.153.
Using the total probability rule,
Pr(B) = Pr(B/A1)x Pr(A1) + Pr(B/A2)x Pr(A2) + Pr(B/A3)x Pr(A3) + Pr(B/A4)x Pr(A4) =
0.024x0.45 + 0.046x0.28 + 0.088x0.2 + 0.153x0.07 = 0.052
Thus 5.2% of our population will develop cataract over the next 5 years, which represents a total
of 5000x0.52 = 260 persons with cataract.

Bayes’ rule and screening tests
Screening tests
The predictive value positive (PV+) of a screening test is the probability that a person has
disease given that the test is positive: Pr(disease/test+).
The predictive value negative (PV-) of a screening test is the probability that a person does not
have given that the test is negative: Pr(no disease/test-).
The sensitivity of a syndrome is the probability that the symptom is present given that the person
has disease.
The specificity is the probability that the symptom is not present given that the person does not
have disease.
Let A = {symptom} and B = {disease}.
Predictive value positive = PV+ = Pr(B/A)
Predictive value negative = PV- = Pr(non B/ non A)
Sensitivity = Pr(A/B)
Specificity = Pr(nonA/ nonB)

Bayes’ rule
Let A = {symptom} and B = {disease}. Pr(B) = prevalence of disease in the reference
population.
                                                    Pr( A / B) x Pr( B)
                     PV   Pr( B / A) 
                                         Pr( A / B) xPR ( B)  Pr( A / B) x Pr( B)
This can be written:
                                             Pr( B) xsensitivity
                     PV  
                            Pr( B) xsensitivity  (1  Pr( B)) x(1  specificity )
                                                (1  PR ( B)) xspecificity
                         PV  
                                  (1  Pr( B)) xspecificity  Pr( B) x(1  sensitivity )

Example_12: Suppose that 84% of a hypertensives and 23% of a normotensives are classified as
hypertensive by an automated blood pressure machine. What is the predictive value positive and
predictive value negative of the machine, assuming that 20% of the adult populations
hypertensive?
Solution: The sensitivity = 0.84 and specificity = 1-0.23 = 0.77. Pr(B) = 0.2
Thus, from Bayers’ rule is follows that
PV+ = 0.2x0.84/(0.2x0.84+0.8x0.23) = 0.48
PV- = 0.8x0.77/(0.8x0.77+0.2x0.16) = 0.95.

Generalized Bayers’ rule
Let B1, B2, ………, Bk be a set of mutual exclusive and exhaustive disease states, that is, at least
one disease state must occur and no two disease states can occur at the same time. Let A
represent the presence of a symptom or a set of symptoms. Then:
                                              Pr(A / Bi ) x Pr(Bi )
                               Pr(Bi / A)  k
                                             Pr(A / B j ) x Pr(B j )
                                                        j 1

Example_13: Suppose that a 60-year-old male who has never smoked cigarettes presents with
symptoms consisting of a chronic cough and occasional breathlessness to a physician. The
physician becomes concerned and order the patient admitted to a hospital for a lung biopsy.
Suppose that the results of the lung biopsy are consistent with either lung cancer or sarcoidosis, a
fairly common, nonfatal lung disease. In this case:
Symptom A = {chronic cough}
Disease state B1 = normal, B2 = lung cancer and B3 = sarcoidosis
Suppose that Pr(A/B1) = 0.001, Pr(A/B2) = 0.9 and Pr(A/B3) = 0.9
And that in 60-year-old, never-smoking males: Pr(B1) = 0.99, Pr(B2) = 0.001 and Pr(B3) = 0.009.
What are the probabilities Pr(Bi/A) of the three disease states given the previous symptom?
Solution: Bayes’ rule can be used to answer this question.
               Pr(A / B1 ) x Pr(B1 )                 0.001x0.99
Pr(B1 / A)  3                                                                0.099
                                         0.001x0.99  0.9 x0.001  0.9 x0.009
              Pr(A / B j ) x Pr(B j )
               j 1

                 Pr(A / B2 ) x Pr(B2 )                       0.9 x0.001
Pr(B2 / A)                                                                           0.09
                3
                                                 0.001x0.99  0.9 x0.001  0.9 x0.009
                Pr(A / B ) x Pr(B )
               j 1
                            j            j


                 Pr(A / B3 ) x Pr(B3 )                      00.9 x0.009
Pr(B3 / A)                                                                           0.811
                3
                                                 0.001x0.99  0.9 x0.001  0.9 x0.009
                Pr(A / B ) x Pr(B )
               j 1
                            j        j


Thus, although the unconditional probability of sarcoidosis is very low, the conditional
probability of disease given these symptoms and this age-sex-smoking group is the highest,
equal with 81%.

Discrete probability distributions
Random Variables
A random variable is a numerical quantity that takes different values with specified probabilities.
Two types of random variables are discussed in this text: discret and continuous.
A random variable for which there exists a discret set of value with specified probabilities is a
discret random variable: numbers of hospitalization, number of children …
A random variable whose values from a continuum is a continuous random variable: age,
glycemia, biological parameters, blood pressure…
Probability mass function is a mathematical relationship that assigns to any possibilities
value r of a discret random variable X the probability Pr(X=r).
Example_14: many new drugs have been introduced in the last decade to bring hypertension
under control to reduce high blood pressure to normotensiv levels. A physician agrees to use a
new antihypertensive drug on a trial basis on the first untreated hypertensives patients. From the
previous experience with the drug, the drug company expects that for any clinical practice the
probability that 0 patients out of 4 will be bought under control in 0.008, 1 patients out of 4 will
be bought under control in 0.076, 2 patients out of 4 is 0.265, 3 patients out of 4 is 0.0.411 and
all 4 patients out of 4 is 0.240. This probability mass function, or probability distribution, is
displayed in the next table:
                                  Pr(X=r)   0.008   0.076   0.265   0.411   0.240
                                  r         0       1       2       3       4
The probability of any particular value must be between 0 and 1 and the sum of the probabilities
of all values must exactly equal. Thus, 0<Pr(X=r)  1 and  Pr(X  r )  1.
For our example: 0.08+0.076+0.265+0.411+0.240=1

The expected value of a discrete random variable
The expected value for a discrete random variable id defined as:
                                                      k
                                            E ( X )   xi Pr( X  xi )
                                                     i 1


Example_15: Find the expected value for the random variable depicted in Example_1.
Solution: E(X)=0(0.008)+1(0.076)+2(0.265)+3(0.411)+4(0.240)=2.80
Thus, on average about 2.8 hypertensives would be expected to be brought under control for
every 4 that are treated.

Example_16: Consider the random variable that has the probability mass function mention in the
next table representing the number of episodes of otitis media in the first 2 years of life:
                        r           0       1       2       3       4       5       6
                        Pr(X=r)     0.129   0.264   0.271   0.185   0.095   0.039   0.017
a. What is the expected number of episodes of otitis media in the first 2 years of life?
b. Compute the variance and standard deviation for the random variable.
Solution:
                  n
a. M ( X )   x1 Pr( xi )
                 i 1
M(X)=0×0.129+1×0.264+2×0.271+3×0.185+4×0.095+5×0.039+6×0.017
M(X)=2.038
Thus, on the average a child would be expected to have 2 episodes of otitis media in the first 2
years of life.
b. The variance is given by:
           n
V ( X )   [ xi  M ( X )] 2  Pr( xi )
          i 1

V(X) = 1.967
The standard deviation is given by:
 = 1,967  1.402
Permutation and combination
The number of permutation of n things taken k at a time is
n   Pk  n(n  1)  .......... .  (n  k  1)
It represent the number of ways of selecting k item out of n, where the order of selection is
important.
Example_17: Suppose there are 3 female schizophrenics aged 50-59 and 6 eligible controls
living in the same community. How many ways are there of selecting three controls?
Solution: To answer this question, consider the number of permutation of 6 things taken 3 at
time:
6   P3  6  5  4  120
Thus there are 120 ways of choosing the controls. For example, one ways would be to mach
control A to case 1, control B to case 2 and control C to case 3. Another way would be match
control F to case 1, control C to case 2 and control D to case 3.
n! = n factorial and is defined as
n!=n(n-1)×…. ×2×1
Example_18: Evaluate 5!
Solution: 5!=5×4×3×2×1= 120
The quantity of 0! Has no intuitive meaning, but for consistency it will be defined as 1.
The number of combination of n things taken by k at the time:
            n!
n Ck 
        k!(n  k )1

Example_19: Evaluate 7 C 3
                 765
Solution: 7 C3            7  5  35
                 3  2 1

The Binomial Distribution
All examples involving the binomial distribution have common structure: a sample of n
independent trials, each of which can have only two possible outcomes, which are denoted as
"success" and "failure". The probability of success at each trial is assumed to be constant p, and
hence the probability of a failure at each trials is 1-p=q.

Example_20: What is the probability of obtaining 2 boys out of 5 children if the probability of a
boy is 0.51 at each birth and sexes of successive children are considered independent random
variables?
Solution: Using a binomial distribution n=5, p=0.15, k=2
                                             5 4
Compute Pr(X  2) 5 C 2 (0.51) 2 (0.49) 2       (0.51) 2 (0.49) 2  0.306
                                             2 1

Example_21: We know that children develop chronic bronchitis in the first year of life in 3 out
of 20 household where both parents are chronic bronchitics, as compared with the national
incidence rate of chronic bronchitis, which is 5% in the first year of life. Is this difference real or
can it attributed to chance? Specifically, how likely are infants in at least 3 out of 20 households
to develop chronic bronchitis if the probability of developing disease in any one household is
0.05?
Solution: n=20, p=0.05
The probability of observing 3 cases out of 20 with disease is given by:
                                 20
                                      20                            2
                                                                           20
                   Pr(X  3)          (0.05) K (0.95) 20 K  1     (0.05) K (0.95) 20 K 
                                 K                                      
                                 K 3                              K  0 K 
= 1 - (0.3585 + 0.3774 + 0.1887) = 0.0754
                                    3
The theoretical value is given by:    = 0.15
                                   20
There is a evident difference between the theoretical value and the real value : 15% compared
with 7.54%.

Problems
Problem_1 Consider a family with a mother, father ant two children.
Let
A1 = {mother has influenza}
A2 = {father has influenza}
A3 = {first child has influenza}
A4 = {second child has influenza}
B = {at least one child has influenzae}
C = {at least one parent has influenzae}
D = {at least one person has influenzae}
a. What does A1  A2 means?
b. What does A1  A2 means?
c. Are A3 and A4 mutually exclusive?
d. What does A3  B means?
e. What does A3  B means?
f. Express C in terms of A1, A2, A3, A4.
g. Express D in term s of B and C.
h. What does A1 mean?
i. What does A2 mean?
j. Represent C in terms of A1, A2, A3, A4.
k. Represent D in terms of B and C.

Problem_2 A drug company is developing a new pregnancy-test kit for use on an outpatient
basis. The company uses the pregnancy test on 100 women who are known to be pregnant, of
whom 95 are positive using the test. The company uses the pregnancy test on 100 other women
who are known to not be pregnant, of whom 99 are negative using the test.
a. What is sensitivity of the test?
b. What is the specificity of the test?
The company anticipates that of the women who will use the pregnancy test kit, 10% will
actually be pregnant.
c. What is the predictive value positive for the test?

Problem_3 We can classify infants as low birthweight if they have birthweight  2500 g and
as normal birthweight if they have  2500 g. Infants can be classified are also classified by
length of gestation in the following four categories: < 20 weeks, 20 - 27 weeks, 28-36 weeks, >
36 weeks. Assume that the probabilities of the different periods of gestation are as given in the
next table:

                                 Length of gestation   Probability
                                 < 20 weeks            0.0004
                                 20 - 27 weeks         0.0059
                                 28 - 36 weeks         0.0855
                                 > 36 weeks            0.9082
Also assume that the probability of being low birthweight given that the length of gestation is <
20 weeks is 0.540, the probability of being low birthweight given that the length of gestation is
20 - 27 weeks in 0.813, the probability of being low birthweight given that the length of
gestation is 28 - 36 weeks in 0. 0.379 and the probability of being low birthweight given that the
length of gestation is > 36 weeks in 0.0.35.
a. What is the probability of having a low birthweight infant?
b. Show that the events (length of gestation  27 weeks) and (low birthweight) are not
independent.
c. What is the probability of having a length of gestation  36 weeks given that a child is low
birthweight?

Problem_4 Evaluate the probability of 2 lymphocytes our of 10 white blood cells if the
probability that any one cell is a lymphocyte is 0.2.

Problem_5 Evaluate the probabilities of obtaining k neutrophils out of 5 cells for k=0, 1, 2, 3,
4, 5 where the probability that any one cell is a neutrophil is 0.6.

Problema_6 In a sample of 110 persons, we have 50 men and from them 10 have with RH -.
From the women, 8 of them have RH-.
a. Which is the probability that a person with RH- from the sample to be a men with RH-.
b. Which is the probability that a person from the sample to have RH+.
c. Which is the probability that from 4 person from the sample 1 to have RH-.
Estimation
Estimation of the Mean of a Distribution
Point estimation of the Mean
A natural estimator to use for estimation the population mean µ is the sample mean:
                                                   n
                                                      x
                                             x i
                                                 i 1 n
Example_1: The birthweights from 1000 consecutive deliveries at Boston City Hospital are
enumerated in to the next table.
                                                      Sample
                           Individual     1           2     3                4      5
                           1              2750        5018 2750              2863   3884
                           2              3317        5613 3544              3232   3345
                           3              3969        3033 1758              2240   2211
                           4              2211        2807 3402              3402   3657
                           5              2807        2948 3742              3260   2466
                           6              4196        3430 3827              3317   3119
                           7              3062        4196 3345              3005   3005
                           8              3827        3771 3884              2438   3289
                           9              3572        3572 3572              3119   3969
                           10             3430        3260 3345              3374   2778
                            x             3314

Compute the mean for the sample two.
Solution:
      5018  5613  3033  2807  2948  3430  4196  3771 3572  3260
 x2 
                                      10
                                                                         Theme:            compute
   37649
         3765
    10
the mean for the sample 3-5.

The Variance and the Standard Deviation
The variance is a measure of spread and is defined by:

                                                       x               
                                                          n
                                                                         2
                                                                i   x
                                           s2        i 1

                                                   n
The most usual form for this measure is with n-1 in the dominator rather than with n. The
resulting measure is called the sample variance (or variance).
The sample variance, or variance is defined as follows:

                                                       x               
                                                          n
                                                                         2
                                                                i   x
                                           s2        i 1

                                                               n 1

The standard deviation
Such as sample variance, the sample standard deviation, or standard deviation, is a measure as
spread and is defined as follows:
                                    n

                                   (x
                                                      2
                                           i    x)
                            s     i 1
                                                               sample var iance
                                          n 1
Standard Error of the Mean
Let x1, …. xn be a random sample from a population with underlying mean µ and the variance σ2.
The set of sample means is repeated random samples of size n from this population has variance
σ2/n. The standard error of the mean or standard is defined as follows:
                                                                         
                                                                 SE 
                                                                             n

Example_2: Compute the standard error of the mean for the third sample from the Example_1.
The mean of the third sample is:   3314
Compute the variance of this sample using the next formula:

                                                                  x                 
                                                                   n
                                                                                      2
                                                                         i   x
                                                          s2     i 1

                                                                         n
For our data, the variance is given by:
        1 2750  3314  3544  3314  1758  3314  3402  3314  3742  3314  3827  3314     
                        2               2                               2               2               2
s2                                                                                                          
       10  3345  3314  3884  3314  3572  3314  3354  3314
          
                          2               2               2               2
                                                                                                              
                                                                                                              
    318096  52900  2421136  7744  183184  263169  961  324900  66564  961
s2                                                                                
                                         10
  3639615
          363961.5
    10
The standard deviation of the sample is:
                              S  S 2  363961.5  603.29  603
Standard error of the mean is given by:
                                           603   603
                                 SE                 190 .68
                                        n    10 3.16
Theme: compute the standard error of the mean for the rest of sample in Example_1.

Confidence interval
Known variance and n large (>30)
A 95% confidence interval for µ when σ is known is defined by:
                                                            
                                      (m  Z      ; m  Z      )
                                                 n             n
Example_3: Compute a 95% confidence interval for the mean basal body temperature using the
data 97.2, 96.8, 97.4, 97.4, 97.3, 97.0, 97.1, 97.3, 97.2, 97.3, assuming that standard deviation is
0.20.
Solution:
First we must to compute the mean of that data:
      97.2  96.8  97.4  97.4  97.3  97.0  97.1  97.3  97.2  97.3
 m                                                                        97.2
                                      10
Now we can compute de confidence interval for α=0.05 (Zα = 1.96). The 95% confidence
interval is given by:
                                     0.2             0.2
(m  Z     ; m  Z              )  (97.2  1.96        ;97.2  1.96                )  (97.2  0.12;97.2  0.12) 
          n                   n                      10                          10
 (97.08;97.32)
Example_4: Consider the 5 sample of size 10 from the population of birthweights as shown in
Example_1. Assume that σ is known to be 20. The interval:
                         
                          x  1,96
                                    
                                       ; x  1,96
                                                    
                                                       x  1,96
                                                                   600
                                                                       ; x  1,96
                                                                                  600 
                                                                                                   
                                                                                        x  372; x  372        
                                    n             n              10             10 
   will be different for each sample and is given in the next figure:



                                                                                  3686
                                                                               (3314+372)



                                                          3393                                               4137
                                                       (3765-372)                                         (3765+372)



                  2945                                                           3689
               (3317-372)                                                     (3317+372)



   2653                                                         3396
(3025-372)                                                   (3025+372)



             2800                                                          3544
   A      (3172-372)
       dashed line has
                    been added to represent                             (3172+372)
                                                          value for µ. The idea is that over a large
                                                                      an imaginary
   number of hypothetical samples of size 10, 95% of such intervals will contain the parameter µ.
                                   µ
   A 100%x (1-α) confidence interval for mean is defined by the interval:
                                                                                                 
                                                       (m  Z                 ;m  Z                )
                                                                 1       n              1       n
                                                                      2                       2



   Factors affecting the length of a confidence interval
   The length of a 100%x(1-α) confidence intervals 2z1-α/2σ/ n and is determined by n, σ and α.
   a. As a sample size (n) increases, the length of confidence interval decreases.
   b. As a standard deviation (σ), which reflects the variability of individual observations,
   increases, and the length of confidence interval increase.
   c. As the confidence desired increases (α decreases), the length of the confidence interval
   increase.

   Example_4: Compute a confidence interval for rte underlying mean basal body temperature
   assuming that the mean of sample is 97.2, the number of days sampled is 100 and the standard
   deviation is 0.2.Campute the 99% confidence interval for that data assuming that standard
   deviation is 0.4.
   Solution: The 95% confidence interval is given by:
                                                     0.2                      0.2
    (m  Z        ; m  Z         )  (97.2  1.96          ;97.2  1.96             )  (97.2  0.04;97.2  0.04) 
              n                 n                       100                      100
     (97.16;97.24)
   The 99% confidence interval is given by:
                           
   (m  Z       ;m  Z       ) , where Z   Z 0.01  Z 0.995  2.574
         1    n       1    n            1    1
            2             2                  2     2
                                                             0.4              0.4
    (m  Z 0.995       ; m  Z 0.995       )  (97.2  2.574       ;97.2  2.574 )  (97.2  0.1;97.2  0.1) 
                 n                     n                       10               10
     (97.1;97.3)
Confidence interval – unknown variance:
A 100%x(1-α) confidence interval for mean is given by:
                                                  s                      s
                               (m  t               ;m  t                )
                                      n  1,1              n  1,1 
                                                2  n                  2   n
Example_5: Suppose we have the birth weight data from a sample of 10 new-born child: 97,
117, 140, 78, 99, 148, 108, 135, 126, 121. Compute a 95% confidence interval for mean
assuming that the variance is unknown.
Solution:
    a. compute the mean of the sample:
        97  117  140  78  99  148  108  135  126  121
    m                                                                   116.9
                                  10

                                                                              x                   
                                                                                  n
                                                                                                    2
                                                                                           i   x
     b. compute the standard deviation: s 2                                     i 1
                                                                                                        , S  S2
                                                                                           n
       [(97  116.9)  (117  116.9)  (140  116.9)  (78  116.9) 2  (99  116.9) 2 
                               2                         2                       2


        (148  116.9) 2  (108  116.9) 2  (135  116.9) 2  (126  116.9) 2  (121  116.9) 2 ]
s2 
                                                  10
     1
s2    [396.01  0.01  533.61  1513.21  320.41  967.21  79.21  327.61  82.81  16.81] 
    10
 470.89
     S  S 2  470.89  21.7
    c. compute the 95% confidence interval:
                      s                     s
   (m  t               ;m  t               )
          n  1,1              n  1,1 
                    2  n                  2  n
                       21 .7                          21 .7                      21 .7                     21 .7
(116 .9  t 9, 0.975           ;116 .9  t 9, 0.975           )  (116  2.262             ;116  2.262            )  (101 .38;132 .42 )
                        10                             10                             10                    10
Confidence interval – sampling proportion
Consider the problem of estimating the prevalence p of a disease in a population. and f is the
sample proportion of the disease in a sample of n size, the 100%(1-α) confidence interval for p
is given by:
                                       f 1  f             f 1  f  
                            f  Z 1             ; f Z 
                                                         1
                                                                         
                                    2
                                            n               2
                                                                  n      

Example_6: Suppose we are interested in estimating the prevalence rate of breast cancer among
50-54-year-old women whose mothers have had breast cancer. Suppose that in a random sample
of 10000 such women, 400 are found to have had breast cancer at some point in their lives.
Compute a 95% confidence interval for the prevalence rate of breast cancer.
Solution: The best point estimate of the prevalence rate is given by proportion:
      400
 f         0.04
     10000
An approximate 95% confidence interval for α=0.05 and Zα =1,96 is given by:
                                     0.04 * 0.96 
                                                    0.04  0.004;0.04  0.004  0.036;0.044
             0.04 * 0.96
0.04  1.96             ;0.04  1.96
              10000                    10000 


Problems
Problem_1: A study of psychological and physiological changes in a cohort of dialysis
patients with and-stage renal disease was conducted. 102 patients were initially ascertained at
baseline; 69 of the 102 patients were reascertained at an 18-month follow-up visit. The data in
the next table were reported:

             E. Coli              S. aureus            P. aeruginosa
Laboratory   Different   Common   Different   Common   Different Common
             media       medium   media       medium   media       medium
A            27.5        23.8     25.4        23.9     20.1        16.7
B            24.6        21.1     24.8        24.2     18.4        17.0
C            25.3        25.4     24.6        25.0     16.8        17.1
D            28.7        25.4     29.8        26.7     21.7        18.2
E            23.0        24.8     27.5        25.3     20.1        16.7
F            26.8        25.7     28.1        25.2     20.3        19.2
G            24.7        26.8     31.2        27.1     22.8        18.8
H            24.3        26.2     24.3        26.5     19.9        18.1
I            24.9        26.3     25.4        25.1     19.3        19.2
a. Provide a point and interval estimation (95% confidence interval) for the mean of each of the
parameters at baseline and follow-up.
b. do you have any opinion on the physiological and psychological changes in this group of
patients?

Problem_2: Suppose we wish to estimate the concentration (µg/mL) of a specific dose of
ampicilin in the urine after various period of time. We recruit 25 volunteers and find that they
have a mean concentration of 7.0 µg/mL with standard deviation of 2.0 µg/mL. Assume that the
underlying population distribution of concentration is normally distributed.
a. Find 95% confidence interval for the population mean concentration.
b. Find a 99% confidence interval for the population variance of the concentrations.
c. How large a sample would be needed to ensure that the length of confidence interval in a is 0.5
µg/mL if the assume that sample standard deviation remains at 2.0 µg/mL?
Hypothesis testing

General concepts
Steps in testing statistical hypothesis:
Step 1. State the research question in term of statistical hypothesis.
The hypotheses can be formulated in term of null and alternative hypotheses, which can be
define as follows:
The null hypothesis, denoted by H0, is the hypothesis that is to be tested.
The alternative hypothesis, denoted by H1, is the hypothesis that in some sense contradicts the
null hypothesis.
As the result, there are four possible outcomes:
         1. We accept H0, and H0 is in fact true.
         2. We accept H0, and H1 is in fact true.
         3. We reject H0, and H0 is in fact true.
         4. We reject H0, and H1 is in fact true.
Step 2. Decide on the appropriate test statistic (parameter of test) for the hypothesis. Test
statistic has a probability distribution if the null hypothesis is assumed true.
The probability of a type I error is the probability of rejecting the null hypothesis given that H0
is true. The probability of a type I error is usually denoted by α and it’s commonly referees to as
significance level of a test.
The probability of a type II error is the probability of accepting the null hypothesis given that
H1 is true. The probability of a Type II error is usually denoted by .
Step 3. Select the level of significance for the statistical test or alpha value. This is a probability
of incorrectly rejecting the null hypothesis when it is actually true.
Traditional values:  = 0.05.
Step 4. Perform the calculation of the statistic test.
Step 5. State the conclusion with critical area:
If test statistic is in RA (rejection area) then accept H1 and reject H0.
If test statistic is in AA (acceptance area) then accept H0 and reject H1.
State the conclusion with p-value:
The p-value for a hypothesis test is the α level at which we would be indifferent between
accepting or rejecting H0. The importance of the p-value is that it tells us exactly how significant
the results are without performing repeated significance tests at different α levels.
The significance of a p-value:
         If 0.01  p  0.05 , then the results are significant.
         If 0.001  p  0.01, then the results are highly significant.
         If p  0.001 , then the results are very highly significant.
         If p  0.05 , the results are considered not statistically significant.
         If 0.05  p  0.1 , then a trend toward statistical significance is sometimes noted.

The power of the test is defined as:
1- = 1 - probability of a type II error.
A one-tailed test is a test in which the values of the parameter being studied under the alternative
hypothesis are allowed to be either greater than or less than the values of the parameter under the
null hypothesis but not both.
A Two-tailed test is a test in which the values of the parameter being studied (in this case )
under the alternative hypothesis are allowed to be either greater than or less than the values of
the parameter under the null hypothesis (0).

We applied the general concepts to several one-sample hypothesis-testing situations:
    the mean of a normal distribution with known variance (one-sample z test)
    the mean of a normal distribution with unknown variance (one-sample t test)
    the variance of a normal distribution (one sample chi-square test)
Each of the hypothesis tests can be concluded in one of two ways:
    specify critical value to determine the acceptance and rejection regions (critical-value
       method)
    compute p-values (p-value method).

One Sample Test for the Mean of a Normal Distribution with
Knowing Variance: Two-Sided Alternative
Example_1: Suppose we want to compare fasting serum-cholesterol levels among recent Asian
immigrants to the United States with typical levels found in the general population in the United
States. Suppose we assume that cholesterol-levels in women age 21-40 in the United States are
approximately normally distributed with mean 190 mg/dL and standard deviation 40 mg/dL. It is
known whether cholesterol levels among recent Asian immigrants are higher or lower than those
in the general U.S. are normally distributed with unknown mean  and standard deviation 40.
We wish to test the null hypothesis H0:  = 0 = 190, 2 = 1600 versus alternative hypothesis H1:
    0 , 2 = 1600.
Blood tests are performed on 100 female Asian immigrants age 21-40 and the mean level is
found to be 181.52 mg/dL.
What can be concluded on the basis of this evidence?

One Sample Test for the Mean of a Normal Distribution with
Knowing Variance
To test the hypothesis: H0:  = 0,  = 0 versus H1:    0 ,  = 0
with a significance level , the best (more powerful) test is based on x if:
              
 x   0  z    then H0 is rejected.
               n
If:
              
 x   0  z    then H0 is accepted.
               n
To the hypothesis: H0:  = 0,  = 0 versus H1:  < 0,  = 0
with a significance level of , we compute
     x  0
 z
        
        n
If z  z  or z  z               , then we reject H0.
                          1
        2                      2

If z   z  z          , then we accept H0.
              1
    2                2
The value z is called a test statistic, because the test procedure is based on this statistic.
The value z is call critical value because the outcome of the test depends on whether the test
statistic z  z  or z  z  = critical value, whereby we reject H0 or, z   z  z  whereby we
                                      1                                              1
                 2                         2                                 2             2
accept H0.
the general approach where we compute a test statistic and determine the outcome of a test by
comparing the test statistic to c critical value determined by the type I error is called the critical-
value method of a hypothesis testing
Example_2: Test the hypothesis that the cholesterol levels of recent Asian immigrants are
different from those in general United States population using the data in Example_1.
Solution: We compute the test statistic:
     x  0
z
       
        n
    181 .52  190  8.48
z                         2.12
          40           4
          100
For the two-sided test with =0.05, the critical value are z   z 0.025  1.96 , z        z 0.975  1.96 .
                                                                                   1
                                                              2                         2

Since z  z  , it is follows that we reject H0 at the 5% level significance. We conclude that the
           2
mean cholesterol level of recent Asian immigrants is significantly lower than the mean for the
general U.S. population.

Alternatively, we might want to compute a p-value. A p-value is computed in two different
ways, depending on whether z is less than or greater than 0.

p-value for the One-Sample z Test for the Mean of a Normal
Distribution with Known Variance (Two-Sided Alternative)
                                                   x  0
                                              z
                                                     
                                                      n
p-value:
 p  2 ( z ) if z  0
 p  2[1   ( z )] if z > 0
If p  0.05 , then H0 is rejected and the results are declared statistical significant.
If p  0.05 , then H0 is accepted and the results are declared not statistical significant.
We will refer to this approach as the p-value method.

Example_3: Compute the p-value for the hypothesis test in Example_1.
Solution: Since z = -2.12, the p-value for the test would be twice the left-hand tail area, or:
 p  2   ( z )  2   (2.12)  2  [1   (2.12)]
 p  2  (1  0.983)  0.034
The result are statistical significant with a p-value of 0.034.

The Power of a Test
Power of a One-Sample z Test for the Mean of a Normal Distribution with
Known Variance (Two-Sided Alternative)
The power of the two-sided test H0:  = 0 versus H1:    0 for the specific alternative  = 1,
where the underlying distribution is normal and the population variance (2) is known, is given
by:
                                                   0  1 n 
                                 Power    z              
                                             2
                                                            
                                                              
Example_4: A new drug in the class of calcium channel blockers is to be tested for the treatment
of patients with unstable angina, a sever type of angina. The effect this drug will have on heart
rate is unknown. suppose that 20 patients are to be studied and change in heart rate after 48 hours
is known to have a standard deviation of 10 beats per minute. What power would such a study
have of detecting a significant difference in heart rate over 48 hours if it is hypothesized that the
true mean change in heart rate from baseline to 48 hours could be 5 beats per minute in either
direction?
Solution: The power is given by:
                     0  1 n 
 Power    z                  
             2
                                
                                  
                     5 20 
 Power    z 0.05             1.96  2.236  0.61
             2         10 
The study would have 61% chance of detecting a significant difference.

One-Sample t Test
To test the hypothesis: H0:  = 0 versus H1:    0 with significance level  assuming that 2
is the same under both hypotheses and is unknown, then the best is based on the test statistic t ,
given by:
     x  0
t
        s
         n
If: t  t                  or t  t                 then H0 is rejected.
                n 1,                   n 1,1
                        2                         2

If: t              t t                  then H0 is accepted.
        n 1,                 n 1,1
                2                       2

Note that from the symmetry of the t distribution: t                                     t             
                                                                             n 1,              n 1,1
                                                                                     2                    2
Example_5: Occupational medicine is a relatively new field in medicine, whereby specific
health hazard are identified for particular occupation. One topic of recent interest is the effect of
fire fighting on pulmonary function. Suppose a group of 26, 25-35 years old male fire fighters
are identified and change in their pulmonary function over a 5-years period is measured. over 5
years it is found that the fire fighter have a mean decline in forced expiratory volume (FED),
which is the volume of air expelled in 1 second, of 0.27 liters with a sample standard deviation
of 0.32 liters. Can any conclusions be drawn about the occupational exposure if the expected
change over 5 years is 0.10 liters in normal male in this age group?
Solution: A two-sided test will be used, since pulmonary function of fire fighters may decline
either more than expected because of exposure or less than expected because of the likelihood of
their being healthier than the general population.
Assume that the decline in FEV is normally distributed with mean  and variance 2 is unknown.
To test H0:  = 0.10 versus H1:   0.10 we compute the test statistic:
        x  0
t
           s
      n
   0.27  0.10    0.17
t                       2.70
      0.32        0.063
        26
Under H0, t follows a t distribution with 25 degrees of freedom.
We know from the table that t 25, 0.99  2.485 , t 25,0.995  2.787 and therefore the p-value is between
2(1-0.995) = 0.01 and 2(1-0.99) = 0.02. This result is statistically significant with
0.01  p  0.02 and we conclude that the pulmonary function of fire fighters decline
significantly faster than typical 25-36 years old male.

The Relationship between Hypothesis Testing and Confidence
Intervals (Two-Sided Case)
Suppose we are testing: H0:  = 0, 2 = 20 versus H1:   0, 2 = 20 . H0 is rejected with two-
sided level  test if and only if the two-sided level 100%  (1 - ) confidence interval for  does
not contain 0. H0 is accepted with two-sided level  test if and only if the two-sided 100%  (1
- ) confidence interval for  does not contain 0.

Example_6: Consider the cholesterol data in Example_1. The two-sided 95%confidence interval
for  is given by:
                        
(x  z        ,x z       )
       1    n      1    n
          2            2

            1.96  (40 )            1.96  (40 ) 
 181 .52               ,181 .52               
                 10                      10      
181 .52  7.84,181 .52  7.84   173 .68,198 .36 
This confidence interval contains all values for 0 such that we accept H0:  = 0 and does not
contain any value 0 for which we could reject H0.

One-Sample                 2 Test
Used to test association between two qualitative variables each with two values mutually
exclusive and independent. Example: illness and risk factor.
We must test the hypothesis: H0: risk and illness are independent versus H1: risk and illness are
dependent.
We used 2  2 contingency table: one observed and another expected.
The test statistic is given by:
      LC
           ( f io  f it ) 2
 
  2
                     t
                             where f i o and f i t are observed and expected frequency.
      i 1        fi
Example_7: Suppose we wish to investigate, in a population, if a risk factor (stress) can be
associated with a specific illness (hypertension). It was observed a sample of 500 people and it
was found the next situation:
                                                    FR+     FR-
                                                +
                                           HTA      100     120
                                           HTA-     70      210
a. What are the appropriate hypotheses?
b. What are the appropriate procedures to test these hypotheses? (use   0.05 and for this
     value and one degree of freedom the acceptance region is given by [3.84, ) )
Solution: We must test the hypothesis: H0: risk and illness are independent versus H1: risk and
illness are dependent. In medical words: H0: the stress is not a risk factor for hypertension versus
H1: the stress is a risk factor for hypertension.
The observed contingency table is:
                                                FR+   FR-    Total
                                            +
                                       HTA      100   120    220
                                       HTA-     70    210    280
                                       Total    170   330    500
The expected contingency table is:
                                                                   FR +             FR -             Total
                                                                    220  170        220  330
                                                              +
                                                        HTA                                          220
                                                                      500              500
                                                        HTA-        280  170        280  330       280
                                                                      500              500
                                                        Total      170              330              500


                                                                           FR +    FR -      Total
                                                                      +
                                                                  HTA      75      145       220
                                                                  HTA-     95      185       280
                                                                  Total    170     330       500
The test statistic is:
      (100  75 ) 2 (120  145 ) 2 (70  95 ) 2 (210  185 ) 2
 2                                                           22 .6
           75            145            95              185
RA = [3.84,  )
22.6  RA = [3.84,  )
In conclusion, H1 is accepted; the stress is a risk factor for hypertension.

The Paired t Test
To test the hypothesis: H0:  = 0 versus H1:   0, when the variance is unknown, then the best
is based on the test statistic t , given by:
                                                 d
                                             t
                                                 sd
                                                                                     n
where d is the mean difference
                                                                          d1  d 2  ...  d n 
                                                                    d
                                                                                    n
                                     n 2  n 2 
                             s d   d i    d i  / n /(n  1)
                                     i 1
                                              i 1    
                                                         
and n = number of matched pairs.
If t  t   or t  t   , then H0 is rejected.
          n 1,1                         n 1,1
                       2                            2

If  t                    t t                then the H0 is accepted.
         n 1,1                   n 1,1
                   2                         2
Example_8: Suppose the paired-sample study design is adopted and the sample data in the Table
.... are obtaining. The systolic blood-pressure (bp) level of the ith woman is denoted at baseline by
xi1 and at follow-up by xi2.

                                                 Systolic blood-pressure        Systolic blood-pressure
                                                 level while not using          level while using
                                     i           OC’s (xi1)                     OC’s (xi2)                   di(xi1 –xi2)
                                     1           115                            128                          13
                                     2           112                            115                          3
                                     3           107                            106                          -1
                                     4           119                            128                          9
                                     5           115                            122                          7
                                     6           138                            145                          7
                                     7           126                            132                          6
                                     8           105                            109                          4
                                     9           104                            102                          -2
                                     10          115                            117                          2
Table ... Systolic blood-pressure levels (mm Hg) in 10 woman while not using (baseline) and
while using (follow-up) oral contraceptives.
Assess the statistical significance of the OC-BP data in Table ....
Solution:
    13  3  1  9  7  7  6  4  2  2
 d                                         4.80
                      10
s2={[132+32+…+22]-10(4.80)2}/9=20.844
s=4.566
t=4.80/(4.566/ 10 )=3.32
There are 10-1 = 9 degrees of freedom, and we know from the table that t9.975=2.262. Since t =
3.32 > 2.262 H0 cam be rejected using a two-sided significance test with cu α = 0.05.
To compute the p-value, we know from the table that t9, 0.9995 = 4.781 and t9, 0.995 = 3.250. Thus,
since 3.25 < 3.32 < 4.781, it follow that 0.0005 < p/2 < 0.005 or 0.001< p < 0.01.

Problems
Problem_1 Suppose the annual incidence of asthma in the general population among children
0 -4 years of age is 1,4% for boys and 1% for girls.
a. if 10 cases are observed over one year among 500 boys 0 -4 years of age with smoking
    mothers, then test if there is a significant difference in asthma incidence between this group
    and general population using the critical-method with a two-sided test.
b. Report p-value corresponding to your answer to problem_1.

Problem_2 Plasma-glucose levels are used to determine the presence of diabetes. Suppose the
mean in plasma-glucose concentration (mg/dL) in 35 – 44 – years – olds is 4.86 with standard
deviation 0.54. A study of 100 sedentary persons in this group is planned to test if they have
higher or lower level of plasma glucose than the general population.
a. If the expected difference is 0.10 units, then what is the power of such a study if a two-sided
    test is to be used with  = 0.05.
b. Answer to the same problem such in a if the expected difference is 0.20 units.

Problem_3 Much discussion has appeared in the medical literature in recent years on the role
of diet in the development of heart disease. The serum-cholesterol levels of a group of people
who eat a primarily macrobiotic diet is measured. Among 24 of them, aged 20 -39, the mean
cholesterol level was found to be 175 mg/dL with a standard deviation of 35 mg/dL.
a. If the mean cholesterol level in the general population in this age group is 230 mg/dL and the
    distribution is assumed to be normal, then test the hypothesis that the groups of people on a
    macrobiotic diet have cholesterol levels different from those of general population.
b. Compute a 95% confidence interval for the true mean cholesterol level in this group.

Problem_4 One method for assessing the effectiveness of a drug is to note its concentration in
blood and/or urine sample at certain periods of time after giving the drug. Suppose we wish to
compare the concentrations of two types of aspirin (type A and B) in urine specimens taken from
the same person, 1 hour after he or she has taken the drug. Hence a specific dosage of either type
A or a type B aspirin is given at one time and the 1 hour urine concentration is measured. One
week later, after the first aspirin has presumably been cleared from the system, the same dosage
of the other aspirin is given to the same person and the 1 hour urine concentration is noted. Since
the order of giving the drugs may affect the results, a table of random numbers is used to decide
which of the two types of aspirin to give first. This experiment is performed on 10 people; the
results are given in table ...

Table x....
Concentration of aspirin in urine
                              Aspirin A 1 hour       Aspirin A 1 hour
                              concentration (mg%)    concentration (mg%)
Person
1                             15                     13
2                             26                     20
3                             13                     10
4                             28                     21
5                             17                     17
6                             20                     22
7                             7                      5
8                             36                     30
9                             12                     7
10                            18                     11
Mean                          19.20                  15.60
sd (standard deviation)       8.63                   7.78

Suppose we wish to test the hypothesis that the concentrations of the two drugs are the same in
urine specimens.
a. What are the appropriate hypotheses?
b. What are the appropriate procedures to test these hypotheses?
c. Conduct the test.
d. What is the best point estimation of the difference in concentrations between the two drugs?
e. What is a 95% confidence interval for the mean difference?

Problem_5 Suppose we wish to test, in a population, if there is an association between
smoking and lung cancer. It was observed a sample of 400 people and it was found the next
situation:
                                              smoking +   smoking-
                                          +
                               Lung cancer    100         120
                               Lung cancer-   70          210
a. What are the appropriate hypotheses?
b. What are the appropriate procedures to test these hypotheses? (use   0.05 and for this
   value and one degree of freedom the acceptance region is given by [3.84, ) )

								
To top