VIEWS: 235 PAGES: 81 POSTED ON: 8/7/2011 Public Domain
Microsoft Word At the end of the chapter you will be able to: Create a new Word document; Add text to a document; Move around in the document; Work with and formatting text; Save a document; Create tables and work with them; Work with Clip Art and drawn own diagrams; Insert equations; Create document templates. About Microsoft Word Create a New Document The steps that you must to follow up are: Start – Programs – Microsoft Word. If you are already in a Word application: File – New (Ctrl+N). Add Text to a Word Document The blinking insertion point show where the text you type will appear. The mouse pointer: use to click buttons, select text and so on. The pointer shape varies with the task you are doing. If you want to write a paragraph it is not necessary to press ENTER at the end of the line, Word automatically moves to the next line. Press ENTER only when you want to start a new paragraph. Move Around in the Document If you use the keyboard to navigate, you may find it is easiest to move around in the document by pressing direction key: RIGHT ARROW LEFT ARROW UP ARROW DOWN ARROW HOME – the pointer will be move to the begin of the document END – the pointer will be move to the end of the document PAGE UP – go up on your document PAGE DOWN – go down into your document. You can also get where you want to go with a mouse clicks. Select the Text When you want to select text into a document, according with your desire, you can: Drag over the text that you want to be selected: select any amount of text; Double-click on a word: select the word; Left click to the left of the line: select a line; Click to the left of a line and drag up or down: select multiple lines. To undo a mistake, such as accidentally deleting a word, click the Undo button - or [Edit – Undo Typing]. If you decide you want to go through with the action after all, click the Redo button - . Insert and Delete Text If you have already practiced moving the insertion point, you know how to insert text: just click where you want to start inserting, and then type the new text. To delete just a few characters, use the DELETE (delete the characters from the right of the mouse pointer) or BACKSPACE (delete the characters from the left of the mouse pointer). To delete much more characters, lines or paragraph, first you must to select it and then: [Edit - Cut] or click the Delete button or activate simultaneously CRTL and x keys. Move and Copy a Text To move text, select it, click the Cut button (or [Edit - Cut], or CTRL + x), click in the new location and then click the Paste button (or [Edit - Paste], or CTRL + v). To copy text, select it, click the Copy button (or [Edit - Copy], or CTRL + c), click in the new location, and then click the Paste button (or [Edit - Paste], or Ctrl + v). (You can paste the text as many times as you want; the text remains on the Clipboard – a temporary storage location – until you cut or copy different text). Formatting Text Characters Apply bold formatting: 1. Select the text you want to change. 2. On the Formatting toolbar, click Bold. Apply italic formatting: 1. Select the text you want to change. 2. On the Formatting toolbar, click Italic. Change the font: 3. Select the text you want to change. 4. On the Formatting toolbar, click a font name in the Font box. Change the size of text: 5. Select the text you want to change. 6. On the Formatting toolbar, type or click a point size in the Font Size box. For example, type 10.5 Make text superscript or subscript 1. Select the text you want to format as superscript or subscript. 2. On the Format menu, click Font, and then click the Font. 3. Select the Superscript or Subscript check box Underline text Do one of the following: a. Add a basic underline 1. Select the text you want to change. 2. On the Formatting toolbar, click Underline. b. Add a decorative underline 3. Select the text you want to change. 4. On the Format menu, click Font, and then click the Font. 5. In the Underline style list, click the style you want. 6. In the Underline color list, click the color you want. About Text Alignment and Spacing The factors that determine how text is positioned are: Page margins: determines the distance from the edge for all the text on a page; Paragraph indentation and alignment: determines how paragraphs fit between the margins; Spacing before and/or after paragraphs: determine how much space occurs between before and/or after paragraphs. Line spacing: determine how much space occurs between lines. The types of line spacing are as follows: o Single: Accommodates the largest font in that line, plus a small amount of extra space. The amount of extra space varies depending on the font used. o 1.5 lines: One-and-one-half times that of single line spacing. o Double: Twice that of single line spacing. o At least: Minimum line spacing that is needed to fit the largest font or graphic on the line. o Exactly: Fixed line spacing that Microsoft Word does not adjust. o Multiple: Line spacing that is increased or decreased by a percentage that you specify. For example, setting line spacing to 1.2 will increase the space by 20 percent. Positioning and aligning text Margins determine the overall width of the main text area - in other words, the space between the text and the edge of the page ([File – Page Setup – Margins]). Horizontal alignment determines the appearance and orientation of the edges of the paragraph: left-aligned [Format – Paragraph – Indents and Spacing – General Alignment - Left], right-aligned [Format – Paragraph – Indents and Spacing – General Alignment - Right], centered [Format – Paragraph – Indents and Spacing – General Alignment - Centered], or justified [Format – Paragraph – Indents and Spacing – General Alignment - Justified]. Vertical alignment determines the paragraph's position relative to the top and bottom margins. This is useful, for example, when you are creating a title page, because you can position text precisely at the top or center of the page, or justify the paragraphs so that they are spaced evenly down the page. Changing the space between lines or paragraphs Line spacing determines the amount of vertical space between lines of text in a paragraph. By default, lines are single-spaced, meaning that the spacing accommodates the largest font in that line, plus a small amount of extra space. Paragraph spacing determines the amount of space above or below a paragraph. When you press ENTER to start a new paragraph, the spacing is carried over to the next paragraph, but you can change the settings for each paragraph. If a line contains a large text character, graphic, or formula, Microsoft Word increases the spacing for that line. To space all lines evenly, use exact spacing, and specify an amount of space that is large enough to fit the largest character or graphic in the line. If items appear cut off, increase the amount of spacing. Spelling and Grammar Some of the content in this topic may not be applicable to some languages. By default, Microsoft Word checks spelling and grammar automatically as you type, using wavy red underlines to indicate possible spelling problems and wavy green underlines to indicate possible grammatical problems. You can also check spelling and grammar all at once. 1. Make sure automatic spelling and grammar checking are turned on. On the Tools menu, click Options, and then click the Spelling & Grammar tab. Select the Check spelling as you type and Check grammar as you type check boxes. 2. Type in the document. 3. Right-click a word with a wavy red or green underline, and then select the command or the spelling alternative you want. If you mistype a word but the result is not a misspelling (for example, typing "from" instead of "form" or "there" instead of "their"), the spelling checker will not flag the word. To catch those types of problems, use the grammar checker. Save a Document If you save the document for the first time, from the File menu chouse Save As option (it is necessary to specified where you want to save it, what is the document name and what kind of type do you want to save it). You can save the document, by click the Save button from the button bar (or CTRL + s keys) (if the document is saving for the first time, Word asks for the name of the document). Headers and Footers A header or footer is a text (such as a page number, chapter title, or date) that appears at the top of bottom of every page. To add headers and footers, click [View - Header and Footer]. You will see boxes for entering the heaters and footers. Bulleted and Numbered Lists To organize your information, it can be add a simple bulleted list or it can be created a numbered list like: 1, 2, 3 or a), b), c); or i., ii., iii. Bulleted and numbered lists in Microsoft Word are easy to create: Create a numbered list: [Format – Bullets and Numbering –Numbered] – write and then confirm with ENTER. Create a bulleted list: [Format - Bullets and Numbering –Bulleted] and chouse kind of the bullets. Table A table is made up of rows and columns of cells that you can fill with text and graphics. Tables are often used to organize and present information. Create a table Microsoft Word offers a number of ways to make a table. The best way depends on how you like to work, and on how simple or complex the table needs to be. Click where you want to create a table. On the Table menu, chouse Insert option, and then Table. Under Table size, select the number of columns and rows. Under AutoFit behavior, choose options to adjust table size. To use a built-in table format, click AutoFormat. Delete a table or clear its contents You can delete an entire table. You can also clear the contents of cells without deleting the cells themselves. Delete a table and its contents Click the table. On the Table menu, point to Delete, and then click Table. Merge cells into one cell in a table You can combine two or more cells in the same row or column into a single cell. For example, you can merge several cells horizontally to create a table heading that spans several columns. 1. Select the cells you want to merge. To select: A cell: Click the left edge of the cell. A row: Click to the left of the row. A column: Click the column's top gridline or border. Multiple cells, rows, or columns: Drag across the cell, row, or column. Select multiple items that are not necessarily in order: Click the first cell, row, or column you want, press CTRL, and then click the next cells, rows, or columns you want. The entire table: Click the table move handle, or drag over the entire table. You can also select rows, columns, or the entire table by clicking in the table and then using the Select commands on the Table menu, or by using keyboard shortcuts. 2. On the Table menu, click Merge Cells. When you merge several cells in a column to create a vertically oriented table heading that spans several rows, click Change Text Direction on the Tables and Borders toolbar to change the orientation of the heading text. Insert text before a table Use this procedure to insert text before a table that is on the first line of the first page in a document. To insert text before a table, click in the upper-left cell in the first row of the table, and then press ENTER. If you have text in the upper-left cell, place the insertion point before the text. Type text. Copy a table In print layout view, rest the pointer on the upper-left corner of the table until the table move handle appears. Rest the pointer on the table move handle until a four-headed arrow appears. Press CTRL, and drag the copy to a new location. You can also copy a table by selecting it and then copying and pasting. Clip Art, Graphics and Drawing Add clip art or another type of graphic: [Insert – Picture –Clip Art]. Create your own drawing: [View – Toolbars – Drawing] - it will open a drawing bar which allows creating of own drawings. Lines, Boxes and Shaded Backgrounds Add borders or shading. Select an item and then click [Format – Borders and Shadings] and chose the border that you wand – confirm with Ok. Multiple Columns If you want a text on 2 or 3 … columns first select the text, then from Format menu chose Columns and from it chose the number of the column that you want for your text and press apply to selected text. If you want the whole document on 2 columns, with CTRL + a keys select whole the document and then Format – Column – set your number of column and apply to whole document. Insert an equation Some of the content in this topic may not be applicable to some languages. 1. Click where you want to insert the equation. 2. On the Insert menu, click Object, and then click the Create New tab. 3. In the Object type box, click Microsoft Equation 3.0. If Microsoft Equation Editor is not available, you may need to install it. 4. Click OK. 5. Build the equation by selecting symbols from the Equation toolbar and by typing variables and numbers. From the top row of the Equation toolbar, you can choose from more than 150 mathematical symbols. From the bottom row, you can choose from a variety of templates or frameworks that contain symbols such as fractions, integrals, and summations. If you need help, click Equation Editor Help Topics on the Help menu. 6. To return to Microsoft Word, click the Word document. Document templates When you are saving a template, Word switches to the User templates location (Tools menu, Options command, File Locations tab), which by default is the Templates folder and its subfolders. If you save a template in a different location, the template will not appear in the Templates dialog box. Exercises Exercise 1 1. Create a new Word document (Start - Programs - Microsoft Word). Introduce the following text: CRITICAL FACTS ABOUT LUNG CANCER Smoking Is Not the Only Risk Factor Although long-term cigarette smoking is solidly linked with lung cancer, more than half of those who develop the disease have never smoked or have quit. Other risk factors include contact with: second-hand smoke asbestos radioactive gas radon diesel fuel toxic industrial chemicals. Those working with certain minerals such as silica and beryllium are at increased risk, as are some patients with recurring lung inflammation (tuberculosis is an example). Finally, though no definite connection has been made, marijuana contains many of the cancer-promoting substances found in conventional cigarettes. Early Detection is the Key Lung cancer can be effectively treated provided it is found at an early stage. Unfortunately, these cancers are able to develop, grow and even spread to other body sites over a period of years without producing any outward signs that something is wrong. It is not uncommon for the first symptoms to appear outside the lungs after the cancer has spread. As a rule, once persistent symptoms of lung cancer are present, the tumor is so far advanced that a good response to treatment cannot be expected. New Imaging Methods are Changing the Picture Today radiologists and cancer specialists are finding new ways of detecting and treating lung cancers earlier and more effectively. After obtaining a chest x-ray-which is usually, the first step in the work-up-a special type of exam called spiral computed tomography (CT) scanning could produce three-dimensional images that clearly show the exact location, size, and shape of a lung mass. Both CT and magnetic resonance (MR) imaging can detect cancer that has spread to other parts of the body, a finding that will alter the treatment plan. Another new method, called positron emission tomography, or PET, helps to determine whether a suspicious lung mass is cancer and, if it is, how far it has progressed. Guidance by either x-rays or ultrasound makes it easier to obtain a tissue sample - the final step in diagnosis. 2. Format your document as follows: Paper format: A4; Select the title of the document and follow the steps: Format - Font - Font (ARIAL); Font Style (Bold); Size (18); choose a color from Font color, Shadow and All Caps from Effects. Text Spacing and Alignment: select the whole document and follow the next steps: Format - Paragraph - Indent and Spacing: General-Alignment: Justified; Spacing-Line Spacing: 1.5 lines. To emphasize the title, use a negative indent to push it out into the margin: Format - Paragraph - Indent and Spacing - Indentation: Left -0.5. Custom Margins: You can reduce the margins to fit more text on the page, or expand them to create a custom design. To set margins click Page Setup from the File menu: margins: top = 30mm, bottom = 30mm, side = 15 mm; paragraph indentation is 3.5 mm. Page Numbers, Headers and Footers. To add headers and footers, click Headers and Footers (View menu). In the header area, insert the title of the document aligned to the right. In the footer area, insert the page number by using Insert - Page Numbers. Lines, Boxes and Shaded Background. Add Borders and shadings: select an item and then click Borders and Shading (Format menu): Borders - chose a setting, a style, a color and a width for the border; Shading: chose a fill. If you need a border for the whole document choose Page Border from Borders and Shadings. 3. Save the document as CriticalFactsAboutLungCancer.doc. Exercise 2 1. Create a new Word document containing the following text: Colorectal cancer is the third most common cancer in men and women. An estimated 131,000 Americans are diagnosed with this disease each year and some 55,000 die as a result of it. Certain genetic factors play a role in the development of this cancer. The specific cause of colorectal cancer is unknown; however, environmental, genetic, familial factors and preexisting Ulcerative Colitis have been linked to the development of this cancer. The survival rate in colorectal cancer is determined by the stage of the disease at the time of diagnosis and, to some degree, to the response to treatment Following is a current survival table for patients at various stages of this illness. The statisticians have taken into consideration the impact of proper treatment. Nr. Stage 5 year survival 1 Duke A 85-90 % 2 Duke B 60-80 % 3 Duke C 40-45% 4 Duke D Less than 5 % Hints: The title of the text is written using WordArt. To activate the Drawing bar, from the View menu, choose Toolbars and check Drawing. Activate the Insert WordArt button from the drawing bar and write the title of the document. Format the second paragraph as a two-column text. First, write the text, then select it and from the Format menu choose the Columns option. Insert a table in the document: Table menu – Insert – Table. Choose the proper number of columns and rows. 2. Save the document as ColorectalCancer.doc. Exercise 3 1. Draw the diagram below. Transmission of blood type Hint: Use View - Toolbars - Drawing (for the arrows) and Format - Font - Superscript (for superscript). The text should be inserted into text boxes. 2. Save the document as BloodTypeTransmission.doc. Exercise 4 1. Insert the next equations into a Word document: a. 1 n ( xi X ) 2 n i 1 b. Pr(A B) Pr(A / B) Pr(B) Hint: From the Insert menu, choose the Object command and, subsequently, Microsoft Equation 3.0. 2. Save the file as Equations.doc. Exercise 5 1. Create a new Word document (Start - Programs - Microsoft Word) and introduce the following text using Times New Roman, Regular, 12: DEPARTMENT OF RADIOLOGY BARIUM MEAL INFORMATION SHEET Patient name: Appointment: Date: Time: Have TOTHING TO EAT OR DRINK AFTER MIDNIGHT ON: Please keep the whole day free, as the examination may be prolonged. The purpose of this x-ray examination is to investigate your stomach and upper digestive system. It is important that your stomach is empty and, for this reason, no food or drink is allowed for six hours prior to the examination. At the start of the examination, you will be given a cup of white liquid to drink. The lights in the room will be dimmed and the radiologist will take pictures of you in various positions, both standing and lying. You may be given an injection in your arm to help relax your stomach. Please expect to be in the X-ray Department for up to one hour (longer if we have been asked to perform a follow through examination to investigate your small bowel). You may find that the barium makes you slightly constipated, in which case it would be advisable to take a mild laxative if it proves necessary. If you are diabetic, do not follow any dietary instructions without discussing this with us first. If you are unable to attend for this examination, please inform us immediately so that we may offer the appointment slot to someone else. For women within childbearing age (teenage - middle age) it is very important that you have had a period within the last 28 days before your appointment date. If you have not, would you please contact us and we will change your appointment to a date when you have. Hints: 1. Save the document as a template. From the File menu choose Save As. File type: Document Template File name: BariumMealProgramming (BariumMealProgramming.dot) 2. You want to inform three patients about their barium investigation appointment. To create the announcement for these patients open the My Documents folder and double-click the first template document. Introduce data for the first patient and save the document as FirstPatient. Create the document for the second and third patient in a similar way and save these documents as SecondPatient and ThirdPatient, respectively. Exercise 6 Create the next table in a Word document: Record no. Name Date of birth Birth Weight Day of Weight Blood Pressure (gram) care (gram) Systolic Diastolic (mmHg) (mmHg) 1 3150 75 50 2 3140 80 50 1. Barbara 02.08.2002 3200 3 3130 75 55 4 3150 75 50 5 3170 80 55 1 2700 70 45 2 2650 70 45 2 John 02.09.2002 2800 3 2700 75 50 4 2800 75 50 5 2850 75 50 Hints: Insert a table with 8 columns and 12 rows To make the table look as shown above, use Merge Cells from the Table menu. Save the document as BirthTable.doc. Microsoft Excel At the end of the chapter you will be able to: Create an Excel Workbook; Work with Sheets; Create a New Microsoft Excel Workbook Make sense of your data by organizing, calculating and analyzing it with Microsoft Excel. You work with your data on one or more worksheets in a workbook. Create a Workbook File You can create a new, blank workbook or, to save time, open an existing workbook or a template and fill in your data. What is the difference between a workbook and a worksheet? A workbook is a Microsoft Excel file containing one or more sheets; each worksheet is a ―page‖ in the workbook on which you enter and work with data. Every workbook start with three worksheets but you can add worksheets and other kind of sheets. What is on the Screen? When you create a new workbook, the Microsoft Excel window display a worksheet with grid of row and columns. Each box or cell has a reference indicating its row and column location, for example C3 refer the cell which is at the intersection of column C with row 3. Select sheets When you enter or change data, the changes affect all selected sheets. These changes may replace data on the active sheet and other selected sheets. To select Do this A single sheet Click the sheet tab. If you don't see the tab you want, click the tab scrolling buttons to display the tab, and then click the tab. Two or more adjacent Click the tab for the first sheet, and then hold down SHIFT and click the tab for the sheets last sheet. Two or more nonadjacent Click the tab for the first sheet, and then hold down CTRL and click the tabs for the sheets other sheets. All sheets in a workbook Right-click a sheet TAB, and then click Select All Sheets on the shortcut menu. If sheet tabs have been color-coded, the sheet tab name will be underlined in a user-specified color when selected. If the sheet tab is displayed with a background color, the sheet has not been selected. Rename a sheet 1. To rename the active sheet, on the Format menu, point to Sheet and then click Rename. 2. Type the new name over the current name. Add More Sheets to the Workbook To organize your data, you can add more sheets to a workbook. Another kind of sheet you can add is a chart sheet, which displays data graphically. The number of sheets you can add to workbook is limited only by available system memory using: Insert – Worksheet. Give workbook sheets meaningful names. Named tabs can help you locate sheets in your workbook. Double-click the tab at the button of the window and type the name you want. Enter data in worksheet cells (numbers, text, a date or a time) 1. Click the cell where you want to enter data. 2. Type the data and press ENTER or TAB. Numbers and text in a list 1. Enter data in a cell in the first column, and then press TAB to move to the next cell. 2. At the end of the row, press ENTER to move to the beginning of the next row. 3. If the cell at the beginning of the next row doesn't become active, click Options on the Tools menu, and then click the Edit tab. Under Settings, select the Move selection after Enter check box, and then click Down in the Direction box. Dates Use a slash or a hyphen to separate the parts of a date; for example, type 9/5/2002 or 5-Sep-2002. To enter today's date, press CTRL+; (semicolon). Times To enter a time based on the 12-hour clock, type a space and then a or p after the time; for example, 9:00 p. Otherwise, Microsoft Excel enters the time as AM. To enter the current time, press CTRL+SHIFT+: (colon). Work in Cells and Ranges When you work with data in worksheet cells – for example entering, copying, deleting or formatting data – first you select the area to work in. The selection can be a single cell or a range of cells. After making your selection, perform the action you want. Data you enter and work with can be text, such a list of name and addresses; values, such blood pressure or a formula that calculates a value. Use the TAB key to move to the next cell to the right. When you reach the end of a row, press ENTER to move to the first cell in the next row. Formatting numbers, dates, and times On the Tools menu, click Options, click the Edit tab, and then clear the Fixed Decimal check box. To remove decimal points from numbers you've already entered, you can multiply the numbers by a power of 10. In an empty cell, enter a number such as 10, 100, or 1,000, depending upon the number of decimal places you want to remove. For example, enter 100 in the cell if the numbers contain two decimal places and you want whole numbers. Copy the cell to the Clipboard and select a range of adjacent cells that contain numbers with decimal places. On the Edit menu, click Paste Special, and then click Multiply. Numbers are not displayed or calculated as numeric values If the numbers are aligned to the left of the cell and if you have not changed the default alignment (General), the numbers are formatted or entered as text. To change them to numbers, do the following: 1. Select a blank cell that you know has the General number format. If you aren't sure of the cell format, click Cells on the Format menu, and then click the Number tab. In the Category box, click General, and then click OK. 2. In the cell, type 1 and then press ENTER. 3. Click the cell, and then click Copy on the Standard toolbar. 4. Select the range of cells that contain the "text" numbers. 5. On the Edit menu, click Paste Special, click Multiply, and then click OK. The number in a worksheet is not the same as the number in the formula bar The number format applied to a cell determines the way Microsoft Excel displays a number in that cell on the worksheet. The format does not affect the cell value used in calculations, which is displayed in the formula bar when the cell is active. To remove number formats that may affect the displayed value, select the cells. 1. On the Format menu, click Cells, and then click the Number tab. 2. In the Category box, click General. The number of custom number formats has been exceeded. If you must delete one or more of the existing custom number formats in order to add new ones. 1. On the Format menu, click Cells, and then click the Number tab. 2. In the Category list, click Custom. 3. At the bottom of the Type box, click the custom format you want to delete. Click Delete. Formatting text Rotated text is not displayed at the correct angle You can obtain rotated text from Format menu, click Cells and than click the Alignment tab. If you've saved a workbook in another file format, the rotated text format might be lost. Most file formats do not support rotation within the full 180 degrees (+90 through – 90 degrees), which is possible in the current version of Microsoft Excel. Earlier versions of Excel can rotate text only at angles of +90, 0 (zero), or – 90 degrees. If the specified angle of rotation cannot be maintained in the other file format, the text is not rotated. Borders are not displayed the way I want Look at adjacent cells: If you apply borders to a selected cell, the border is also applied to adjacent cells that share a bordered cell boundary. For example, if you apply a box border enclosing the range B1:C5, the cells D1:D5 acquire a left border. Check which border was last applied: If you apply two different types of borders to a shared cell boundary, the most recently applied border is displayed. Choose the appropriate border type: A selected range of cells is formatted as a single block of cells. If you apply a right border to the range of cells B1:C5, the border is displayed only on the right edge of the cells C1:C5. To display interior borders, use the button on the Borders palette. Enter Data Automatically Avoid repetitive typing and save time by entering some kinds of data automatically. You can automatically enter the same information in several cells or enter an incremental series. For that enter the beginning of the series and select the entities and drag the fill handle. The first of the series are filled automatically. Modify the Data To edit a cell’s contents, double-click it and then make the change. Use the Cut, Copy or Paste command: first make your selection and then right click to display the shortcut menu. Make a mistake? Click the Undo button. Clear data from a cell. Select the cell and press Delete. Adjust the Spacing and Alignment of Data To help distinguish different types of information in cells, adjust the alignment of cell contents using the alignment buttons. You can insert rows and columns to set data or labels apart using the Rows and Columns commands (Insert menu). Adjust the width and height of rows and columns by dragging or double-click the line to the right of the column letter or below the row number in the header. Merge cells across columns. You can easily merge headings across the top of a range of cells. Type the title in the leftmost cell in the range, select the range and then click from Format menu Cells – Alignment – Merge cells. About functions Functions are predefined formulas that perform calculations by using specific values, called arguments, in a particular order, or structure. Functions can be used to perform simple or complex calculations. Structure of a function Structure. The structure of a function begins with an equal sign (=), followed by the function name, an opening parenthesis, the arguments for the function separated by commas, and a closing parenthesis. Arguments. Arguments can be numbers, text, logical values such as TRUE or FALSE, arrays, error values such as #N/A, or cell references (the set of coordinates that a cell occupies on a worksheet. For example, the reference of the cell that appears at the intersection of column B and row 3 is B3.). The argument you designate must produce a valid value for that argument. Arguments can also be constants, formulas, or other functions. Entering formulas When you create a formula that contains a function, the Insert Function dialog box helps you enter worksheet functions. As you enter a function into the formula, the Insert Function dialog box displays the name of the function, each of its arguments, a description of the function and each argument, the current result of the function, and the current result of the entire formula. About calculation operators Operators specify the type of calculation that you want to perform on the elements of a formula. Microsoft Excel includes four different types of calculation operators: arithmetic, comparison, text, and reference. Types of operators a. Arithmetic operators To perform basic mathematical operations such as addition, subtraction, or multiplication; combine numbers; and produce numeric results, use the following arithmetic operators. Arithmetic operator Meaning (Example) + (plus sign) Addition (3+3) – (minus sign) Subtraction (3–1) Negation (–1) * (asterisk) Multiplication (3*3) / (forward slash) Division (3/3) % (percent sign) Percent (20%) ^ (caret) Exponentiation (3^2) b. Comparison operators You can compare two values with the following operators. When two values are compared by using these operators, the result is a logical value either TRUE or FALSE. Comparison operator Meaning (Example) = (equal sign) Equal to (A1=B1) > (greater than sign) Greater than (A1>B1) < (less than sign) Less than (A1<B1) >= (greater than or equal to sign) Greater than or equal to (A1>=B1) <= (less than or equal to sign) Less than or equal to (A1<=B1) <> (not equal to sign) Not equal to (A1<>B1) c. Text concatenation operator Use the ampersand (&) to join, or concatenate, one or more text strings to produce a single piece of text. Text operator Meaning (Example) & (ampersand) Connects, or concatenates, two values to produce one continuous text value ("North"&"wind") d. Reference operators Combine ranges of cells for calculations with the following operators. Reference Meaning (Example) operator : (colon) Range operator, which produces one reference to all the cells between two references, including the two references (B5:B15) , (comma) Union operator, which combines multiple references into one reference (SUM(B5:B15,D5:D15)) (space) Intersection operator, which produces on reference to cells common to the two references (B7:D7 C6:C8) The order in which Excel performs operations in formulas Formulas calculate values in a specific order. A formula in Excel always begins with an equal sign (=). The equal sign tells Excel that the succeeding characters constitute a formula. Following the equal sign are the elements to be calculated (the operands), which are separated by calculation operators. Excel calculates the formula from left to right, according to a specific order for each operator in the formula. Operator precedence If you combine several operators in a single formula, Excel performs the operations in the order shown in the following table. If a formula contains operators with the same precedence — for example, if a formula contains both a multiplication and division operator — Excel evaluates the operators from left to right. Operator Description : (colon) Reference operators (single space) , (comma) – Negation (as in –1) % Percent ^ Exponentiation * and / Multiplication and division + and – Addition and subtraction & Connects two strings of text (concatenation) = < > <= >= <> Comparison Use of parentheses To change the order of evaluation, enclose in parentheses the part of the formula to be calculated first. For example, the following formula produces 11 because Excel calculates multiplication before addition. The formula multiplies 2 by 3 and then adds 5 to the result. ―=5+2*3‖ In contrast, if you use parentheses to change the syntax, Excel adds 5 and 2 together and then multiplies the result by 3 to produce 21. ―= (5+2)*3‖ In the example below, the parentheses around the first part of the formula force Excel to calculate B4+25 first and then divide the result by the sum of the values in cells D5, E5, and F5. ―= (B4+25)/SUM (D5:F5)‖ Create a formula Formulas are equations that perform calculations on values in your worksheet. A formula starts with an equal sign (=). Create a simple formula The following formulas contain operators and constants. Example formula What it does =128+345 Adds 128 and 345 =5^2 Squares 5 1. Click the cell in which you want to enter the formula. 2. Type = (an equal sign). 3. Enter the formula. 4. Press ENTER. Create a formula that contains references or names: =A1+23 The following formulas contain relative references to and names of other cells. The cell that contains the formula is known as a dependent cell when its value depends on the values in other cells. For example, cell B2 is a dependent cell if it contains the formula =C2. Example formula What it does =C2 Uses the value in the cell C2 =Sheet2!B2 Uses the value in cell B2 on Sheet2 =Asset-Liability Subtracts a cell named Liability from a cell named Asset 1. Click the cell in which you want to enter the formula. 2. In the formula bar (a bar at the top of the Excel window that you use to enter or edit values or formulas in cells or chart. Displays the constant value or formula stored in the active cell), type = (equal sign). 3. Do one of the following: o To create a reference, select a cell, a range of cells, a location in another worksheet, or a location in another workbook. You can drag the border of the cell selection to move the selection, or drag the corner of the border to expand the selection. o To create a reference to a named range, press F3, select the name in the Paste name box, and click OK. 4. Press ENTER. Create a formula that contains a function: =AVERAGE (A1:B4) The following formulas contain functions. Example formula What it does =SUM(A:A) Adds all numbers in column A =AVERAGE(A1:B4) Averages all numbers in the range 1. Click the cell in which you want to enter the formula. 2. To start the formula with the function, click Insert Function on the formula bar. 3. Select the function you want to use. You can enter a question that describes what you want to do in the Search for a function box (for example, "add numbers" returns the SUM function), or browse from the categories in the Or Select a category box. 4. Enter the arguments. To enter cell references as an argument, click Collapse Dialog to temporarily hide the dialog box. Select the cells on the worksheet, and then press Expand Dialog. 5. When you complete the formula, press ENTER. IF Returns one value if a condition you specify evaluates to TRUE and another value if it evaluates to FALSE. Use IF to conduct conditional tests on values and formulas. Syntax: IF(logical_test,value_if_true,value_if_false) Logical_ test is any value or expression that can be evaluated to TRUE or FALSE. For example, A10=100 is a logical expression; if the value in cell A10 is equal to 100, the expression evaluates to TRUE. Otherwise, the expression evaluates to FALSE. This argument can use any comparison operator.. Value_ if_ true is the value that is returned if logical test is TRUE. Valu_ if_false is the value that is returned if logical test is FALSE. Remarks Up to seven IF functions can be nested as value_if_true and value_if_false arguments to construct more elaborate tests. See the last of the following examples. When the value_if_true and value_if_false arguments are evaluated, IF returns the value returned by those statements. If any of the arguments to IF are array , every element of the array is evaluated when the IF statement is carried out. Microsoft Excel provides additional functions that can be used to analyze your data based on a condition. For example, to count the number of occurrences of a string of text or a number within a range of cells, use the COUNTIF worksheet function. To calculate a sum based on a string of text or a number within a range, use the SUMIF worksheet function. Example The example may be easier to understand if you copy it to a blank worksheet. 1. Create a blank workbook or worksheet and introduce the data from the next table. 2. In the worksheet, select cell A1, and press CTRL+V. 3. To switch between viewing the results and viewing the formulas that return the results, press CTRL+` (grave accent), or on the Tools menu, point to Formula Auditing, and then click Formula Auditing Mode. A B 1 Cholesterol blood level Normal Cholesterol blood level 2 250 220 3 340 4 500 Formula Description (Result) = IF(A2>$B$2,"ill","not ill") Checks whether the first row is over normal cholesterol blood level (ill) =IF(A3>$B$3,"ill","not ill") Checks whether the second row is over normal cholesterol level (not ill) About charts Charts are visually appealing and make it easy for users to see comparisons, patterns, and trends in data. For instance, rather than having to analyze several columns of worksheet numbers, you can see at a glance whether sales are falling or rising over quarterly periods, or how the actual sales compare to the projected sales. Creating charts You can create a chart on its own sheet or as an embedded object on a worksheet. To create a chart, you must first enter the data for the chart on the worksheet. Then select that data and use the Chart Wizard to step through the process of choosing the chart type and the various chart options, or use the Chart toolbar to create a basic chart that you can format later. Create a chart 1. Make sure the data on your worksheet is arranged properly for the type of chart you want to use. For Column, Bar, Line, Area. Surface or Radar chart Arrange your data in columns or in rows. For a Pie or Doughnut chart Regular pie charts have only one series of data, so you should use only one column or one row of data. For an XY scatter or Bubble chart Arrange your data in columns, with x values in the first column and corresponding y values and/or bubble size values in adjacent columns. 2. Do one of the following: Customize your chart as you create it. 1. Select the cells that contain the data you want to use for your chart. 2. Click Chart Wizard button. 3. Follow the instructions in the Chart Wizard. Create a basic chart that you can customize later. 4. Display the Chart toolbar. To show the Chart toolbar, point to Toolbars on the View menu and then click Chart. 5. Select the cells that contain the data you want to use for your chart. 6. Click Chart Type. Add a trendline to a chart 1. Click the data series (related dada points that are plotted in a chard. Each data series in a chart has a unique color or pattern and is represented in the chart legend) to which you want to add a trendline (a graphic representation of trends in data series. Trendlines are used for the study of problems of prediction, also called regression analysis). 2. On the Chart menu, click Add Trendline. 3. On the Type tab, click the type of regression trendline or moving average you want. o If you select Polynomial, enter in the Order box the highest power for the independent variable. o If you select Moving Average, enter in the Period box the number of periods to be used to calculate the moving average Equations for calculating trendlines Linear Calculates the least squares fit for a line represented by the following equation: y mx b where m is the slope and b is the intercept. Polynomial Calculates the least squares fit through points by using the following equation: y b c1 x c 2 x 2 c3 x 3 ....... c6 x 6 where b and c1 .....c 6 are constants. Logarithmic Calculates the least squares fit through points by using the following equation: y c ln x b where c and b are constants and ln is the natural logarithm function. Exponential Calculates the least squares fit through points by using the following equation: y ce bx where c and b are constants, and e is the base of the natural logarithm. Display the R-squared value for a trendline 1. Click the trendline for which you want to display the R-squared value (an indicator from 0 to 1 that reveals how closely the estimated values for the trendline correspond to your actual data. A trendline is most reliable when R-squared values is at or near 1. Also known as the coefficient of determination). 2. On the Format menu, click Selected Trendline. 3. On the Options tab, select Display R-squared value on chart. Sort a list Sort rows in ascending order (A to Z, or 0 to 9) or descending (Z to A, or 9 to 0) 1. Click a cell in the column you would like to sort by. 2. Click Sort Ascending or Sort Descending . Sort rows by 2 or more columns For best results, the list you sort should have column labels. 1. Click a cell in the list you want to sort. 2. On the Data menu, click Sort. 3. In the Sort by and Then by boxes, click the columns you want to sort. 4. Select any other sort options you want, and then click OK. Exercises Problem_1 1. Create a new Excel document with the following data: 2. Compute the cost of hospitalization by using the next formula: =B6*$G$3. To create any formulas, begin by pressing the equal sign (=).You can enter values directly in a formula, for example, by typing =B6*$G$3. Press Enter to see the value resulting from the formula. To avoid repetitive typing and save time by entering same formula select the first entities and drag the fill down. 3. Create a Pie chart to relive the sex distribution from the study and a Scatter chart between age and duration of hospital stay. You can insert a graphic in a worksheet from Insert menu by using Chart or press Chart Wizard button. First you must to select the range of worksheet data you want to include in a chart. Then click the Chart Wizard button. Follow the instructions in the wizard to specify the chart type and options you want. The wizard offers you the option of creating a chart on the worksheet or creating a separate chart sheet in the workbook. If you create a chart on the worksheet, you can reposition and resize it. For the Pie chart, first you must to compute the number of male and respectively female. In Insert menu chose the Function option and after that chose the function COUNTIF. Into the range you must to select the data from the sex column and for the criteria you must to select a cell that contains the number who corresponds to sex (for example: cell D5 for female and D7 for the male). 4. To the scatter chart add a trendline: select a point from the graphic and with right click chose Add Trendline: from Type select Linear, from Options select: Display the equation (to obtain equation of the line) on chart and Display R-squared value on chart (to obtain coefficient of determination). 5. After you select the entire data sort: Data – Sort – sort by: Column C, descending; Then by: Column B, descending; Then by: Column G, descending. 6. Save this file in your name folder: File – Save as – a dialog box (Save in: FirstName.SecondName, File name: Excel_1 - Save). 7. Close the application and don’t forget to log off: Start – Shut Down – a dialog box where ask you ―What do you want to computer to do?‖ and click on Log off FirstName.SecondName. Problem_2 1. Create a new Microsoft Excel document: Start - Programs- Microsoft Excel and introduce the next data: Table 2.12 comes from a paper giving the distribution of astigmatism in 133 young men, aged 18-22, who were accepted for military service in Great Britain. Assume that astigmatism is rounded to the nearest 10th of a diopter. Table 2.12 Distribution of astigmatism in 1033 young men aged 18-22 2. Compute the grouped arithmetic mean (average): from the Insert menu chose Function and Average (for arithmetic mean). 3. Compute the grouped standard deviation: Insert - Function - STDEV (standard deviation). 4. Plot a histogram to properly illustrate these data: Select the data from the two columns and from Insert menu chose Chart - Column. The chart must to look like: Histogram 500 400 300 200 100 0 0,0- 0,2- 0,4- 0,6- 1,1- 2,1- 3,1- 4,1- 5,1- 0,1 0,3 0,5 1,0 2,0 3,0 4,0 5,0 6,0 5. Save the document in your folder as Excel_2. Problem_3 1. Create a new Microsoft Excel document: Start - Programs- Microsoft Excel and introduce the next data: Table 1.1. Serum-cholesterol levels before and after adopting a vegetarian diet. 2. Compute the arithmetic mean (average): Insert -Function - Average (for arithmetic mean), max (MAX) and (MIN) min for the cholesterol before and after adopting a vegetarian diet and for the differences. 3. Create a scatter chart that to represent the link between age and serum cholesterol level before adopting the vegetarian diet. Add a trendline and the display the equation and the R-squared value on the chart. 4. Save the document in your folder as Excel_3. Minimize the Excel windows and open a new Word document. Copy here all the data that you create it in Excel. Save the Word document as Word Excel. 5. Close all the application and do not forgot to log off: Start - Shut Down - Log off. Problem_4 1. Create a new Excel document: Start – Programs – Microsoft Excel and make a file with the following data: In the Patient status column introduce the next formula: =IF(B2<$D$1,"No","Yes"). 2. Sort the data by the Patient status column: select the data first and from menu Data chose Sort – sort by: Patient status, descending. Now, the data must to be as bellow: 3. To organize the program for descriptive statistics, from the Tools menu chose Add-Ins and then chose Analysis ToolPak and Analysis ToolPak-VBA: 4. Compute the correlation between Blood level of sugar and Urinary level of sugar: Tools - Data Analysis and chose the same as bellow: 5. Compute the Covariation between Blood level of sugar and Urinary level of sugar: Tools - Data analysis and chose the same as bellow: 6. Calculate the parameters of descriptive statistics: Tools - Data analysis- Descriptive statistics and chose the same as bellow: 7. Save the document in tour folder as Excel_4. 8. Close the application and don’t forget to log off: Start – Shut Down – a dialog box where ask you ―What do you want to computer to do?‖ and click on Log off FirstName.SecondName. Microsoft PowerPoint About Microsoft PowerPoint Any time you communicate with a group of people, your are giving a presentation. You can communicate information better and more easily with PowerPoint presentation, a series of slides that create by using PowerPoint. About creating presentations Creating a presentation in Microsoft PowerPoint involves starting with a basic design; adding new slides and content; choosing layout (the arrangement of elements such title and subtitle text, lists, pictures. tables. charts, AutoShapes and movies on the chart); modifying slide design, if you want, by changing the color scheme or applying different design templates (a file that contains the style in a presentation, including the type and size of bullets and fonts, background design and fill, color schemes and a slide master); and creating effects such as animated slide transitions. The information below focuses on the options available to you when you start the process. The New Presentation in PowerPoint gives you a range of ways with which to start creating a presentation. These include: Blank Start with slides that have minimal design and no color applied to them. Existing presentation Base your new presentation on one that you have already written and designed. This command creates a copy of an existing presentation so you can make the design or content changes you want for the new presentation. Design template Base your presentation on a PowerPoint template that already has a design concept, fonts, and color scheme. In addition to the templates that come with PowerPoint, you can use one you created yourself. Templates with suggested content Use the AutoContent Wizard to apply a design template that includes suggestions for text on your slides. You then type the text that you want. A template on a Web site Create a presentation using a template located on a Web site. About PowerPoint views Microsoft PowerPoint has three main views: normal view, slide sorter view, and slide show view. Normal view Normal view is the main editing view, which you use to write and design your presentation. The view has three working areas: on the left, tabs that alternate between an outline of your slide text (Outline tab) and your slides displayed as thumbnails (Slides tab); on the right, the slide pane, which displays a large view of the current slide; and on the bottom, the note pane. Outline tab: Showing your slide text in outline form, this area is a great place to start writing your content — to capture your ideas, plan how you want to present them, and move slides and text around. Slides tab: Switch to this tab to see the slides in your presentation as thumbnail-sized images while you edit. The thumbnails make it easy for you to navigate through your presentation and to see the effects of your design changes. You can also rearrange, add, or delete slides. Slide pane: With the current slide shown in this large view, you can add text, insert pictures, tables, charts, drawing objects, text boxes, movies, sounds, hyperlinks, and animations. Notes pane: Add notes that relate to each slide's content, and use them in printed form to refer to as you give your presentation, or create notes that you want your audience to see either in printed form or on a Web page. The Outline and Slides tabs change to display an icon when the pane becomes narrow, and if you only want to see the current slide in the window as you edit, you can close the tabs with a Close box in the right corner. Slide sorter view Slide sorter view is an exclusive view of your slides in thumbnail form. When you are finished creating and editing your presentation, slide sorter gives you an overall picture of it — making it easy to reorder, add, or delete slides and preview your transition and animation effects. Slide show view Slide show view takes up the full computer screen, like an actual slide show presentation. In this full-screen view, you see your presentation the way your audience will. You can see how your graphics, timings, movies, animated elements, and transition effects will look in the actual show. Create a presentation using a design template 1. If the New Presentation task pane isn't displayed, on the File menu, click New. 2. Under New, click From Design Template. 3. In the Slide Design task pane, click a design template that you'd like to apply. 4. If you want to keep the default title layout for the first slide, go to step 5. If you want a different layout for the first slide, on the Format menu, click Slide Layout, and then click the layout you want. 5. On the slide or on the Outline tab, type the text for the first slide. 6. To insert a new slide, on the toolbar, click New Slide , and click the layout you want for the slide. 7. Repeat steps 5 and 6 to keep adding slides, and add any other design elements or effects you want. 8. To save the presentation, on the File menu, click Save; in the File name box type a name for the presentation, and then click Save. Duplicate slides within a presentation Duplicated slides are inserted directly below the slides you have selected. 1. On the Outline tab or Slides tab in normal view, select the slides you want to duplicate. (If you want to select slides in order, press SHIFT as you click; for slides not in order, press CTRL as you click.) 2. On the Insert menu, click Duplicate Slide. Change slide order Do one of the following: On the Outline tab in normal view, select one or more slide icons and then drag the selection to a new location. On the Slides tab in normal view, select one or more slide thumbnails, and then drag the selection to a new location. In slide sorter view, select one or more slide thumbnails, and then drag the thumbnails to a new location. To select multiple slides in a row, press SHIFT before clicking the slide icon or thumbnail. Delete a slide 1. On the Outline tab or Slides tab in normal view, select the slides you want to delete. (If you want to select slides in order, press SHIFT as you click; for slides not in order, press CTRL as you click.) 2. On the Edit menu, click Delete Slide. About adding text to a slide There are four types of text you can add to a slide: placeholder text, text in an AutoShape, text in a text box, and WordArt text. The text you type into placeholders, such as titles and bulleted lists, can be edited on the slide or on the Outline tab, and it can be exported from the Outline tab to Microsoft Word. Text in an object, such as a text box or AutoShape, and WordArt text do not appear on the Outline tab and must be edited on the slide. Placeholders: Slide layouts contain text and object placeholders in a variety of combinations. In the text placeholders, type titles, subtitles, and body text onto your slides. You can resize and move placeholders and format them with borders and colors. AutoShapes: AutoShapes such as callout balloons and block arrows lend themselves to text messages. When you type text into an AutoShape, the text is attached to the shape and moves or rotates with the shape. Text boxes: Use text boxes to place text anywhere on a slide, such as outside a text placeholder. For example, you can add a caption to a picture by creating a text box and positioning it near the picture. Also, a text box is handy if you want to add text to an AutoShape, but you don't want the text to attach to the shape. A text box can have a border, fill, shadow, or three-dimensional (3-D) effect, and you can change its shape. WordArt: Use WordArt for fancy text effects. WordArt can stretch, skew, curve, and rotate your text or make it 3-D or vertical. Add a picture 1. Click where you want to insert the picture. 2. Insert a picture from a file: On the Drawing toolbar click Insert Picture From File. Locate the folder that contains the picture that you want to insert, and then click the picture file. Do one of the following: To embed (information that contained in a source file and inserted into a destination file. Once embedded, the object becomes part of the destination file. Changes you make to the embedded object are reflected in the destination file) the picture, click Insert. To link (linked object: an object that is created in a source file and inserted into a destination file, while maintaining a connection between the two files) the picture to the picture file on your hard disk, click the arrow next to Insert, and then click Link to File. To animate your presentation You can animate text, graphics, diagrams, charts, and other objects on your slides so that you can focus on important points, control the flow of information, and add interest to your presentation. Custom Animation Custom animations can be applied to items on a slide, in a placeholder, or to a paragraph (which includes single bullets or list items). For instance, you can apply the fly-in animation to all items on a slide or you can apply it to a single paragraph in a bulleted list. Use entrance, emphasis, or exit options, in addition to preset or custom motion path. Also, you can apply more than one animation to an item; so, you can make that bullet item fly in and then fly out. Most animation options include associated effects to choose from. These might include options for playing a sound with the animation, and text animations usually let you apply the effect by letter, word, or paragraph (such as having a title fly in a word at a time instead of all at once). You can preview the animation of your text and objects for one slide or for the whole presentation. Animate the object from the slides: chose from the slide Show menu the option Custom animation. After that select from the slide the object that you want to animate it and from Add effects chose an effect. Make the same things for the entire object. Animate the whole presentation: from Slide Show menu chose Slide Transition. Chose then a type of transition; from Modify transition chose o speed of transition; from Advance slide chose on mouse click and chose to Apply to all slides. To view your presentation from View menu chose Slide Show. Start a slide show presentation Do one of the following: Start a slide show from within Microsoft PowerPoint 1. Open the presentation you want to view as a slide show. 2. Do one of the following: o Click Slide Show at the lower left of the PowerPoint window. o On the Slide Show menu, click View Show. o Press F5. Start a slide show from your desktop 1. In My Computer or Microsoft Windows Explorer, locate the presentation file you want to open as a slide show. 2. Right-click the file name, and then click Show. Start a PowerPoint Show (.pps) Use this procedure to open and play a slide show you have saved as a PowerPoint Show (.pps). 1. In My Computer or Microsoft Windows Explorer, locate the PowerPoint Show file you want to open. 2. Double-click the file name to open it. Navigate between slides during a presentation Use the following commands in slide show view. For each type of navigation, choose from several methods. Go to the next slide Click the mouse. Press SPACEBAR or ENTER. Right-click, and on the shortcut menu, click Next. Go to the previous slide Press BACKSPACE. Right-click, and on the shortcut menu, click Previous. Go to a specific slide Type the slide number, and then press ENTER. Right-click, point to Go on the shortcut menu, then point to By Title, and click the slide you want. See previously viewed slide Right-click, point to Go on the shortcut menu, and then click Previously Viewed. About hyperlinks and action buttons In Microsoft PowerPoint, a hyperlink is a connection from a slide to another slide, a custom show, a Web page, or a file. The hyperlink itself can be text or an object such as a picture, graph, shape, or WordArt. An action button is a ready-made button that you can insert into your presentation and define hyperlinks for. If the link is to another slide, the destination slide is displayed in the PowerPoint presentation. In PowerPoint, hyperlinks become active when you run your presentation, not when you are creating it. When you point to a hyperlink, the pointer becomes a hand, indicating that it is something you can click. Text that represents a hyperlink is displayed underlined and in a color that coordinates with your color scheme. Pictures, shapes and other object hyperlinks have no additional formatting. You can add action settings, such as sound or highlighting, to emphasize hyperlinks. About printing You can print your entire presentation — the slides, outline, notes pages, and audience handouts — in color, grayscale, or pure black and white. You can also print specific slides, handouts, notes pages, or outline pages. Color, black and white, or grayscale Most presentations are designed to be shown in color, but slides and handouts are usually printed in black and white or shades of gray (grayscale). When you choose to print, Microsoft PowerPoint sets the colors in your presentation to match your selected printer's capabilities. For example, if your selected printer is black and white, your presentation will automatically be set to print in grayscale. With print preview, you can see how your slides, notes, and handouts will look in pure black and white or in grayscale, and adjust the look of objects before you print. You can also make certain changes when you preview before printing. You can select: What you want to print: the presentation, handouts, notes pages, or just the outline. A layout for handouts. To add a frame around each slide for print out only. Orientation (portrait or landscape) for handouts, notes pages, or an outline. Header and footer options. Slides and transparencies You might choose to print only the slides and use them as handouts. Slides print one per page and can be sized to fit a variety of paper sizes. Slides can also be sized to fit standard transparencies (for overhead projectors), 35mm slides, or you can customize the fit and orientation. Outline, notes pages and handouts Outline: You can choose to print all the text in your outline or just the slide titles, in either landscape (horizontal) or portrait (vertical) orientation. The printout might look different from the screen display; while you can show or hide formatting (such as bold or italic) in the Outline pane on screen, on the printout the formatting will always appear. Notes pages: Print your notes pages either for your own use when delivering a presentation or to include as handouts for your audience. Notes pages can be designed and formatted with colors, shapes, charts, and layout options. Each notes page includes a copy of the slide it refers to and prints one slide per page, with the notes printed under the slide image. To print two slides per page with the associated notes printed next to the slides, you can send the presentation to Microsoft Word. Headers and footers on the notes pages are separate from the headers and footers on the slides. Handouts: You can design and create handouts similarly to the notes pages. However, you can choose from many layout options for printing: from 1 slide per page to 9 slides per page. The 3- slides-per page option includes lined space for note-taking by the audience. For additional layout options, you can send the presentation to Microsoft Word. Headers and footers on handouts are separate from the headers and footers on the slides. Problems Problem_1 1. Create a PowerPoint presentation about how a physicians can use a computer in his every day work (described hardware and software components and the principals programs - ex. Word - documents and letters; Excel-organizing, calculating and analyzing your data; PowerPoint- presentations; Access-to keep your data about a subject in one place). Animate your presentation. Save your presentation in your folder as First Presentation. 2. Close all programs and from Shut Down chose Log off your name. Problem_2 1. Create a PowerPoint presentation about the informatics lab. The presentation must to have a title slide, a slide about word documents (when you can use Word document), a slide about Microsoft Excel (when you can use Excel document) and an end slide. 2. The Word and Excel slides must to have: Some actions button who do the hyperlink with your Word lab and with your Excel lab. The title must to be in WordArt format (see Drawing bare) You must to has some pictures and some AutoShapes (see Drawing bare) 3. Save your presentation as Second presentation in your folder. 4. Close all programs and from Shut Down chose Log off your name. Microsoft Access About Microsoft Access Designing a database Good database design ensures that your database is easy to maintain. You store data in tables and each table contains data about only one subject, such as customers. Therefore, you update a particular piece of data, such as an address, in just one place and that change automatically appears throughout the database. A well-designed database usually contains different types of queries that show the information you need. A query might show a subset of data, such as all customers in London, or combinations of data from different tables, such as order information combined with customer information. Before you use Microsoft Access to actually build tables, queries, forms, and other objects, it's a good idea to sketch out and rework your design on paper first. You can also examine well- designed databases similar to the one you are designing. Follow these basic steps when designing your database. 1. Determine the purpose of your database The first step in designing a database is to determine its purpose and how it's to be used: Talk to people who will use the database. Brainstorm about the questions you and they would like the database to answer. Sketch out the reports you'd like the database to produce. Gather the forms you currently use to record your data. 2. Determine the fields you need in the database Each field is a fact about a particular subject. For example, you might need to store the following facts about your patients: id_number, name, address, city, date of birth, and phone number. You need to create a separate field for each of these facts. When determining which fields you need, keep these design principles in mind: Include all of the information you will need. Store information in the smallest logical parts. For example, patient names are often split into two fields, FirstName and SecondName, so that it's easy to sort data by SecondName. Don't create fields for data that consists of lists of multiple items. Don't include derived or calculated data (data that is the result of an expression). Don't create fields that are similar to each other. For example, in a Patients table, if you create the field age and date of birth it will be more difficult to find the information about a patient age. 3. Determine the tables you need in the database Each table should contain information about one subject. Your list of fields will provide clues to the tables you need. 4. Determine which table each field belongs to When you decide which table each field belongs to, keep these design principles in mind: Add the field to only one table. Don't add the field to a table if it will result in the same information appearing in multiple records in that table. If you determine that a field in a table will contain a lot of duplicate information, that field is probably in the wrong table. When each piece of information is stored only once, you update it in one place. This is more efficient, and it also eliminates the possibility of duplicate entries that contain different information. 5. Identify the field or fields with unique values in each record In order for Microsoft Access to connect information stored in separate tables — for example, to connect a customer with all the customer's orders — each table in your database must include a field or set of fields that uniquely identifies each individual record in the table. Such a field or set of fields is called a primary key. 6. Determine the relationship between tables Now that you've divided your information into tables and identified primary key fields, you need a way to tell Microsoft Access how to bring related information back together again in meaningful ways. To do this, you define relationships between tables. 7. Refine your design After you have designed the tables, fields, and relationships you need, it is time to study the design and detect any flaws that might remain. It is easier to change your database design now than it will be after you have filled the tables with data. 8. Enter data and create other database objects When you are satisfied that the table structures meet the design principles described here, then it's time to go ahead and add all your existing data to the tables. You can then create other database objects – queries, forms, reports, data access pages, macros and modules. Determine the table you need in the database Each table should contain information about one subject. Your list of fields will provide clues to the tables you need. Determine which table each field belongs to When you decide which table each field belongs to, keep these design principles in mind: Add the field to only one table. Don't add the field to a table if it will result in the same information appearing in multiple records in that table. If you determine that a field in a table will contain a lot of duplicate information, that field is probably in the wrong table. When each piece of information is stored only once, you update it in one place. This is more efficient, and it also eliminates the possibility of duplicate entries that contain different information. Identify the field or fields with unique values in each record In order for Microsoft Access to connect information stored in separate tables — for example, to connect a patient with all the his/her consultation — each table in your database must include a field or set of fields that uniquely identifies each individual record in the table. Such a field or set of fields is called a primary key. Determine the relationships between tables Now that you've divided your information into tables and identified primary key (one or more fields – columns – whose value or values uniquely identify each record in a table. A primary key cannot allow Null values and must always have a unique index. A primary key is used to relate a table to foreign keys in other tables) fields, you need a way to tell Microsoft Access how to bring related information back together again in meaningful ways. To do this, you define relationship (an association established between common fields – columns – in two tables. A relationship can be one-to-one, one-to many or many-to-many) between tables. Create an Access database Microsoft Access provides two methods to create an Access database. You can use a Database Wizard to create in one operation the required tables, forms, and reports for the type of database you choose — this is the easiest way to start creating your database. Or you can create a blank database and then add the tables, forms, reports, and other objects later — this is the most flexible method, but it requires you to define each database element separately. Either way, you can modify and extend your database at any time after it has been created. Create a Blank Database 1.From the File menu chose New options. 2.In the New File task pane, under New, click Blank Database. 3.In the File New Database dialog box, specify a name and location for the database, and then click Create. Open an Access database 1.On the File menu, click Open. 2.Click a shortcut in the left side of the Open dialog box, or in the Look in box, click the drive or folder that contains the Microsoft Access database that you want. 3.In the folder list, double-click folders until you open the folder that contains the database. 4.Double-click the database. Create a table in Design View 1.Click Tables under Objects. 2.Double-click on Create table Design view. 3.Define each of the fields in your table. 4.Define a primary key field before saving your table. Open a table in Design view. Select the field or fields you want to define as the primary key. To select one field, click the row selector for the desired field. To select multiple fields, hold down the CTRL key and then click the row selector for each field. Click Primary Key on the toolbar. You do not have to define a primary key, but it is usually a good idea. If you do not define a primary key, Microsoft Access asks if you want Access to create one for you when you save the table. 5.When you are ready to save your table, click Save button on the toolbar, and then type a unique name for the table. About primary keys The power of a relational database system such as Microsoft Access comes from its ability to quickly find and bring together information stored in separate tables using queries, forms and reports. In order to do this, each table should include a field or set of fields that uniquely identifies each record stored in the table. This information is called the primary key of the table. Once you designate a primary key for a table, Access will prevent any duplicate or Null values from being entered in the primary key fields. There are three kinds of primary keys that can be defined in Microsoft Access: 1. AutoNumber primary keys An AutoNumber field can be set to automatically enter a sequential number as each record is added to the table. Designating such a field as the primary key for a table is the simplest way to create a primary key. If you don't set a primary key before saving a newly created table, Microsoft Access will ask if you want it to create a primary key for you. If you answer Yes, Microsoft Access will create an AutoNumber primary key. 2. Single-field primary keys If you have a field that contains unique values such as ID numbers or part numbers, you can designate that field as the primary key. You can specify a primary key for a field that already contains data as long as that field does not contain duplicate values or Null values. 3. Multiple-field primary keys In situations where you can't guarantee the uniqueness of any single field, you may be able to designate two or more fields as the primary key. The most common situation where this arises is in the table used to relate two other tables in a many-to-many relationship. For example, a Consultations table can relate the Patients tables. Its primary key consists of two fields: NumPac and Date_of_consultation. About relationships in an Access database After you've set up different tables for each subject in your Microsoft Access database, you need a way of telling Microsoft Access how to bring that information back together again. The first step in this process is to define relationships between your tables. After you've done that, you can create queries, forms, and reports to display information from several tables at once. For example, this form includes information from four tables: A one-to-many relationship A one-to-many relationship is the most common type of relationship. In a one-to-many relationship, a record in Table A can have many matching records in Table B, but a record in Table B has only one matching record in Table A. A many-to-many relationship In a many-to-many relationship, a record in Table A can have many matching records in Table B, and a record in Table B can have many matching records in Table A. This type of relationship is only possible by defining a third table (called a junction table) whose primary key consists of two fields — the foreign keys from both Tables A and B. A many-to-many relationship is really two one-to-many relationships with a third table. A one-to-one relationship In a one-to-one relationship, each record in Table A can have only one matching record in Table B, and each record in Table B can have only one matching record in Table A. This type of relationship is not common, because most information related in this way would be in one table. You might use a one-to-one relationship to divide a table with many fields, to isolate part of a table for security reasons, or to store information that applies only to a subset of the main table. About forms A form is a type of a database object that is primarily used to enter or display data in a database. You can also use a form as a switchboard that opens other forms and reports in the database, or as a custom dialog box that accepts user input and carries out an action based on the input. Creating a form in Design view You can customize a form in Design view in the following ways: Record source: Change the tables and queries that a form is based on. Controlling and assisting the user: You can set form properties to allow or prevent users from adding, deleting, or editing records displayed in a form. You can also add custom Help to a form to assist your users with using the form. Form window: You can add or remove Maximize and Minimize buttons, short cut menus, and other Form window elements. Sections: You can add, remove, hide, or resize the header, footer, and details sections of a form. You can also set section properties to control the appearance and printing of a form. Controls: You can move, resize, or set the font properties of a control. You can also add controls to display calculated values, totals, current date and time, and other useful information on a form. Add or edit data in a datasheet or form 1.Open a datasheet or open a form in Form view. 2.Do one of the following: To add a new record, click New Record button on the toolbar, type the data, and then press TAB to go to the next field. At the end of the record, press TAB to go to the next record. To edit data within a field, click in the field you want to edit, and then type the data. To replace the entire value, move the pointer to the leftmost part of the field until it changes into the plus pointer, and then click. Type the data. Notes: To correct a typing mistake, press BACKSPACE. To cancel your changes in the current field, press ESC. To cancel your changes in the entire record, press ESC again before you move out of the field. About designing a query When you open a query in Design view (a window that shows the design of these database objects: tables, queries, forms, reports, macros, and data access pages. In Design view, you can create new database objects and modify the design of existing ones.), or open a form, report, or datasheet. Add or remove tables, queries and field You can add a table or query if the data you need isn't in the query, or remove a table or query if you decide you don't need them. Once you add the tables or queries you need, you can then add the fields that you want to work with to the design grid, or remove them if you decide you don't need them. Limit results by using criteria You can limit the records that you see in the query's results or the records that are included in a calculation by specifying criteria. About reports A report is an effective way to present your data in a printed format. Because you have control over the size and appearance of everything on a report, you can display the information the way you want to see it. Most reports are bound to one or more table and query in the database. A report's record source refers to the fields in the underlying tables and queries. A report need not contain all the fields from each of the tables or queries that it is based on. A bound report gets its data from its underlying record source. Other information on the form, such as the title, date, and page number, is stored in the report's design. Creating a report You can create different types of reports quickly by using wizards. Use the Label Wizard to create mailing labels, the Chart Wizard to create charts, or the Report Wizard to create a standard report. The wizard asks you questions and creates a report based on your answers. Creating report by using wizard You can customize a report in the following ways: Record source: Change the tables and queries that a report is based on. Sorting and grouping data: You can sort data in ascending or descending order. You can also group records on one or more fields, and display subtotals and grand totals on a report. Report window: You can add or remove Maximize and Minimize buttons, change the title bar text, and other Report window elements. Sections: You can add, remove, hide, or resize the header, footer, and details sections of a report. You can also set section properties to control the appearance and printing of a report. Controls: You can move, resize, or set the font properties of a control. You can also add controls to display calculated values, totals, current date and time, and other useful information on a report. Problems Problem_1 1. Create a new Access document: Start - Programs - Microsoft Access. From File menu chose New - Microsoft Access Application. The name of the document it will be First Database. 2. Open the Database files with double click on his icon and create two tables: From the Objects column chose Tables and after that chose Create table in Design view. The first table is named Patients and will have the next field: The next table name is Consultations and will have the next field: Primary key: The primary key for the table Patients will be the field named Id_Pacients; in the table Consultations the primary key will be the fields: NumPac and Date of consultations. o Open a table in Design view. o Select the field or fields you want to define as the primary key. o To select one field, click the row selector for the desired field. o To select multiple fields, hold down the CTRL key and then click the row selector for each field. o Click Primary Key on the toolbar. To create a relationship between the two tables: o In Design view, open the table named Consultation. o Click in that field NumPac row and in the Data Type column, click the arrow and select Lookup Wizard. o In the first dialog box of the Lookup Wizard, select the option that indicates you want the Lookup field to look up the values in a table or query. o Click Next and follow the directions in the remaining Lookup Wizard dialog boxes. When you click the Finish button, Microsoft Access creates a Lookup field whose properties are based on the choices you made in the wizard. Introduce in your table 10 patients. Every patient must to have more then one consultations. 3. Close the application and don’t forget to log off: Start – Shut Down – a dialog box where ask you ―What do you want to computer to do?‖ and click on Log off FirstName.SecondName. Problem_2 1. Create a new Access document: Start - Programs - Microsoft Access. From File menu chose New - Microsoft Access Application. The name of the document it will be Second Database. 2. Open the Database files with double click on his icon and create two tables: From the Objects column chose Tables and after that chose Create table in Design view. The first table is named Patients and will have the next field: The next table name is Analysis and will have the next field: The primary key for the table Patients will be the field named Id_Pacients; in the table Analysis the primary key will be the fields: Patient and Date_of _analysis. Create a relationship between the field Name from the Patients table and the field Patients from the tables Analysis. To create the relationship follows the next steps: o Open the table named Analysis in Design view. o Click in that field Patient row and in the Data Type column, click the arrow and select Lookup Wizard. o In the first dialog box of the Lookup Wizard, select the option that indicates you want the Lookup field to look up the values in a table or query. o Click Next and follow the directions in the remaining Lookup Wizard dialog boxes. When you click the Finish button, Microsoft Access creates a Lookup field whose properties are based on the choices you made in the wizard. Introduce in your table 10 patients. Every patient must to have more then one consultations. 3. Create an interrogation (Query) to find which patients have the level of glicemia more than 110. For that follow the next steps: a. From the Access Objects chose Queries. b. Double-click on Create Queries in Design view. c. Click on the table Patients and after that click on Add button. Make the same steps to add the Analysis table. d. Chose from the table Patients the field Name and from the table Analysis chose the fields Date_of_analysis and Glicemia. As criteria from the column Glicemia introduce >=110. 4. Create a report with all patients from the data based. a. From the Access Object chose Reports. b. Double click on Create Report by using wizard. c. Chose form the table Patients, by using Add button, the next field: Name, Date of birth and Sex and from Analysis table chose: Date_of_analysis, Glicemia and Cholesterol. d. Follow the next steps until finish the report. The name of report must to be Patients. 5. Close the application and don’t forget to log off: Start – Shut Down – a dialog box where ask you ―What do you want to computer to do?‖ and click on Log off FirstName.SecondName. EpiInfo 2000 About EpiInfo 2000 The main programs of EpiInfo can be accessed either through the PROGRAMS menu or by clicking on the icon buttons. The buttons can be turned on or off with the BUTTONS item on the SETTINGS menu. EpiInfo 2000 have some principal menus: Programs Examples Adv.Stats Language Settings Manual DoEpiTutorials With the Program menu we can open the next programs: Make View Designing a new form or questionary Enter Data Entering data; Opening an existing View and database; Searching for particular records Analyze Data READ a view or a data file or table; LIST the contents of the database; Obtain the FREQuency of values for a field; Cross-tabulate with the TABLES command and resulting epidemiologic statistics; Define a new variable and assigning a value; Use an IF statement to determine and assign case status; SELECT a subset of records to process; RECODE values to group the AGE field; WRITE data to another file or table; READ a non Access file and related tables in a view and analyze data from more than one table StatCalc An epidemiological calculator that produces statistics from summary data entered on the screen. Offered three types of calculation: statistics from 2-by-to to 2-by-9 table; sample size calculation; chi square for trend by the Mantel Extension Method. Epi Map Adding a shape file to create a map; Adding data to be represented as a color density map; Creating a map of cholera cases in John Snow's London using X and Y coordinates of case households Nutrition Entering data from one child's measurements; Interpreting results from calculations based on accepted reference standards; Graphing more than one result to show a child's growth; Customizing the data entry screen Visualize Data To update the data Word Processing Word processor (for improving the results) EXIT To close EpiInfo sessions MakeView Program Designing a new form or questionnaire (a View) Text and numeric fields Specifying a list of Legal values Inserting a grid, the automatic way to deal with repeating data within a questionnaire Large text (multiline) fields To run MakeView, click on the Make View button on the main menu screen. You should see a blank page for constructing a ―View.‖ Questionnaires are called Views in EpiInfo 2000 because there can be more than one View of a database or data table. To make a view, from the FILE menu choose Make New View. Enter a name for your project database, such as your name or initials, and click OPEN. A project or database (.MDB for "Microsoft Database") file can hold as many Views and data tables as you wish (well, up to 1000, anyway). Place the cursor near the upper left corner of the blank page and click the right mouse button. The field dialog box that appears offers options for entering the prompt, the field type and length, and a number of the characteristics that were previously implemented. For the first field, enter the prompt ―First Name‖ and press Enter twice. This makes a text field that can hold up to 255 characters. For the next field, you could move the cursor and right-click with the mouse on a suitable location. Below First Name, right-click to add another field. Enter the prompt ―Today’s Date,‖ and use the scroll bar to the right of the field types to see the rest of the list of types. Choose the DATE type and the appropriate date format as MM-DD-YYYY or DD-MM-YYYY in the dialog. Click OK. Add another field for ―Date of Birth,‖ using the same field type and pattern. Click OK. Right-click on the form to make a field for AGE. Type ―Age‖ as the prompt. Choose NUMBER for the TYPE and then choose ### or ## from the PATTERN list. You can also type patterns into the pattern window. Click on OK at the bottom of the dialog. The next field is ―Sex.‖ We will use it to illustrate how variable names are constructed. Right- click where you would like to place the field. Type ―Male, Female, or Unknown Sex‖ in the prompt window, press Enter, and note what appears in the Field Name window on the right. If you need another page, then click on the ADD PAGE button under the page window on the left side of the screen. The first page is saved automatically and a blank page appears. Problem_1: Create a questionnaire with the next field: Name of field Type of field FirstName Text SecondName Text Sex Yes/No (Yes = Male, No = Female) DateOfBirth Date Address Text Phone number Number (9 from the Pattern list) Profession Text DateOfConsultation Date Weight Number (### from the Pattern list) Height Number (#.## from the Pattern list) SystolicBloodPressure Number (### from the Pattern list) DiastolicBloodPressure Number (### from the Pattern list) Cholesterol Number (### from the Pattern list) Hypertension Yes/No (Yes = ill, No = not ill) The name of the questionnaire will be Example and you will save this file in your personal folder. Enter Program Entering data and verifying that your age calculation works Moving from page to page Opening an existing View and database Navigating from record to record Searching for particular records. You should have the EXAMPLE questionnaire on the screen. If not, go back to the main menu and choose ENTER DATA, OPEN on the FILE menu, and then the database that you created and the EXAMPLE view. It will appear the next windows: and confirm with OK. Enter data in the fields displayed. To move from one field to another pres TAB or move your mouse pointer into the field that you want to enter the data. After you finish with a patient, to move to another one you can click on the NEW button ( ) to save the current record and move to a blank record. Moving From Record to Record Examine the records in the file by moving from record to record with the arrow buttons on the lower left ( ). The double arrows move to the first and last records; single arrows move one record at a time. To move to a new record, click on the double right arrows twice. Finding Records To find records matching specified criteria, click on the FIND button on the left. A dialog box appears. Choose the SystolicBloodPressure field and then type ―>160‖ (without quotation marks) in the field that appears. Click on the OK button to find all the records in which SystolicBloodPressure is greater then 160. If you want to continue with the current record, click on the BACK button. Problem_2: Enter the data in the questionnaire EXAMPLE for the 25 patients. Problem_3: Using Find button, find all the records in which Systolic blood-pressure is greater than 140, Diastolic blood-pressure is greater than 90, the Sex of patients are ―Yes‖ and who have from the Hypertension field the option ―Yes‖. Analysis Program READ a view or a data file or table LIST the contents of the database Obtain the FREQuency of values for a field Cross-tabulate with the TABLES command and resulting epidemiologic statistics The library of previous output, all in HTML for the Internet Choose how "Yes" and "No" are displayed Define a new variable and assigning a value Use an IF statement to determine and assign case status SELECT a subset of records to process RECODE values to group the AGE field WRITE data to another file or table READ a non Access file READ related tables in a view and analyze data from more than one table To run Analysis, click on the ANALYZE DATA button on the main menu screen. Note that all commands are shown in the tree view on the left. Clicking on a command will bring up a dialog that places the command in appropriate form in the program editor at the bottom of the screen. Results appear in the third window above the program editor, which is a simplified version of the Microsoft Internet browser. The most frequently Analysis Command used are: Data: Read (Import), Relate, Write (Export), Merge Select/If: Select, Cancel Select, If, Sort, Cancel Sort Statistics: List, Frequencies, Tables, Means, Graph, Map Advanced Statistics: Linear Regression, Logistic Regression, Kaplan-Maier Frequencies, Complex Sample Tables, Complex Sample Mean Output: Header, Type, RoutOut, CloseOut, PrintOut, Storing Output Selection of statistical methods depends on the type of data and the purpose of the analysis. EpiInfo provides a number of statistical methods within the commands FREQ, TABLES, MEANS, and REGRESS. The commands used for statistical analysis in EpiInfo, the types of data to be analyzed, and the purposes of various analyses are as follows: For all types of data Purpose See the data, record by record Analysis command LIST For Text, Numbers or Dates – Categorical Data Purpose determine the frequency of each value Analysis command FREQ Statistical method(s) confidence intervals on proportions For Text, Numbers or Dates – Categorical Data – 2 values per variable Purpose evaluate association between variables cross tabulation 2x2 tables Analysis command TABLES (Risk Factor)(Outcome) Statistical method(s) odds ratios with confidence limits Risk Ratio with confidence limits Chi Square Fisher Exact For Text, Numbers or Dates – Categorical Data – 2 values per variable with multiple value for a possible confounder or modifier used for stratification Purpose assessing confounding and interaction (Effect Modification) Analysis command TABLES (Risk Factor)(Outcome)(Stratifier(s)) Statistical method(s) Mantel-Haenszel odd ratio with confidence limits Summary Risk Ratio with confidence limits Fisher Exact For Text, Numbers or Dates – Categorical Data – More than 2 Values per Variable Purpose cross tabulation RxC Analysis command TABLES (Risk Factor)(Outcome) optional (Stratifier(s)) Statistical method(s) Chi Square For Numbers – Continuous Data, Single Variable Purpose express series of numbers as a single value Analysis command MEANS VariableName Statistical method(s) Sum, Mean, Median, Mode, Percentiles, Standard Deviation, Variance For Numbers – Continuous Data, Single Variable Purpose test whether a series of differences (e.g. before and after) differs from zero Analysis command MEANS VariableName Statistical method(s) t-test For Numbers – Continuous Data, Grouped by Another (Categorical) Variable Purpose see if group values differ Analysis command MEANS Number GroupVar Bartlett’s test for homogeneity of variances Statistical method(s) ANOVA (and Student’s t-test if two groups) Kruskal-Wallis Test (Mann-Whitney/Wilcoxon if two groups) For Numbers – Continuous Data, Two Numeric Variables Purpose measure Correlation Analysis command REGRESS Outcome=Risk Factor Statistical Pearson Correlation Coefficient method(s) Mean, Beta coefficient, Lower and Upper Confidence Limits, Standard Error, F-Test, Y- Intercept For Numbers – Continuous Data, More Than Two Numeric Variables Purpose predict outcome from Risk Factors Analysis REGRESS Outcome=Risk Factor(s) command Statistical Pearson Correlation Coefficient method(s) Mean, Beta coefficient, Lower and Upper Confidence Limits, Standard Error, Partial F- Test, Y-Intercept READing a View in Analysis READ makes one or more views the active dataset. It also removes any previously active datasets and associated DEFINED variables, and dataset-specific commands. Syntax: READ <Table specification> LINKNAME= <LinkTable Name> FILESPEC HDR="NO" FMT="<File format>" LINKNAME: The name of a link table in the current (home) MDB that constitutes a link to an external file or data table. FILESPEC: Additional information necessary to process the data table, which depends upon the type of data being processed. With EpiInfo we can read some different files as: Epi2000 Views, Access Table, Desktop Databases, Microsoft Excel, ODBC databases, Text Files, HTML Files. Click on the READ command. A dialog box appears so that you can choose a database and a view. Problem_3: Read the questionnaire that you create it. LISTS LIST does a line listing of the current dataset. If variable names are given, LIST will list only these variables. LIST * will list all variables of all active records, using several pages across to accommodate all the variables, if necessary Click on the LIST command. In the dialog that appears, choose one or more variables or click on ―ALL‖ to choose all. The variables are displayed in columns in a scrolling window. Click the button with the "X" in the upper right corner of the Grid to leave the Grid. Try the LIST command again, but choose HTML as the output format. This time the results appear in the form of an Internet web page displayed in the small browser included with EpiInfo 2000. This browser displays web pages on the local machine, but is not itself connected with the Internet. Problem_4: List all the field from the questionnaire EXAMPLE. FREQUENCIES FREQ produces a table from the table(s) specified in the last READ statement, showing how many records have each value of the variable. Confidence limits for each proportion are included. Syntax: FREQ <Variable> Choose the Frequencies command. In the dialog box, use the dropdown menu to select one or more variables, and then click OK. After a short wait, the results should appear in the browser window. Scroll up and down and note that each table is accompanied by yellow bars to the right that indicates the frequencies. Statistics will be displayed below the table if the value of the variable is numeric, as in Cholesterol, but not for Yes/No fields like HYPERTENSION. Example_1: FREQ ILL ILL Frequency Percent Cum Percent Yes 46 61.3% 61.3% No 29 38.7% 100.0% Total 75 100.0% 100.0% 95% Conf Limits Yes 49.4% 72.4% No 27.6% 50.6% In the table: ILL shows values for the variable ILL. The representation of Yes and No is determined by a SET command under Options, but the underlying data values are always 0 for No and 1 for Yes. Frequency is the number of records in the dataset having the indicated value. In the example, there are 46 people who said they were ill and 29 who were not ill. Percent is the number of ill divided by the total (i.e., 46/75 or 61.3%). Cum. Percent The cumulative percent cumulatively adds the percent column. Total shows the total number of observations in table (in this example, 75) and the total percent, which is always 100.0% 95% Confidence Limits are given for each proportion. If the 75 records had been chosen randomly from a large number of interviews, we would predict with 95% confidence (i.e. be wrong 5 out of 100 tries) that the number of ill in the larger population lies between 49.4% and 72.4%. Problem_5: Using Frequencies command find the frequencies of HYPERTENSION if you use like weight CHOLESTEROL. TABLES TABLES does a cross-tabulation of the specified variables and sends the table to the screen, printer, or other current output. Values of the first variable will appear on the left margin of the table, and those of the second variable will be across the top of the table. Normally cells contain counts of records matching the values in the corresponding marginal labels. Example_2: Yes/No or True/False Variables and 2x2 Tables In epidemiology, 2x2 tables are frequently used. In these tables, there is usually an ―exposure‖ variable that has two levels (e.g. exposed vs. not exposed) and a dichotomous (2-value) outcome variable (e.g. the person had the disease or outcome of interest or they did not; see Table 1). With 2x2 tables, the odds ratio (OR), risk ratio (RR) and other parameters can be calculated. To get the correct estimates for these parameters in EpiInfo, the table must be set up as follows: Table 1. Standard table setup and notation for EpiInfo (count data and unmatched data) Disease Exposed Yes No Total Yes a b n1 No c d n0 Total m1 m0 n The ―exposure‖ is the row variable, and the ―disease‖ or ―outcome‖ is the column variable. The cases with the exposure of interest should be in the first row (in this example ―Yes‖) and those without the exposure in the second row. For the ―disease‖ or ―outcome‖ variable, those with the outcome of interest should be in the first column (in this example ―Yes‖) and those without the disease in the second column. Parameters, such as the risk ratio and odds ratio, are calculated regardless of whether the table is set up correctly or incorrectly. The computer program relies on the user to assure that the information is provided correctly, as depicted in Table 1. What happens if the table is set up incorrectly? There are eight possible ways to mix up a 2x2 table. Only two possible odds ratios can be calculated, the true OR (1.75) or its inverse (0.57 or 1/1.75). There are eight different possible ―risk ratios‖ that can be calculated, of which only one (RR=1.24) is correct. In the output below the exposure is a dichotomous variable, CAT (catecholamine level); those exposed have high CAT levels (1) and the nonexposed have low CAT levels (0). The outcome is coronary heart disease (CHD); those who had a coronary event are 1 and those without CHD are 0. The first and last three records of the data for CAT and CHD are depicted below. Evans County data listing for CAT and CHD Current View C:\data\Data.mdb : viewKkmyn No Of Records 609 Date 02/22/2002 4:51:35 PM ID CAT CHD 21 0 0 31 0 0 51 1 1 ... ... ... 19091 0 0 19121 0 0 19161 0 0 The output from the TABLES command for a 2x2 table is provided in three sections. The first section provides the table, the second the parameter estimates, and the third the statistical tests. TABLES CAT CHD First section, the Table. We continue with the above example, showing the TABLES command for a 2x2 Table: Current View C:\data\Sample.mdb : viewEvansCounty No Of Records 609 Date 02/24/2002 8:26:25 PM CAT by CHD CHD CAT YES No Total Yes 27 95 122 No 44 443 487 Total 71 538 609 Single Table Analysis Point 95% Confidence Interval Estimate Lower Upper PARAMETERS: Odds-based Odds Ratio (cross product) 2.8615 1.6878 4.8514 (T) Odds Ratio (MLE) 2.8554 1.6690 4.8350 (M) 1.6148 4.9853 (F) PARAMETERS: Risk-based Risk Ratio (RR) 2.4495 1.5837 3.7887 (T) Risk Difference (RD) 13.0962 5.3021 20.8903 (T) (T=Taylor series; C=Cornfield; M=Mid-P; F=Fisher Exact) STATISTICAL TESTS Chi-square 1-tailed p 2-tailed p Chi square - uncorrected 16.2465 0.0000567826 Chi square - Mantel-Haenszel 16.2198 0.0000575712 Chi square - corrected (Yates) 14.9998 0.0001086935 Mid-p exact 0.0000910000 Fisher exact 0.0001400000 When the table is set up correctly for EpiInfo (as depicted in Table 1), the row proportions have epidemiologic meaning. In a cohort study, in the first row, the row percentage in the first column is the ―risk‖ of disease among the exposed; in the second row, the risk in the unexposed. If the data are from an outbreak of acute disease, these are sometimes referred to as ―attack rates.‖ If the data are based on prevalence information, the row proportions are the prevalence of disease in the exposed and unexposed. Interpretation of Statistical Tests All of the p-values in the example are < 0.001, indicating a statistically significant association between CAT and CHD in the study population. One question is the use of one-tailed vs. two- tailed p-values. Many authors argue that two-tailed p-values are appropriate in the majority of situations. Which of the two-sided p-values should be used? When the cell sizes are reasonably large, all of the p-values will be similar; when the data are sparse, the exact p-values, with the exact mid-p p-values, seem to be the most frequently recommended. Prior to performing analyses, the user should decide which p-value(s) to use to determine statistical significance. Occasionally one method will have a statistically significant p-value and others may not. Problem_6: Click the TABLES command. In the EXPOSURE VARIABLE field, choose SEX and for the OUTCOME VARIABLE, choose HYPERTENSION. This will perform a cross-tabulation of SEX by HYPERTENSION. MEANS MEANS <Variable1> [<Variable2>] <Variable1>: A numeric variable to be used to calculate means (or * for all numeric variables) <Variable2>: Any variable used for cross-tabulation (optional, or * for all numeric variables) The MEANS command has two formats. If only one variable is supplied, the program produces a table like that produced by FREQuencies, plus descriptive statistics. If two variables are supplied, the first is a numeric variable containing data to be analyzed and the second is a variable that indicates how groups will be distinguished. The output of this format is a table like that produced by TABLES, plus descriptive statistics of the numeric variable for each value of the group variable. MEANS produces the following statistical tests: Parametric tests o ANOVA (for two or more samples) o Student's t-test (for two samples) Non-parametric tests o Kruskal-Wallis one-way analysis of variance (for two or more samples) o Mann-Whitney U Test = Wilcoxon Rank Sum Test (for two samples) o Further details are given in the chapter on Statistics. Example_3: For A Single Numeric Variable — the MEANS Command The MEANS command is used when the variable of interest is numeric and measured on a continuous scale. A continuous variable can have decimal values (real numbers like 44.645) or integer values (44). In some ways, AGE can be considered either categorical (with 1-year categories) or continuous, but to use the MEANS command the Mean or Average value of the values must be of interest. The Mean AGE of one or more groups of people is useful information, whereas the Mean of the numeric codes for countries of the world would usually be of no interest, even though both sets of data might contain numeric values. Let’s examine the output from a request for MEANS AGE in the viewOSWEGO dataset. Current View D:\EPI2000\Sample.Mdb : viewOswego No Of Records 74 Date 10/28/1999 9:31:38 AM MEANS of age AGE Frequency Percent Cum Percent 3 1 1.4% 1.4% 7 2 2.7% 4.1% 8 2 2.7% 6.8% 9 1 1.4% 8.1% 10 1 1.4% 9.5% 11 4 5.4% 14.9% 12 1 1.4% 16.2% 13 2 2.7% 18.9% 14 1 1.4% 20.3% 15 3 4.1% 24.3% 16 1 1.4% 25.7% .------ -----. -----. -----. 18 1 1.4% 31.1% 70 1 1.4% 95.9% 72 1 1.4% 97.3% 74 1 1.4% 98.6% 77 1 1.4% 100.0% Total 74 100.0% 100.0% Total Sum Mean Variance Std Dev Std Err 74 2744 37.0811 461.0344 21.4717 2.4960 T statistic=14.8560 df=73 p-value=0.0000 Example_4: Comparing a Numeric Variable Across Groups – The MEANS Command with a Second Variable The MEANS command can compare mean values of a variable between groups of records. The numeric variable of interest, AGE, for example, is processed as for a single variable, but another categorical or ―group‖ variable (such as ill/not ill) is used to divide the records into groups for comparison. MEANS AGE ILL, for example, compares the ages of the ill and well persons and provides statistics to evaluate whether or not there is really a difference. If there are only two groups, the equivalent of an independent t-test is performed. If there are more than two groups, then a one-way analysis of variance (ANOVA) is computed. ―One way‖ means that there is only one grouping variable (in the above example: ill/not ill). If there were two grouping variables, such as ill/not ill and sex, then that would be a two-way ANOVA, which EpiInfo does not perform. The one-way ANOVA can be thought of as an extension of the independent t-test to more than two groups. Because the ANOVA test requires some assumptions about the data and the underlying population, another test (Kruskal-Wallis, also known as the Mann Whitney/Wilcoxon test if there are only two groups) is also provided. This is a non-parametric test, meaning that it does not require assumptions about the underlying population. Non-parametric tests are more conservative in detecting a statistically significant difference, but a result that is ―significant‖ in the non-parametric test will also be so in the ANOVA test. With a grouping variable, the MEANS command has the form: MEANS <continuous var> <grouping var> The output is provided in 5 different sections: 1. A table of the two variables with the continuous variable forming the rows and the grouping variable forming the columns. 2. Descriptive information of the continuous variable by each group: number of observations, mean, variance, and standard deviation; minimum and maximum values; the 25th, 50th (median), and 75th percentiles; and the mode. 3. An Analysis of Variance (ANOVA) table and a p-value for whether or not the means are equal. 4. A test to determine whether the variances in each group are similar (Bartlett's test for homogeneity of variance). 5. A non-parametric equivalent to the independent t-test and one-way ANOVA. Using the Oswego dataset, let’s examine the ages of ill and well persons in the outbreak. MEANS AGE ILL ILL AGE yes no TOTAL 3 1 0 1 7 1 1 2 8 2 0 2 etc. etc. etc. etc. 72 1 0 1 74 1 0 1 76 1 0 1 TOTAL 46 29 75 Descriptive Statistics for Each Value of Crosstab Variable Obs Total Mean Variance Std Dev yes 46 1806.0000 39.2609 477.2638 21.8464 no 29 955.0000 32.9310 423.7094 20.5842 Minimum 25% Median 75% Maximum Mode yes 3.0000 17.0000 38.5000 59.0000 77.0000 15.0000 no 7.0000 14.0000 35.0000 50.0000 69.0000 11.0000 ANOVA, a Parametric Test for Inequality of Population Means (For normally distributed data only) Variation SS df MS F statistic Between 712.6550 1 712.6550 1.5604 Within 33340.7316 73 456.7224 total 34053.3867 74 P-value = 0.2156 Bartlett's Test for Inequality of Population Variances Bartlett's chi square= 0.1193 df=1 P value=0.7298 A small p-value (less than 0.05) suggests that the variances are not homogeneous and that the ANOVA may not be appropriate. Mann-Whitney/Wilcoxon Two-Sample Test (Kruskal-Wallis test for two groups) Kruskal-Wallis H (equivalent to Chi square) = 1.1612 Degrees of freedom = 1 P value = 0.2812 Problem_7: Click the MEANS command. In the Means of field, choose a numeric variable as SystolicBloodPressure and for the Cross-tabulate by value of, choose Hypertension. The program will perform: Descriptive Statistics for Each Value of Crosstab Variable ANOVA, a Parametric Test for Inequality of Population Means Mann-Whitney/Wilcoxon Two-Sample Test (Kruskal-Wallis test for two groups) REGRESS REGRESS can be used for simple linear regression (only one independent variable), for multiple linear regression (more than one independent variable), and for quantifying the relationship between two continuous variables (correlation). Regression is used when the primary interest is to predict one dependent variable (y) from one or more independent variables (x1, ... xk). The correlation coefficient or r (sometimes referred to as the Pearson correlation coefficient) is a useful measure of how two continuous variables are related. If the correlation is greater than 0, the variables are positively correlated; as x increases, y also increases. If the correlation is less than 0, the variables are negatively correlated; as x increases, y decreases. If the correlation is exactly 0, then the variables are uncorrelated. The correlation coefficient can vary between +1 and -1. For positive correlations (r > 0), the closer to +1 the stronger the correlation; for negative correlations (r < 0), the closer to -1 the stronger the correlation. If the data are ordinal or far from normal, significance tests based on the Pearson correlation coefficient are not valid and a non-parametric equivalent to Pearson’s should be used. Example_5: Simple Linear Regression Current View: D:\EPI2000\Sample.Mdb:viewEstriolAndBirthweight Record Count: 31 Deleted Excluded Date: 09/118/200117:14 Records: AM REGRESS BIRTHWEIGHT = ESTRIOL Birthweight = Estriol Correlation Coefficient: r^2=0.37 Source df Sum of Squares Mean Square F-statistic Regression 1 248.421 248.421 16.811 Residuals 29 428.547 14.777 Total 30 676.968 Variable Coefficient Std Error F-test P-Value Intercept 21.536 2.636 66.7390 0.000000 Estriol 0.606 0.148 16.8108 0.000076 Correlation coefficient The Pearson correlation coefficient, or ―r‖. In this example, the correlation is 0.61, indicating a relatively strong positive correlation between estriol and birthweight. With only a single independent variable, the correlation = square root (R2). r^2 Sometimes represented as r2 or R2, R squared. The R2 value = Regression Sum of Squares / Total Sums of Squares (in the above example, 250.5745/674 = 0.37177). The R2 can be thought of as the proportion of variance of y (in this example, birthweight) that can be explained by x (in this example, estriol). In this example, 37% of the variance in birthweight can be explained by the women’s estriol levels. If R 2 = 1, then all of the variability is explained, which would mean that all data points fall on the regression line. If R 2 = 0, then no variance is explained. A 95% confidence interval for the R2 value is also provided (0.02, 0.64). F-Statistic The F-statistic is the Regression Mean Square / Residual Mean Square (in the above example, 250.5745/14.6009 = 17.16). The F-statistic is calculated to determine if the slope of the regression line is significantly different from 0. EpiInfo does not provide the p-value. Mean The average value for ESTRIOL; could also be determined with FREQ ESTRIOL. Coefficient The slope in the line, sometimes referred to as the ―regression coefficient.‖ In this example, 0.608 can be interpreted as: For every 1 unit increase in estriol (1 mg/24 hr), there is a 0.61 increase in each birthweight unit (g/100). Statistics concerning the slope are also provided: the standard error (Std Error), 0.146812, and the 95% confidence interval (0.307921, 0.908460). The interpretation would be that although we observed for every 1 unit increase in Estriol a 0.61 increase in birthweight, we are 95% confident that the ―true‖ slope would be captured between 0.31 and 0.91. (As above, the confidence interval type can be changed, for example to 90%, with the command SET CONFIDENCE= 90.) GRAPH Numerous settings are available in the GRAPH module, and these can be saved as a Graph Template. When a Graph Template is referred to by name, the settings are taken from this template. If explicit settings are given as above, they override the settings in the Graph Template. EpiInfo 2000 can perform the next type of graph: Line, Bar, Horizontal Bar, Histogram, Mark, Pie, Ares, Pareto, Scatter, Scatter Line, Doughnut, Surface, Polar, Cube. Y Axis titles and scale can be defined using YTITLE, YRANGE and YTICK. LEGEND can be used to specify labels for the legend; if not specified, either the variable name or prompt will be used, depending on the PROMPTS setting. X Axis titles and the scale can be defined using XTITLE, XRANGE, and XTICK. XLABELS can be used to specify labels for each value of the X axis; if not specified, the value will be used. XORIENT can be used to specify the direction in which the X axis labels will be displayed. Problem_8: Click the GRAPH command. In a Graph Type field chose Pie, as Title: ―Sex distribution‖ and as Y Variable chose Sex. Use a 3D diagram. Your diagram will appear as below: Problem_9: Click the GRAPH command. In a Graph Type field chose Scatter, Y Variable: Cholesterol, X Variable: Weight, Title: ―Correlation between Weight and Cholesterol‖ Your diagram will appear as below: Problem_10 a. Create a questionnaire about diabetes. This questionnaire will have the next fields: Id_patient, Age, Sex, Profession, Weight, Height, SystolicBloodPressure, DiastolicBloodPressure, SugarBloodLevel, and Diabetes. Chose for all these fields the correct type. b. Introduce the data for 30 patients. Problem_11 a. Create a questionnaire named HIV QUESTIONNAIRE about AIDS with the next fields: Id_patient, Name, Age, Sex (Yes = Male, No = Female), Occupation, STD (sexually transmitted diseases) in antecedents (Yes, No), Unprotected sexual intercourse (Yes/No), HIV Test (Yes = Positive and No = Negative), T4-lymphocites, Stage of disease ( Laboratory stages Stages of disease A B C T4-lymphocytes/l asymptomatic symptoms symptoms infection not yet AIDS AIDS more than 500 A1 B1 C1 200 to 499 A2 B2 C2 less than 200 A3 B3 C3 ), Ill (Yes/No). Chose for all these fields the correct type. b. Fill the field specified to the above for 40 patients. c. Read the HIV QUESTIONNAIRE.. d. List the next field from the questionnaire: Age, Sex, Occupation, STD, Stage of disease and Ill. e. Using Frequencies command finds the frequencies of STD, Age, Sex, Occupation and HIV test. f. Click the TABLES command. In the EXPOSURE VARIABLE field, choose Unprotected sexual intercourse and for the OUTCOME VARIABLE, choose T4-lymphocites. This will perform a cross-tabulation of SEX by HYPERTENSION. g. Click the MEANS command. In the Means of field, choose a numeric variable as T4- lymphocites and for the Cross-tabulate by value of, choose Ill. h. Click the GRAPH command. Create a histogram of Occupation and of Stage of disease. e. Is there any correlation between age and T4-lymphocites? Descriptive Statistics Measures of Central Location A. The Arithmetic Mean The Arithmetic Mean is the sum of the all the observations divided by the number of observation. 1 n x xi In statistical terms is written as: n i 1 n x The sign i (sigma) is referred to as a summation sign. The expression: i 1 Is simply short way of writing quantity: 1 x x2 ...... xn . Example_1: If x1 =2, x2 =5, x3 =-4 3 x i 1 i 254 3 Then: B. The Median An alternative measure of central location, perhaps second in popularity to the arithmetic mean, is the median, or sample median. Suppose these are n observations in a sample. If these observations are order from smallest to largest, then the median is defined as follow: The median sample is: n 1 x 2 if n is odd xn xn 1 2 2 2 if n is even. Example_2: Compute the sample median for the sample of birth weights of live-born infants born at a private hospital in San Diego, California, during a 1-week period (g). Solution: First, arrange the sample ascending order: 2069, 2581, 2759, 2834, 2838, 2841, 3031, 3101, 3200, 3245, 3248, 3260, 3265, 3314, 3323, 3484, 3541, 3069, 3649, 4146 Since n is even, sample median= average of the 10th and 11th largest observation 3245 3248 Median 3246.5 2 Observations: If the distribution is symmetric, then the relative position of the points on each side of the sample median will be the same. If the distribution is positively skewed (or skewed to the right), the points above the median will tend to be farther from the median in absolute value than the points below the median. If the distribution is negatively skewed (or skewed to the left), the points below the median will tend to be farther from the median in absolute value that the points above the median. C. The Mode The mode is the most frequently occurring value among all observation in a sample. Example_3: Consider the sample of time intervals between successive menstrual periods of a group of 500 women aged 18-21, as shown in the table. The frequency column gives the number of women who reported each of the respective. value frequency value frequency value frequency 24 5 29 96 34 7 25 10 30 63 35 3 26 28 31 24 36 2 27 64 32 9 37 1 28 185 33 2 38 1 28 days is the mode because it is the most frequently occurring value. Some distributions have more than one mode. In fact, one useful method of classifying distributions is the number of modes present. A distribution with one mode is referred to as unimodal; two modes, bimodal; three mode, trimodal; and so forth. Example_4: Total/HDL cholesterol (mmol/L) and weight (kg) for 10 patients are: Cholesterol 6.8 5.3 4.3 5.0 7.1 5.5 3.8 4.6 4.0 6.0 Weight 90 75 70 73 110 67 60 65 59 80 Compute the arithmetic mean and median for cholesterol and weight. Solution: a. The mean of cholesterol levels is given by: 6.8 5.3 4.3 5.0 7.1 5.5 3.8 4.6 4.0 6.0 xc 5.24 10 The mean of weight: 90 75 70 73 110 67 60 65 59 80 xw 74.9 10 b. First we order the cholesterol level from smallest to largest: 3.8 4.0 4.3 4.6 5.0 5.3 5.5 6.0 6.8 7.1 The median of cholesterol is given by: xn xn 1 x x6 5.0 5.3 2 2 5 5.15 2 2 2 First we order the cholesterol level from smallest to largest: 59 60 65 67 70 73 75 80 90 110 The median of cholesterol is given by: xn xn 1 x x6 70 73 2 2 5 71.5 2 2 2 Measures of Spread The Range The range is the difference between the largest and the smallest observation in a sample. The range in the sample of birth weights of live-born infants born at a private hospital in San Diego, California, during a 1-week period (g) is: 4146-2069=2077g Quantiles Another approach that addresses some of the shortcomings of the range in quantifying the spread in a data set is the use of quantiles or percentiles. The pth percentile is the value Vp such that p% of the sample points are less than or equal to Vp. The median, being the 50th percentile, is a special case of quantile. The pth percentile is defined as by: np th The (k+1) largest sample point if 100 is not an integer; k=the largest integer less than np/100 np The average of the (np/100)th and (np/100+1)th largest observations if 100 is an integer. The spread of distribution can be characterized by specifying several percentiles. For examples, the 10th and 90th percentiles are often used to characterize spread. Percentiles have the advantages over the range of being less sensitive to outliers and of not being much affected by the sample size (n). Example_5: Compute the 10th and 90th percentile for the birth weight data. Solution: Since 20 x 0.1=2 and 20 x 0.9=18 are integers, the 10th and 90th percentiles are defined by: 10th percentiles: average of the 2nd and the 3rd largest value = (2581+2759)/2 = 2670g 90th percentiles: average of the 18th and 19th largest values = (3609+3649)/2 = 3629g. There is no limit to the number of percentiles that can be computed. Frequently used percentiles are quartiles (25th, 50th, and 75th percentiles), quintiles (20th, 40th, 60th, and 80th percentiles), and deciles (10th, 20th, ….. 90th percentiles). The Variance and Standard Deviation The sample variance, or variance, is defined as follows: x n 2 i x s2 i 1 n 1 The sample standard deviation, or standard deviation, is defined as follows: x n 2 i x s i 1 sample var iance n 1 Example_6: The white blood counts taken on admission of all patients entering a small hospital in Pennsylvania, on a given day, are: i xi i xi 1 7 6 3 2 35 7 10 3 5 8 13 4 9 9 8 5 8 10 12 Compute the variance of the white blood count: Solution: First we must to compute the arithmetic mean: x x x x3 x 4 x5 x6 x7 x8 x9 x10 x 1 10 7 35 5 9 8 3 10 13 8 12 11 10 Second, we compute the variance: x n 2 i x s2 i 1 n 1 s2 1 10 1 x1 x x2 x x3 x x4 x x5 x x6 x x7 x x8 x x9 x x10 x 1 9 s 2 7 11 35 11 5 11 9 11 8 11 3 11 10 11 13 11 8 11 12 11 s 2 2 2 2 2 2 2 2 2 2 2 720 9 80 And now, we can compute de standard deviation: x n 2 i x s i 1 80 8.94 n 1 The coefficient of Variation is given by: s 100 CV x Example_7: Compute the coefficient of variance from the Example_5. Solution: The coefficient of variance of the last example is: 8.94 CV 100 81.27 11 The coefficient of variance is most useful in comparing variability of several different samples, each with different arithmetic means, This is because a higher variability is usually expected when the mean increase, and CV is a measure that accounts for this variability. Correlation indices Sum of deviation products (SPE) is given by: n SPE ( xi x) ( y i y ) I 1 Covariance is given by: 1 n COV ( x, y ) ( xi x ) ( y i y ) n I 1 Coefficient of correlation is given by: COV ( x, y ) r Sx Sy a. correlation from -0.25 to +0.25 = little or no relationship b. correlation from 0.25 to 0.50 (or – 0.25 to – 0.50) = an acceptable degree of association c. correlation from 0.50 to 0.75 (or – 0.50 to – 0.75) = a moderate to good association b. correlation upper than 0.75 (or lower than – 0.75) = a very good association. Coefficient of determination is given by: d r2 Example_8: For the statistical data Example_4 compute the covariance between cholesterol and weight. Solution: The covariance is given by: 1 n COV ( x, y ) ( xi x) ( y i y ) n I 1 1 COV ( x, y) [23.5 0.006 4.6 0.456 64.585 2.054 21.456 6.336 19.71 3.876] 10 142.471 COV ( x, y) 14.271 10 Grouped Data Sometimes the sample size is prohibitively large to display all the raw data. Also, data are frequently collected in grouped form, since the required degree of accuracy to specify a measured quantity exactly is often lacking, because of either measurement error or imprecise patient call. A frequency distribution is an ordered display of each value in a data set together with its frequency, that is, the number of times that value occurs in the data set. In addition, the percentage of sample points that take on a particular value is also typically given. We work with: frequency, cumulative frequency (CUM FREQ), relative frequency (PERCENT), and cumulative percent (CUM PERCE). Cumulative frequency (CUM FREQ) is the number in the sample that are less than or equal to a specific number x. FREQUENCY x100 The PERCENT = n , while the cumulative percent, CUM FREQ x100 CUM PERCENT= n . Some general instructions for categorizing the data are: Subdivide the data into k intervals, starting at some lower bound y1 and ending at some upper bound yk+1. The first interval is from y1 inclusive to y2 exclusive; the second interval is from y2 inclusive to y3 exclusive; ….. the kth interval is from yk inclusive to yk+1 exclusive. The rationale for this representation is to make certain that the group intervals include all possible values and do not overlap. The group intervals are generally chosen to be equal, although the appropriateness of equal group sizes should be dictated more by by subject-matter considerations. A count is made of the number of units that fall in each interval, which is denoted by the frequency within that interval. The midpoint of each group interval is computed for calculation of descriptive statistics. The y y2 m1 1 midpoint of the first interval is denoted by: 2 , the midpoint of the second interval by: y y3 y y k 1 m2 2 mk k 2 , and the midpoint of the last interval by: 2 . Finally, for the purpose of computing descriptive statistics, the group intervals and their midpoints, mi , and frequencies, fi, are displayed concisely in a table such the next table: Group interval Midpoint of group interval Frequency >= y1, < y2 m1 f1 >= y2, < y3 m2 f2 >= yi, < yi+1 mi fi ………. ……………. ………………. ………… ………….. …………….. >= yk, < yk+1 mk fk Example_8: Birthwt Frequency CUM FREQ PERCENT CUM PERCENT 32 1 1 1.000 1.000 58 1 2 1.000 2.000 64 1 3 1.000 3.000 67 1 4 1.000 4.000 68 1 5 1.000 5.000 83 1 6 1.000 6.000 85 2 8 2.000 8.000 86 1 9 1.000 9.000 87 1 10 1.000 10.000 88 2 12 2.000 12.000 89 3 15 3.000 15.000 91 1 16 1.000 16.000 92 1 17 1.000 17.000 93 1 18 1.000 18.000 94 2 20 2.000 20.000 95 1 21 1.000 21.000 96 1 22 1.000 22.000 98 3 25 3.000 25.000 99 1 26 1.000 26.000 100 1 27 1.000 27.000 101 1 28 1.000 28.000 102 1 29 1.000 29.000 103 1 30 1.000 30.000 104 5 35 5.000 35.000 105 2 37 2.000 37.000 106 1 38 1.000 38.000 107 1 39 1.000 39.000 108 4 43 4.000 43.000 109 2 45 2.000 45.000 110 2 47 2.000 47.000 Example_9: Obtain the frequency class table for typical total/HDL cholesterol for the following three interval classes: 3.0 to 4.9, 5.0 to 6., 6.1 to 7.0 and 7.1 to 8.9 using the data from the Example_4. Draw the associated histogram. Solution: The frequency class is: Interval Frequency CUM FREQ PERCENTE 3.0-4.9 4 4 40 5.0-6.0 4 8 80 6.1-7.0 1 9 90 7.1-8.9 1 10 100 The histogram associated to this data is (Excel): Histogram 5 120,00% Frequency 4 100,00% 80,00% 3 60,00% 2 40,00% 1 20,00% 0 ,00% Frequency 4,9 6,0 7,0 8,9 More Cumulative Bin % Problems: Problem_1: Sample of birth weights of live-born infants born at a private hospital in San Diego, California, during a 1-week period (g). i xi i xi i xi i xi 1 3565 6 3323 11 2581 16 2759 2 3260 7 3649 12 2841 17 3248 3 3245 8 3200 13 3609 18 3314 4 3484 9 3031 14 2838 19 3101 5 4146 10 3069 15 3541 20 2834 Table ... Birth weight of live-born infants, hospital in San Diego, California. Compute the arithmetic mean for that sample. Problem_2: The white blood counts taken on admission of all patients entering a small hospital in Pennsylvania, on a given day, are: i xi i xi 1 7 6 3 2 35 7 10 3 5 8 13 4 9 9 8 5 8 10 12 Table ... Wight blood counts. Compute the median white-blood count. Problem_3: Table 2.12 comes from a paper giving the distribution of astigmatism in 133 young men, aged 18-22, who were accepted for military service in Great Britain. Assume that astigmatism is rounded to the nearest 10th of a diopter. Degree of astigmatism Frequency (diopters) 0.0 or less than 0.2 485 0.2-0.3 268 0.4-0.5 151 0.6-1.0 79 1.1-2.0 44 2.1-3.0 19 3.1-4.0 9 4.1-5.0 3 5.1-6.0 2 Table 2.1 Distribution of astigmatism in 1033 young men aged 18-22 a. Compute the grouped arithmetic mean (average). b. Compute the grouped standard deviation. c. Plot a histogram to properly illustrate these data. Problem_4: The data in Table 2.13 are sample of cholesterol levels taken from 22 hospital employees who were on a standard American diet and who agreed to adopt a vegetarian diet for 1 month. Serum-cholesterol measurements were made before adopting the diet and 1 month after. 1 Subject Age Before After Before-After 2 1 45 195 146 3 2 25 145 155 4 3 36 205 178 5 4 85 159 146 6 5 45 244 208 7 6 69 166 147 8 7 57 250 202 9 8 51 236 215 10 9 42 192 184 11 10 31 224 208 12 11 26 238 206 13 12 59 197 169 14 13 76 169 182 15 14 84 158 127 16 15 52 151 149 17 16 43 197 178 18 17 61 180 161 19 18 50 222 187 20 19 42 168 176 21 20 49 168 145 22 21 58 167 154 Tabel 2.2 Serum-cholesterol levels before and after adopting a vegetarian diet. a. Compute the mean change in cholesterol. b. Compute the standard deviation of the change in cholesterol levels. c. Compute the median change in cholesterol. d. Compute the covariance and coefficient of correlation between age and cholesterol level after adopting vegetarian diet. Probability Example_1: The probability of developing a new case of breast cancer in 30 years in 40 year-old women who have never had breast cancer is approximately 1/11. This probability means that over a large sample of 40-year-old women who never had breast cancer, approximately 1 in 11 will develop the disease over 30 years, with this proportion becoming increasingly close to 1 in 11 as the number of women sample increases. Definition of probability In referring to probabilities of events, an event is any set of outcomes of interest. The probability of an event is the relative frequency of this set of outcomes over an indefinitely large number of trials. 1. The probability of an event E, denoted by Pr(E), always satisfies 0 Pr(E ) 1 2. If outcomes A and B are two events that cannot both happen at the same time, then Pr(A or B occurs)= Pr(A) + Pr(B) Example_2: Let A be the event that a person has normotensiv diastolic blood pressure (DBP) readings (DBP< 90) and let be the event that a person has borderline DBP reading ( DBP 95 and 90 ). Suppose that Pr(A)= 0.7 and Pr(B)= 0.1. Let C be the event that a person has DBP<95. Compute the Pr(C). Solution: If the outcomes A and B are two events that cannot both happen at the same time, then Pr(A or B occurs) = Pr(A)+Pr(B) In our case: Pr(C) = 0.7+0.1 = 0.8 Two events A and B are mutually exclusive if they cannot both happen at the same time. Example_3: Let x be DBP, C be the event that x 90 , and D be the event that 75 x 100 . The event C and D are not mutually exclusive, since they both occur when 90 x 100 Some useful probabilistic notation The symbol {} is used as short hard for the phrase "the event". A B is the event that either A or B occurs or they both occur. Figure 1 and 2 diagrammatically depicts A B both for the case where A and B are and are not mutually exclusive. A B A B Figure 1.Diagrammatic representation of A B : A, B are mutually exclusive A A B A B Figure 2. Diagrammatic representation of A B : A, B are not mutually exclusive Example_4: Let the event A and B be define as in Example_2; that is A={x<90}, B {90 x 95} , where x=DBP. Then, A B {x 95} A B is the event that both A and B occur simultaneously. A B is depicted diagrammatically in Figure 3. A B A B Figure 3. Diagrammatic representation of A B . Example_5: Let C be the event that x 90 and D be the event that 75 x 100 . Then, C D {90 x 100} A is the event that A does not occur. It is sometimes referred to as the complement of A. Notice that Pr( A )=1 - Pr(A), since A occurs only when A does not occur. Example_6: Let be the events: A {x 90} and C {x 90} . Then, C A , since C can only occur when A does not occur. The multiplication law of probability Two event A and B are referred to as independent events if Pr(A B) Pr(A) Pr(B) Example_7: Suppose we are conducting a hypertension-screening program in the home. We are interested in whether the mother or father is hypertensive, which is described, respectively, by the events A = { mother’s DBP 95 }, B= { father’s DBP 95 }. Suppose we know that Pr(A) = 0.1, Pr(B) = 0.2. What can we say about Pr(A B) Pr(mother' s DBP 95 and father' s DBP 95) ? Solution: If A and B are independent events, then: Pr(A B) Pr(A) x Pr(B) 0.1x0.2 0.02 Example_8: Consider all possible diastolic blood pressure measurements from a mother and her first child. Let A {mother ' s DBP 95} and B { first born child ' s DBP 95} Suppose Pr(A B) 0.05 , Pr(A) = 0.1 and Pr(B) = 0.2. Then Pr( A B ) 0.05 Pr( A) x Pr( B) 0.02 and the events A, B would be dependent. The Addition Law of Probability: If A and B are any events, then Pr(A B) Pr(A) Pr(B) Pr(A B) Example_9: Suppose two doctors, A and B, diagnose all patients coming into a clinic for syphilis. Let the events A+ = {doctor A makes a positive diagnosis}, B+ = {doctor B make a positive diagnosis}. Suppose that doctor A diagnoses 10% of all patients as positive, doctor B diagnoses 17% of all patients as positive, and both doctors diagnose 8% of all patients as positive. Suppose a patient is referred for further lab tests if either doctor A or B makes a positive diagnosis. What is the probability that patients will be referred for further lab tests? Pr(A+) = 0.1 Pr(B+) = 0.17 Pr( A B ) 0.08 Therefore, from the addition law of probability, Pr( A B ) Pr( A ) Pr( B ) Pr( A B ) 0.1 0.17 0.08 0.19 Thus, 19% of all patients will refer for further lab tests. Additional law of probability for independent events If two events A and B are independent, then Pr(A B) Pr(A) Pr(B) x[1 Pr(A)] Conditional probability Pr( A B) Pr( B / A) Pr( A) If A and B are independent events, then Pr( B / A) Pr( B) Pr( B / A) . If two events A, B are dependent, then Pr( B / A) Pr( B) Pr( B / A) and Pr(A B) Pr(A) Pr(B). Pr( B / A) Relative risk (RR) of B given A is: RR . If that two events A, B are Pr( B / A) independent, then the relative risk will be 1 and if that two events are dependent, then the relative risk will be different from 1. The more the dependence between events increases, the further the RR is from 1. Example_10: Suppose that 1 person in 10000 from the people with negative skin tests has TB, or Pr( B / A) 0.0001 and 1 person in 100 from those with positive skin test has TB, or PR( B / A) 0.01. Compute the relative risk. Pr( B / A) 0.01 Solution: RR 100 . That means; people with positive skin tests are 100 Pr( B / A) 0.0001 times as likely to have TB as those with negative skin tests. Total probability rule k Pr( B) Pr( B / Ai ) x Pr( Ai ) i 1 Example_11: We are planning a 5-year study of cataract in a population of 5000 people 60 years of age or older. We know from census data that 45% of these populations are ages 60-64, 28% are ages 65-69, 20%are ages 70-74 and 7% are age 75 or older. We also know from a study that 2,4%, 4,6%, 8,8% and 15,3% of the people in those respective age groups will develop cataract over the next 5 years. What percentage of our population will develop cataract over 5 years and how many cataracts does this percentage represent? Solution: Let A1 = {ages 60-64}, A2 ={ages 65-69}, A3 ={ages 70-74}, A2 ={ages 75+}. These events are mutually exclusive and exhaustive, since exactly one event occur for each person in our population. We also know that Pr(A1) = 0.45, Pr(A2) = 0.28, Pr(A3) = 0.2, Pr(A4) = 0.07, Pr(B/A1) = 0.024, Pr(B/A2) =0.046, Pr(B/A3) = 0.088, Pr(B/A4) = 0.153. Using the total probability rule, Pr(B) = Pr(B/A1)x Pr(A1) + Pr(B/A2)x Pr(A2) + Pr(B/A3)x Pr(A3) + Pr(B/A4)x Pr(A4) = 0.024x0.45 + 0.046x0.28 + 0.088x0.2 + 0.153x0.07 = 0.052 Thus 5.2% of our population will develop cataract over the next 5 years, which represents a total of 5000x0.52 = 260 persons with cataract. Bayes’ rule and screening tests Screening tests The predictive value positive (PV+) of a screening test is the probability that a person has disease given that the test is positive: Pr(disease/test+). The predictive value negative (PV-) of a screening test is the probability that a person does not have given that the test is negative: Pr(no disease/test-). The sensitivity of a syndrome is the probability that the symptom is present given that the person has disease. The specificity is the probability that the symptom is not present given that the person does not have disease. Let A = {symptom} and B = {disease}. Predictive value positive = PV+ = Pr(B/A) Predictive value negative = PV- = Pr(non B/ non A) Sensitivity = Pr(A/B) Specificity = Pr(nonA/ nonB) Bayes’ rule Let A = {symptom} and B = {disease}. Pr(B) = prevalence of disease in the reference population. Pr( A / B) x Pr( B) PV Pr( B / A) Pr( A / B) xPR ( B) Pr( A / B) x Pr( B) This can be written: Pr( B) xsensitivity PV Pr( B) xsensitivity (1 Pr( B)) x(1 specificity ) (1 PR ( B)) xspecificity PV (1 Pr( B)) xspecificity Pr( B) x(1 sensitivity ) Example_12: Suppose that 84% of a hypertensives and 23% of a normotensives are classified as hypertensive by an automated blood pressure machine. What is the predictive value positive and predictive value negative of the machine, assuming that 20% of the adult populations hypertensive? Solution: The sensitivity = 0.84 and specificity = 1-0.23 = 0.77. Pr(B) = 0.2 Thus, from Bayers’ rule is follows that PV+ = 0.2x0.84/(0.2x0.84+0.8x0.23) = 0.48 PV- = 0.8x0.77/(0.8x0.77+0.2x0.16) = 0.95. Generalized Bayers’ rule Let B1, B2, ………, Bk be a set of mutual exclusive and exhaustive disease states, that is, at least one disease state must occur and no two disease states can occur at the same time. Let A represent the presence of a symptom or a set of symptoms. Then: Pr(A / Bi ) x Pr(Bi ) Pr(Bi / A) k Pr(A / B j ) x Pr(B j ) j 1 Example_13: Suppose that a 60-year-old male who has never smoked cigarettes presents with symptoms consisting of a chronic cough and occasional breathlessness to a physician. The physician becomes concerned and order the patient admitted to a hospital for a lung biopsy. Suppose that the results of the lung biopsy are consistent with either lung cancer or sarcoidosis, a fairly common, nonfatal lung disease. In this case: Symptom A = {chronic cough} Disease state B1 = normal, B2 = lung cancer and B3 = sarcoidosis Suppose that Pr(A/B1) = 0.001, Pr(A/B2) = 0.9 and Pr(A/B3) = 0.9 And that in 60-year-old, never-smoking males: Pr(B1) = 0.99, Pr(B2) = 0.001 and Pr(B3) = 0.009. What are the probabilities Pr(Bi/A) of the three disease states given the previous symptom? Solution: Bayes’ rule can be used to answer this question. Pr(A / B1 ) x Pr(B1 ) 0.001x0.99 Pr(B1 / A) 3 0.099 0.001x0.99 0.9 x0.001 0.9 x0.009 Pr(A / B j ) x Pr(B j ) j 1 Pr(A / B2 ) x Pr(B2 ) 0.9 x0.001 Pr(B2 / A) 0.09 3 0.001x0.99 0.9 x0.001 0.9 x0.009 Pr(A / B ) x Pr(B ) j 1 j j Pr(A / B3 ) x Pr(B3 ) 00.9 x0.009 Pr(B3 / A) 0.811 3 0.001x0.99 0.9 x0.001 0.9 x0.009 Pr(A / B ) x Pr(B ) j 1 j j Thus, although the unconditional probability of sarcoidosis is very low, the conditional probability of disease given these symptoms and this age-sex-smoking group is the highest, equal with 81%. Discrete probability distributions Random Variables A random variable is a numerical quantity that takes different values with specified probabilities. Two types of random variables are discussed in this text: discret and continuous. A random variable for which there exists a discret set of value with specified probabilities is a discret random variable: numbers of hospitalization, number of children … A random variable whose values from a continuum is a continuous random variable: age, glycemia, biological parameters, blood pressure… Probability mass function is a mathematical relationship that assigns to any possibilities value r of a discret random variable X the probability Pr(X=r). Example_14: many new drugs have been introduced in the last decade to bring hypertension under control to reduce high blood pressure to normotensiv levels. A physician agrees to use a new antihypertensive drug on a trial basis on the first untreated hypertensives patients. From the previous experience with the drug, the drug company expects that for any clinical practice the probability that 0 patients out of 4 will be bought under control in 0.008, 1 patients out of 4 will be bought under control in 0.076, 2 patients out of 4 is 0.265, 3 patients out of 4 is 0.0.411 and all 4 patients out of 4 is 0.240. This probability mass function, or probability distribution, is displayed in the next table: Pr(X=r) 0.008 0.076 0.265 0.411 0.240 r 0 1 2 3 4 The probability of any particular value must be between 0 and 1 and the sum of the probabilities of all values must exactly equal. Thus, 0<Pr(X=r) 1 and Pr(X r ) 1. For our example: 0.08+0.076+0.265+0.411+0.240=1 The expected value of a discrete random variable The expected value for a discrete random variable id defined as: k E ( X ) xi Pr( X xi ) i 1 Example_15: Find the expected value for the random variable depicted in Example_1. Solution: E(X)=0(0.008)+1(0.076)+2(0.265)+3(0.411)+4(0.240)=2.80 Thus, on average about 2.8 hypertensives would be expected to be brought under control for every 4 that are treated. Example_16: Consider the random variable that has the probability mass function mention in the next table representing the number of episodes of otitis media in the first 2 years of life: r 0 1 2 3 4 5 6 Pr(X=r) 0.129 0.264 0.271 0.185 0.095 0.039 0.017 a. What is the expected number of episodes of otitis media in the first 2 years of life? b. Compute the variance and standard deviation for the random variable. Solution: n a. M ( X ) x1 Pr( xi ) i 1 M(X)=0×0.129+1×0.264+2×0.271+3×0.185+4×0.095+5×0.039+6×0.017 M(X)=2.038 Thus, on the average a child would be expected to have 2 episodes of otitis media in the first 2 years of life. b. The variance is given by: n V ( X ) [ xi M ( X )] 2 Pr( xi ) i 1 V(X) = 1.967 The standard deviation is given by: = 1,967 1.402 Permutation and combination The number of permutation of n things taken k at a time is n Pk n(n 1) .......... . (n k 1) It represent the number of ways of selecting k item out of n, where the order of selection is important. Example_17: Suppose there are 3 female schizophrenics aged 50-59 and 6 eligible controls living in the same community. How many ways are there of selecting three controls? Solution: To answer this question, consider the number of permutation of 6 things taken 3 at time: 6 P3 6 5 4 120 Thus there are 120 ways of choosing the controls. For example, one ways would be to mach control A to case 1, control B to case 2 and control C to case 3. Another way would be match control F to case 1, control C to case 2 and control D to case 3. n! = n factorial and is defined as n!=n(n-1)×…. ×2×1 Example_18: Evaluate 5! Solution: 5!=5×4×3×2×1= 120 The quantity of 0! Has no intuitive meaning, but for consistency it will be defined as 1. The number of combination of n things taken by k at the time: n! n Ck k!(n k )1 Example_19: Evaluate 7 C 3 765 Solution: 7 C3 7 5 35 3 2 1 The Binomial Distribution All examples involving the binomial distribution have common structure: a sample of n independent trials, each of which can have only two possible outcomes, which are denoted as "success" and "failure". The probability of success at each trial is assumed to be constant p, and hence the probability of a failure at each trials is 1-p=q. Example_20: What is the probability of obtaining 2 boys out of 5 children if the probability of a boy is 0.51 at each birth and sexes of successive children are considered independent random variables? Solution: Using a binomial distribution n=5, p=0.15, k=2 5 4 Compute Pr(X 2) 5 C 2 (0.51) 2 (0.49) 2 (0.51) 2 (0.49) 2 0.306 2 1 Example_21: We know that children develop chronic bronchitis in the first year of life in 3 out of 20 household where both parents are chronic bronchitics, as compared with the national incidence rate of chronic bronchitis, which is 5% in the first year of life. Is this difference real or can it attributed to chance? Specifically, how likely are infants in at least 3 out of 20 households to develop chronic bronchitis if the probability of developing disease in any one household is 0.05? Solution: n=20, p=0.05 The probability of observing 3 cases out of 20 with disease is given by: 20 20 2 20 Pr(X 3) (0.05) K (0.95) 20 K 1 (0.05) K (0.95) 20 K K K 3 K 0 K = 1 - (0.3585 + 0.3774 + 0.1887) = 0.0754 3 The theoretical value is given by: = 0.15 20 There is a evident difference between the theoretical value and the real value : 15% compared with 7.54%. Problems Problem_1 Consider a family with a mother, father ant two children. Let A1 = {mother has influenza} A2 = {father has influenza} A3 = {first child has influenza} A4 = {second child has influenza} B = {at least one child has influenzae} C = {at least one parent has influenzae} D = {at least one person has influenzae} a. What does A1 A2 means? b. What does A1 A2 means? c. Are A3 and A4 mutually exclusive? d. What does A3 B means? e. What does A3 B means? f. Express C in terms of A1, A2, A3, A4. g. Express D in term s of B and C. h. What does A1 mean? i. What does A2 mean? j. Represent C in terms of A1, A2, A3, A4. k. Represent D in terms of B and C. Problem_2 A drug company is developing a new pregnancy-test kit for use on an outpatient basis. The company uses the pregnancy test on 100 women who are known to be pregnant, of whom 95 are positive using the test. The company uses the pregnancy test on 100 other women who are known to not be pregnant, of whom 99 are negative using the test. a. What is sensitivity of the test? b. What is the specificity of the test? The company anticipates that of the women who will use the pregnancy test kit, 10% will actually be pregnant. c. What is the predictive value positive for the test? Problem_3 We can classify infants as low birthweight if they have birthweight 2500 g and as normal birthweight if they have 2500 g. Infants can be classified are also classified by length of gestation in the following four categories: < 20 weeks, 20 - 27 weeks, 28-36 weeks, > 36 weeks. Assume that the probabilities of the different periods of gestation are as given in the next table: Length of gestation Probability < 20 weeks 0.0004 20 - 27 weeks 0.0059 28 - 36 weeks 0.0855 > 36 weeks 0.9082 Also assume that the probability of being low birthweight given that the length of gestation is < 20 weeks is 0.540, the probability of being low birthweight given that the length of gestation is 20 - 27 weeks in 0.813, the probability of being low birthweight given that the length of gestation is 28 - 36 weeks in 0. 0.379 and the probability of being low birthweight given that the length of gestation is > 36 weeks in 0.0.35. a. What is the probability of having a low birthweight infant? b. Show that the events (length of gestation 27 weeks) and (low birthweight) are not independent. c. What is the probability of having a length of gestation 36 weeks given that a child is low birthweight? Problem_4 Evaluate the probability of 2 lymphocytes our of 10 white blood cells if the probability that any one cell is a lymphocyte is 0.2. Problem_5 Evaluate the probabilities of obtaining k neutrophils out of 5 cells for k=0, 1, 2, 3, 4, 5 where the probability that any one cell is a neutrophil is 0.6. Problema_6 In a sample of 110 persons, we have 50 men and from them 10 have with RH -. From the women, 8 of them have RH-. a. Which is the probability that a person with RH- from the sample to be a men with RH-. b. Which is the probability that a person from the sample to have RH+. c. Which is the probability that from 4 person from the sample 1 to have RH-. Estimation Estimation of the Mean of a Distribution Point estimation of the Mean A natural estimator to use for estimation the population mean µ is the sample mean: n x x i i 1 n Example_1: The birthweights from 1000 consecutive deliveries at Boston City Hospital are enumerated in to the next table. Sample Individual 1 2 3 4 5 1 2750 5018 2750 2863 3884 2 3317 5613 3544 3232 3345 3 3969 3033 1758 2240 2211 4 2211 2807 3402 3402 3657 5 2807 2948 3742 3260 2466 6 4196 3430 3827 3317 3119 7 3062 4196 3345 3005 3005 8 3827 3771 3884 2438 3289 9 3572 3572 3572 3119 3969 10 3430 3260 3345 3374 2778 x 3314 Compute the mean for the sample two. Solution: 5018 5613 3033 2807 2948 3430 4196 3771 3572 3260 x2 10 Theme: compute 37649 3765 10 the mean for the sample 3-5. The Variance and the Standard Deviation The variance is a measure of spread and is defined by: x n 2 i x s2 i 1 n The most usual form for this measure is with n-1 in the dominator rather than with n. The resulting measure is called the sample variance (or variance). The sample variance, or variance is defined as follows: x n 2 i x s2 i 1 n 1 The standard deviation Such as sample variance, the sample standard deviation, or standard deviation, is a measure as spread and is defined as follows: n (x 2 i x) s i 1 sample var iance n 1 Standard Error of the Mean Let x1, …. xn be a random sample from a population with underlying mean µ and the variance σ2. The set of sample means is repeated random samples of size n from this population has variance σ2/n. The standard error of the mean or standard is defined as follows: SE n Example_2: Compute the standard error of the mean for the third sample from the Example_1. The mean of the third sample is: 3314 Compute the variance of this sample using the next formula: x n 2 i x s2 i 1 n For our data, the variance is given by: 1 2750 3314 3544 3314 1758 3314 3402 3314 3742 3314 3827 3314 2 2 2 2 2 s2 10 3345 3314 3884 3314 3572 3314 3354 3314 2 2 2 2 318096 52900 2421136 7744 183184 263169 961 324900 66564 961 s2 10 3639615 363961.5 10 The standard deviation of the sample is: S S 2 363961.5 603.29 603 Standard error of the mean is given by: 603 603 SE 190 .68 n 10 3.16 Theme: compute the standard error of the mean for the rest of sample in Example_1. Confidence interval Known variance and n large (>30) A 95% confidence interval for µ when σ is known is defined by: (m Z ; m Z ) n n Example_3: Compute a 95% confidence interval for the mean basal body temperature using the data 97.2, 96.8, 97.4, 97.4, 97.3, 97.0, 97.1, 97.3, 97.2, 97.3, assuming that standard deviation is 0.20. Solution: First we must to compute the mean of that data: 97.2 96.8 97.4 97.4 97.3 97.0 97.1 97.3 97.2 97.3 m 97.2 10 Now we can compute de confidence interval for α=0.05 (Zα = 1.96). The 95% confidence interval is given by: 0.2 0.2 (m Z ; m Z ) (97.2 1.96 ;97.2 1.96 ) (97.2 0.12;97.2 0.12) n n 10 10 (97.08;97.32) Example_4: Consider the 5 sample of size 10 from the population of birthweights as shown in Example_1. Assume that σ is known to be 20. The interval: x 1,96 ; x 1,96 x 1,96 600 ; x 1,96 600 x 372; x 372 n n 10 10 will be different for each sample and is given in the next figure: 3686 (3314+372) 3393 4137 (3765-372) (3765+372) 2945 3689 (3317-372) (3317+372) 2653 3396 (3025-372) (3025+372) 2800 3544 A (3172-372) dashed line has been added to represent (3172+372) value for µ. The idea is that over a large an imaginary number of hypothetical samples of size 10, 95% of such intervals will contain the parameter µ. µ A 100%x (1-α) confidence interval for mean is defined by the interval: (m Z ;m Z ) 1 n 1 n 2 2 Factors affecting the length of a confidence interval The length of a 100%x(1-α) confidence intervals 2z1-α/2σ/ n and is determined by n, σ and α. a. As a sample size (n) increases, the length of confidence interval decreases. b. As a standard deviation (σ), which reflects the variability of individual observations, increases, and the length of confidence interval increase. c. As the confidence desired increases (α decreases), the length of the confidence interval increase. Example_4: Compute a confidence interval for rte underlying mean basal body temperature assuming that the mean of sample is 97.2, the number of days sampled is 100 and the standard deviation is 0.2.Campute the 99% confidence interval for that data assuming that standard deviation is 0.4. Solution: The 95% confidence interval is given by: 0.2 0.2 (m Z ; m Z ) (97.2 1.96 ;97.2 1.96 ) (97.2 0.04;97.2 0.04) n n 100 100 (97.16;97.24) The 99% confidence interval is given by: (m Z ;m Z ) , where Z Z 0.01 Z 0.995 2.574 1 n 1 n 1 1 2 2 2 2 0.4 0.4 (m Z 0.995 ; m Z 0.995 ) (97.2 2.574 ;97.2 2.574 ) (97.2 0.1;97.2 0.1) n n 10 10 (97.1;97.3) Confidence interval – unknown variance: A 100%x(1-α) confidence interval for mean is given by: s s (m t ;m t ) n 1,1 n 1,1 2 n 2 n Example_5: Suppose we have the birth weight data from a sample of 10 new-born child: 97, 117, 140, 78, 99, 148, 108, 135, 126, 121. Compute a 95% confidence interval for mean assuming that the variance is unknown. Solution: a. compute the mean of the sample: 97 117 140 78 99 148 108 135 126 121 m 116.9 10 x n 2 i x b. compute the standard deviation: s 2 i 1 , S S2 n [(97 116.9) (117 116.9) (140 116.9) (78 116.9) 2 (99 116.9) 2 2 2 2 (148 116.9) 2 (108 116.9) 2 (135 116.9) 2 (126 116.9) 2 (121 116.9) 2 ] s2 10 1 s2 [396.01 0.01 533.61 1513.21 320.41 967.21 79.21 327.61 82.81 16.81] 10 470.89 S S 2 470.89 21.7 c. compute the 95% confidence interval: s s (m t ;m t ) n 1,1 n 1,1 2 n 2 n 21 .7 21 .7 21 .7 21 .7 (116 .9 t 9, 0.975 ;116 .9 t 9, 0.975 ) (116 2.262 ;116 2.262 ) (101 .38;132 .42 ) 10 10 10 10 Confidence interval – sampling proportion Consider the problem of estimating the prevalence p of a disease in a population. and f is the sample proportion of the disease in a sample of n size, the 100%(1-α) confidence interval for p is given by: f 1 f f 1 f f Z 1 ; f Z 1 2 n 2 n Example_6: Suppose we are interested in estimating the prevalence rate of breast cancer among 50-54-year-old women whose mothers have had breast cancer. Suppose that in a random sample of 10000 such women, 400 are found to have had breast cancer at some point in their lives. Compute a 95% confidence interval for the prevalence rate of breast cancer. Solution: The best point estimate of the prevalence rate is given by proportion: 400 f 0.04 10000 An approximate 95% confidence interval for α=0.05 and Zα =1,96 is given by: 0.04 * 0.96 0.04 0.004;0.04 0.004 0.036;0.044 0.04 * 0.96 0.04 1.96 ;0.04 1.96 10000 10000 Problems Problem_1: A study of psychological and physiological changes in a cohort of dialysis patients with and-stage renal disease was conducted. 102 patients were initially ascertained at baseline; 69 of the 102 patients were reascertained at an 18-month follow-up visit. The data in the next table were reported: E. Coli S. aureus P. aeruginosa Laboratory Different Common Different Common Different Common media medium media medium media medium A 27.5 23.8 25.4 23.9 20.1 16.7 B 24.6 21.1 24.8 24.2 18.4 17.0 C 25.3 25.4 24.6 25.0 16.8 17.1 D 28.7 25.4 29.8 26.7 21.7 18.2 E 23.0 24.8 27.5 25.3 20.1 16.7 F 26.8 25.7 28.1 25.2 20.3 19.2 G 24.7 26.8 31.2 27.1 22.8 18.8 H 24.3 26.2 24.3 26.5 19.9 18.1 I 24.9 26.3 25.4 25.1 19.3 19.2 a. Provide a point and interval estimation (95% confidence interval) for the mean of each of the parameters at baseline and follow-up. b. do you have any opinion on the physiological and psychological changes in this group of patients? Problem_2: Suppose we wish to estimate the concentration (µg/mL) of a specific dose of ampicilin in the urine after various period of time. We recruit 25 volunteers and find that they have a mean concentration of 7.0 µg/mL with standard deviation of 2.0 µg/mL. Assume that the underlying population distribution of concentration is normally distributed. a. Find 95% confidence interval for the population mean concentration. b. Find a 99% confidence interval for the population variance of the concentrations. c. How large a sample would be needed to ensure that the length of confidence interval in a is 0.5 µg/mL if the assume that sample standard deviation remains at 2.0 µg/mL? Hypothesis testing General concepts Steps in testing statistical hypothesis: Step 1. State the research question in term of statistical hypothesis. The hypotheses can be formulated in term of null and alternative hypotheses, which can be define as follows: The null hypothesis, denoted by H0, is the hypothesis that is to be tested. The alternative hypothesis, denoted by H1, is the hypothesis that in some sense contradicts the null hypothesis. As the result, there are four possible outcomes: 1. We accept H0, and H0 is in fact true. 2. We accept H0, and H1 is in fact true. 3. We reject H0, and H0 is in fact true. 4. We reject H0, and H1 is in fact true. Step 2. Decide on the appropriate test statistic (parameter of test) for the hypothesis. Test statistic has a probability distribution if the null hypothesis is assumed true. The probability of a type I error is the probability of rejecting the null hypothesis given that H0 is true. The probability of a type I error is usually denoted by α and it’s commonly referees to as significance level of a test. The probability of a type II error is the probability of accepting the null hypothesis given that H1 is true. The probability of a Type II error is usually denoted by . Step 3. Select the level of significance for the statistical test or alpha value. This is a probability of incorrectly rejecting the null hypothesis when it is actually true. Traditional values: = 0.05. Step 4. Perform the calculation of the statistic test. Step 5. State the conclusion with critical area: If test statistic is in RA (rejection area) then accept H1 and reject H0. If test statistic is in AA (acceptance area) then accept H0 and reject H1. State the conclusion with p-value: The p-value for a hypothesis test is the α level at which we would be indifferent between accepting or rejecting H0. The importance of the p-value is that it tells us exactly how significant the results are without performing repeated significance tests at different α levels. The significance of a p-value: If 0.01 p 0.05 , then the results are significant. If 0.001 p 0.01, then the results are highly significant. If p 0.001 , then the results are very highly significant. If p 0.05 , the results are considered not statistically significant. If 0.05 p 0.1 , then a trend toward statistical significance is sometimes noted. The power of the test is defined as: 1- = 1 - probability of a type II error. A one-tailed test is a test in which the values of the parameter being studied under the alternative hypothesis are allowed to be either greater than or less than the values of the parameter under the null hypothesis but not both. A Two-tailed test is a test in which the values of the parameter being studied (in this case ) under the alternative hypothesis are allowed to be either greater than or less than the values of the parameter under the null hypothesis (0). We applied the general concepts to several one-sample hypothesis-testing situations: the mean of a normal distribution with known variance (one-sample z test) the mean of a normal distribution with unknown variance (one-sample t test) the variance of a normal distribution (one sample chi-square test) Each of the hypothesis tests can be concluded in one of two ways: specify critical value to determine the acceptance and rejection regions (critical-value method) compute p-values (p-value method). One Sample Test for the Mean of a Normal Distribution with Knowing Variance: Two-Sided Alternative Example_1: Suppose we want to compare fasting serum-cholesterol levels among recent Asian immigrants to the United States with typical levels found in the general population in the United States. Suppose we assume that cholesterol-levels in women age 21-40 in the United States are approximately normally distributed with mean 190 mg/dL and standard deviation 40 mg/dL. It is known whether cholesterol levels among recent Asian immigrants are higher or lower than those in the general U.S. are normally distributed with unknown mean and standard deviation 40. We wish to test the null hypothesis H0: = 0 = 190, 2 = 1600 versus alternative hypothesis H1: 0 , 2 = 1600. Blood tests are performed on 100 female Asian immigrants age 21-40 and the mean level is found to be 181.52 mg/dL. What can be concluded on the basis of this evidence? One Sample Test for the Mean of a Normal Distribution with Knowing Variance To test the hypothesis: H0: = 0, = 0 versus H1: 0 , = 0 with a significance level , the best (more powerful) test is based on x if: x 0 z then H0 is rejected. n If: x 0 z then H0 is accepted. n To the hypothesis: H0: = 0, = 0 versus H1: < 0, = 0 with a significance level of , we compute x 0 z n If z z or z z , then we reject H0. 1 2 2 If z z z , then we accept H0. 1 2 2 The value z is called a test statistic, because the test procedure is based on this statistic. The value z is call critical value because the outcome of the test depends on whether the test statistic z z or z z = critical value, whereby we reject H0 or, z z z whereby we 1 1 2 2 2 2 accept H0. the general approach where we compute a test statistic and determine the outcome of a test by comparing the test statistic to c critical value determined by the type I error is called the critical- value method of a hypothesis testing Example_2: Test the hypothesis that the cholesterol levels of recent Asian immigrants are different from those in general United States population using the data in Example_1. Solution: We compute the test statistic: x 0 z n 181 .52 190 8.48 z 2.12 40 4 100 For the two-sided test with =0.05, the critical value are z z 0.025 1.96 , z z 0.975 1.96 . 1 2 2 Since z z , it is follows that we reject H0 at the 5% level significance. We conclude that the 2 mean cholesterol level of recent Asian immigrants is significantly lower than the mean for the general U.S. population. Alternatively, we might want to compute a p-value. A p-value is computed in two different ways, depending on whether z is less than or greater than 0. p-value for the One-Sample z Test for the Mean of a Normal Distribution with Known Variance (Two-Sided Alternative) x 0 z n p-value: p 2 ( z ) if z 0 p 2[1 ( z )] if z > 0 If p 0.05 , then H0 is rejected and the results are declared statistical significant. If p 0.05 , then H0 is accepted and the results are declared not statistical significant. We will refer to this approach as the p-value method. Example_3: Compute the p-value for the hypothesis test in Example_1. Solution: Since z = -2.12, the p-value for the test would be twice the left-hand tail area, or: p 2 ( z ) 2 (2.12) 2 [1 (2.12)] p 2 (1 0.983) 0.034 The result are statistical significant with a p-value of 0.034. The Power of a Test Power of a One-Sample z Test for the Mean of a Normal Distribution with Known Variance (Two-Sided Alternative) The power of the two-sided test H0: = 0 versus H1: 0 for the specific alternative = 1, where the underlying distribution is normal and the population variance (2) is known, is given by: 0 1 n Power z 2 Example_4: A new drug in the class of calcium channel blockers is to be tested for the treatment of patients with unstable angina, a sever type of angina. The effect this drug will have on heart rate is unknown. suppose that 20 patients are to be studied and change in heart rate after 48 hours is known to have a standard deviation of 10 beats per minute. What power would such a study have of detecting a significant difference in heart rate over 48 hours if it is hypothesized that the true mean change in heart rate from baseline to 48 hours could be 5 beats per minute in either direction? Solution: The power is given by: 0 1 n Power z 2 5 20 Power z 0.05 1.96 2.236 0.61 2 10 The study would have 61% chance of detecting a significant difference. One-Sample t Test To test the hypothesis: H0: = 0 versus H1: 0 with significance level assuming that 2 is the same under both hypotheses and is unknown, then the best is based on the test statistic t , given by: x 0 t s n If: t t or t t then H0 is rejected. n 1, n 1,1 2 2 If: t t t then H0 is accepted. n 1, n 1,1 2 2 Note that from the symmetry of the t distribution: t t n 1, n 1,1 2 2 Example_5: Occupational medicine is a relatively new field in medicine, whereby specific health hazard are identified for particular occupation. One topic of recent interest is the effect of fire fighting on pulmonary function. Suppose a group of 26, 25-35 years old male fire fighters are identified and change in their pulmonary function over a 5-years period is measured. over 5 years it is found that the fire fighter have a mean decline in forced expiratory volume (FED), which is the volume of air expelled in 1 second, of 0.27 liters with a sample standard deviation of 0.32 liters. Can any conclusions be drawn about the occupational exposure if the expected change over 5 years is 0.10 liters in normal male in this age group? Solution: A two-sided test will be used, since pulmonary function of fire fighters may decline either more than expected because of exposure or less than expected because of the likelihood of their being healthier than the general population. Assume that the decline in FEV is normally distributed with mean and variance 2 is unknown. To test H0: = 0.10 versus H1: 0.10 we compute the test statistic: x 0 t s n 0.27 0.10 0.17 t 2.70 0.32 0.063 26 Under H0, t follows a t distribution with 25 degrees of freedom. We know from the table that t 25, 0.99 2.485 , t 25,0.995 2.787 and therefore the p-value is between 2(1-0.995) = 0.01 and 2(1-0.99) = 0.02. This result is statistically significant with 0.01 p 0.02 and we conclude that the pulmonary function of fire fighters decline significantly faster than typical 25-36 years old male. The Relationship between Hypothesis Testing and Confidence Intervals (Two-Sided Case) Suppose we are testing: H0: = 0, 2 = 20 versus H1: 0, 2 = 20 . H0 is rejected with two- sided level test if and only if the two-sided level 100% (1 - ) confidence interval for does not contain 0. H0 is accepted with two-sided level test if and only if the two-sided 100% (1 - ) confidence interval for does not contain 0. Example_6: Consider the cholesterol data in Example_1. The two-sided 95%confidence interval for is given by: (x z ,x z ) 1 n 1 n 2 2 1.96 (40 ) 1.96 (40 ) 181 .52 ,181 .52 10 10 181 .52 7.84,181 .52 7.84 173 .68,198 .36 This confidence interval contains all values for 0 such that we accept H0: = 0 and does not contain any value 0 for which we could reject H0. One-Sample 2 Test Used to test association between two qualitative variables each with two values mutually exclusive and independent. Example: illness and risk factor. We must test the hypothesis: H0: risk and illness are independent versus H1: risk and illness are dependent. We used 2 2 contingency table: one observed and another expected. The test statistic is given by: LC ( f io f it ) 2 2 t where f i o and f i t are observed and expected frequency. i 1 fi Example_7: Suppose we wish to investigate, in a population, if a risk factor (stress) can be associated with a specific illness (hypertension). It was observed a sample of 500 people and it was found the next situation: FR+ FR- + HTA 100 120 HTA- 70 210 a. What are the appropriate hypotheses? b. What are the appropriate procedures to test these hypotheses? (use 0.05 and for this value and one degree of freedom the acceptance region is given by [3.84, ) ) Solution: We must test the hypothesis: H0: risk and illness are independent versus H1: risk and illness are dependent. In medical words: H0: the stress is not a risk factor for hypertension versus H1: the stress is a risk factor for hypertension. The observed contingency table is: FR+ FR- Total + HTA 100 120 220 HTA- 70 210 280 Total 170 330 500 The expected contingency table is: FR + FR - Total 220 170 220 330 + HTA 220 500 500 HTA- 280 170 280 330 280 500 500 Total 170 330 500 FR + FR - Total + HTA 75 145 220 HTA- 95 185 280 Total 170 330 500 The test statistic is: (100 75 ) 2 (120 145 ) 2 (70 95 ) 2 (210 185 ) 2 2 22 .6 75 145 95 185 RA = [3.84, ) 22.6 RA = [3.84, ) In conclusion, H1 is accepted; the stress is a risk factor for hypertension. The Paired t Test To test the hypothesis: H0: = 0 versus H1: 0, when the variance is unknown, then the best is based on the test statistic t , given by: d t sd n where d is the mean difference d1 d 2 ... d n d n n 2 n 2 s d d i d i / n /(n 1) i 1 i 1 and n = number of matched pairs. If t t or t t , then H0 is rejected. n 1,1 n 1,1 2 2 If t t t then the H0 is accepted. n 1,1 n 1,1 2 2 Example_8: Suppose the paired-sample study design is adopted and the sample data in the Table .... are obtaining. The systolic blood-pressure (bp) level of the ith woman is denoted at baseline by xi1 and at follow-up by xi2. Systolic blood-pressure Systolic blood-pressure level while not using level while using i OC’s (xi1) OC’s (xi2) di(xi1 –xi2) 1 115 128 13 2 112 115 3 3 107 106 -1 4 119 128 9 5 115 122 7 6 138 145 7 7 126 132 6 8 105 109 4 9 104 102 -2 10 115 117 2 Table ... Systolic blood-pressure levels (mm Hg) in 10 woman while not using (baseline) and while using (follow-up) oral contraceptives. Assess the statistical significance of the OC-BP data in Table .... Solution: 13 3 1 9 7 7 6 4 2 2 d 4.80 10 s2={[132+32+…+22]-10(4.80)2}/9=20.844 s=4.566 t=4.80/(4.566/ 10 )=3.32 There are 10-1 = 9 degrees of freedom, and we know from the table that t9.975=2.262. Since t = 3.32 > 2.262 H0 cam be rejected using a two-sided significance test with cu α = 0.05. To compute the p-value, we know from the table that t9, 0.9995 = 4.781 and t9, 0.995 = 3.250. Thus, since 3.25 < 3.32 < 4.781, it follow that 0.0005 < p/2 < 0.005 or 0.001< p < 0.01. Problems Problem_1 Suppose the annual incidence of asthma in the general population among children 0 -4 years of age is 1,4% for boys and 1% for girls. a. if 10 cases are observed over one year among 500 boys 0 -4 years of age with smoking mothers, then test if there is a significant difference in asthma incidence between this group and general population using the critical-method with a two-sided test. b. Report p-value corresponding to your answer to problem_1. Problem_2 Plasma-glucose levels are used to determine the presence of diabetes. Suppose the mean in plasma-glucose concentration (mg/dL) in 35 – 44 – years – olds is 4.86 with standard deviation 0.54. A study of 100 sedentary persons in this group is planned to test if they have higher or lower level of plasma glucose than the general population. a. If the expected difference is 0.10 units, then what is the power of such a study if a two-sided test is to be used with = 0.05. b. Answer to the same problem such in a if the expected difference is 0.20 units. Problem_3 Much discussion has appeared in the medical literature in recent years on the role of diet in the development of heart disease. The serum-cholesterol levels of a group of people who eat a primarily macrobiotic diet is measured. Among 24 of them, aged 20 -39, the mean cholesterol level was found to be 175 mg/dL with a standard deviation of 35 mg/dL. a. If the mean cholesterol level in the general population in this age group is 230 mg/dL and the distribution is assumed to be normal, then test the hypothesis that the groups of people on a macrobiotic diet have cholesterol levels different from those of general population. b. Compute a 95% confidence interval for the true mean cholesterol level in this group. Problem_4 One method for assessing the effectiveness of a drug is to note its concentration in blood and/or urine sample at certain periods of time after giving the drug. Suppose we wish to compare the concentrations of two types of aspirin (type A and B) in urine specimens taken from the same person, 1 hour after he or she has taken the drug. Hence a specific dosage of either type A or a type B aspirin is given at one time and the 1 hour urine concentration is measured. One week later, after the first aspirin has presumably been cleared from the system, the same dosage of the other aspirin is given to the same person and the 1 hour urine concentration is noted. Since the order of giving the drugs may affect the results, a table of random numbers is used to decide which of the two types of aspirin to give first. This experiment is performed on 10 people; the results are given in table ... Table x.... Concentration of aspirin in urine Aspirin A 1 hour Aspirin A 1 hour concentration (mg%) concentration (mg%) Person 1 15 13 2 26 20 3 13 10 4 28 21 5 17 17 6 20 22 7 7 5 8 36 30 9 12 7 10 18 11 Mean 19.20 15.60 sd (standard deviation) 8.63 7.78 Suppose we wish to test the hypothesis that the concentrations of the two drugs are the same in urine specimens. a. What are the appropriate hypotheses? b. What are the appropriate procedures to test these hypotheses? c. Conduct the test. d. What is the best point estimation of the difference in concentrations between the two drugs? e. What is a 95% confidence interval for the mean difference? Problem_5 Suppose we wish to test, in a population, if there is an association between smoking and lung cancer. It was observed a sample of 400 people and it was found the next situation: smoking + smoking- + Lung cancer 100 120 Lung cancer- 70 210 a. What are the appropriate hypotheses? b. What are the appropriate procedures to test these hypotheses? (use 0.05 and for this value and one degree of freedom the acceptance region is given by [3.84, ) )