ICS3M – Assignment #3 “FINDING THE LINE OF BEST FIT” In this assignment we are going to look at using arrays and applets to do some introductory statistics. The data set we are using compares the hours a student studies with the grade achieved in a particular class. The data will be used to create a “scatter plot”, similar to the one shown below. ICS3M Marks Scatterplot 120 Grade Achieved 100 80 60 40 20 0 0 5 10 15 20 Hours Studied The next step in the process will be to add the “Line of Best Fit”. The line of best fit is a linear approximation of the data. (Simply put, it’s the straight line that best fits the data.) ICS3M Marks Scatterplot 120 Grade Achieved 100 80 60 40 Line of Best Fit 20 0 0 5 10 15 20 Hours Studied ICS3M – Assignment #3 “FINDING THE LINE OF BEST FIT” Calculating the Line of Best Fit Recall that our equation of a line is as follows: Y = mX + b We can calculate the values of m and b, the formulas are described using „sigma‟ notation. These formulas might look difficult at first, but if you split them up into pieces and store the result of each piece then the formulas are not too hard to handle. N N N ( X i )( Yi ) ( X iYi ) i 1 N i 1 i 1 m N 2 X i ( X i ) i 1 N N 2 i 1 N N Yi Xi b i 1 m i 1 N N Unfortunately the formula for calculating the line of best fit is not straightforward. It involves many summations of the data. In this assignment we will be using sigma notation (∑ „sigma‟). Here’s an example of how to convert a sum of ‘N’ numbers into a loop in Java. Sigma Java N int sumX = 0; sumX X i for ( int i = 0 ; i < N ; i ++ ) i 1 sumX + = x [ i ] In sigma Remember that notation, we start array indices start at 1 and go to N. at 0. So we loop from 0 to N – 1. ICS3M – Assignment #3 “FINDING THE LINE OF BEST FIT” What is to be submitted: Hard copy of the program (print out) All program files on a floppy disk (Attach to hard copy). Saved copy of the program on the shared I: drive. External documentation containing: Pseudo-code A write-up summarizing the program design (1 page minimum). A spreadsheet including data and scatter plot to verify solution to program. Hand calculated results (or spreadsheet showing totals of the summations above). Try to follow these steps to successfully completing the assignment: Step 1 Complete pseudo-code (Expand on steps 2-6). Test the data using spreadsheet software. Do the calculations by hand or using a spreadsheet to verify possible solutions. (This way you can check your program for errors as you progress.) Step 2 Start coding your program: Set-up the applet. Try to use methods wherever applicable, they will make the programming easier. Step 3 Draw the scatter plot using the data in your array(s). Use a dot, an ‘x’ or a small circle for each data point. Use a nice colour scheme: try different colours to show pass / fail, or different colours for each Level of Achievement (0-49, 50-59, 60-69, 70-79, 80-100). Label your axes. Step 4 Calculate the line of best fit. Use hand calculations from step 1 to check program for errors. Step 5 Draw the line of best fit. Label the line. Show equation of the line. Step 6 Attempt any of the higher expectations you can do. ICS3M – Assignment #3 “FINDING THE LINE OF BEST FIT” /100 Marking Scheme NAME: __________________________ SP2.01 – use constants, variables, expressions, and assignment statements to store and manipulate numeric, character and logical data in programs. SP2.02 – incorporate one-dimensional and two-dimensional arrays into computer programs. SP2.03 – write programs that use related arrays to store and extract data. K/U & App. Level 1 Level 2 Level 3 Level 4 Marks Scatterplot Scatter plot and Scatter plot and Scatter plot and Presentation of points are poorly presented points are adequately points are well presented Scatter plot and points is /20 presented exemplary Line of Best Fit Line of best fit is Line of best fit is Line of best fit is Line of best fit is incorrect and poorly displayed correct but poorly displayed correct and well displayed correct, labeled and well displayed /20 Higher *see assignment *see assignment *see assignment *see assignment Expectations for possible extras for possible extras for possible extras for possible extras /30 /70 SPV.03 – produce appropriate internal and external documentation SP2.09 – adhere to defined programming style, including naming conventions for variables and subroutines, indentation and spacing. SP2.10 – incorporate and maintain internal documentation to a specific set of standards, including author, date, file name, purpose, and explanatory comments. SP2.11 – develop external documentation to summarize the design. Comm. Level 1 Level 2 Level 3 Level 4 Marks SP2.09 Has barely Has partially Has mostly Has adhered to Naming and Indenting adhered to defined programming adhered to defined programming adhered to defined programming defined programming /5 style. style. style. style. SP2.10 Has incorporated Has incorporated Has incorporated Has many Internal Documentation few comments and header blocks some comments and header blocks many comments and header blocks meaningful comments and /5 header blocks. SP2.11 Has poorly Has adequately Has mostly Has effectively External Documentation summarized the programs design summarized the program design summarized the program design summarized the program design /5 /15 SP1.07 – solve the same problem using various tools (spreadsheet software). SP1.08 – verify solutions to problems. Think. / Inq. Level 1 Level 2 Level 3 Level 4 Marks SP1.07 Has poorly solved Has adequately Has mostly solved Has effectively Solve using spreadsheet the problem using spreadsheet solved the problem using the problem using spreadsheet solved the problem using /7 software. software spreadsheet software spreadsheet software software SP1.08 Has poorly Has poorly Has mostly Has excellently Verify Solution described and verified the described or verified the described and verified the described and verified the /8 solution. solution. solution. solution. /15 ICS3M – Assignment #3 “FINDING THE LINE OF BEST FIT” Additions to the project must be included in the program summary documentation. (5 marks each) Calculating and displaying the mean (average) for X & Y. N X i mean X i 1 N Calculating and displaying the median (middle value) for X & Y. Calculating and displaying the mode (most occurring value) for X & Y. Displaying all the data points in the applet (Labels). Calculating and displaying the variance and standard deviation. N (X i X )2 var S 2 i 1 N 1 S .Dev S var Instead of using Labels to list the coordinates, make your applet interactive by using TextFields and allowing the user to change data points. Update all of the information on screen after every change. (15 marks !!!) X Y ICS3M Marks Scatterplot 15 60 19 98 1 23 120 9 50 Grade Achieved 13 76 100 16 85 80 17 82 5 46 Mean = 66.4 60 4 32 Median = 65 Mode = 85 40 18 95 18 85 20 12 65 0 13 61 14 79 0 5 10 15 20 8 59 Hours Studied Mean = 12.13 Variance = 31.12 Median = 13 Line of Best Fit Standard Deviation = 5.58 Mode = 13 or 18 Y = mX + b