CHAPTER 3 METHODS AND PROCEDURES
This chapter presents the specific methods and procedures used to conduct the data collection and analysis for the study. The study is comprised of four main objectives and this Methods and Procedures chapter attempts to address the particularities of each, reflecting the methodological techniques required for the different research questions and objectives for each section.
The chapter first describes the research design employed to carry out the study and its philosophical underpinnings. The second section of the current chapter then describes the methods employed for developing the MRPI, including all of the procedures involved for carrying out two pilot tests and a field test of the MRPI. The final two sections of the chapter detail the development of the norming tables and normed scores, which resulted in the MRPI religiosity benchmarks.
Research Design Exploratory-Descriptive Study Design
This study included several methodological processes such as concept or definition development (conceptualization), survey or measurement development
(operationalization), and norm development (application). This three-phase process of measurement is closely related to the model outlined by Punch (1998) that includes: 1) defining the concept; 2) selecting measures for the concept; and 3) obtaining empirical information from the measures. He adds a fourth sub-category of evaluating the validity of the measures, assessing to what extent the indicators
represent the concept empirically. The current study reflected Punch‟s three-phase methodological process by: 1) development/defining a concept of religiosity from the Islamic perspective based on the literature; 2) developing/creating a measurement instrument for measuring the concept of religiosity from the Islamic perspective; and 3) conducting a norming analysis of Malaysian Muslim youth, based on data collected from the developed instrument.
Based on the above three methodological phases, the current study was both exploratory and descriptive in nature, reflecting the gaps in the literature (as discussed in Chapter 2) in the area of conceptualization and instrumentation of religiosity from the Islamic perspective. The exploratory element of the design reflects the first two phases – concept and instrument development, and the descriptive element is represented by the third phase, which included a norming analysis of the study data for the purpose of developing Islamic religiosity benchmarks for Malaysian Muslim youth.
Norming is typically affiliated with standardized testing in educational settings or in psychological testing. Kubiszyn and Borich (1996) claimed that the purpose of testing is to provide objective data that can be used along with subjective impressions to make better educational decisions. They discussed two main types of tests used to make educational decisions: criterion-referenced tests and norm-referenced tests. Criterion-referenced tests provide information about a student or test subject‟s level of proficiency in or mastery of some skill or set of skills. This is accomplished by comparing their performance to a standard of mastery called a criterion. Such information tells us whether a test recipient needs more or less work on some skills
or sub-skills, but says nothing about the individual‟s performance relative to others (Rodriguez, 1997).
Norm-referenced tests, on the other hand, yield information regarding the test taker‟s performance in comparison to a norm or average of performance by similar individuals. Norms are statistics that describe the test performance of a defined group (Rodriguez, 1997). As Brown (1976) noted, potentially there are a number of possible norm groups for any test. Since a person's relative ranking may vary widely, depending upon the norm group used for comparison, Brown claimed that the composition of the norm group is a crucial factor in the interpretation of normreferenced scores (Rodriguez, 1997). In the current study, norming was not conducted on an educational-based or standardized test, but rather in the field of religious psychology. Therefore, the objective of norm development was not to compare students to determine areas of strength and weakness in regard to academic proficiency. Rather, norming was conducted to develop benchmarks for understanding differences and similarities in different areas of Malaysian Muslim youth religiosity according to a variety of demographic variables.
Operationalization of ‘Islamic Religiosity’: Development of the Measurement Instrument Procedures – Pilot Test 1
The second phase of the study, the operationalization of the Islamic religiosity concept or the development of the measurement instrument, included going from the concept developed from the review of the literature to the creation of a Muslim religiosity measurement instrument. To accomplish this, in addition to the development of items and survey dimensions, various statistical analyses were used
to establish reliability and validity for the measurement instrument. This is required when a concept or construct is translated into a functioning and operating reality (the operationalization of it), and there is a need to know how well the translation was conducted (Trochim, 2002).
Following the development of the concept for Islamic religiosity, the next task was to cull out the major constructs used to develop operational definitions, followed by actual survey items. This process entailed the further development of the religiosity concept into sub-units, or major dimensions. Dimensions for the religiosity instrument were extracted from the literature review of the study. Once the major dimensions were defined for the religiosity concept, operational definitions were then developed. The use of operational definitions allowed for the creation of survey items using the rational or deductive method, which relies on a theory or theoretical model to develop items and constructs (Burisch, 1984).
There are a number of standardized models that can be used to develop religiosity and personality dimension measurements. One well-known model is the Model of Measurement Development by Brown (1983) (see Appendix A). The current research incorporated an adapted version of the Brown model (see Appendix B), adjusted according to the specific objectives and uniqueness of the current study.
From the operational definitions for the religiosity concept, survey items for each of the concepts were developed in a creative process. To maintain a spirit of authenticity and religious orthodoxy from the perspective of the majority school of Islamic thought, that of the ahl al Sunnah wal jama’at, the development of survey
items were guided by the two main sources of traditional Islamic knowledge, the Holy Qur‟an and sayings (hadith) of Prophet Muhammad.
Guided by the operational definitions and the above-mentioned sources of traditional Islamic knowledge, items were developed, compiled and refined culminating in a pilot survey. In terms of item development, Golden, Sawicki and Franzen (1984) listed three possible sources for items: theory, nomination by experts, and other tests. They indicated that whatever their source, items initially are selected on the basis of their face validity. That is, the items appear to measure what they are intended to measure. When test developers employ a deductive or theory-based approach, such as was done in the current study, item selection often stops when theoretically relevant items are written. Most test developers, however, typically administer the items and score the responses according to empirical strategies based on pilot testing (Meier, 1994).
Though the current study relied on a rational or theory-based approach, where applicable, items from existing religiosity and personality measurements that were consistent with the goals and criteria of the present research were considered for the pilot instrument. This research aimed to incorporate items that were either directly or closely related to the sources of Islamic knowledge noted previously. Items from other instruments were only included if they fulfilled this criterion. The original pilot survey comprised 214 items in total (see Appendix C for original survey items). According to Golden, Sawicki and Franzen (1984), the initial item pool during test development should be two to four times the final number of desired items.
Accordingly, the current study started with more than twice the amount of items (224) as were included in the final version of the instrument (107).
Prior to pilot testing the instrument, the method of scaling was determined. According to Trochim (2002), the first step in scale development is to determine what it is that one is trying to measure. This was determined based on the major constructs resulting from the first phase of the study. As the pilot instrument was comprised of two major dimensions – Islamic Worldview and Religious Personality - Likert-like scales or Likert-type scales using five response alternatives (1 – 5) were used to measure respondents‟ level of agreement with item statements (Clason & Dormody, 2000). The Islamic Worldview scale was comprised of items assessing respondents‟ perceptions towards statement about Islamic aqidah (creed/fundamental belief). Therefore, a Likert-like scale was developed (see Chapter 4 for item responses for both dimensions). For the Religious Personality scale, a Likert-type scale was used as this particular dimension attempted to assess frequency of behaviors in the daily lives of respondents.
Participants – Pilot Test 1
The target population for pilot testing was Malaysian Muslim youth between the ages of 16 to 35, of any race. Although a majority of Malaysia‟s Muslims are ethnic Malays, this study was not interested in race per se. The population of youth selected, which was also the target for the field study, could be more aptly considered „young adults‟ (Boeree, 1997). Young adults are a transition group that tends to either be engaged in higher education or the world of work for the first time. Often, they are energetic and idealistic and are on the brink of taking the full responsibilities of
adults (Azimi, Turiman & Ezhar, 1997). This 16-35 year-old population is, according to Erik Erikson, in the human developmental stage of which the task is to achieve a degree of intimacy with others, as opposed to remaining in isolation. “Intimacy is the ability to be close to others, as a lover, a friend, and as a participant in society” (Boeree, 1997). The developmental need for young adults to become active participants in society makes this group an important population for inquiry into religiosity and ostensibly, nation building.
Once the pilot instrument was solidified, the first pilot test was conducted and included four groups of youth ages 16-35. Pilot test 1 included 224 respondents sampled from four different groups of young people. The breakdown of the sample was as follows:
Table 3.1: Sample Size and Distribution for Pilot Test 1
Rakan Muda Youth Program
Serdang Youth Center (Pusat Belia) 10
Female Prison Inmates (Penjara Wanita) 15
The sample size of 224 was sufficient for determining reliability and for conducting item analysis, as reliability tends to increase with sample size (Statsoft, 2003). Findings from the pilot test, including reliability and item measures, were used to further refine the instrument. For the first pilot test, a convenience sample was selected based on their locality being close to UPM and representing a mix of social affiliations or backgrounds. For example, the UKM sub-sample, the largest
represented group in the sample, was comprised of students from an introductorylevel Islamic studies course. The UPM sub-sample was comprised of psychology students. The Rakan Muda group was comprised of young people involved in a sports/recreation focused youth organization, while Pusat Belia represented another non-sports focused youth organization. Finally, the Penjara Wanita sub-sample was comprised of female prison inmates. In this way, the pilot test sample was represented by young people from fairly diverse backgrounds.
Data Collection – Pilot Test 1
For the first pilot test, surveying was conducted by the primary researcher and a coresearcher. Surveys were administered to respondents in classroom settings and respondents were asked and invited to be honest with their reporting. An introduction of the project and purpose of the surveys was provided along with instructions on how to complete the surveys. Respondents were then reminded that their answers would be anonymous and, as such, they should not be afraid to answer honestly. No time limit was given, as all respondents completed the surveys within one half hour.
Data Analysis – Pilot Test 1
For the first pilot test, upon collection of the data, the data was entered into SPSS (Statistical Package for Social Sciences), version 11. Data was also explored and cleaned – i.e. any input errors were removed to avoid non-sampling errors. In addition, reverse scored items were treated so that all items could be summed according to the same scale. Tests of normality were conducted including visual examination of both histograms and box plots, along with numerical statistics such as skewness and kurtosis descriptive measures. As a final step to test the normality of
the data, normality plots such as the Q-Q plot and Detrended Q-Q plot were executed using SPSS.
Upon completion of the first pilot test, following data cleaning and clearing, items were reviewed using statistical measures such as Cronbach Alpha test for reliability (internal consistency) and item analysis, for the purpose of refinement and reducing the overall number as previously mentioned. According to Clason and Dormody (2000), Likert‟s original work assumed an attitude scale would first be pilot tested for reliability assessment of the individual items. This reliability assessment includes the correlation (ratio) between the individual item score and the total score (itemtotal correlation). Items that did not correlate with the total were discarded.
Reliability is the internal consistency or stability of assessment results. The usual definition of reliability refers to a measurement method's ability to produce consistent scores (Meier, 1994). The most frequently used methods for examining the reliability of forced-choice assessments are: 1) internal consistency reliability, 2) equivalent forms reliability and 3) repeated measures reliability. Internal consistency was used in the current study to measure the reliability for each of the dimensions of the pilot instrument. Internal consistency testing provides an estimate of test stability without having to administer the test twice (Meier, 1994). Reliability was thus measured using the SPSS Cronbach Alpha test for internal consistency.
“Content-related validation involves essentially the systematic examination of the test content to determine whether it covers a representative sample of the behavior
domain to be measured” (Anastasi, 1988, pp. 140). Content validity was conducted using the following three methods: Literature Review. A literature review for content was conducted and then
analyzed using an informal content analysis approach to establish the foundation for a concept for religiosity from the Islamic perspective. Based on the concept and information from the analysis along with gaps in the literature, a model was developed for religiosity from the Islamic perspective and subsequently used to guide the development of the test instrument.
A committee of arbiters was convened comprised of
academicians who had expertise in areas relating to the study, namely, Islam and psychology or psychological testing. The committee was comprised of seven arbiters. The arbiters were encouraged to provide comments or suggestions for every item on the survey, according to the operational definitions for each dimension. A special form was provided for them to record their responses and final decisions on each item, whether to keep the item as presented to them, make edits to it, or remove it from the instrument. The panel of arbiters included:
1. Dr. Saiyed Fareed Ahmad (Islamic Sociology and Measurement, UIA) 2. Dr. Jamil Farooqi (Islamic Sociology, UIA) 3. Dr. Noor Mohammad Osmani (Quran and Sunnah, UIA) 4. Dr. Mustaffa Mahmood Al-Emam (Psychology and Measurement, UIA) 5. Dr. Mohd. Shah Jani (Quran and Sunnah, UIA) 6. Dr. Zahid Emby (Social Anthropology, UPM) 7. Dr. Muhammed Awang (Pyschology and Measurement, UKM)
A criterion of 70% of arbiter consensus was used to determine whether to act on any arbiter suggestions to either delete or revise the items (Norri, 2004). Additional comments and suggestions were also recorded. The table and data were analyzed twice. The pilot survey was thus edited and revised according to arbiter suggestions, taking into consideration other factors such as results from item analysis/item discrimination along with feedback from the Research Team.
Research Team. A Research Team of experts from diverse academic backgrounds (UPM and UKM researchers) was used to further ensure content validity of the items of the instrument. Their feedback and input during the item development phase helped to ensure that the items reflected the correct and comprehensive content domain of the religiosity construct and dimensions. The Research Team provided final say and determination in regard to the deletion or retention of the survey items, taking into account arbiters‟ evaluations and results from statistical analysis of the items.
Item analysis is the process of evaluating each item to determine its overall effectiveness. Item analysis "investigates the performance of items considered individually either in relation to some external criterion or in relation to the remaining items on the test" (Thompson & Levitov, 1985, p. 163). Item analysis has both quantitative and qualitative testing elements and both were employed in the current study. Quantitatively, instrument homogeneity was assessed using item analysis and specifically the SPSS Cronbach‟s Alpha reliability test. Toward this end, the alpha if item deleted value for each item was listed and every item reviewed
in detail. The computation of Cronbach‟s Alpha when a particular item is removed from consideration was used to determine the item‟s contribution to the entire test‟s assessment performance (SPSS, 1998). If the alpha if item deleted was greater than the standardized item alpha for the dimension as a whole, then the item was removed from the scale (Sidek Mohd Noah, 1998). In addition, the corrected item-total correlation values, which display the corrected point biserial correlation and measure the association between individual test items and overall test performance (SPSS, 1998), were listed to help determine which items to retain. This measure is the correlation between an item and the rest of the scale, without the item considered part of the scale. If the correlation is low, it means the item is not really measuring what the rest of the test is trying to measure (Sherry, 1997). Following item analysis, items retained on the scales were purged from typos, grammatical and sentence structure errors. Any „low quality‟ or unclear items were also removed from the pilot instrument.
To establish construct validity of the pilot survey, factor analysis was conducted to confirm the hypothesized underlying dimensions of the scales. Typically, the main applications of factor analysis are: (1) to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify variables. Used as a structure detection method, such as in the current study, findings from factor analysis may add evidence for the construct validity of the scales (validation of hypothesized factors) (Kinnear & Gray, 1999). A structure detection or confirmatory factor analysis was thus conducted using SPSS, to confirm the existence and validity of the religiosity scale dimensions that were developed at the outset of the study.
Pilot Test 2 – Establishing Normality of the Scales
Due to major revisions of the measurement instrument following the first pilot test, a second pilot test was conducted. In addition, due to the non-normality of the Islamic Worldview dimension in Pilot test 1, a second pilot test was conducted.
Participants – Pilot Test 2
The second pilot test included a smaller sample, (n = 93), taken from only two groups of young people. The sample for the second pilot test is illustrated in the following table:
Table 3.2: Sample Size and Distribution for Pilot Test 2
Kiblat School 35
Pilot Test 2 was conducted with two groups of youth. The first was a class of Universiti Kebangsaan Malaysia (UKM) (first-year) students and the second was a private Islamic high school – the Kiblat School - in Kajang, Selangor state, which was comprised of a class of Form 4 – 6 students. The researcher along with one other co-researcher conducted surveying of the two groups.
Data Analysis – Pilot Test 2
Pilot Test 2 was conducted to retest the Islamic Worldview scale, due to the results of the first pilot test indicating a negative skewness (skewness measure = -1.371). Analysis of Pilot Test 2 data thus focused primarily on re-examining the data to
ensure normality and reliability (Cronbach Alpha) of the scales following the major revisions from the first pilot test.
Figure 3.1: MRPI Development - Reliability and Validity Methodology Flow Chart
Chronbach‟s Alpha – Internal Consistency
Item-Total Correlation/Alpha if Item Deleted
Arbiters Evaluation Research Team Feedback Evaluation Item Discrimination Index (Field test only)
Further Refinement of the Religiosity Instrument – Field Test
Upon completion of the second pilot test and the establishment of normality for both of the major MRPI scales, the religiosity instrument was field tested among 1,990 Malaysian Muslim youth. Upon completion of the field test, further refinement of the MRPI instrument was conducted based on results and feedback from the field test.
Following the field test of Malaysian Muslim youth, the data was cleaned and analyzed for further refinement of the MRPI. Instrument refinement began with exploratory data analysis that was conducted to examine the data for errors and to examine the nature of the data, including assessing normality for each of the two religiosity scales.
Data Cleaning – Field Test
Data cleaning included the removal of type-o‟s and cases with a high number of missing values (i.e. more than 10). All other cases with missing values were retained in the sample and were replaced with the series mean for the item using SPSS. This was done in order to retain as many cases as possible. Following the deletion of cases and replacement of missing values, negatively worded (i.e. reverse scored) items‟ scores were reversed in order to allow for summation of the scales.
Social Desirability/Outlier Scale – Field Test
One important addition to the EDA methodology for the field test data was the creation of a social desirability/outlier scale as an additional data screening and data cleaning procedure. Such scales are typically used to filter outlier cases caused either from socially desirable responding (SDR), i.e. the tendency to give answers that
make the respondent look good (Paulhus, 1991), or from a very small sub-group of respondents whose responses are honest, yet unusually high. Such outliers tend to skew the data.
This is a common problem in surveying, particularly in fields where there is the desire to make oneself appear better than they may be, such as in the case of religiosity measurements. Poll data on religious behavior and practice are notoriously unreliable, as individuals often describe their own behavior inaccurately by answering questions according to what they think they should be doing (Robinson, 2001). For example, many polls indicate that the percentage of adults who regularly attend a religious service is about 40% in the U.S., 20% in Canada, and perhaps 10% or less in Europe. But when numbers are actually counted, the true figures are about half the stated figures (about 20% in the U.S. and 10% in Canada.) The 50% figure also appears to apply in the UK (Robinson, 2001). In regard to this phenomenon, Furlong (2002) commented: "...people questioned about how much they go to church, give figures which, if true, would add up to twice those given by the churches” (p.216).
To address this critical issue as it pertains to the current study, a social desirability/outlier scale was developed for the second of the two religiosity scales – the Religious Personality dimension – to attempt to filter out cases representing extreme outliers. Done in this way, no assumptions or accusations of lying by respondents were made; rather, cases were removed that indicate unusually high scoring, regardless of the reason.
The social desirability/outlier scale was constructed by identifying six items (behaviors) from the Religious Personality scale that indicated the possibility of an inclination toward social desirability in relation to religious and social behaviors. The items selected were:
6) I use the lessons from the Qur‟an and Hadith in my conversations. 23) I am the first to give salam when meeting another Muslim 46) I make effort to fulfill my promises 58) I do non-obligatory prayers (salat sunnat) wherever I am 75) I perceive all non-Muslims that I see as potential Muslims 85) I gossip about others (reverse score item)
The scale was constructed and executed according to the methodology proposed by Sidek Mohd. Noah (1998). Among the six items, if respondents responded with a score of 5 (always) on four or more of the six items (> 66.67%), they were subsequently deleted from the sample, as such a pattern of responses indicates a strong possibility of the case being an outlier.
Age-Related Scoring Adjustments - Field Test
One additional adjustment made following the field test and prior to norm development was the scoring for one particular item on the Religious Personality scale. Question number 15 had to be adjusted due to its wording. Upon further examination of the MRPI instrument following the field test, it became apparent that item #15 was questionable or not relevant for certain groups of respondents,
particularly those who responded (in the demographic data profile) that they did not have any income.
The item in question was: 15) I do not pay alms (reverse score item)
The above item was believed to be irrelevant to a number of young people who do not qualify for paying alms because many young people surveyed do not have any income. Item #15 was thus „treated‟ by replacing all those respondents who did not report an income or who reported an income of „0‟ with missing values.
In addition to item #15, several other items had to be removed completely from the Religious Personality scale analysis due to their potential irrelevance to a large number of youth respondents. For example, items about relations with siblings would not be relevant to young people without siblings. Since the likert-like scales used in the surveys did not include an option such as “not applicable”, these items had to be removed from the analysis. The items were: 27. I easily forgive my siblings when they hurt me 43. I care about my good relations with my siblings 82. I make a serious effort to fulfill wedding invitations 89. My siblings and I compete in serving our parents 92. I offer my guests the best of what I have when I am hosting them in my home
These five items were subsequently removed from the analysis of the Religious Personality dimension.
Item Analysis and Item Discrimination – Field Test
Although reliability and item analysis were conducted following the first pilot test of the MRPI instrument, a second round of item analysis in addition to an item discrimination test were undertaken following the field test to ensure that all items on the scale were valid and discriminating, given the major revisions made to the scales following pilot testing. This additional step of item discrimination was taken due to the length of the instrument (150+ items) and the fact that pilot test results indicated the possibility that the Islamic Worldview scale was „too easy‟, given its nature as a basic knowledge scale. Item discrimination was thus used to help ensure that all the items had acceptable power to discriminate between high and low scorers. In addition, in scale development and refinement, one often goes through several rounds of generating and eliminating items, until one arrives at a final set that makes up a reliable scale (Statsoft, 2003).
For the item discrimination test, using Kelley's (1939) "27% of sample" group size, values of 0.4 and above are regarded as high and less than 0.2 as low (Ebel, 1954). According to others, 25% can also be used as a cutoff for the high and low scoring groups (Alabama Department of Education, 2002). Following Ebel‟s cutoff of 0.2, the current study used a 0.2 cutoff point for the Islamic Worldview scale, however, a 0.3 threshold was used for the Religious Personality scale as only 6 out of 100 items had discrimination indices below 0.2. Thus, for the Religious Personality scale a higher threshold of 0.3 was used in order to more significantly reduce the size of the scale as well as increase its power of discrimination between low and high scorers.
The discrimination test was conducted by first ordering the total scores for each of the main constructs from highest to lowest. Then, the top and bottom 27% of the sample was retained and the middle 46% deleted. Scores were then converted. Responses of 4 or 5 were converted to „1‟ (i.e. „high‟) while responses of 1 to 3 were converted to „0‟ (i.e. „low‟). Then, for each item on the MRPI, the total summed score for the bottom 27% of the sample was subtracted from the total summed score for the top 27%. This number was then divided by the total number of respondents in each of the groups. The result was an index ranging from 0 to 1. The higher the index, the more discriminating the item proved to be.
Following pilot testing and revisions, the current study relied on a quantitative norming methodology to develop group norming tables for the Islamic religiosity dimensions and their sub-dimensions. Norms – average results for large, representative population-based samples – can provide a useful reference point from which to compare results for large, diverse populations (Hermann & Provost, 2003). Norms are statistics that describe the test performance of a defined group of testtakers or respondents (Noll, Scannell & Craig, 1979). The process of constructing norms is called norming. McDaniel (1994) argued that the result of norming is a table that allows the user to convert any raw score to a derived score that instantly compares the individual with the normative group. There are several types of normreferenced scores (also called derived scores). Brown (1976) discussed four major types: percentiles, standard scores, developmental scales, and ratios and quotients (Rodriguez, 1997).
Criterion-Referenced vs. Norm-Referenced Tests
There is an important difference between norm-referenced and criterion-referenced tests. In criterion-referenced tests, scores are compared to some absolute measure of behavior, a criterion; on the other hand, norm-referenced scores are compared among individuals (Glaser, 1963). Norm-referenced test takers are therefore not compared to any so-called absolute measure, but rather to one‟s norm group. Thus, normreferenced tests do not claim to test participants against an absolute standard, but rather against one‟s peers. This is more appropriate for religiosity testing, due to the philosophical nature of religion as a topic of inquiry and the many challenges in measuring it, about which an absolute criterion for success is impossible to determine.
The procedures undertaken for establishing group norms for the current study followed the norming model proposed by Crocker and Algina (1986):
Figure 3.2: Model for Conducting Norming Study (Source: Crocker & Algina, 1986) Identify the population of interest
Identify the most critical statistics that will be computed for the sample data
Decide on the tolerable amount of sampling error
Devise a procedure for drawing a sample from the population of interest
Estimate the minimum sample size required to hold the sampling error within the specified limits
Draw the sample and collect the data
Compute the values of the group statistics of interest and their standard errors
Identify the types of normative scores that will be needed and prepare the normative conversion tables
Prepare written documentation of the norming procedure and guidelines for interpretation of the normative scores
Key steps from the model that were included in the norming methodology for the current study are detailed below.
Identify the Population of Interest
The first step in norming according to Crocker and Algina, i.e. identifying the population of interest includes a general category that one wishes to know more about through the research, to which findings can be generalized. A population is a number of individuals or objects in a group with at least one similar characteristic (Konting, 2000). For the current study, the target population of interest was Malaysian Muslim youth ages 16-35.
Identify Critical Statistics
Statistics included in the current study for the purpose of norming focused on measures of central tendency such as mean, median, range and standard deviation. In addition, from the norming literature, in order to assess overall performance, most psychological tests employ a standardization sample that allows the test makers to create a normal distribution that can be used for comparison of any specific future test score (VerWys, 2001). Norms in the current study were therefore developed according to a standard scoring (or z-score) method. Hopkins and Stanley (1981; p. 52) defined standard scores as "scores expressed in terms of a standard, constant mean and a standard, constant standard deviation." Standard scores are obtained by dividing each deviation score (subtracting the mean raw score from each raw score) by the standard deviation of the particular distribution: z=x-X/s where,
z = the standard score x = the raw score X is the mean raw score s is the standasd deviation of the distribution (Rodriguez, 1997). Brown (1976; p. 185) discussed the following five properties of standard (z) scores: 1. They are expressed as a scale having a mean of 0 and a standard deviation of 1. 2. The absolute value of a z score indicates the distance of the raw score from the mean of the distribution. The sign of the z scores indicate whether the raw score falls above or below the mean; scores above the mean will have positive signs; scores below the mean, negative signs. 3. Inasmuch as standard scores are expressed on an interval scale, they can be subjected to algebraic operations. 4. The transformation of raw scores to standard scores is linear. Thus, the shape of the distribution of z scores is identical to the distribution of raw scores. 5. If the distribution of raw scores is normal, the range of z scores will be from approximately -3 to +3.
Standard or z-scores, usually between –3 and +3 for a normal distribution, can be converted into a linear scale. For the present study, the z-scores were „transformed‟ or converted into a scale of 1 – 6 indicating six „levels‟ of religiosity: very low, low, moderate-to-low, moderate-to-high, high and very high, as follows:
Very Low MRPI Score = 1 (<-2SD)
Low MRPI Score = 2 (-2SD to – 1SD)
Moderate -to-Low MRPI Score = 3 (-1SD to Mean)
Moderateto-High MRPI Score = 4 (Mean to 1SD)
High MRPI Score = 5 (1SD to 2SD)
Very High MRPI Score = 6 (>2SD)
z = -2
z = -1
Using the above z-score method, the norm tables for each religiosity dimension were developed leading to norm-based scales from which ‟levels‟ of religiosity were derived for MRPI respondents. The use of z-scores, using the mean and standard deviation of the norm group as a benchmark, allows for a norm-referenced comparison rather than an objective assessment based on some identified criteria of mastery. For example, on a criterion-referenced test, it is predetermined that to obtain a score of 6, or ‟very high‟, respondents must score greater than 90%. A normreferenced test, however, sets no objective criterion; rather, the criterion is established by the norm group‟s results. In the case of the current study, that criterion is determined by the z-score method, which implies the criterion is determined according to the norm group‟s mean and standard deviation scores.
The z-score method, which uses a relative approach to criterion determination, is more appropriate given the nature of the MRPI. As a religious-based scale, there is no realistic objective criteria for determining levels of religiosity as previously discussed in the section on Limitations of the Study in Chapter 1. A norm-based approach, rather, allows for the assessment and determination of levels of religiosity without having to identify an objective criterion. Thus, the characteristics of the norm group, from which religiosity levels are derived, is of critical importance in a
norm-referenced test. Z-scores used for MRPI religiosity measurement are thus based on the norm group mean and standard deviation and tell us – in relative terms – ‟levels‟ of religiosity. These derived levels allow us to then compare different groups and individuals according to the operational definitions for each of the constructs comprising the religiosity model.
Using the above z-score method, the standard deviation was used to differentiate different ‟levels‟ of religiosity, as explained above. Accordingly, if a respondent‟s score was within the mean and one standard deviation (+1), that respondent was depicted as being in the moderate-to-high range. Likewise, if a respondent‟s score was between the mean and negative one standard deviation (-1), that respondent was deemed as being in the moderate-to-low range. These ‟levels‟ are relative to the overall norming sample and its mean and standard deviation. The use of the labels ‟very low‟ to ‟very high‟ are to depict the respondents‟ score on a relative basis, that is, juxtaposed against the entire norm group.
Determine an Acceptable Amount of Sampling Error
Crocker and Algina (1986) identify the third step in the norming framework as “deciding on the tolerable amount of sampling error (discrepancy between the sample estimate and the population parameter) for one of more of the statistics in step 2. (Frequently the sampling error of the mean is specified.)” (Rodriquez, 1997; para 1 „Norming‟ section). When one computes the mean for a norming sample, one obtains an estimate of that parameter in the population. This estimate is subject to sampling error. If all the possible samples of a given size were drawn from the population and the mean calculated for each sample, then it would be possible to
describe the sampling distribution of the mean. The standard deviation of this distribution of means is called the standard error of the mean (SM). The SM is estimated on the basis of a single sample by the formula: SM = σ / ^n Where, σ = Standard deviation of scores for the sample ^n = square root of the sample size
As can be seen from this formula, the two determinants of the accuracy of the sample mean are the standard deviation of the sample and the size of the sample group. Thus, the greater the variability, the larger the sample size needed to achieve a given level of sampling error (Rodriguez, 1997).
Level of error can be determined in a number of ways depending on how much one determines as acceptable (Hill, 1998). Roscoe (1975) uses 10% as a "rule of thumb" acceptable level. According to Krejcie and Morgan (1970), the general rule relative to
acceptable margins of error in educational and social research is as follows: for categorical data, 5% margin of error is acceptable, and, for continuous data, 3% margin of error is acceptable (Krejcie & Morgan, 1970). Weisberg and Bowen (1977), in a
book dedicated to survey research, provide a table of maximum sampling error related to sample size for simple randomly selected samples. Their table insinuates that if you are prepared to accept an error level of 5% in your survey, then you require a sample size of 400 observations. If 10% is acceptable then a sample of 100 is acceptable, provided the sampling procedure is simple random (Hill, 1998).
In the current study, given the breadth of the sample and population, this step was not originally considered, as the Kreijcie and Morgan (1970) formula for determining sample size was relied on instead due to the size of the target population and the level of diversity of the sampling framework. These two factors led to the assumption that the sampling error would be somewhat large, and as such, was not relied on to determine sample size.
Given constraints in sampling, however, the Kreijcie and Morgan target was not achieved for four of the six sample sub-populations. To determine the adequacy of the sample size for the population as a whole, a retrospective analysis of the level of sampling error for each of the two norming variables was conducted. As reported in Table 3.3 below, the standard errors of the mean for the two norming variables were .313 (Islamic Worldview) and .852 (Religious Personality), which was expected given the diversity of the sampling framework. The diversity of the sample most likely contributed to the high standard deviations for the religiosity dimensions, leading to a high level of sampling error.
Table 3.3: Levels of Sampling Error for the Norming Variables
Variable Islamic Worldview Religious Personality
Standard Error of the Mean .313 .852
According to DeRoche (2005), the smaller the standard error of the sample‟s statistic, the smaller the confidence interval, thus the more precise the estimate of the population parameter. Given the much lower standard deviation for the Islamic
Worldview dimension as compared to the Religious Personality dimension, the reported standard error was lower. The amount of sampling error for the religiosity variables reported above, however, could not have been a product of a small sample size as n = 1,692 is a large sample size (Blaikie, 2003). According to Bartlett et al. (2001), researchers may increase the margin of error value when a higher margin of error is acceptable or they may decrease these values when a higher degree of precision is needed. For the current study, a higher margin of error was deemed acceptable due to the diversity of the sample and the study‟s exploratory nature and objectives.
Devise a Procedure for Drawing the Sample
There are several types of sampling design: random sampling; stratified random sampling; cluster sampling; and systematic sampling (Konting, 2000). Cluster sampling, one of the methods on which this study relied, is used when sampling units are comprised of more than one element (e.g., classrooms, schools, factories, city blocks). These aggregates or clusters of elements are then randomly selected, or the entire cluster is sampled. In its simplest form, cluster sampling consists of sampling clusters only once and treating all elements of the selected clusters as comprising the sample. This is referred to as single-stage sampling (Rodriguez, 1997).
Conversely, in multistage sampling, selection proceeds in stages, each of which requires a different type of sampling frame from which appropriate clusters are drawn. For example, if a researcher is interested in conducting a norming study with a sample of fourth graders in a particular state, a random sample of counties is first drawn. Second, within the counties selected, districts are randomly sampled. Third,
within each district, schools are randomly drawn. Fourth, within the schools selected, fourth grade classrooms are randomly sampled. Finally, all fourth graders within the classrooms selected comprise the sample. Alternatively, fourth graders may be randomly selected within classrooms (Rodriguez, 1997).
For the current study, with the target population of interest being so large (Muslim youth 16-35 years old), a multistage sampling design was employed. To sample such a large population, it was first reduced to six sub-populations or strata, labeled in the study as „cluster groups‟. The six clusters or sub-populations of youth selected for the study aimed to represent a cross-section or diverse sample of youth from the overall target population. Given the size of the target population and its huge diversity, groups were selected in an attempt to capture the diversity of the population in a measurable sample.
In this way, a stratified sampling methodology was incorporated at the outset. The six cluster groups or sub-populations were chosen from a master list (see Appendix D) of youth populations in Malaysia through brainstorming/feedback from the Research Team that included several youth studies experts. Based on their recommendations and feedback, the six sub-populations that were selected were: IPTA (Public Institutions of Higher Learning) students, political party members, youth organization members, youth factory workers, youth in Serenti (drug rehabilitation center) trainees, and „general youth‟ (youth-at-large). The overall sample size for the study, (prior to data cleaning) after summing the six individual youth clusters was 1,990 respondents. Accordingly, the six clusters and their subsamples represented “successful” or “achieved” youth, e.g. IPTA students and degree
holders; “general” or “unaffiliated” youth, e.g. Youth At-large; “troubled” youth, e.g. Serenti youth; “affiliated” youth, e.g. Youth Organization members, and so on. The diversity of the sampling framework, though challenging to complete, included many different „types‟ of young people broken down according to a variety of social groupings.
In light of the different sampling techniques mentioned above, following the initial breakdown of the overall population into sub-populations or strata, the current study then utilized a cluster sampling approach, which included different techniques for each sub-population, given various constraints on time and resources as well as the particular unique makeup of each youth cluster. For example, four of the six subpopulations or clusters were selected firstly according to randomly selected geographic locations, followed by other means. To determine the geographic locations for four of the sub-populations, the nation was broken into four zones: North (Perlis, Kedah, Penang and Perak states); Central (Selangor, Kuala Lumpur and Putrajaya states); South (Negeri Sembilan, Melaka and Johor states); and East (Pahang, Terengganu and Kelantan states). For each zone, one state was selected through simple random selection, and surveying was conducted exclusively in that state.
Table 3.4: States Selected for Sampling
Zone State Selected
Central Kuala Lumpur
The sampling framework, or specific sampling methods employed for each subpopulation are provided below.
IPTA Students. Selected from a randomly selected combination of universities, faculties and programs/departments from the 13 IPTAs throughout Malaysia. At the program/department level, either all available students were sampled or students were sampled according to class (due to time/resource constraints).
Youth Organizations. All youth organizations under the Malaysian Youth Council were considered. According to the four states randomly selected, organizations were then selected randomly. At the organizational level, either all members for the selected organization within each of the four states were sampled, or members were selected at the discretion of the organization staff.
Political Party Youth. Three of the predominantly Muslim parties were selected – PAS, UMNO and KEADILAN. Respondents were first selected according to the four randomly selected states, each party was sampled in each of the four states, or in cases where members could not be reached, at the discretion of each party‟s governing body.
Serenti Youth. ‘Fresh‟ trainees/inmates (i.e. in their first year) of government-run Serenti drug rehabilitation centers were sampled geographically based on four randomly selected states. At the site level, either all trainees of the given center that fit the sampling criteria were included, or Serenti staff selected participants based on their availability.
Youth At-large. „Supermarket surveys‟ – data gathered from this sub-population was taken from respondents at malls and shopping centers in the greater Kuala Lumpur area. Respondents were approached at random by enumerators at each
site, and subsequently selected only if they indicated a lack of affiliation with any formal youth groups. This sample was purposive in the sense that the target group was to fit specific criteria for inclusion in the sampling. The sampling technique used was chosen due to the unknown size of the target population, and the challenge in sampling its members based on the chosen criteria. Young Factory Workers. Respondents were first selected according to the four randomly selected states, followed by a random sampling of type of factory within each state. In some cases, the factory selected by random design could not be included due to sampling constraints (e.g., factories would not allow time off for sampling). In such cases, factories were selected according to availability due to time and resource constraints.
Figure 3.3: Sampling Flowchart for MRPI Field Test
Muslim Youth 16-35
Youth Organization Members
Political Party Members
Youth Factory Workers
Perlis Counseling; Accounting; Economics Human Ecology; English as a Second Language (TESL); Veterinary Medicine; Landscape Design Academy Islam; Law; Arts and Sciences; Medicine Engineering; Computer Sciences/IT
Johor Perlis Keadilan
Kelantan Kuala Lumpur
Angkatan Belia Islam Malaysia (ABIM); FELDA Youth; Persatuan Belia India Muslim Malaysia (GEPIMA) ; International Youth Center (IYC) - X-Games volunteers; Malaysian Association Youth Club (MAYC); Pertubuhan Kebajikan Islam Malaysia (PERKIM) ; Persatuan Kebangsaan Pelajar Islam Malaysia (PKPIM); Persatuan Puteri Islam Malaysia (PPIM); Persekutuan Pengakap Malaysia (Scouts); Gabungan Pelajar Melayu Semenanjunj (GPMS)
Central Market; Kota Raya; SoGo; MidValley; KLCC; Pertama Complex; Sunway Pyramid
Gabungan Pelajar Melayu Semenanjunj (GPMS) - (36) 118
Minimum Sample Size Required
The sample size determination was based on the goal of developing norming tables for the two religiosity dimensions (and sub-dimensions) according to the target population of Muslim youth ages 16-35. Following the breakdown of the sample by sub-population, the next step was to determine the number of respondents that were to be sampled within each, based on Krejcie and Morgan‟s (1970) formula and table. The Krejcie and Morgan formula for determining sample size was chosen due to its applicability to non-parametric tests, which require large sample sizes. Their table, therefore, represents conservative sample estimates.
According to Krejcie and Morgan (1970), populations of over 100,000 require a minimum sample size of 384 - 400 respondents. The youth population between the age group 15 to 34 in Malaysia is approximately 9.75 million and they are around 42.5% of the total population (Doraisamy, 2002). From this, it was deduced that the 16 - 35 year old study population well exceeded 100,000. The minimum sample size required for the target population for the current study, therefore, was 384 - 400. However, Isaac and Michael (1995) argue that a large sample size is essential when the total sample is to be sub-divided into several sub-samples to be compared with one another, and when the parent population consists of a wide range of variables and characteristics, and there is a risk therefore of missing or misrepresenting those differences (Hill, 1998). As the current study was broken down into sub-populations, i.e. cluster groups using a stratified sampling method, to improve the validity of the sample, a minimum sample size for each sub-population was determined in addition to the overall target population. This conservative sampling target was set at 384 for each sub-population, due primarily to the large and unknown sizes of several of the
sub-populations. However, due to the many constraints and challenges of such a large and ambitious sampling framework, the goal of 384 was not reached for four of the six sub-populations. The resulting sample size (following data cleaning) according to the six cluster groups was as follows:
Table 3.5: Field Test Sample Size According to Cluster Group
Cluster Group n
Youth Organization 429
Political Party 147
Youth Atlarge 244
Factory Workers 196
Nevertheless, given the overall sample size of 1,692, the sample was sufficient to develop the norming tables for the target population as a whole. For according to Blaikie (2003) and others, “…In studies with large populations, a sample of around 1,000 may be satisfactory and one of 2,000 will be very satisfactory” (p. 166).
Draw Sample and Collect Data
The author conducted the data collection with assistance from other members of the Research Team. In most cases, the author and Research Team members administered the surveys and were present when participants completed them. However, in certain cases where surveying was either far from the University or other logistical problems resulted, the surveys were given to a contact person along with instructions and the surveys were completed and mailed back to the researcher. This occurred mostly with those groups in Kelantan, Perlis and Johor states, though not frequently. The introduction and instructions for completing the surveys were given and the participants were walked through the survey‟s brief demographic introduction where participants were asked for personal information, excluding their names in order to
maintain anonymity. The participants were then provided approximately 30 – 45 minutes to complete the surveys. Details on the clusters or sub-populations follow:
1. IPTA Students The total number of IPTA students sampled for the current study prior to data cleaning was 427 Morgan. Based on the methodology above, four IPTA institutions were randomly selected from the master list of 13. The schools chosen were: UPM (Serdang), UPSI (Tanjung Malim), UM (K.L.) and UTM (Johor Bahru). From these four institutions the following faculties/departments were randomly chosen along with the number of respondents from each:
UPSI: (119) - Counseling - Accounting - Economics UPM: (115) - Human Ecology - English as a Second Language (TESL) - Veterinary Medicine - Landscape Design UM: (120) - Academy Islam - Law - Arts and Sciences - Medicine UTM: (49) - Engineering - Computer Sciences/IT
2. Youth Organization Members The total sample of youth organization members for the current study prior to data cleaning was 505. Based on the above methodology, organizations were randomly selected based on the master list of the Malaysian Youth Council. All of the
organizations on the master list were first broken down by organization type, followed by a random sampling of two organizations for each type such as religious organizations, membership organizations, recreational organizations, leadership organizations, etc. Sampling of the selected organizations was conducted at the site with the most members, or the site with the most availability for sampling based on recommendations by the organization staff. The breakdown of the sample was as follows:
Angkatan Belia Islam Malaysia (ABIM) - (43) FELDA Youth (Perak) - (27) Persatuan Belia India Muslim Malaysia (GEPIMA) - (38) International Youth Center (IYC) - X-Games volunteers - (82) Malaysian Association Youth Club (MAYC) (Melaka) - (38) Pertubuhan Kebajikan Islam Malaysia (PERKIM) - (31) Persatuan Kebangsaan Pelajar Islam Malaysia (PKPIM) - (37) Persatuan Puteri Islam Malaysia (PPIM) - (131) Persekutuan Pengakap Malaysia (Scouts) - (42) Gabungan Pelajar Melayu Semenanjunj (GPMS) - (36)
3. Serenti Youth The Serenti youth sample of 359 was taken from four Serenti centers in Malaysia that were selected based on the four randomly selected states discussed previously. The four centers selected for sampling were: Sungai Besi, near Kuala Lumpur; Muar, Johor Bahru; Perlis; and the Serenti Wanita (women‟s center) in Kelantan. The breakdown for the Serenti sample was as follows:
Serenti Perlis: (96)
Serenti Sungai Besi: (86) Serenti Muar: (97) Serenti Wanita (Kelantan): (80)
4. Political Party Youth Members The political party sample total for the field test of the MRPI was 189. Political party youth, as a cluster, represented the lowest total number of respondents due to the fact that only three organizations were represented in the sample and many difficulties were presented in securing opportunities for sampling the members. The sample was comprised of members from UMNO, PAS and Keadilan from the four randomly selected states mentioned above – Johor, Kelantan, Kuala Lumpur, and Perlis. The breakdown of the sample was as follows:
PAS: (110) - Kelantan - Perlis - Johor - KL UMNO: (51) - Kelantan - Johor - KL Keadilan: (28) - Johor - Kelantan
5. Youth At-Large The youth at-large sample was distinctly different than the other samples. The total sample size was 250. The youth at-large sample, unlike the others, was a purposive sample. The sampling targeted unaffiliated youth from shopping centers in Kuala Lumpur utilizing a team of enumerators hired to recruit respondents from seven
different malls and shopping centers. The enumerators were instructed to identify young people that were not affiliated with any particular youth organization or in school. Respondents were paid RM5 for filling out the survey. The breakdown of the youth at-large sample according to name of shopping center/mall was as follows:
Central Market: (40) Kota Raya: (35) SoGo: (41) MidValley: (15 – enumerators had difficulty gaining access to the mall due to security) KLCC: (40) Pertama Complex: (40) Sunway Pyramid: (39)
6. Youth Factory Workers The youth factory worker sample originally targeted line worker staff in factories from the four selected states. The total sample size for this cluster was 249. Based on the sampling methodology stated above, the factories were randomly selected from a list based on type of factory, i.e. textiles, foodstuffs, electronics, etc. Due to difficulties in obtaining samples from the originally selected factories, the Research Team was forced to change its sampling strategy midway through the data collection process. Factory workers and political party youth were the two most challenging samples from which to obtain respondents. Factory workers work on strict time schedules and are afforded little time for breaks. The Research Team was therefore only able to sample respondents at the following sites by sponsoring lunch for the respondents and sampling them during their lunch breaks. Even using this approach, several of the factories turned down our request for sampling. In addition, it was
evident during the sampling that the sample itself was not only comprised of line workers, but also middle management personnel as well. As a result, the following factories were sampled in the noted locations: Johor – (67) Perlis – (47) Kelantan - (90) Kuala Lumpur (Rawang) - (45)
Compute Values of the Group Statistics of Interest
The key statistics included above (step 2) were computed for each of the six groups using the SPSS (version 11) statistical software package. This included the mean, median, standard deviation and standard error of the mean. The z-scores, for determination of norms were computed using means and standard deviations. The zscore values were computed according to the formula identified above.
Identify Types of Normative Scores Needed
In acquiring overall scores for the Islamic religiosity dimensions for each of the groups of youth, norms for the two main dimensions – Islamic Worldview and Religious Personality - were developed according to five demographic variables: age group, cluster, sex, educational level, and residence (rural/urban). Additional norms for the Islamic Worldview and Religious Personality sub-dimensions were only conducted for the cluster group and age group variables, as these were the two primary demographic variables of most interest to the researcher. To complete the above tasks, four norming tables were developed: two for the main religiosity dimensions and an additional two for the religiosity sub-dimensions. From these
tables, normed mean scores were computed for each of the above demographic variables.
Figure 3.4: Flowchart for Conducting the Norming Study Identify the population of interest: Malaysian Muslim youth ages 16-35
Identify the most critical statistics that will be computed for the sample data: Mean, standard deviation
Decide on the tolerable amount of sampling error: N/A
Devise a procedure for drawing a sample from the population of interest: Multistage, stratified and cluster sampling
Estimate the minimum sample size: 384 - 400
Draw the sample and collect the data: According to six clusters or sub-populations of the target population
Compute the values of the group statistics of interest and their standard errors: Compute z-scores based on mean, standard deviation – development of norming tables
Identify the types of normative scores that will be needed and prepare the normative conversion tables: Norming tables created for Islamic Worldview and Religious Personality variables and their subdimensions. Norm scores developed and compared according to five demographic variables: age group, cluster, sex, educational level, and place of residence (rural/urban)
Normed Score Results for Five Demographic Variables
Following the creation of the norming tables, normed scores were created according to five demographic sub-groups: age group, cluster group, sex, level of educational attainment and place of residence. Descriptive analysis on the normed scores was carried out for each. As the main goal of a norming study is the development of norming tables (McDaniel, 1994), the identification of normed scores for the sample population is not a necessary step in a norming study. Nevertheless, in the current study, to begin to understand differences in religiosity among the youth sampled, norm scores were identified for the different sub-groups within the norming sample and subsequently compared using comparison-of-means tests.
The results of norm development for the different demographic sub-groups can provide benchmarks based on the MRPI normed scores. In the following diagram, an example is provided of what the results of normed score development might look like for different sub-groups of youth:
Figure 3.5: Example of Normed Score Benchmarks for Islamic Worldview Dimension
6.0 Islamic Scholars (5.5) 5.0 4.0 Political Party Youth (4.8) IPTA Youth (3.9) Factory Worker Youth (3.4) Serenti Youth (2.9) Factory Worker Youth (2.1)
From the diagram, an example is provided of how different sub-groups can provide normed score benchmarks that can be used for ranking or comparison of MRPI respondents. By developing group normed scores, comparisons can begin to be made between different groups of respondents, to understand strengths and weaknesses in religiosity. For example, based on the benchmarks above, if a future group of MRPI respondents comprised of Islamic scholars took the MRPI and indicated a group normed score of 5.5 on the Islamic Worldview dimension, this would act as a benchmark for that sub-group. A score of 5.5 is high according to the MRPI scale. Thus, if an individual were to take the MRPI and score 5.5, he or she would see that their score is not only high, but is near to that of the Islamic scholars sub-group. Benchmarking thus allows MRPI respondents to identify their score against an established range of their peers and other test-takers. In this way benchmarks can provide a meaningful context for understanding different groups of respondents‟ efficacy in the different MRPI religiosity constructs.
Following the creation of normed scores, to uncover significant differences between each demographic sub-group, hypotheses were developed and comparison of means tests were conducted, either ANOVA or T-test. The null hypotheses used to conduct the comparison of means tests were written as follows: 1) There are no differences in the mean scores for Islamic Worldview and Religious Personality among the six cluster groups of youth 2) There are no differences in the mean scores for Islamic Worldview and Religious Personality among the three age groups of youth 3) There are no differences in the mean scores of Islamic Worldview and Religious Personality for males and females
4) There are no differences in the mean scores for Islamic Worldview and Religious Personality among the seven levels of formal educational attainment 5) There are no differences in the mean scores of Islamic Worldview and Religious Personality for rural and urban youth
Following hypothesis testing, post-hoc tests (Bonferroni) for significant variables with more than three groups were conducted. Bonferroni test was selected because it is moderately powerful, it is formulated to handle variance in the sample sizes and it is designed for comparison between all possible groups (Howell, 2002). A value of p < .05 was considered statistically significant. No comparisons or analysis was conducted between or across different variables.
Summary of Chapter
This chapter reported on the methods and procedures undertaken in the current study, beginning with an explanation of the research design employed. The study was described as basic research in the psychology-sociology of religion that utilized an exploratory-descriptive approach to achieve the study objectives. The study included four main research tasks: the development of a concept; the creation of a measurement instrument based on the concept (the MRPI); the testing of the measurement instrument; and the development of norms. To complete these tasks, an extensive review of the literature, also known as a library search, was conducted followed by multiple quantitative research methods including two pilot tests to establish reliability and validity of the MRPI. Following instrument development, a major field test of the instrument was conducted. The final section of the chapter
detailed the steps taken in conducting the field test, and developing the norming tables and normed scores from the field test data, which resulted in MRPI religiosity benchmarks.