Comparison of Rasch Model Logits and Likert Mean Score in Testing the Normality Assumption

The lack of empirical evidence about Rasch model logits as a potential score in normality testing boost the motivations on how this type of data can replace the Likert mean score. The research also would like to see how these logits will be variant by fixing the diversity of zones selected for Adversity Quotient (AQ) measurement. The AQ was tested using IKBAR (Instrumen Kecerdasan Menghadapi Cabaran) with CORE model (Control, Ownership, Reach and Endurance) on 1845 polytechnic students from five different zones in Malaysia using a proportionate stratified multistage cluster sampling. The findings revealed that the skewness and kurtosis pattern for all zones showed that Rasch model logits are greater than the Likert mean score. The gaps pattern showed that Borneo dominated the largest gap for Control and AQ for both skewness and kurtosis. In terms of skewness, North region dominated the Endurance gap and the West dominated the Reach gaps. The East dominated the kurtosis gaps for Reach and Endurance. The Rasch model logits for all zones generally had a probability of failing to meet the normality acceptance range. A paired-samples t-test revealed that there was a significant difference for both scores. There was a significant average difference between both score. Rasch model logits were higher than Likert mean score. Based on the results, it is clear that the patterns of the Rasch model logits applicability exposed a great potential and alternative to be used by the researchers as normality data testing besides the Likert mean score. Further studies are proposed to be carried out to explore the potential of using logits for different settings and to ensure applicability to the other normality testing for educational research.


Introduction
Normality assumption should be taken into account when using parametric statistical tests (Ghasemi and Zahediasl, 2012). The requirement in most statistical techniques such as t tests, correlation, regression, and ANOVA needs the data to be normal and that is vital (Chua Yan Piaw, 2006;Drezner et al., 2009;Hair et al., 2013;Mohd Rafi Yaacob, 2013). Wrong interpretation of normality testing will affect the final results of the findings. If the normality assumption was interjected, the interpretation of data is likely to be invalid or unreliable (Nornadiah and Wah, 2011). (Lay and Khoo, 2008) emphasized this by stating the importance of examining the distribution of the data before any decision can be made in choosing the appropriate statistical test for data analysis or else the interpretations may not be reliable or valid (Park, 2008). One of the new instruments developed was Instrumen Kecerdasan Menghadapi Cabaran (IKBAR) using the Rasch model. It is used to measure Adversity Quotient (AQ) in Malaysia using samples of polytechnic students (Mohd and Ahmad, 2015), (Mohd and Ahmad, 2016). This instrument used logits in conducting the normality analysis. Based on the assumptions and measurement capabilities of Classical Test Theory (CTT) on rating scale, several limitations have been identified (Smith et al., 2002) Rating scale such as the Likert scale or ordinal data is often mistaken as an interval data and misused in parametric statistical procedures (Bode and Wright, 1999). In this study, IKBAR uses a four-point Likert scale (strongly agree, agree, disagree, and strongly disagree). Nowadays, most of the researchers prefer to use Likert mean score even though they realized that the data were ordinal. In Rasch, logits are known as interval data and they have good potential in fulfilling the requirement of normality analysis. Hence, this research would like to investigate the pattern of normality of Rasch logits and Likert mean score applied in different settings (polytechnic zones). This paper also investigates how far the applicability of Rasch logits in normality testing and also about the gaps of the Rasch model logits and Likert mean score in terms of the values produced. It has been conducted researches of Rasch based, but very lack information regarding the integration of normality and rasch (Mohd et al., 2018b;Mohd et al., 2018d;Mohd et al., 2018a;Mohd et al., 2018c;Muhamad et al., 2018).
The other issue was the misunderstanding in normality testing. The distribution of larger samples tends to spread better and form a bell-shape curve when the characteristics of the population are plotted based on the frequencies. Confusion occurs when there is a sample from a normal population that does not show normal distribution; instead, only a few samples showed a seemingly normal distribution, especially with small sample size.
The confusion might be due to graph techniques or eyeball test and formal normality tests that may show inconsistent results for the same data. Some tests can only be applied to certain situations or assumptions. Furthermore, differences in normality tests often produce different results, for example when some tests rejected and other tests fail to reject the null hypothesis of a hypothesis (Nornadiah and Wah, 2011). These different findings have been an issue and have been misleading researchers and normality test selection must be given priority (Nornadiah and Wah, 2011). To overcome this problem, skewness and kurtosis test methods are proposed where it is relatively accurate for both small and large (Kim, 2013). For this research, the skewness and kurtosis were applied for large samples.

Rasch Model and Logits
The Rasch Model (RM) helps explains the response on observable traits to estimate unobservable traits of a particular construct. RM involves modelling two important parameters in testing, namely, (1) item difficulty (observable trait), and (2) respondents' ability (unobservable trait). The Equation (1) for dichotomous and Equation (2) for polytomous relationship between the parameters [20] (1) Where the probability of getting a correct answer for Item i, P i , person ability, B n and item difficulty, D i . This probability is equal to the constant e, or natural log function (2.7183) and then divided by one and the same value. Equation (2) shows the that was the difficulty of the first threshold and this difficulty calibration is estimated only once for this threshold across the entire set of items in the rating scale. For Logits, the construction of a logit scale is in the same way that we construct an Amp scale. We deduced a theory that produces equal interval, linear measures and derive a method for applying that theory. In the case of qualitative ordered observations (right/wrong, present/absent, none/some/all), the necessary and sufficient theory is the Rasch model, and the method of application is numerous administrations of similar agents (test items) to relevant objects (persons). The theory is mentioned in Equation (3).
(3) This is a "linear" model because all elements can be represented as fixed positions along one straight line. In games of chance, the (Probability of Success) / (Probability of Failure) is called the "odds of success". "Log e [(Probability of Success) / (Probability of Failure)]" is called log-odds. The units of measurement constructed by this theory are called "log-odds units" or "logits". (Wright, 1993) This logits help to turn the type of data from ordinal to the interval for this normality testing.

Methodology
This study used a quantitative approach and survey research design with a set of questionnaire from the Instrumen Kecerdasan Menghadapi Cabaran (IKBAR) by Mohd and Ahmad (2015) . The questionnaire contains 66 items of four-point Likert scale; representing four domains of Adversity Quotient (AQ) namely CORE or Control, Ownership, Reach and Endurance. The sample of this study consists of 1,892 respondents from 18,828 polytechnic students based on the clustered multistage stratified proportional sampling technique The respondents consists 994 males (53.9%) and 851 females (46.1%) which involving five polytechnics in Malaysia according to the five zones namely Western zones (456, 24.7%); Northern zones (393, 21.3%); Southern zones (375, 20.3%); Eastern zones (363, 19.7%) and Borneo zones (258, 14%). Data were analysed using SPSS for Windows Version 23. Then the data was collected and analysed based on polythomous Rasch model data by using computer applications of WINSTEPS 3.71.0.1. The return rate of the questionnaire was 97.52% (1,845 students) and considered as acceptable (Loewenthal, 2001); (Christensen et al., 2011). The normality test of Skewness and Kurtosis was chosen because of the different layers of acceptance range of normality and the suitability for large sample sizes.

Results
Objective 1: To investigate the normality pattern of skewness and kurtosis by using CORE model based on different zones from Rasch model logits and Likert mean score perspectives. Table 1 revealed that the skewness and kurtosis pattern for all zones showed that Rasch model logits are greater than Likert mean score. It was apparently shown in Figure 1 and 2 that it was occurred across the constructs and zones. The sequential mean range between logits for skewness are 1.179 (North), 1.159 (West), 1.144 (Borneo), 0.875 (East) and 0.864 (South). The sequential mean range between logits for kurtosis are 2.947 (Borneo), 2.174 (West), 2.154 (North), 1.585 (East) and 1.412 (South). Table 2 emphasized that the range for skewness was between 0.864 to 1.179 and for kurtosis was 1.412 to 2.947.    This objective emphasized on examining the gaps between both scores (logits and Likert score) according to zones. Table 3 and Figure 3 showed that Borneo dominated the largest gaps for Control and AQ for both skewness and kurtosis. In terms of skewness, the North region dominated the Endurance gap and the West dominated the Reach gap. For kurtosis in Figure 4 showed that East dominated the gaps for Reach and Endurance.   Objective 3: To identify the capability of Rasch model logits and Likert mean score to meet the normality acceptance range by using CORE model based on different zones. Figure 5 to 9 also revealed that the Rasch logits tend to deviate from the normality range compared to Likert mean score. If the data referred to ± 1.0 (Leech et al., 2015), ± 2.0 (Garson, 2012); (Lomax and Hahs-Vaughn, 2012) or ± 3.0 (Peat and Barton, 2005), some of the constructs and testing also failed to meet the normality acceptance range. So, it was proven that Rasch logits for all zones mostly failed fulfil the requirement of skewness and kurtosis analysis.     Objective 4: To examine the significant differences between Rasch model logits and Likert mean score. The reason of examining the significant differences between Rasch model logits and Likert mean score is to check whether both data were different for all zones and testing (skewness and kurtosis). Paired sample t test was chosen because both of the data were related each other in contributing to normality. The criteria for running the test had been fulfilled such as random sampling, normality and type of data. On average, Rasch model logits were 1.549 points higher than Likert mean score (95% CI [-1.898, -1.201]) as shown in Table 4. Table 5 had proven that Rasch model logits and Likert mean score were moderately and positively correlated (r = 0.518, p < 0.001). A pairedsamples t-test was conducted to examine the significant differences between Rasch model logits and Likert mean score. Table 6 revealed that there was a significant difference in the scores for Likert mean score (M=-.0396, SD=.285) and Rasch model logits (M=1.510, SD=1.351) conditions; t (49) = -8.927, p = .000. There was a significant average difference between Rasch logits and Likert mean score scores for (t 49 = -8.927, p < 0.001).

Conclusions
This study has proven that the pattern is quite fluctuated by both skewness and kurtosis. The potential of logits was tested because of their strength as interval data and how logits can replace Likert mean score for normality testing. This pattern proved that the gaps between both values of skewness and kurtosis and CORE produced the significant differences. Researchers can consider using logit for their normality tests, but the values deviate from normal compared to the Likert mean score. The research implication is directly to the researchers in terms of how difficult the logits in fulfilling the requirement of normality acceptance range even the data was in interval. Some future studies suggested to be carried out by seeing how this Rasch model logits will react to the different type of research setting and also the other type of normality testing.