Monday, 9 December 2013

Interpreting Test Scores & Item Analysis

      Last week's lecture....
   
   Personally, I like this chapter/topic, probably because I loveeee calculations. Haha~ =D Anyway, questions regarding calculations have been eliminated from final exam. =( Like what Dr. Lee has said, if you get it right, you’ll score and if there are some calculation errors, you’ll get the whole questions wrong.


     Basically, in this lecture, we are taught the most basic statistical analysis, to find out the performance of candidates on the test and how good are the test items. I found this lecture interesting, as I can analyze performance on the test and identify weaknesses or problems of the test (if any). If we were to carry out this analysis, we have to take not that our sample size should not be too small. It should have at least 30 students.

Interpreting Test Scores:
     There are two ways of interpreting test scores, namely (i) measures of central tendency (mean, mode, median) and (ii) measures of dispersion (range, standard deviation). Actually, I have learned (i) and (ii) in Form 4 for Additional Mathematics. Luckily, I still can recall back some of the formulas to calculate mean, mode, median, range and standard deviation. =))
      From my understanding, mode is the score with the highest frequency (the score that appears the most), mean is the average score whereas median is the middle score. On the other hand, range is the difference between highest and lowest scores whereas standard deviation (s.d.) is a measure of the dispersion of a set of data from its mean. The more spread apart the data, the higher the deviation.

Item Analysis:
        This part is interesting, as we learned how to evaluate test items. We learned about these two important things: (i) item difficulty and (ii) item discrimination. The following is the summary of what I've learnt.
     
      The index of difficulty/facility value (FV) shows us how easy or difficult is the test item. It can be calculated using the formula FV=R/N or FV=(Correct U+Correct L)/2n, where R= no. of correct answers, N= total no. of candidates, U= upper half, L= lower half, and n= no. of candidates in a group. Usually, items with FV between 0.30 and 0.7 are accepted. If the FV of the item is low, it means that the item is difficult and vice versa.

      The index of discrimination (D) shows us whether or not the test items discriminate the more able students from the less able one. The test item is considered good if the good students tend to do well on an item and the poor students badly on the same item. It can be calculated using the formula D=(Correct U-Correct L)/n. The item is regarded as good if its D value between 0.4-0.6 (function effectively). Test item with D value +1 discriminates perfectly whereas test item with D value 0 doesn’t discriminate at all. If an item has D value less than 0 (negative value), it means that the item discriminates in completely wrong way. In addition, if the key discriminates negatively or the distractors discriminates positively, the item should be eliminated.

      In a nutshell, it’s indeed important to know how to analyze test items and categorized them based on their difficulty and discrimination index. Items or distractors which are not appropriate are eliminated and replaced. Items which are good are stored in the “item bank”. This will save a lot of time for the teacher as they can reuse the objective questions later. 

     The following is the tutorial task that I've done. Calculations, calculations,... analyze, interpret... 


** Corrections for 3(b) **

Item X: 
D = (10-8)/2
    = 0.4 
IT DISCRIMINATES FAIRLY EFFECTIVELY.

Item Y: 
D = (3-8)/15
    = -0.3333
IT DISCRIMINATES NEGATIVELY, IN ENTIRELY WRONG WAY. 

Solutions to Ques 2(b) s.d.


** Addition after tutorial discussion **

4(c) 
Item X: 
FV=0.6 (fairly easy, between 0.4-0.6) and D=0.26667 (discriminates positively).
Overall, it functions effectively
SO, THE ITEM SHOULD NOT BE ELIMINATED.

Item Y: 
FV=0.17857 (<0.2-the item is very difficult) and D=0.21495 (discriminates positively).
CAN KEEP ITEM Y. BUT, IT WILL BE BETTER IF IT IS REVISED.

Item Z:
FV=0.46667 (fairly difficult) and D=-0.4 (discriminates negatively, in entirely wrong way)
ITEM Z SHOULD BE ELIMINATED.

4(d)
Item X: 
Distractors A and D are performing well whereas distractor B maybe not working.
NO DISTRACTOR SHOULD BE ELIMINATED/MODIFIED. 

Item Y:
Distractor B functions well but distractors C and D attract the better candidates.
MODIFY DISTRACTORS C and D.

Item Z:
Distractors A, C and D attract the wrong candidates also. More upper level candidates selected the             distractors as their answer for the question. 
MAYBE THERE'S SOMETHING WRONG WITH THE STEM (QUESTION). SO, ELIMINATE THE WHOLE ITEM.