Monday, 9 December 2013

Interpreting Test Scores & Item Analysis

      Last week's lecture....
   
   Personally, I like this chapter/topic, probably because I loveeee calculations. Haha~ =D Anyway, questions regarding calculations have been eliminated from final exam. =( Like what Dr. Lee has said, if you get it right, you’ll score and if there are some calculation errors, you’ll get the whole questions wrong.


     Basically, in this lecture, we are taught the most basic statistical analysis, to find out the performance of candidates on the test and how good are the test items. I found this lecture interesting, as I can analyze performance on the test and identify weaknesses or problems of the test (if any). If we were to carry out this analysis, we have to take not that our sample size should not be too small. It should have at least 30 students.

Interpreting Test Scores:
     There are two ways of interpreting test scores, namely (i) measures of central tendency (mean, mode, median) and (ii) measures of dispersion (range, standard deviation). Actually, I have learned (i) and (ii) in Form 4 for Additional Mathematics. Luckily, I still can recall back some of the formulas to calculate mean, mode, median, range and standard deviation. =))
      From my understanding, mode is the score with the highest frequency (the score that appears the most), mean is the average score whereas median is the middle score. On the other hand, range is the difference between highest and lowest scores whereas standard deviation (s.d.) is a measure of the dispersion of a set of data from its mean. The more spread apart the data, the higher the deviation.

Item Analysis:
        This part is interesting, as we learned how to evaluate test items. We learned about these two important things: (i) item difficulty and (ii) item discrimination. The following is the summary of what I've learnt.
     
      The index of difficulty/facility value (FV) shows us how easy or difficult is the test item. It can be calculated using the formula FV=R/N or FV=(Correct U+Correct L)/2n, where R= no. of correct answers, N= total no. of candidates, U= upper half, L= lower half, and n= no. of candidates in a group. Usually, items with FV between 0.30 and 0.7 are accepted. If the FV of the item is low, it means that the item is difficult and vice versa.

      The index of discrimination (D) shows us whether or not the test items discriminate the more able students from the less able one. The test item is considered good if the good students tend to do well on an item and the poor students badly on the same item. It can be calculated using the formula D=(Correct U-Correct L)/n. The item is regarded as good if its D value between 0.4-0.6 (function effectively). Test item with D value +1 discriminates perfectly whereas test item with D value 0 doesn’t discriminate at all. If an item has D value less than 0 (negative value), it means that the item discriminates in completely wrong way. In addition, if the key discriminates negatively or the distractors discriminates positively, the item should be eliminated.

      In a nutshell, it’s indeed important to know how to analyze test items and categorized them based on their difficulty and discrimination index. Items or distractors which are not appropriate are eliminated and replaced. Items which are good are stored in the “item bank”. This will save a lot of time for the teacher as they can reuse the objective questions later. 

     The following is the tutorial task that I've done. Calculations, calculations,... analyze, interpret... 


** Corrections for 3(b) **

Item X: 
D = (10-8)/2
    = 0.4 
IT DISCRIMINATES FAIRLY EFFECTIVELY.

Item Y: 
D = (3-8)/15
    = -0.3333
IT DISCRIMINATES NEGATIVELY, IN ENTIRELY WRONG WAY. 

Solutions to Ques 2(b) s.d.


** Addition after tutorial discussion **

4(c) 
Item X: 
FV=0.6 (fairly easy, between 0.4-0.6) and D=0.26667 (discriminates positively).
Overall, it functions effectively
SO, THE ITEM SHOULD NOT BE ELIMINATED.

Item Y: 
FV=0.17857 (<0.2-the item is very difficult) and D=0.21495 (discriminates positively).
CAN KEEP ITEM Y. BUT, IT WILL BE BETTER IF IT IS REVISED.

Item Z:
FV=0.46667 (fairly difficult) and D=-0.4 (discriminates negatively, in entirely wrong way)
ITEM Z SHOULD BE ELIMINATED.

4(d)
Item X: 
Distractors A and D are performing well whereas distractor B maybe not working.
NO DISTRACTOR SHOULD BE ELIMINATED/MODIFIED. 

Item Y:
Distractor B functions well but distractors C and D attract the better candidates.
MODIFY DISTRACTORS C and D.

Item Z:
Distractors A, C and D attract the wrong candidates also. More upper level candidates selected the             distractors as their answer for the question. 
MAYBE THERE'S SOMETHING WRONG WITH THE STEM (QUESTION). SO, ELIMINATE THE WHOLE ITEM.



Friday, 29 November 2013

Which Do You Prefer? Analytic Marking??? Holistic Marking???



In my opinion, it’s quite difficult to choose whether to use holistic or analytic marking in evaluating students’ work. This is because they are two different things and are used for different purposes.

For holistic marking, the marking is evaluated for its overall quality whereas analytic marking is done on separate criteria such as grammar, content, voice and etc. Holistic marking assigns a single score to represent a weighing of the whole work whereas analytic marking assigns different score to different factor

Although teacher can evaluate students’ writing faster using holistic marking, it does not help in diagnosing students’ strengths and weaknesses, which means that it does not help much in students’ further stages of learning. Meanwhile, analytic marking helps to diagnose students’ strengths and weaknesses. Hence, teacher can know more about students’ performances and students will receive more information about their writing.

For holistic marking, the teachers have to be extensively trained to use the scale accurately (Brown, 2010). So, if I were to choose between those two marking methods, I would prefer analytic marking since I’m a novice teacher and not being able to mark papers as a whole without some guidance. When I get more experience, I will switch to holistic marking, as it will save a lot of time when marking piles of papers, especially during final examination, where teachers are given limited time to mark the papers. 


Saturday, 12 October 2013

The Cognitive Domain of Bloom's Taxonomy

Tutorial Task: Design 6 Questions on 
the Cognitive Domain of Bloom's Taxonomy
      The picture above displays the 6 questions that are designed by my group members. Unfortunately... T_T We didn't set a specific topic. There is no text given also, as a reference to answer the questions. The questions that we designed are more to open-ended questions. Meaning that the question that we set for the level "Comprehension" can also be used in "Application", for example. Although we did state the verb for each level, the answers for each questions are too broad. 
Why? What? How? 
If students are asked to answer the questions "Why..? What...? and How...?", there will be various answers. Then, how are we going to evaluate the answers? Which one is right? Which one is wrong? Which answers deserve more marks? These will be hard!!! 
     
      So, what I learned is that, if I were to design questions based on the 6 cognitive domains of Bloom's Taxonomy, I should set a specific topic or provide a text if I want to ask open-ended questions. 

     



The picture on the left illustrates the 6 questions designed by other group. Although the questions are also open-ended questions (mostly WH- questions), they are examples of good questions for different levels of cognitive domain. This is because they provide a text so that students can refer to it while answering their questions. 











The picture on the right also shows that the questions designed are based on a particular topic, which is "Pollution". 









The examples shown above are questions on the cognitive domain of Bloom's Taxonomy, designed based on a particular topic, picture or text only. Actually, the questions designed must not necessary based on ONE topic only. Let's have a look at the picture below. 







There are pictures, instructions, topic and text which can guide the students in answering the questions. 

Thursday, 3 October 2013

Tests... @@

There have always been tests in schools throughout the curriculum. Why are there so many tests??? Wow… if started to count from kindergarten, to primary then secondary schools, I had been in schools for more or less 15 years. How many tests have I taken for that 15 years??? Innumerable!!! Even in uni, we have to sit for quizzes, mid-terms and final exams. So, why do teachers test students? Hmm… Actually, teachers use various kinds of tests to find out how well students are learning and if their instruction has been successful or not, place students at different levels, report the performance of schools and etc.

I just got to know that there are six types of tests, namely (i) progress test, (ii) achievement test, (iii) diagnostic test, (iv) placement test, (v) proficiency test, and (vi) aptitude test. Progress test and achievement tests are similar, where both the tests are used to determine whether students have acquired the appropriate skills and knowledge. However, achievement test is usually given by the end of a given period of instruction.

The following table illustrates types of tests and purposes, and when the tests are administered.


** Types of Tests and Purposes **


Sunday, 29 September 2013

Key Ideas in Validity & Reliability for Teachers

Assessment and Measurement in Teaching: Professor Patty LeBlanc
Part 1: http://www.youtube.com/watch?v=IF-oeuidRuU
Part 2: http://www.youtube.com/watch?v=C3Zc8g9BwKg 

          After watching the videos, I have better understanding about validity and reliability. The following is the summary and important points that I got from the videos. 

          According to Dr. Patty LeBlanc, the two key questions in assessment are validity and reliability.
(i) Validity: Does this test measure what it supposes to measure?
(ii) Reliability: Does this test consistently measure what it supposes to measure?

      Validity is a more important concept for classroom-based test and in education measurement (standardized test). It concerns whether or not a test measure what it claims to measure. "Does this test measure what was taught and learn?" There are 3 basic ways to determine validity on a test, which are (i) content validity (Does the test measure what was taught?), (ii) construct validity (Does the test measure the characteristics/quality/construct that is designed to measure?) and (iii) criterion/predictive validity

          On the other hand, reliability deals with consistency measurement. E.g. It involves giving same tests over and over again to different individuals OR a test given multiple times to the same individual --> then take measurement --> average --> determine consistency of measurement. 

How to determine the reliability of a test?
Take multiple measures of the test to determine consistency and mathematically express consistency is the number between 0-1. The higher the number (score), the greater the reliability/consistency.

Factors that influence reliability
(i) The number of subjects (people) that are tested. 
The higher the number, the more accurate the reliability score would be. 
(ii) The number of items on a test. One essay is not enough to measure everything that are covered in a course. Generally, 30 items are recommended for assessment of knowledge or skills. 

 * Relationship between Validity and Reliability * 
If a test is VALID, it will be RELIABLE!
A test may be RELIABLE but can still NOT be VALID.

The picture below can clearly illustrates the relationship. 
           (A)                                            (B)            
Accurate = VALID; Precise = RELIABLE

Picture A: Precise/Consistent, Not Accurate (Reliable, Not Valid)
Picture B: Accurate, Precise (Valid, Reliable)

NOTE: The darts are consistent for both pictures. Meaning that
if the darts are CONSISTENT, they can be ACCURATE or NOT ACCURATE. 
However, if the darts are ACCURATE, they will be CONSISTENT. 

Therefore, if a test is VALID, it will be RELIABLE.
If a test is RELIABLE, it can be VALID or NOT VALID.