Monday, 9 December 2013

Interpreting Test Scores & Item Analysis

      Last week's lecture....
   
   Personally, I like this chapter/topic, probably because I loveeee calculations. Haha~ =D Anyway, questions regarding calculations have been eliminated from final exam. =( Like what Dr. Lee has said, if you get it right, you’ll score and if there are some calculation errors, you’ll get the whole questions wrong.


     Basically, in this lecture, we are taught the most basic statistical analysis, to find out the performance of candidates on the test and how good are the test items. I found this lecture interesting, as I can analyze performance on the test and identify weaknesses or problems of the test (if any). If we were to carry out this analysis, we have to take not that our sample size should not be too small. It should have at least 30 students.

Interpreting Test Scores:
     There are two ways of interpreting test scores, namely (i) measures of central tendency (mean, mode, median) and (ii) measures of dispersion (range, standard deviation). Actually, I have learned (i) and (ii) in Form 4 for Additional Mathematics. Luckily, I still can recall back some of the formulas to calculate mean, mode, median, range and standard deviation. =))
      From my understanding, mode is the score with the highest frequency (the score that appears the most), mean is the average score whereas median is the middle score. On the other hand, range is the difference between highest and lowest scores whereas standard deviation (s.d.) is a measure of the dispersion of a set of data from its mean. The more spread apart the data, the higher the deviation.

Item Analysis:
        This part is interesting, as we learned how to evaluate test items. We learned about these two important things: (i) item difficulty and (ii) item discrimination. The following is the summary of what I've learnt.
     
      The index of difficulty/facility value (FV) shows us how easy or difficult is the test item. It can be calculated using the formula FV=R/N or FV=(Correct U+Correct L)/2n, where R= no. of correct answers, N= total no. of candidates, U= upper half, L= lower half, and n= no. of candidates in a group. Usually, items with FV between 0.30 and 0.7 are accepted. If the FV of the item is low, it means that the item is difficult and vice versa.

      The index of discrimination (D) shows us whether or not the test items discriminate the more able students from the less able one. The test item is considered good if the good students tend to do well on an item and the poor students badly on the same item. It can be calculated using the formula D=(Correct U-Correct L)/n. The item is regarded as good if its D value between 0.4-0.6 (function effectively). Test item with D value +1 discriminates perfectly whereas test item with D value 0 doesn’t discriminate at all. If an item has D value less than 0 (negative value), it means that the item discriminates in completely wrong way. In addition, if the key discriminates negatively or the distractors discriminates positively, the item should be eliminated.

      In a nutshell, it’s indeed important to know how to analyze test items and categorized them based on their difficulty and discrimination index. Items or distractors which are not appropriate are eliminated and replaced. Items which are good are stored in the “item bank”. This will save a lot of time for the teacher as they can reuse the objective questions later. 

     The following is the tutorial task that I've done. Calculations, calculations,... analyze, interpret... 


** Corrections for 3(b) **

Item X: 
D = (10-8)/2
    = 0.4 
IT DISCRIMINATES FAIRLY EFFECTIVELY.

Item Y: 
D = (3-8)/15
    = -0.3333
IT DISCRIMINATES NEGATIVELY, IN ENTIRELY WRONG WAY. 

Solutions to Ques 2(b) s.d.


** Addition after tutorial discussion **

4(c) 
Item X: 
FV=0.6 (fairly easy, between 0.4-0.6) and D=0.26667 (discriminates positively).
Overall, it functions effectively
SO, THE ITEM SHOULD NOT BE ELIMINATED.

Item Y: 
FV=0.17857 (<0.2-the item is very difficult) and D=0.21495 (discriminates positively).
CAN KEEP ITEM Y. BUT, IT WILL BE BETTER IF IT IS REVISED.

Item Z:
FV=0.46667 (fairly difficult) and D=-0.4 (discriminates negatively, in entirely wrong way)
ITEM Z SHOULD BE ELIMINATED.

4(d)
Item X: 
Distractors A and D are performing well whereas distractor B maybe not working.
NO DISTRACTOR SHOULD BE ELIMINATED/MODIFIED. 

Item Y:
Distractor B functions well but distractors C and D attract the better candidates.
MODIFY DISTRACTORS C and D.

Item Z:
Distractors A, C and D attract the wrong candidates also. More upper level candidates selected the             distractors as their answer for the question. 
MAYBE THERE'S SOMETHING WRONG WITH THE STEM (QUESTION). SO, ELIMINATE THE WHOLE ITEM.



Friday, 29 November 2013

Which Do You Prefer? Analytic Marking??? Holistic Marking???



In my opinion, it’s quite difficult to choose whether to use holistic or analytic marking in evaluating students’ work. This is because they are two different things and are used for different purposes.

For holistic marking, the marking is evaluated for its overall quality whereas analytic marking is done on separate criteria such as grammar, content, voice and etc. Holistic marking assigns a single score to represent a weighing of the whole work whereas analytic marking assigns different score to different factor

Although teacher can evaluate students’ writing faster using holistic marking, it does not help in diagnosing students’ strengths and weaknesses, which means that it does not help much in students’ further stages of learning. Meanwhile, analytic marking helps to diagnose students’ strengths and weaknesses. Hence, teacher can know more about students’ performances and students will receive more information about their writing.

For holistic marking, the teachers have to be extensively trained to use the scale accurately (Brown, 2010). So, if I were to choose between those two marking methods, I would prefer analytic marking since I’m a novice teacher and not being able to mark papers as a whole without some guidance. When I get more experience, I will switch to holistic marking, as it will save a lot of time when marking piles of papers, especially during final examination, where teachers are given limited time to mark the papers. 


Saturday, 12 October 2013

The Cognitive Domain of Bloom's Taxonomy

Tutorial Task: Design 6 Questions on 
the Cognitive Domain of Bloom's Taxonomy
      The picture above displays the 6 questions that are designed by my group members. Unfortunately... T_T We didn't set a specific topic. There is no text given also, as a reference to answer the questions. The questions that we designed are more to open-ended questions. Meaning that the question that we set for the level "Comprehension" can also be used in "Application", for example. Although we did state the verb for each level, the answers for each questions are too broad. 
Why? What? How? 
If students are asked to answer the questions "Why..? What...? and How...?", there will be various answers. Then, how are we going to evaluate the answers? Which one is right? Which one is wrong? Which answers deserve more marks? These will be hard!!! 
     
      So, what I learned is that, if I were to design questions based on the 6 cognitive domains of Bloom's Taxonomy, I should set a specific topic or provide a text if I want to ask open-ended questions. 

     



The picture on the left illustrates the 6 questions designed by other group. Although the questions are also open-ended questions (mostly WH- questions), they are examples of good questions for different levels of cognitive domain. This is because they provide a text so that students can refer to it while answering their questions. 











The picture on the right also shows that the questions designed are based on a particular topic, which is "Pollution". 









The examples shown above are questions on the cognitive domain of Bloom's Taxonomy, designed based on a particular topic, picture or text only. Actually, the questions designed must not necessary based on ONE topic only. Let's have a look at the picture below. 







There are pictures, instructions, topic and text which can guide the students in answering the questions. 

Thursday, 3 October 2013

Tests... @@

There have always been tests in schools throughout the curriculum. Why are there so many tests??? Wow… if started to count from kindergarten, to primary then secondary schools, I had been in schools for more or less 15 years. How many tests have I taken for that 15 years??? Innumerable!!! Even in uni, we have to sit for quizzes, mid-terms and final exams. So, why do teachers test students? Hmm… Actually, teachers use various kinds of tests to find out how well students are learning and if their instruction has been successful or not, place students at different levels, report the performance of schools and etc.

I just got to know that there are six types of tests, namely (i) progress test, (ii) achievement test, (iii) diagnostic test, (iv) placement test, (v) proficiency test, and (vi) aptitude test. Progress test and achievement tests are similar, where both the tests are used to determine whether students have acquired the appropriate skills and knowledge. However, achievement test is usually given by the end of a given period of instruction.

The following table illustrates types of tests and purposes, and when the tests are administered.


** Types of Tests and Purposes **


Sunday, 29 September 2013

Key Ideas in Validity & Reliability for Teachers

Assessment and Measurement in Teaching: Professor Patty LeBlanc
Part 1: http://www.youtube.com/watch?v=IF-oeuidRuU
Part 2: http://www.youtube.com/watch?v=C3Zc8g9BwKg 

          After watching the videos, I have better understanding about validity and reliability. The following is the summary and important points that I got from the videos. 

          According to Dr. Patty LeBlanc, the two key questions in assessment are validity and reliability.
(i) Validity: Does this test measure what it supposes to measure?
(ii) Reliability: Does this test consistently measure what it supposes to measure?

      Validity is a more important concept for classroom-based test and in education measurement (standardized test). It concerns whether or not a test measure what it claims to measure. "Does this test measure what was taught and learn?" There are 3 basic ways to determine validity on a test, which are (i) content validity (Does the test measure what was taught?), (ii) construct validity (Does the test measure the characteristics/quality/construct that is designed to measure?) and (iii) criterion/predictive validity

          On the other hand, reliability deals with consistency measurement. E.g. It involves giving same tests over and over again to different individuals OR a test given multiple times to the same individual --> then take measurement --> average --> determine consistency of measurement. 

How to determine the reliability of a test?
Take multiple measures of the test to determine consistency and mathematically express consistency is the number between 0-1. The higher the number (score), the greater the reliability/consistency.

Factors that influence reliability
(i) The number of subjects (people) that are tested. 
The higher the number, the more accurate the reliability score would be. 
(ii) The number of items on a test. One essay is not enough to measure everything that are covered in a course. Generally, 30 items are recommended for assessment of knowledge or skills. 

 * Relationship between Validity and Reliability * 
If a test is VALID, it will be RELIABLE!
A test may be RELIABLE but can still NOT be VALID.

The picture below can clearly illustrates the relationship. 
           (A)                                            (B)            
Accurate = VALID; Precise = RELIABLE

Picture A: Precise/Consistent, Not Accurate (Reliable, Not Valid)
Picture B: Accurate, Precise (Valid, Reliable)

NOTE: The darts are consistent for both pictures. Meaning that
if the darts are CONSISTENT, they can be ACCURATE or NOT ACCURATE. 
However, if the darts are ACCURATE, they will be CONSISTENT. 

Therefore, if a test is VALID, it will be RELIABLE.
If a test is RELIABLE, it can be VALID or NOT VALID.


Friday, 27 September 2013

Different Learners, Different Abilities, Different Learning Styles --> SAME TEST


"For a fair selection, everybody has to take the same exam: please climb that tree."
          Is it still considered fair if you ask a fish or an elephant to climb a tree (which is impossible for them) and include monkey as one of the participants? The result is obvious, only the monkey is able to climb the tree. So, what about the rest of the animals? Failed the test??? What if you ask the animals to swim? The result will be different! Every animals have their own abilities. The ability to climb a tree does not reflect their actual ability. 

"Everybody is a genius, but if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid."
-Albert Einstein


    Same to human beings also. Different students have different abilities and learning styles. So, in my opinion, it's indeed unfair to judge everyone's abilities on a particular exam. However, due to our educational system, every students must sit for a standardized exam. Actually,  if a student perform badly in a test, it doesn't mean that he or she is "not good". Meanwhile, the one performing well in the same test is not always good in everything. This is because some people can do well in exam conditions whereas others can present their knowledge better orally. This is why I think it's not fair to assess or measure students' performance using a particular test only.

Monday, 16 September 2013

Thought on Youtube Video Watched (What was Significant to Me in the Assessment Used)


READ
REMEMBER
REGURGITATE
     While watching the video, I came across the three terms above and of course, we are too familiar with these terms. This was what we as 90's kids did during our primary and secondary educations. We sat for innumerable quizzes, exams and tests. We read, remembered and regurgitated the facts and throw them out during tests. Sometimes, we didn't even have the time to digest the new information. After tests, we forgot everything. So seriously, there is nothing much that we can learn from memorising factual information. We couldn't apply them in real world situation! What teachers expect from students is not memorising skills but other skills or abilities such as communication and leadership skills and also ability to collaborate with each other. 

     As what I have seen from the video, the implementation of the alternative assessment (performance-based assessment) in education involves students to develop performances, where they will create products and the teacher will access and evaluate their performances based on certain scoring guide. In my point of view, this is indeed a good approach for teachers to test students' abilities and knowledge. Through projects, teacher can access students' understanding on particular subjects and their capability in applying certain concepts. Moreover, performance assessment emphasises in depth learning and less focuses on drills. This is what we need in 21st century education. Traditionally, teachers could not test leadership skills, problem solving skills and etc in students through black and white paper. However, these vital these skills could be assessed through performance-based practice in classroom, where students will do their project independently, with teacher as facilitator and perform whatever they know. 
          
     One of the part of video that caught my attention is when Professor Linda Darling-Hammond from Stanford University School of Education mentions "the time is teaching and learning". Actually, by conducting performance-based assessment, it will consume a lot of time and energy. Nevertheless, while conducting the assessment in real life, students are actually learning and teachers can give immediate feedback on what to do to meet students' needs. 


Saturday, 14 September 2013

Feedback on Anderson's Article: Three Things I liked about the Article


          Firstly, I liked the article by Anderson because she reminds me that there is a need to shift from traditional assessment to alternative assessment practices in education nowadays. I agree with her point that traditional lecturing causes students to build castle in the air or not paying attention in the classroom by doing something else. On the other hand, the shift from traditional to alternative assessment is crucial so as to facilitate active language learning. This is because traditionally, evaluation of student learning is based on objective questions, which inhibits active learning and high thinking skills of students. Significantly, in this high-paced world and modern education, obviously, objective questions are not adequate and relevant to measure students' performance.  

          Other than that, I liked this article as it highlights the differences between traditional and alternative assessment. The clear and detailed explanations by Anderson make the philosophical beliefs and theoretical assumptions of the assessments more comprehensible and interesting. Besides that, the comparison of both the assessments is summarised in a diagram, which is much simpler and comprehensive. 


        Another thing that I found interesting about this article is the issues related to rubrics, under the effects of alternative assessment paradigm (constructivist). Anderson has mentioned that in constructivist classroom, students are required to establish rubrics and develop criteria. However, it would be time consuming. Therefore, I think that it would be challenging for teachers to decide whether they want their students to gain knowledge or to learn how to learn.