Testing Writing in the EFL Classroom: STUDENT EXPECTATIONS

Nahla Nola Bacha
LEBANON
Testing Writing in the EFL Classroom: STUDENT EXPECTATIONS
NO EFL PROGRAM CAN DENY OR IGNORE THE SIGNIFICANCE OF TESTING FOR evaluating learners’ acquisition of the target language. An important area of con- cern in testing is how students view their own achievements. Often students’ expectations of test results differ from actual results. Students’ grade expectations are often higher, which may negatively affect student motivation. This situation calls for raising students’ awareness of their abilities.
The focus of this article is testing writing in the EFL classroom. Specifically, it describes a study comparing students’ expectations of grades with their actual grades earned for essays assigned in Freshman English classes at the Lebanese American University. The results confirm a divergence between expected and actual grades, as has been reported in other research. The article concludes with implications for classroom teaching and testing.
14 APRIL 2002 ENGLISH TEACHING FORUM

Background
Experience has shown teachers, researchers, and school administrators that, just like lan- guage itself, testing practices in ELT are not static but dynamic and changing. One contro- versial area is testing writing, which requires that test construction and evaluation criteria be based on course objectives and teaching methodologies. In the English language class- room, especially at the high school and uni- versity levels, teachers are always challenged by how to reliably and validly evaluate students’ writing skills, so that the students will be bet- ter prepared for internal and external profi- ciency and achievement exams. Indeed, writ- ing in the academic community is paramount; a student can’t be successful without a certain level of academic writing proficiency.
Another question that many ELT programs are addressing is how do students perceive the process used to evaluate their work? Do they know how they are being tested and what is acceptable by the standards of the institution and their teachers? These are questions this study seeks to answer, but first, it is necessary to differentiate between assessment and eval- uation of writing and to present the main issues involved.
Assessing and evaluating writing
There are many reasons for testing writing in the English language classroom, including to meet diagnostic, proficiency, and promo- tional needs. Each purpose requires different test construction (Bachman 1990, 1991; Pierce 1991). Recent approaches to academic writing instruction have necessitated testing procedures that deal with both the process and the product of writing (Cohen 1994; Connor- Linton 1995; Upshur and Turner 1995). It is generally accepted by teachers and researchers that there are two main goals of testing: first, to provide feedback during the process of acquiring writing proficiency (also referred to as responding or assessing), and second, to assign a grade or score that will indicate the level of the written product (also referred to as evaluating).
The present study focuses on evaluating student essays, that is, assigning scores in order to indicate proficiency level. Evaluation of writing in ELT has a long history, with various procedures and scoring criteria being revised and adapted to meet the needs of administra-
tors, teachers, and learners (see Oller and Perkins 1980; Siegel 1990; Silva 1990; Dou- glas 1995; Shohamy 1995; Tchudi 1997; Bacha 2001). For testing writing, reliability and validity, as well as choice of topics and rater training, are important and must be addressed whatever the purpose of the testing situation may be (Jacobs et al. 1981; Kroll 1990; Hamp-Lyons 1991; Airasian 1994; Kunnan 1998; Elbow 1999; Bacha 2001).
Reliability
Reliability is the degree to which the scores assigned to students’ work accurately and con- sistently indicate their levels of performance or proficiency. Correlation coefficients of .80 and above between readers’ scores (inter-rater reli- ability) as well as between the scores assigned by the same reader (intra-rater reliability) to the same task are considered acceptable for decision making (Bachman 1990). There is research that indicates that the gender, back- gound, and training of the reader can affect the reliability of scores (Brown 1991; Cush- ing-Weigle 1994). Thus, to maintain reliabili- ty many programs put heavy emphasis on the training of raters and as a result have obtained high positive correlations (Jacobs et al. 1981; Hamp-Lyons 1991).
Validity
Validity is the degree to which a test or assignment actually measures what it is intended to measure. There are five important aspects of validity (Hamp-Lyons 1991; Jacobs et al. 1981):
1.Face validity Does the test appear to measure what it purports to measure?
2. Content validity Does the test require writers to perform tasks similar to what they are normally required to do in the classroom? Does it sample these tasks rep- resentatively?
3. Concurrent validity Does the test require the same skill or sub-skills that other simi- lar tests require?
4. Construct validity Do the test results provide significant information about a learner’s ability to communicate effectively in English?
5. Predictive validity Does the test predict learners’ performance at some future time? To what extent should we teachers com-
ENGLISH TEACHING FORUM APRIL 2002
15
municate these reliability and validity con- cerns to our students? Teachers’ awareness of the issues of reliability and validity is crucial, but perhaps equally important is how accu- rately students perceive their own abilities and the extent to which they understand what is considered acceptable EFL writing at the university level.
Perceptions of achievement
Research in how students perceive their language abilities compared with faculty per- ceptions and actual performance indicates that there is a problem that needs to be addressed (Kroll 1990). In a survey carried out by Pen- nington (1997) with students graduating from university in the United Kingdom, results indicated that 42 of the 48 students rated their writing ability as very good or quite good. In contrast, the teachers did not indicate such confidence. Another study indicated that first- year university students, who were L1 speakers of Arabic, rated their EFL writing skills in gen- eral as good, while faculty rated their skills as only fair (Bacha 1993). There were similar findings in another study comparing student and faculty grade expectations with actual test scores (Douglas 1995). In a needs analysis proj- ect carried out at Kuwait University, Basturk- men (1998:5) reported that “over 60% of fac- ulty members perceived students to have inadequate writing skills.” She also found that students’ English language proficiency did not meet professors’ expectations and students were not aware of the level of proficiency that was expected of them (Basturkmen 1998:5). Basturkmen concludes that one curricular objective should be to “raise students’ aware- ness of the levels of proficiency which the fac- ulty find acceptable” (1998:5).
If EFL students studying at the university level are deficient in academic language skills, a critical question is, to what extent are the students aware of their deficiencies? From the studies cited above, it appears they are not very aware of their deficiencies or, at best, seem to be more confident of their abilities— and thus hold higher grade expectations— than is warranted by their teachers’ percep- tions or by their actual test scores. This study will examine the problem in the Lebanese university context.
Survey on student grade expectations
Participants and procedure
During the Fall 2000 semester at the Lebanese American University, 150 students in the Freshman English 1 course in the EFL Program (the first of four required courses) were surveyed on their grade expectations. These courses stress essay writing and reading comprehension skills, focusing on sentences, paragraphs, and short essays. The students who completed the survey were L1 Arabic speakers who had studied English during their preuni- versity schooling and were pursuing different majors in the Schools of Arts and Sciences, Business, Engineering and Architecture, and Pharmacy. They had English entrance scores equivalent to TOEFL scores of 525 to 574, and were enrolled in Freshman English 1 sec- tions with between 25 and 30 students each.
Specifically, the survey was given in order to find out if there were any differences between students’ grade expectations and the actual grades they earned. The survey was given two weeks before the end of the semester with the belief that students would have a better idea of their abilities later in the semester than they would at the beginning of the semester. They were requested to indicate the grade range they expected on two end-of-course essays. The five grade ranges were: below 60%, fail- ing; 60–69%, fair; 70–79%, satisfactory; 80–89%, good; and 90–100%, excellent.
Essay 1 (E1) was given toward the end of the semester in the Freshman English 1 course. It is usually in the comparison or contrast rhetor- ical mode with a choice of different topics and completed in two fifty-minute class periods. During the first class period, students write a first draft. The teacher makes comments for improvement on the first draft, which is then rewritten during the second period. Essay 1 constitutes 20% of the final course grade.
Essay 2 (E 2) was given at the end of the semester as part of the final exam for the course, which also included a reading compre- hension and vocabulary component. The reading and vocabulary component of the final exam is similar in content for all Fresh- man English 1 sections, but students have a choice of three or four topics in the essay sec- tion with each topic requiring a different rhetorical mode. Essay 2 also constitutes 20% of the final course grade.
16
APRIL 2002 ENGLISH TEACHING FORUM
Table 2
Percentage of Students Selecting Each Grade Range for Essays 1 and 2
Expected vs. Actual Grades (figures are in percentages)
Expected E 1 Actual E 1 Expected E 2 Actual E 2
(90–100%) (80–89%) (70–79%)
2.5 37.7 50.6 0.5 4.0 41.6 5.6 46.3 44.2 0.0 6.9 36.1
(60–69%) (below 60%)
9.3 0.0 42.1 11.9 3.9 0.0 42.6 14.4
The survey asked students to indicate their grade expectations for these two end-of-course essays. In addition, for each essay, the students were asked to indicate their grade expectations for the three major sub-skills of essay writing emphasized in the course: language (sentence structure, grammar, vocabulary, coherence, mechanics), organization (format, logical order of ideas, thesis and topic sentences), and content (major and minor supporting ideas). To indicate each expected grade, students selected one of the five possible grade ranges.
Results and discussion
A statistical comparison was made on a random sample of 30 surveys using the Wilcoxon Signed Ranks Test. This statistical test indicates whether there are any differences in mean ranks of scores when normal distrib- ution is uncertain. Results of the Wilcoxon test indicated significant differences of p=<.001 on all tests, confirming that the sur- vey results showing differences between expected and actual grades are not according to chance and have a high degree of certainty.
It is not possible to pinpoint the accuracy with which individual students predicted their grades because the survey responses were tal- lied in mean averages. The results are most revealing when student expectations are exam- ined as a whole and we can see that student grade expectations differed from actual grades.
Table 1 shows that the mean actual scores of the students on the two essays are one grade level lower (10%) than their mean grade expectations.
Since the gap between mean expected and mean actual grades is large, a whole proficiency level, a question raised is whether the students are aware of the criteria for each grade level. In other words, do students understand what is expected of them in the writing skills on which they are being tested? From random interviews with students and faculty, it seems they are not and that more work needs to be done in this area in the university’s EFL program. All of our efforts to set up valid and reliable testing crite- ria seem self-defeating if the learners themselves are unaware of their potential achievement level or what is expected in their writing. These are important issues that need to be addressed in any educational program.
Table 2 compares the percentage of stu- dents who expected each of the possible grade ranges with the percentage of students who actually received those grades on Essays 1 and 2. We can see that no student expected to fail on either of the essays, but actual results show a failure rate of 11.9 percent on Essay 1 and 14.4 percent on Essay 2. The most accurate predictions were made in the grade range 70–79%. Perhaps many of the students placed their expectations in this range because it rep- resented a cautious and modest expectation.
As can be seen in Table 2, expected and actual grades differed in the 60–69% grade range, with only 9.3% and 3.9% of the stu- dents accurately predicting grades on Essays 1 and 2, respectively. In the grade range 80–89%, students showed overconfident pre- dictions of 37.7% and 46.3% on essays 1 and
Table 1
Differences in Mean Expected Grades and Mean Actual Grades
(expressed as a percentage of total possible grade)
Essay 1 (E 1) Mean Expected Grade 74% Mean Actual Grade 64%
Essay 2 (E 2)
75% 65%
ENGLISH TEACHING FORUM APRIL 2002
17
Table 3
Percentage of Students Selecting Each Grade Range for Writing Sub-skills in Essay 1
Expected vs. Actual Grades (figures are in percentages)
Expected Language Actual Language Expected Organization Actual Organization Expected Content Actual Content
(90–100%) (80–89%)
7.1 36.1
0.5 3.4 10.2 48.5 0.0 4.4 9.0 49.1 0.0 5.9
(70–79%) (60–69%)
44.1 12.7 36.9 36.5 34.9 6.6 40.9 41.9 36.4 5.6 38.9 45.3
(below 60%)
0.0 22.7 0.0 12.8 0.0 9.9
2, while only 4.0% and 6.9% actually attained these levels, respectively. Students were most overconfident in their predictions of grades between 90–100%; only 0.5% of the students actually attained this score on Essay 1, and none did so on Essay 2.
Table 3 shows expected and actual grades for the three sub-skills of writing (language, organization, and content) in Essay 1 (E 1). It indicates that the actual scores were lower than student expectations and that failure was not expected. In fact, the findings show that for E 1 there is a failure rate of 22.7%, 12.8%, and 9.9% on language, organization, and content, respectively. Again, grade expectations and actual grades were closest in the grade range 70–79%. Students had much higher expecta- tions than actually obtained for both of the upper grade ranges, 80–89% and 90–100%. Of the three sub-skills, language proved to be the weakest for students, indicating a need to focus more on this sub-skill in the classroom.
Table 4 shows expected and actual grades for the three sub-skills of writing in Essay 2 (E 2). Similar to E 1, it indicates that students’ expectations in the sub-skills for that essay were higher than their actual test scores, and that all students expected to pass. In general, student expectations in the sub-skills were higher for E 2 than for E 1. Perhaps students gained more confidence in their abilities by the end of the semester and thus expected higher grades at the completion of the course, even though their actual scores do not support this expectation. In fact, no student attained a grade level of 90–100% in any of the sub-skills in E 2, and there were more actual scores in the failing range than in the grade range 80–89%. Also similar to E 1, students’ expectations were most realistic in the grade range 70–79%.
Implications
The results obtained from this survey reveal that students and their instructors have differ-
Table 4
Percentage of Students Selecting Each Grade Range for Writing Sub-skills in Essay 2
Expected vs. Actual Grades (figures are in percentages)
Expected Language Actual Language Expected Organization Actual Organization Expected Content Actual Content
(90–100%) (80–89%)
9.5 38.0
0.0 5.9 14.8 50.1 0.0 6.9 10.1 50.4 0.0 7.9
(70–79%) (60–69%)
45.7 6.8 34.7 42.1 32.9 2.1 36.1 44.6 35.3 4.2 37.1 42.1
(below 60%)
0.0 17.3 0.0 12.4 0.0 12.9
18
APRIL 2002 ENGLISH TEACHING FORUM
ent perceptions of acceptable essay writing. This has important implications for writing evaluation in the university’s EFL program. Teachers need to help students increase their awareness and understanding of the proficien- cy levels required in writing essays.
One way teachers can do this is by showing their students sample essays, perhaps drawn from the students’ own work, that represent each of the grade levels from poor to excellent. These model essays could be photocopied for the class so that they can be read and discussed in detail. Students could take part in practice evaluation sessions by assigning grades for each sample essay, including the three sub-skills lan- guage, organization, and content, according to the criteria for essays used by the EFL pro- gram. Such practice evaluation could be done in small groups, with each group justifying the grades it assigns in short oral presentations to the rest of the class, followed by questions and discussion. Once this exercise is done, the teacher could discuss the different grade ranges and comment on the grades assigned by the groups in light of what grades the essays would likely receive in a testing situation.
A second way to raise students’ awareness of essay evaluation criteria is through individ- ual or small group conferences held periodi- cally with the teacher. In fact, although stu- dent-teacher conferences are carried out irregularly, they have been quite successful in the EFL program at the university, especially for lower proficiency level writers. Students become more involved in the evaluation process and more aware of what is expected in their essays, and thus realistically build confi- dence in their writing.
In addition to these awareness-raising activities, teachers need to revisit periodically the writing criteria being used for essay evalu- ation in light of recent research and innova- tions in teaching writing. Teachers also might need to clarify criteria for the different profi- ciency levels for the various types of writing tasks assigned throughout a semester. Essay tests in certain rhetorical modes, such as nar- ration or description, might require different evaluation criteria than those used for essays in the comparison or contrast mode. Although the essay tests included in this survey were from the end of the semester, teachers might want to consider whether they should evaluate
essays written earlier in the course according to objectives covered up to that point.
Conclusion
Testing is an inextricable part of the instructional process. If a test is to provide meaningful information on which teachers and administrators can base their decisions, then many variables and concerns must be considered. Testing writing is undeniably dif- ficult. Although we teachers try hard to help students acquire acceptable writing proficien- cy levels, are we aware that perhaps our stu- dents do not know what is expected of them and do not have a realistic concept of their own writing abilities?
This article has reported the grade expecta- tions of students and the actual grades they earned on two important end-of-semester essays. Results show that students’ expecta- tions are significantly higher than their actual proficiency levels. Developing test procedures for more valid and reliable evaluation is neces- sary and important; however, it does very little to motivate students to continue learning if their perceived levels of performance are not compatible with those of their teachers. In addition to the need to develop valid and reli- able testing procedures, we must not overlook the need to raise students’ awareness of their abilities. It is perhaps only through this under- standing that genuine learning occurs.
Note: This is a revised version of a paper presented at the 21st Annual TESOL Greece convention, held in April 2000. The author received a grant from the Center for Research and Development at the Lebanese American University to support this research.
References
Airasian, P. W. 1994. Classroom assessment (2nd ed.). New York: McGraw-Hill.
Bacha, N. N. 1993. Faculty and EFL student percep- tions of the language abilities of the students in the English courses at the Lebanese American Univer- sity, Byblos Branch. Unpublished survey results, Byblos, Lebanon.
———. 2001. Writing evaluation: What can ana- lytic versus holistic scoring tell us? System, 29, 3, pp. 371–383.
Bachman, L. F. 1990. Fundamental considerations in language testing. Oxford: Oxford University Press. ———. 1991. What does language testing have to
offer? TESOL Quarterly, 25, 4, pp. 671–672.
27
ENGLISH TEACHING FORUM APRIL 2002
19

English Proficiency Test - The Oral Component of a Primary School

English Proficiency Test - The Oral Component of a Primary School

Ishbel Hingle and Viv Linington
Many teachers feel comfortable setting pencil-and-paper tests. Years of experience marking written work have made them familiar with the level of writ- ten competence pupils need in order to succeed in a specific standard. Howev- er, teachers often feel much less secure when dealing with tests which measure speaking and listening even though these skills are regarded as essential compo- nents of a diagnostic test which measures overall linguistic proficiency. Although the second-language English pupils often come from an oral rather than a writ- ten culture, and so are likely to be more proficient in this mode of communica- tion, at least in their own language, speaking in English may be a different mat- ter. In English medium schools in particular a low level of English may impede students’ acquisition of knowledge. Therefore, identifying the correct level of English of the student is all the more challenging and important.
This article outlines some of the problem areas described by researchers when designing a test of oral production for beginning-level speakers of English and suggests ways in which they may be addressed.
30 VOLUME 43 NUMBER 1 2005 ENGLISH TEACHING FORUM

How does one set a test which does not intimidate children but encourages them to provide an accurate picture of their oral ability?
In replying to this question, one needs to consider briefly the findings of researchers working in the field of language testing. “The testing of speaking is widely regarded as the most challenging of all language tests to prepare, administer and score,” writes Harold Madsen, an international expert on testing (Madsen 1983:147). This is especially true when exam- ining beginning-level pupils who have just started to acquire English, such as those apply- ing for admission to primary school. Theo- rists suggest three reasons why this type of test is so different from more conventional types of tests.
Firstly, the nature of the speaking skill itself is difficult to define. Because of this, it is not easy to establish criteria to evaluate a speaking test. Is “fluency” more important than “accu- racy,” for example? If we agree fluency is more important, then how will we define this con- cept? Are we going to use “amount of infor- mation conveyed per minute” or “quickness of response” as our definition of fluency?
A second set of problems emerges when test- ing beginning-level speakers of English, which involves getting them to speak in the first place, and then defining the role the tester will play while the speaking is taking place. Relevant elic- itation procedures which will prompt speakers to demonstrate their optimum oral perfor- mance are unique to each group of speakers and perhaps even unique to each occasion in which they are tested. The tester will therefore need to act as a partner in the production process, while at the same time evaluating a number of things about this production.
A third set of difficulties emerges if one tries to treat an oral test like any other more conventional test. “In the latter, the test is often seen as an object with an identity and purpose of its own, and the children taking the test are often reduced to subjects whose only role is to react to the test instrument” (Madsen 1983:159). In oral tests, however, the priority is reversed. The people involved are important, not the test, and what goes on between tester and testee may have an existence indepen- dent of the test instrument and still remain a valid response.
How can one accommodate these diffi- culties and still come up with a valid test of oral production?
In answering this question, especially in relation to the primary school mentioned ear- lier, I would like to refer to the experience I and one of my colleagues, Viv Linington, had in designing such a test for the Open Learning Systems Education Trust (OLSET) to measure the success of their English-in-Action Pro- gramme with Sub B pupils. This Programme is designed to teach English to pupils in the earliest grades of primary school, using the medium of the tape recorder or radio.
In devising this test, we decided to use flu- ency as our basic criterion, i.e., “fluency” in the sense Brumfit uses it: “the maximally effec- tive operation of the language system so far acquired by the student” (Brumfit 1984: 543). To this end, we decided to record the total number of words used by each pupil on the test administration and to employ this as an overall index to rank order the testees in terms of performance.
To address the second and third set of prob- lems outlined above, we decided to use elicita- tion procedures with which the children were familiar. Figures 1 and 2 would require the teacher to find a picture full of images the pupils could relate to such as children playing. Students could participate in the following types of activities:
• an informal interview, to put the children at ease by getting them to talk about themselves, their families and their home or school lives (See Figure 1).
• a set of guided answers to questions about a poster, to test their knowledge of the real life objects and activities depict- ed on the poster as well as their ability to predict the consequences of these activi- ties (See Figure 2).
• narratives based upon packs of story cards, to generate extended language in which the children might display such features as cohesion or a knowledge of the English tense system in an uninter- rupted flow of speaking.
Instead of treating the situation as a “test,” we asked testers to treat it as a “game.” Both partners would be seated informally on the ground (with, in our case, a recorder placed
ENGLISH TEACHING FORUM  VOLUME 43 NUMBER 1 2005
31
The tester should capture personal details by asking the following type of questions:
What is your name?
Where do you live?
Do you have any brothers or sisters?
Does anyone else live at home with you?
Now tell me, what do you all do when you get up in the morning?
How do you all go to school and work?
Do you have any brothers or sisters in this school?
What standards are they in?
Which subject do you enjoy most? Why?
What do you do at break?
Tell me about your best friends.
What does your mother/grandmother cook for dinner?
Can you tell me how she cooks it? Why do you all enjoy this food most?
Do you listen to the radio/watch TV in your house?
What is your favorite programme? Why do you enjoy it most?
What do you do when you are getting ready to sleep in the evening?
What time do you go to sleep. Why?
Now look at the picture and tell me what this little boy is doing. Letʼs give him a name.
What do you suggest?
FIGURE 1
unobtrusively on the floor between them be- cause of the research nature of our test). If the occasion was unthreatening to the pupil with the tester acting in a warm friendly way, we anticipated the child would respond in a simi- lar way, and thus produce a more accurate pic- ture of his or her oral productive ability. We suggested the tester act as a Listener/Speaker only while the test was being conducted, and as Assessor once the test administration was over.
To maintain a more human approach to the testing situation, we decided to allow the tester a certain flexibility in choosing ques- tions to suit each particular child, and also in the amount of time she spent on each subtest. The time allowed for testing each pupil would be limited to 8 minutes, and all three subtests would be covered during this period, but the amount of time spent on each could vary.
Question banks were provided for testers to select questions they felt were within the range
of each child’s experience, but there was an understanding that how and why questions were more difficult to answer than other Wh- questions. A range of both types should there- fore be used.
Story packs also provided for a range of experiences and could be used by the tester telling a story herself first, thus demonstrating what was required of the pupil. However, it was anticipated that some pupils might be suf- ficiently competent to use the story packs without any prompting from the teacher. Pupils could place the cards in any order they chose, as the sole purpose of this procedure was to generate language. Story packs were composed of picture stories that had been photocopied from appropriate level books, cut up into individual pictures, and mounted on cardboard. Six pictures to a story pack were considered sufficient to prompt the anticipat- ed length of a story pupils could handle.
This test of oral production was administer- ed at both rural and urban schools to children who were on the English-in-Action Programme and those who were not. The comparative re- sults are not relevant here, but findings about which aspects of the test worked and which did not may be of assistance to those who wish to set similar tests. In summarising these find- ings, I will comment on the administration of the test, the success of each subtest in eliciting
Questions for guided response:
What are the children doing? Where are they?
How many children are there? Are there more boys than girls? How do you know this?
What is the girl in the green dress doing? What are the boys going to do when they
finish playing marbles?
Do you think the children are happy? Have you ever played marbles?
(If yes) How do you play marbles?
(If no) What other game do you play with
your friends?
How do you play it?
Now look at the picture and tell me what this little boy is doing. Letʼs give him a name.
What do you suggest?
FIGURE 2
32
VOLUME 43 NUMBER 1 2005  ENGLISH TEACHING FORUM
language, and, finally, on the criteria we used for evaluating the test outcomes.
Firstly, both testers commented that this type of test was more difficult to organise and administer than other kinds of evaluation tests they had used. This was caused by the need to find a quiet and relatively private place to ad- minister the test and record the outcome and because the procedure could be done only on a one-to-one basis. We had anticipated this type of feedback but were also not surprised when told that subsequent administrations “were much easier and the children were more enthusiastic about participating than the pre- vious time.” The testing procedure was new to both tester and testee, but once experienced, it gave children greater freedom of expression than other kinds of tests.
Secondly, while the test as a whole did elic- it oral language production, the amount and type of language varied from subtest to sub- test. The interview produced rather less lan- guage than the other two subtests; it also elicit- ed rather learned chunks of language, which we called “patterned responses.”
The guided responses, on the other hand, produced a much greater variety of answers, couched in a fairly wide range of grammatical structures. But even these responses consisted on the whole of single words or phrases. Open-ended questions evoked longer respons- es from the more able students, but seemed to confound less able students. For example, the question “What can you see in the picture?” produced the answer “I can see a car and a woman going to the shop and a boy had a bicycle and the other one riding a bicycle,” from a bright pupil, but only “Boy and bicy- cle” from a weaker pupil.
Higher order Wh- questions such as “What do you think is in the suitcase?” or “What will happen next?” seemed to produce only “I don’t know” responses from even the most competent pupils. They seemed to lack the linguistic resources, or perhaps the cognitive resources, to predict or suggest answers.
The narrative subtest, based on the story cards, elicited the best display of linguistic ability from the testees, both in terms of amount of language produced and range of grammatical structures used.
Competent pupils were able to respond well to the tell/retell aspect and constructed sentences of 7 to 10 words in length, joined by
a variety of coordinating devices. They also employed past tense forms in retelling the story such as the following:
The boys they played with the cow’s what ...... what ...... a ...... bells three bells ...... then they got some apples and went to swim ...... the monkey saw them swim and putted them shirts and shorts ...... some they said hey ...... I want my shirts ...... wait I want my shirts ...... but mon- key she run away
Less competent students could describe isolated images on each card without using narrative in any way to link them together.
From these results we therefore concluded that the story packs were the most successful of the three elicitation procedures we used in stimulating optimum language output.
The final issue from the findings of the OLSET test that are relevant here are the cri- teria used for assessing the language output. Our decision to count “number of words pro- duced” as a measure of speaking ability was a mixed blessing. Initially it did seem to rank order the pupils in terms of ability and gave us a base for comparison at subsequent test administrations, but non-verbal factors such as self confidence, familiarity with the tester, and presence of the teacher may have affected even these results. In the second administration of the test, it was not at all accurate because improvement in ability to speak and respond in English was reflected more in the quality of how the testees spoke, rather than in the quan- tity of language they produced. Several of the more competent pupils spoke the second time in round 1 but displayed knowledge/features not present in their own home languages such as prepositions and articles, used correctly sub- ordinating and coordinating conjunctions they had been introduced to only in the course of conversation, and employed a variety of tenses in their story telling. We therefore used this data to develop a number of assessment levels, or descriptive band scales, based upon these various grammatical competencies, when evaluating the pupils’ output (a band scale outlines a set of linguistic features and skills a pupil needs to display in order to be placed in that category).
In response to our discussion, some schools have begun to introduce two components in their diagnostic test. The first is a multiple-
ENGLISH TEACHING FORUM  VOLUME 43 NUMBER 1 2005
33
English Proficiency Test... | Hingle and Linington continued from page 33
choice comprehension test and the second an oral test based upon a set of story cards.
The same test will be used for pupils at all levels of the primary school, using the lead pro- vided by a test produced by the Human Sciences Research Council for the same purpose. How- ever, the expected proficiency levels to enter a particular grade or standard will be different.
In conclusion, let me summarise the advice I would give to teachers who need to design speaking tests but who are afraid to take the plunge into this area of assessment:
• Do not be afraid to set such a test in the first place.
• Draw on your own materials to set a test appropriate for your group of testees.
• Keep the factor of time constant for each test administration.
• Give the testee the opportunity to lead once he or she is at ease.
• Do not allow factors such as accent to cloud your perception of linguistic com- petence.
• Rely on your own instinctive judgment when assigning a value to performance on such a test.
• Try and think of this value in terms of words rather than marks.
References
Brumfit, C. 1984. Communicative methodology in language teaching. Cambridge: Cambridge Uni- versity Press.
Madsen, H. S. 1983. Techniques in testing. New York: Oxford University Press.
This article was originally published in the April 1997 issue. z
ENGLISH TEACHING FORUM  VOLUME 43 NUMBER 1 2005
23