全国国际商务英语考试(二级) 口语考官培训 2011.07.19
目 录 口语测试相关概念 1 考核方式与内容 2 评分标准开发 3 评分标准解读 4 考试流程视频介绍 5 考试模拟测试视频 6
口语测试相关概念 Rater ( also judge, marker, scorer, assessors) The judge or observer who operates a rating scale in the measurement of oral and written proficiency. The reliability of raters depends in part on the quality of their training, the purpose of which is to ensure a high degree of comparability, both inter- and intra-rater. Since raters are human and are therefore subject to individual biases, close attention is paid not only to reliability, but also to analyses of rater bias. (Davies 1999) A person who assigns a score or rating to a test taker’s oral or written performance on the basis of a set of rating criteria. (Richards 2005)
口语测试相关概念 Rater training The preparation of raters for their task of judging performances. The training often takes the form of a workshop in which raters are introduced to the test format and test tasks and the rating criteria, the exemplar performances at each defined level of performance are presented and discussed. Raters are then asked to evaluate a series of performances and to compare their ratings, discussing the grounds for any differences between them. (Davies 1999)
口语测试相关概念 Rater qualities Language proficiency Sensitive to language Empathetic Fair Flexible
口语测试相关概念 Interlocutor In language testing the term is used to refer to the interviewer or facilitator of communication in an oral interview. The interlocutor may also be the assessor. (Davies 1999) A neutral term referring to any person with whom someone is speaking. In language testing, the term is sometimes used to refer to a teacher or other trained person who acts during a test as the person with whom the student or candidate interacts in order to complete a speaking task. (Richards 2005)
口语测试相关概念 Interlocutor effect (对话人效应, 对话人影响) The effect of interlocutor behavior in an oral interview test. Variability in interlocutor behavior is a potential source of measurement error.
口语测试相关概念 Interlocutor effect Below are examples of different aspects of interlocutor input which may have an effect either on the language behavior of the candidate or on the way in which this behavior is assessed by raters. The complexity/formality of the language produced by the interlocutor The number of turns initiated or allocated by him/her The degree of empathy established with the candidate The extent to which the interlocutor adheres to the script/guidelines provided.
口语测试相关概念 Marker Reliability (Weir 2005:34) (评分人信度) To what extent the markers are in overall agreement; ranking a group of students in the same order; rating individuals at the same level of severity; consistent in their own judgments during the whole marking process.
口语测试相关概念 Marker Reliability Markers need to be consistent in two ways: each marker needs to be consistent within himself (intra-rater reliability), i.e., given a particular quality of performance, he needs to award the same mark whenever this quality appears. there needs to be consistency of marking between markers (inter-rater reliability), i.e., one marker will award the same mark as another when confronted with a performance of the same quality.
口语测试相关概念 Intra-rater reliability (or internal consistency)(评分人内部信度) means raters agree with themselves, over a period of a few days, about the ratings that they give. (Luoma 2004) The degree to which an examiner or judge making subjective ratings of ability gives the same evaluation of that ability when he or she makes an evaluation on two or more different occasions. (Richards 2005)
口语测试相关概念 Inter-rater reliability (评分人间信度) The degree to which different examiners or judges making different subjective ratings of ability agree in their evaluations of that ability. (Richards 2005) The level of consensus between two or more independent raters in their judgments of candidates’ performance. (Davies 1999)
口语测试相关概念 Intra-rater reliability The extent to which a particular rater is consistent in using a proficiency scale. The consistency of a particular rater can be affected by such factors as The nature of the scale (inadequately defined scales may cause rater uncertainty which leads to random assignment of scores); The order in which candidates/test papers are assessed (raters may be normative in their behavior and each new performance may therefore be assessed in relation to the previous one rather than according to the standards defined by the scale); The ‘stability’ of the rater (raters’ interpretation of the scale may change either because they are tired or distracted after successive marking of a number of test performances or because considerable time has elapsed since their training session or since the previous test administration). (Davies 1999)
口语测试相关概念 Halo effect(光环效应,成见效应) The distorting influence of early impressions on subsequent judgments of a subject’s attributes or performance, or the tendency of a rater to let an overall judgment of the person influence judgments on more specific attributes. (Davies 1999)
口语测试相关概念 Holistic scoring(综合性评分) (Also global assessment全局性评价, global scoring全局性评分, impressionistic assessment印象性评价, holistic assessment整体性评价) A method of scoring where a single score is assigned to writing or speaking samples on the basis of an overall impressionistic assessment of the test taker’s performance on a writing or speaking task as a whole. (Richards et al. 2005)
口语测试相关概念 Holistic scoring(综合性评分) A type of marking procedures which is common in communicative language testing whereby raters judge a stretch of discourse impressionistically according to its overall properties rather than providing separate scores for particular features of the language produced. (Davies 1999)
口语测试相关概念 Analytic Scoring (分项评分,分析性评分) A method of scoring that separates and weights different features of the test taker’s performance on a writing or speaking task and assigns separate scores to each feature. (Richards et al. 2005) A method of subjective scoring often used in the assessment of speaking and writing skills, where a separate score is awarded for each of a number of features of a task, as opposed to one global score.
口语测试相关概念 Analytic Scoring Advantages: Raters are required to focus on each of the nominated aspects of performance individually, thus ensuring that they are all addressing the same features of the performance. It allows for more exact diagnostic reporting of literacy or oracy development especially where skills may be developing at different rates. It leads to greater reliability as each candidate is awarded a number of scores.
口语测试相关概念 Analytic Scoring Problems: The focus on specified aspects of the performance may divert raters’ attention from its overall effect. The possibility of a halo effect distorting the score due to the number of judgments required. Time consuming (Davies 1999)
口语测试相关概念 Accuracy Pronunciation must be clearly intelligible even if some influences from L1 remain. Grammatical/lexical accuracy is high though grammatical errors which do not impede communication are acceptable.
口语测试相关概念 Fluency and Coherence This refers to the ability to talk with normal levels of continuity, rate and effort and to convey information and to express or justify opinions in coherent, connected speech.
口语测试相关概念 Approppriacy The use of language must be generally appropriate to function and to context. The intention of the speaker must be clear and unambiguous.
口语测试相关概念 Interactivity This refers to the ability to take an active part in the development of the discourse, showing sensitivity to turn taking and without undue hesitation.
口语测试相关概念 Range A wide range of language must be available to the candidate. Any specific items which cause difficulties can be smoothly substituted or avoided.
口语测试相关概念 Size Must be capable of making lengthy and complex contributions where appropriate. Should be able to expand and develop ideas with minimal help from the interlocutor.
口语测试相关概念 口语考官分工 Assessor A: Interlocutor Rater: holistic scoring Assessor B: Rater: analytic scoring
考核方式与内容 第 一 部 分 1 考官与考生 热身对话 考官提出问题 提供个人信息 2 50分 二 6 考生口头 陈述 部分 时间 (分钟) 形式 为考生提供的信息 考生需提供的信息 题目数量 分数 第 一 部 分 1 考官与考生 热身对话 考官提出问题 提供个人信息 2 50分 二 6 考生口头 陈述 信息卡(图片或文字) 就商务交往和国际贸易实务等内容深入阐述个人观点 三 5 考生间交谈 就商务交往和国际贸易实务等内容与他人展开交流
评分标准开发-理论依据 社会语言学 言语行为理论 交际语言能力理论 系统功能语言学 衔接理论 语域理论
评分标准开发-实践依据 全国国际商务英语(二级)口试 CET PETS OPI IELTS BULATS BEC
评分标准开发-实践依据 分项 评分项目 测试类别 测试名称 描述角度 准确性 (accuracy) 国内测试 大学英语 四、六级口试 语音、语调及语法、词汇 PETS(五级) 国外测试 OPI lexical control; structural control IELTS lexical resource ; grammatical range and accuracy; pronunciation BEC grammar and vocabulary; pronunciation BULATS how accurately they use the language (grammar and vocabulary)
评分标准开发-实践依据 分项 评分项目 测试类别 测试名称 描述角度 连贯性 (coherence) 国内测试 大学英语 四、六级口试 连贯性;语言范围;话语的长短 PETS(五级) 话语运用 国外测试 OPI delivery IELTS fluency and coherence BEC coherence BULATS how well they develop the conversation and organize their ideas
评分标准开发-实践依据 分项 评分项目 测试类别 测试名称 描述角度 得体性 (appropria-teness) 国内测试 大学英语 四、六级口试 场合适应能力 国外测试 OPI sociolinguistic competence BEC relevance BULATS how appropriately they use the language (grammar and vocabulary)
评分标准开发-实践依据 分项 评分项目 测试类别 测试名称 描述角度 互动性 (interactive-ness) 国内测试 PETS(五级) 互动交际 国外测试 OPI interactive comprehension BEC interactive communication BULATS how positively they contributed to the conversation
评分标准开发-实践依据 总体 评分项目 测试类别 测试名称 描述角度 总体评分 (global scoring ) 国内测试 PETS (五级) 交流; 语言表达; 词汇; 发音 国外测试 OPI global tasks and functions
评分标准解读 口语考试成绩 (总分50分) 分项评分(满分 25分) 总体评分 (满分 25分) 口试总分为50分,成绩30分及以上为合格
评分标准解读 准确性解读: 词汇 :基础词汇 一般性商务词汇——discount, sales, business relationship 专业性商务词汇——FOB, insurance, packing, shipment 语法 : 词性、词序、句式、时态 语音 : 单词的发音、重读、韵律和语调
准确性 分项评分 5分 4分 3分 2分 1分 语法正确,但在使用复杂结构时会出现些小错 用词恰当、丰富 语音语调标准清晰 语法基本正确,但在使用复杂结构时会出现些小错 用词比较恰当、比较丰富 语音语调基本准确清晰 3分 简单语法结构运用得当,复杂结构错误较多 用词有限,在完成复杂任务时用词不当 语音语调部分准确,清晰度一般 2分 简单语法结构运用不当 经常用词不当 语音语调错误较多,清晰度较差 1分 词汇、语法匮乏,仅能表达只言片语 语音语调错误很多,不清晰 准确性
评分标准解读 连贯性解读: 考生能连续地输出意义一致的话语单位;整段话语各个不同命题可以被纳入一个意义框架,即整段话语围绕一个主题展开。
连贯性 分项评分 5分 4分 3分 2分 1分 表达流畅连贯,能运用丰富的衔接手段 表达基本流畅连贯,能运用比较丰富的衔接手段 表达基本连贯,时有停顿,能运用基本的衔接手段 2分 表达不连贯,仅能运用简单的衔接手段 1分 表达不连贯,不会使用衔接手段 连贯性
评分标准解读 得体性解读: 语式 语场 语旨 语域
得体性 分项评分 5分 4分 3分 2分 1分 具备商务语境意识,回答基本符合商务语境表达习惯 不具备商务语境意识,回答不符合商务惯例 回答得体,符合商务语境表达习惯 4分 回答比较得体,符合商务语境表达习惯 3分 具备商务语境意识,回答基本符合商务语境表达习惯 2分 具备商务语境意识,回答不符合商务惯例 1分 不具备商务语境意识,回答不符合商务惯例 得体性
评分标准解读 互动性解读: 表达 倾听 反馈 互动性
互动性 分项评分 5分 4分 3分 2分 1分 时常需要帮助和提示以便和对方交流,不足以完成交际任务 能够主动、恰当地应答并导入话题,有效地完成交际任务 4分 能够应答并导入话题,较好地完成交际任务 3分 能够应答并导入话题,基本完成交际任务 2分 时常需要帮助和提示以便和对方交流,不足以完成交际任务 1分 无法和对方交流,不能完成交际任务 互动性
商务知识 分项评分 5分 4分 3分 2分 1分 商务知识丰富,运用准确自如 商务知识比较丰富,运用比较准确 具备基本商务知识,运用基本准确 商务知识不足,运用不够准确 1分 商务知识匮乏 商务知识
总体评分 21-25分 能熟练运用语言、商务知识有效地完成交际任务 词汇丰富、语法结构较为复杂 语音准确,允许带有口音但不影响理解 表达连贯流畅 语言的使用符合商务语境表达习惯
总体评分 能运用语言、商务知识有效地完成交际任务 词汇比较丰富、语法结构有些错误但不影响理解 语音基本准确 表达比较连贯流畅,偶有停顿 16-20分 能运用语言、商务知识有效地完成交际任务 词汇比较丰富、语法结构有些错误但不影响理解 语音基本准确 表达比较连贯流畅,偶有停顿 语言的使用比较符合商务语境表达习惯
总体评分 11-15分 能运用语言、商务知识基本完成交际任务 词汇不丰富、语法结构简单 语音不够准确,有时会影响理解 表达基本连贯,有时会出现较长时间停顿 语言的使用基本符合商务语境表达习惯
总体评分 6-10分 所用语言、商务知识不足以完成交际任务 词汇、语法错误较多 语音不够准确,有时会影响交际 表达因缺乏词汇、语法结构导致交际中断 语言的使用不符合商务语境表达习惯
总体评分 1-5分 所用语言、商务知识无法完成交际任务 词汇、语法严重匮乏,错误较多 语音不准,时常导致交际中断 无法表达连贯的话语 语言的使用不符合商务语境表达习惯
口试评分卡
口试评分卡
口试成绩报告单
The End Thank You !