實作評量：素養評量 ──教學評量專題研究報告

Slides:

Advertisements

Similar presentations

MAPLE LEAF INTERNATIONAL SCHOOL TIANJIN HUAYUAN.

Advertisements

期末考试作文讲解 % 的同学赞成住校 30% 的学生反对住校 1. 有利于培养我们良好的学习和生活习惯； 1. 学生住校不利于了解外界信息； 2 可与老师及同学充分交流有利于共同进步。 2. 和家人交流少。在寄宿制高中，大部分学生住校，但仍有一部分学生选择走读。你校就就此开展了一次问卷调查，主题为.

美国高校学生事务管理角色解析 — 基于学生教育进展途径之模式 The Role of Student Affairs in American Higher Education: A Student Progress Pipeline-Based Model 常桐善博士 2013 年 4 月 Institutional.

有效教學與多元評量：理念、策略與應用羅文基龍華科技大學

变革中的教师教育 Teacher education in transformation

胸痛中心的时间流程管理上海胸科医院方唯一.

2014 年上学期湖南长郡卫星远程学校制作 13 Getting news from the Internet.

IFY Parents Meeting 3 December 年12月3日家长会

Academic Service Learning－Course Design

統合分析臨床試驗實之文獻品質評分：以針灸療法之統合分析為例

-CHINESE TIME (中文时间): Free Response idea: 你周末做了什么？

國立台灣師範大學國際人力資源發展研究所施正屏博士

苗逢春教育部全国中小学计算机教育研究中心北京部中国教育学会中小学信息技术教育专业委员会

How can we become good leamers

二維品質模式與麻醉前訪視滿意度中文摘要麻醉前訪視，是麻醉醫護人員對病患提供麻醉相關資訊與服務，並建立良好醫病關係的第一次接觸。本研究目的是以Kano‘s 二維品質模式，設計病患滿意度問卷，探討麻醉前訪視內容與病患滿意度之關係，以期分析關鍵品質要素為何，作為提高病患對醫療滿意度之參考。本研究於台灣北部某醫學中心，通過該院人體試驗委員會審查後進行。對象為婦科排程手術住院病患，其中實驗組共107位病患，在麻醉醫師訪視之前，安排先觀看麻醉流程衛教影片；另外對照組111位病患，則未提供衛教影片。問卷於麻醉醫師

Today – Academic Presentation 学术报告

初中英语教学设计与案例分析（上）北京教育学院袁昌寰.

嬰幼兒團體照顧文化與早期自我身分的形成 During the first three years of life, infants and toddlers are developing their first sense of self as well as learning cultural rules.

雅思大作文的结构 Presented by: 总统秘书王富贵.

B型肝炎帶原之肝細胞癌患者接受肝動脈栓塞治療後血液中DNA之定量分析

標準本位評量的實施方法慈濟大學教育研究所張景媛.

分析抗焦慮劑/安眠劑之使用的影響因子在重度憂鬱症及廣泛性焦慮症病人和一般大眾的處方形態

Chapter 8 Liner Regression and Correlation 第八章直线回归和相关

Chaoping Li, Zhejiang University

Academic Year TFC EFL Data Collection Outline 学年美丽中国英语测试数据收集概述

指導教授：Chen, Ming-puu 報告者：Chen, Wan-Yi 報告日期：

Unit 4 I used to be afraid of the dark.

Thinking of Instrumentation Survivability Under Severe Accident

考试与考生 --不对等与对等邹申上海外国语大学

優質教育基金研究計劃研討會: 經驗分享 - 透過Web 2.0推動高小程度探究式專題研習的協作教學模式

教師的成長與教師專業能力理念架構教育局專業發展及培訓分部 TCF, how much you know about it?

Understanding Report Cards 读懂成绩单 Mr Alex Ward Director of Studies 教学总监

The Empirical Study on the Correlation between Equity Incentive and Enterprise Performance for Listed Companies 上市公司股权激励与企业绩效相关性的实证研究汇报人：白欣蓉学号：

Activities in 2004/5 2004/5年度活动 ETI conference- May 2004

文獻探討花蓮師院科教所李暉老師編輯 2002/10/16.

肢體殘障人士 Physically handicapped

Guide to Freshman Life Prepared by Sam Wu.

G10 PARENT MEETING COURSE SELECTION 高一选课家长会 PRESENTED BY B

Chapter 9 Intelligence.

問題導向學習教學策略(PBL) 對學生學習成效的初探

Hong Kong Library Education and Career Forum 2009

Formal Pivot to both Language and Intelligence in Science

Towards Emotional Awareness in Software Development Teams

基于课程标准的校本课程教学研究乐清中学赵海霞.

基于任务的设计：思路、评价与实施卫朝霞西安交通大学.

GRANT UNION HIGH SCHOOL

IBM SWG Overall Introduction

12年國民基本教育課程總綱核心素養概念及其課程轉化

Unit 5 Reading A Couch Potato.

虚拟仪器 virtual instrument

中央社新聞— ＜LTTC：台灣學生英語聽說提升讀寫相對下降＞

-----Reading: ZhongGuanCun

Cisco Troubleshooting and Maintaining Cisco IP Networks (TSHOOT)

《语言与文化》 Unit 3 Verbal and Non-verbal Communication

校本藝術教學單元設計計劃培訓課程教學與課程評鑑

績效考核一.績效考核： 1.意義 2.目的 3.影響績效的因素二.要考核什麼? 三.誰來負責考核? 四.運用什麼工具與方法?

高考应试作文写作训练 5. 正反观点对比.

指導教授：Chen, Ming-Pu 報告者：Hsiao, Na-Na 報告日期：

The Role of Parents in the Moral Development of the Child

Assessment: Measuring Performance and Impact

An organizational learning approach to information systems development

Adaptive Planning 适应性规划

主要内容什么是概念图？概念图的理论基础概念图的功能概念地图的种类如何构建概念图概念地图的评价标准国内外概念图研究现状

Infrastructure as Learning Environment 学习环境的基础结构

怎樣把同一評估給與在不同班級的學生 How to administer the Same assessment to students from Different classes and groups.

贵阳市教科所代钊模教师如何做课题研究贵阳市教科所代钊模

构建系统化信息技术教育课程与信息应用实践能力的培养

二项式的分解因式 Factoring binomials

作业请您用星级模式评估您自己公司的一致性状况。您的公司与它的战略执行一致吗?.

國際理事的角色講師: 年指派理事 G L T 地區領導人江達隆博士.

Presentation transcript:

實作評量：素養評量 ──教學評量專題研究報告指導教授：余民寧老師報告人：謝智如 104.10.15

報告大綱一、素養評量(literacy assessment)相關議題與應用二、期刊研究報告： Assessment literacy and student learning: the case for explicitly developing students ‘assessment literacy’

一、素養評量 (literacy assessment) 相關議題與應用

什麼是素養？(Literacy) Literacy is traditionally understood as the ability to read and write. The term's meaning has been expanded to include the ability to use language, numbers, images and other means to understand and use the dominant symbol systems of a culture. The concept of literacy is expanding in OECD countries to include skills to access knowledge through technology and ability to assess complex contexts.

什麼是素養？(Literacy) The United Nations Educational, Scientific and Cultural Organization (UNESCO) defines literacy as the "ability to identify, understand, interpret, create, communicate and compute, using printed and written materials associated with varying contexts. Literacy involves a continuum of learning in enabling individuals to achieve their goals, to develop their knowledge and potential, and to participate fully in their community and wider society"

素養和能力的不同素養（literacy）或能力（competence）所指稱的內容眾說紛紜，但素養包括具備基本學科知識，將其運用在生活以及工作的問題解決能力，除了能力的考量外，也包括運用學科知識解決問題的習慣與態度，因此，素養包括能力以及態度兩個層次，更強調整合性的知識，以及解決真實問題的習慣與態度。

各式素養(many other new literacies) 閱讀素養(reading literacy) 科學素養(scientific literacy) 數學素養(mathematical literacy) 資訊素養、媒體素養(information, media) 環境素養(environmental literacy) 健康素養(health literacy) 價值素養(value literacy) 多元文化素養(multicultural literacy)

素養評量依據素養的定義，素養評量應屬實作評量的範疇。實作評量：強調實際的表現行為（actual performance），都需要教師根據學生的表現過程之有效性或最後完成作品的成果品質，分別或合併地進行判斷（或評分），以決定學生在這方面學習的成就高低。如概念應用：應用所學的概念和知識解決日常生活所遇到的實際問題等

素養評量 IEA：國際數學與科學成就趨勢調查（Progress in International Reading Literacy Study，簡稱TIMSS） OECD：國際學生能力評量計劃( Programme for International Student Assessment，簡稱 PISA)，評量閱讀、數學和科學的素養程度 IEA：促進國際閱讀素養研究（Progress in International Reading Literacy Study，簡稱 PIRLS）

PISA PISA研究三年一次，宗旨是針對十五歲學生，生活知能的學習成效提供跨國際比較，以及各國教育效能的分析，由此界定國民素養的內涵。試題設計著重在情境應用，不設限於課程內容，學生須理解資訊靈活運用統整、評鑑、省思能力，自行建構問題情境的答案。評量焦點在於能否使用習得知識技能，面對真實世界的挑戰，而非僅是精熟學校課程。評量內容涵蓋閱讀，數學和科學三個領域的素養程度。革新的素養概念結合終身學習的理念，以成人生活所需的重要知能為主軸，包含正式與非正式的環境，諸如正規課程，課外社團，家庭環境，學校氣氛等。

PISA評量學科週期排列 2000年閱讀為主科；科學和數學為輔 2003年數學為主科；閱讀和科學為輔 2006年科學為主科；閱讀和數學為輔 2009年又回到閱讀主科，科學和數學為輔 2012年則是數學為主科，閱讀和科學為輔，另加測線上問題解決能力 (Problem Solving)

PISA評量學科週期排列

PISA 2006科學素養評量 PISA 2006 的科學素養透過四個向度（情境、能力、知識、態度）來反映科學素養（Scientific literacy）主要在於測驗學生應用科學知識的能力。將物理、化學、生物及地球科學應用到各個題目當中，以獲得新的科學知識、解釋科學現象、用證據解讀科學相關議題及科學與技術的關係。

PISA 2006科學素養評量

PISA 2006科學素養評量

PISA 2006科學素養評量

PISA 2006科學素養評量

態度的評量態度選項（attitudinal item）大部份 PISA 2006 科學試題都包含了一項新的態度選項，用來連結試題與學生對該科學議題的態度。主要有有二種形式，一是測試學生對於學習科學的興趣，另一是有關於調查學生對於此項科學的支持度（同意度）。這樣的項目會以灰色方框呈現，學生只要依照自己的意思去勾選，這些問題並沒有正確的答案，也不會計算在學生的測驗總分內。這些問題要求學生表示對特定議題的一些陳述的同意程度。對於每項陳述學生應該勾選出一個最能代表自己意見的答案。

態度的評量

態度的評量

態度的評量 PISA 2006 透過二個管道蒐集資料：一透過單獨的態度問卷（questionnaires）蒐集資料，另外也在學生回答能力/知識問題的過程中，透過問題的設計蒐集並整合學生的態度資料。這種情境式的問題使 PISA 2006 可以了解學生對特定科學活動（specific science tasks）的態度。不同於過去利用一般性的態度問卷（general questions about attitudes）得到的結果，PISA 2006 透過這種方法去了解（1）學生的態度是否會因情境的不同而有所改變？（2）學生的態度反應到底是跟哪一題的問題比較相關？還是跟哪一群的題組（groups of questions）比較相關？

PIRLS 閱讀素養評量閱讀是教育的核心，閱讀讓我們透過文字與符號掌握知識，學校裡各學科的知識幾乎都是透過閱讀來學習。理論上，小學三年級是閱讀發展的關鍵期，在此之前是「學會閱讀」（learn to read）的基本能力，包括識字、有基本的文體概念和理解，三年級後是「透過閱讀來學習新知」（read to learn）。因此，PIRLS以小學四年級學生為主要評量對象，檢視他們是否具備了閱讀基本能力，並朝向透過閱讀吸收新知的階段繼續發展。

PIRLS 閱讀素養評量「提升國際閱讀素養研究」（PIRLS）可以瞭解我國的學生閱讀的能力。與其他國家學生相較之下，我國學生的閱讀能力如何？閱讀成績有沒有進步？我國的四年級學生是否看重閱讀？是否能享受閱讀？我們的學生是否具有能促成讀寫發展的家庭？我國的學校如何規劃閱讀教學？我國教師的教學實務與其他國家相較之下如何？

PIRLS對「閱讀素養」的定義 1.學生能夠理解並運用書寫語言的能力 2.能夠從各式各樣的文章中建構出意義來 3.能夠從閱讀中學習新的事物 4.參與學校及生活中閱讀社群的活動 5.經由閱讀獲得樂趣

閱讀素養評量的內容 PIRLS研究工具：學童所需要閱讀每篇1,200個字至1,600個字之間的文章和回答的題目針對學生、家長、學校、教師以及課程設計了五種問卷，以便周全瞭解影響閱讀的環境因素。

閱讀素養評量的內容 1.測量學生理解說明文和故事體的能力；理解的評量包括對文章內容與形式的理解。形式問題例如：「有編號的框框怎樣幫助讀者了解文章的內容？寫出其中一個方法。 2.學生的閱讀行為和興趣。 3.家長為學生所提供的閱讀環境，例如：親子共讀頻率和家中圖書等。 4.學校和老師為閱讀所設計的教學和圖書館規劃等。

PIRLS之閱讀理解層次歷程內容說明策略直接歷程直接提取讀者找出文章裡清楚寫出的訊息直接提取讀者找出文章裡清楚寫出的訊息 1.找出與閱讀目標有關的訊息 2.找出特定觀點 3.搜尋字詞或句子的定義 4.指出故事的場景（例如時間、地點） 5.當文章明顯陳述出來時，找到主題句或主旨直接推論連結文章裡兩項以上的訊息 1.歸論出某事件所導致的另一事件 2.在一串的論點後，歸納出重點 3.找出代名詞與主詞的關係 4.歸納文章的主旨 5.描述人物間的關係解釋歷程詮釋與統整讀者提取先備知識，連結文章裡未清楚明顯表達的訊息 1.清楚分辨出文章整體訊息或主題 2.考慮文中人物可選擇的其他行動 3.比較及對照文章訊息 4.推測故事中的情緒或氣氛 5.詮釋文中訊息在真實世界的適用性檢驗與評估批判與思考文章中的訊息 1.評估文章所描述的事件實際發生的可能性 2.描述作者如何想出讓人出乎意料的結局 3.評斷文章中訊息的完整性 4.找出作者的觀點

PIRLS評量範例─一個不可思議的夜晚學生：閱讀文章、回答問題

PIRLS評分範例─一個不可思議的夜晚評分：依據事先擬定的表現水準描述進行給分

PIRLS評分範例─一個不可思議的夜晚評分：依據事先擬定的表現水準描述進行給分

PIRLS 2006台灣學生表現 2006年46國學生參加評比，全球平均分數 500分，台灣學生平均分數為535 分，名列22。學生在直接歷程的得分（541，通過率73%) 明顯地比在解釋歷程的表現（530，通過率 49%）好。在閱讀的深度上，我們大多數的孩子仍停留在字面或文章表面的層次。

從評比結果中分析全面關係問卷中還瞭解基本人口資料以及學生閱讀態度、家長閱讀環境、教師閱讀教學、學校閱讀教學政策、整體閱讀課程安排等閱讀條件、環境與學生閱讀成就之間的關係。依據此詳細調查，參與評比的國家便可以看到各國閱讀教育的面貌，藉此改善或調整國內的閱讀教育政策、教學與課程的改革。當一個國家連續參與PIRLS評比，累積多年的評量資料，也可瞭解因教學或政策改變，使得學生閱讀能力有所改變的趨勢。

我國：教育部提升國民素養專案辦公室為了解十二年國教培育出來的學生其素養如何？五大素養(語文、數學、科學、數位，教養與美感素養) 教師專業素養 http://literacytw.naer.edu.tw/five.php 以綠釉皮囊壺象徵素養，為陶塑歷程後帶得走的能力

二、期刊研究報告： Assessment literacy and student learning: the case for explicitly developing students ‘assessment literacy’

前言 How to enable students to feel part of their programmes ’ academic culture while encouraging them to take responsibility for their own learning. (Nicol 2009) Nicol(2009)曾指出對於大一學生而言一個重要的學術議題，是如何增進學生感受部分學術計畫的學術文化以增進他們對自我學習的責任。

To become self-regulated learners, students need to be able to judge their work, identify its merits, locate its weaknesses and determine ways to improve it. 為了成為自我調節的學習者，學生必須能夠評判自己的工作、認同自己的優勢，定位自己的弱勢並決定方法以改善它。 *此處的Judgement包括了評估自己在評量任務的回應是否適當，是否做到被要求要做的。

It also requires them to judge how good their response is in relation to the relevant academic achievement standards (Sadler 2009). 評斷自己的回應與學術成就標準的關聯。 Students ’ understanding of the purposes of assessment and the processes surrounding assessment is part of the context within which they learn to make those judgements and become effectively self-regulating. 學生理解評量的目的與歷程，都是在學習如何評價，與有效的自我調整。

Francis (2008) argues, however, that ﬁrst-year students in particular are likely to over-rate their understanding of the assessment process and that there is a disjuncture between what they think they are being assessed on and what the marking criteria and achievement standards require of them. 問題是，第一年的學生容易高估對評量我成的理解，他們認為自己被評量時的表現與真實成就標準間是有差距的。

By helping to clarify the meaning of learning goals and criteria, and through the provision of feedback, formative assessment encourages students to keep realigning their work to what is required. Nicol consequently applied a framework based on task structure, learner-regulation and an associated set of assessment principles to inform the redesign of formative assessment in two ﬁrst-year courses. Nicol(2009)幫助澄清學生對於學習目標和標準的意義，提供回饋，形成性的評量激勵學生持續調整他們的學習。

他使用了Gibbs and Simpson ’ s (2004) 11 assessment conditions和Nicol and Macfarlane- Dick ’ s (2006) 7 principles of good feedback practice. For students to have a sense of control over their own learning, formative assessment practices must help them develop the skills needed to monitor, judge and manage their learning (Nicol 2009, 338).

什麼是「評量素養」？ The literature here tends to suggest tha students ’ capacity to become successful self-regulated learners can be affected by various aspects of the assessment process. 文獻指出，透過評量的許多歷程，學生成為成功的自我調節學習者。

1.Students need to understand the purpose of assessment and how it connects with their learning trajectory. 然而，學生須了解評量的目的，評量與學習軌跡的連結。

2.They need to be aware of the processes of assessment and how they might affect students ’ capacity to submit responses that are on-task, on-time and completed with appropriate academic integrity. 他們必須意識到評量的過程，評量如何影響學生的能力，擁有學術誠信、對任務按時完成。

3.Opportunities for them to practise judging their own responses to assessment tasks need to be provided so that students can learn to identify what is good about their work and what could be improved. 同時須提供機會，學生練習對自己評量任務的回應的判斷，如此他們才能學習辨認自己工作時的優點和尚待改進的地方。

We therefore conceptualised students ’ capacity to develop these aspects of assessment as assessment literacy, and deﬁned this as students ’ understanding of the rules surrounding assessment in their course context, their use of assessment tasks to monitor or further their learning, and their ability to work with the guidelines on standards in their context to produce work of a predictable standard. 因此，本篇文章將「發展評量這些方面的能力」，概念化為「評量素養」。同時定義「評量素養」為學生對課程內容評量的原則的理解，他們使用此評量任務以監控和促進他們的學習。他們有能力依據評量標準的指引，而有可預測標準的工作行為。

過去的相關研究？過去有少數研究評量經驗的問卷，卻沒有關注評量素養概念的問卷，本研究經過文獻回顧，提供二者差距的一些資訊。 O ’ Donovan, Price and Rust(2004)和O ’ Donovan(2003)促進學生理解評量的研究，接近評量素養的概念。

O ’ Donovan, Price and Rust (2004)發現只依靠對評量標準明確的敘述無法幫助學生理解評量者的感受與評量的目標期許。 The authors have been developing a growing body of work on criterion-based assessment methods including the development and use of assessment rubrics (grids), grade descriptors and benchmark statements. 研究者發展與使用評量欄格、等級說明與指標敘述，以增進學生的評量能力。學生透過訊息接收精確的抓取任務的意義和教師所表達的期待。

Knowledge has both explicit and tacit dimensions, and that learners need to construct that knowledge from experience for themselves for it to have meaning for them. 知識有其顯性和隱性的向度，學習者需要透過經驗建構知識，並從中獲得意義。對於評量標準學科知識與有意義的知識而言皆如此。

Their approach was to aim to develop, through structured activities, students ’ knowledge of how assessment responses would be marked, and in turn their understanding of how their own responses would be judged. 透過結構性的活動，學生將發展對於評量回應如何標記的知識，以及回應如何被判斷的理解。

The intervention was based on the notion that that once the students started making judgements about the quality of the work in front of them they could apply that evaluative way of thinking to their own work to help them self-monitor it during its production and identify ways to improve its quality. 學生開始記錄關於工作品質的評價，在他們能應用評價的方法思考工作之前，幫助他們在產出過程間自我監控，並思考改善品質的方法。

A 90-minute marking workshop 關於標記/紀錄的工作坊 They were given two exemplar pieces of work that they had to mark and provide feedback for. The assignments were similar in nature and format to the next piece of assessment that the participating students were about to commence for their own coursework, but covered different topics with different instructions. 工作坊事前給予兩個任務範本(已有記錄與回饋)，是與即將面對的任務在形式本質上相似的，學生即將要在課堂工作時間評論，但有不同的主題與指示。

During the workshop students discussed their marking and rationales in small groups before reporting to the whole class the marks they awarded and their justiﬁcations. 跟全班發表他們授予的標記與辯解之前，學生於小組間討論或更改他們的標記/記錄與理由。

At this point, the lecturer lead a discussion of the students ’ rationales and related them to the application of the marking criteria. 同時老師會對小組提出的理由進行討論，並連結至標準(指標)的應用。

The small student groups then had a chance to reconsider their marks and rationale, and ﬁnally, the lecturer provided the whole class his/her annotated assignment exemplars showing the feedback, mark and rationale. 小組此時可以修改之前的討論，最後由老師提供他註記的任務範本並秀出回饋、標記與理由。

Students who participated in the workshop showed signiﬁcant improvement in subsequent assessment pieces compared with students who did not participate. 經過三年重複的研究，作者發現參與工作坊的學生(比起未參與者)在隨後的評量有明顯的進步。

Their research show that relying only on the explicit expression of assessment criteria, standards, and processes as a method of transferring knowledge about assessment does not work. 僅依靠對評量標準、評量過程提供明確的表達，以作為轉化知識的方法，是沒有作用的。

The provision of explicit criteria and summarised standards descriptors needs to be complemented by opportunities for students and staff to share the experience of judging the quality of responses in order to build tacit knowledge into the students ’ repertoire, improve their assessment literacy, and hence, their assessment outcomes. 提供明確的標準、與總結性標準的描述，需要給予學生和教師機會分享判斷回應品質的機會，最為一種補充，為學生建立他們的內隱知識，增進評量素養與評量結果。

本研究的目的 In the present study, we aimed to test this assertion by quantifying the impact of developing students ’ assessment literacy on their assessment literacy levels and on their learning outcomes. 本研究的目的在依據評量素養的層級與學習結果，量化學生發展評量素養的影響。

We set a stringent high-risk testing scenario in which the assessment literacy-developing intervention was much briefer (50 minutes) than those used in the O ’ Donovan, Price and Rust (2004; Rust, Price, and O ’ Donovan 2003) studies. 跟O ’ Donovan, Price and Rus的研究相比，本研究使用較嚴謹的高風險的測試方案，而評量素養發展的介入縮短為50分鐘。

We ﬁrst developed a questionnaire to operationalise some key concepts in the assessment literacy arena; we then implemented these measures in a pre- and post-test framework, such that, between the measurement episodes, the students in the experimental cohort were exposed to an assessment literacy-building intervention. 首先建立一份評量素養關鍵概念的問卷，以此問卷作為前後側的設計。在前後測的中間，實驗組的學生進行評量素養建立的實驗介入。

A control group in the same programme of study, but at a different campus location, received only the pre-test instrument and no intervention. This paper reports on the results of this intervention. 在另一個校園內進行控制組實驗，控制組只進行前測，沒有實驗干預。因此本研究得以報告實驗介入的成效。

Method Participants 研究對象：澳大利昆士蘭州369名樣本，分別來自兩所大學(Campus A and Campus B)，皆是修習商業的自願學生(56% females, 44% males, with a mean age of 19.1 years)。最後有效樣本數為349例提供後續的研究分析 (剔除未寫號碼與未完成前後測者)

Materials Assessment literacy 使用本研究作者發展的Assessment Literacy Survey作為測量，共有30個items(5等量表： ﬁ ve-point Likert scale, ranging from 1 =‘ strongly disagree ’ to 5 = ‘ strongly agree ’ ) 包括：

1.Students ’ understanding of the local protocols and performance standards (6 items) (e.g. ‘ I understand the criteria against which my work will be assessed ’ ); 學生對實驗規定與表現標準的理解(6題)

2. Students ’ use of assessment tasks for enhancing or monitoring their learning, including assessment for learning (6 items) (e.g. ‘ I use assessment to ﬁgure out what is important to learn ’ ) 學生使用評量任務以增進或監控學習(6題) and assessment for grading (4 items) (e.g. ‘ I think the University makes me do assessment to: produce work that can be judged for the University ’ s marking and grading purposes ’ ). 學生使用評量任務以獲得分級(4題)

3. Students ’ orientation to putting into the production of assessable work the minimum amount of effort necessary merely to pass the course requirements (6 items) (e.g. ‘ My aim is to pass the course with as little work aspossible ’ ); 學生投入評量工作的定向是聽過課程需求的最小需求努力(6題)

4. Students ’ ability to judge their own and others ’ responses to assessment task(8 items) (e.g. ‘ I feel conﬁdent that I could judge my peer ’ s work accurately using my knowledge of the criteria and achievement standards provided ’ ). 學生評估自己或他人對於評量任務回應的能力 (8題)

Assessment performance 評量包括：測驗、一個報告、最終考試單選題測驗作為實驗介入前的學術能力(pre- intervention academic ability) 主要關注的自變項是學生成就(分數/等級)，與介入的評量任務相關，歷經一個報告的形式。同時有評量的專欄網格表現敘述(each criterion for each performance standard.) 最終考試的目的是測驗參與者在課堂上、個別指導課程和課本中的關鍵知識。包括單選題、寫作測驗和個案情境簡答題。

Procedure 該研究是準實驗計(quasi-experimental)：實驗組(校園A)進行了前後測和增進評量素養的干預實驗，控制組(校園B)只進行前測。

一、Assessment rubric phase 參與者被告知當完成任務時如何使用評量規準解釋評量標準及內容評量者的評價是立基於作答回應與評量標準的對應評判會依照指示中簡短敘述的學術標準而調整

二、Pre-test assessment literacy survey phase The pre-test measure allowed comparison between the two campuses to detect any initial group differences in assessment literacy and establish baseline levels. 前測提供兩組間的在評量素養的初始比較，並建立基線水平，並針對前測資料進行初始因素分析。

三、Intervention phase (Campus A only)：約45分鐘的實驗階段 1. 「思考、分組、分享」練習( think, pair, and share exercise) 受試者將考慮、判斷與練習標記兩則例子(學生工作的真實例子)已決定他們的品質：優秀、良好、滿意、差的。目的在於幫助受試者學會評斷一篇作品，識別他們用以評判的標準，使用標準以認定不同的成就標準。

(i) Participants made practice judgments on the exemplars ( ‘ think ’ ), then explained and justi ﬁ ed their judgments to the person next to them ( ‘ pair and share ’ ). (ii) Randomly selected pairs shared their decisions with the whole class. (iii) Out of this conversation emerged a list of criteria expressed by the participants in their own language.

2.面對範本佳作的差異，需經過說明範本標記，和舉手表決他們的判斷落在標準的什麼範圍內。最後由課程召集人說明他選擇的標記和原因。 3.參與者指出評量規準，說明他們的評價相對應的學術標準。

四、Post-test assessment literacy survey phase 實驗組再度進行評量素養調查的後測，以評估評量素養的層級(assessment literacy levels) 的改變。

五、Assessment outcome phase 三周後，參與者完成了文獻分析與1500字的報告(形成課程評估的一部分)，同時使用了先前一樣的規準。兩個學校的主題導師將依據標準參照進行報告的評分，參與者的報告的標記將做為自變相與結果變項用做後續的統計分析。

Results Preliminary analyses Factor analysis was undertaken to con ﬁrm the underlying structure of the survey items. The four factors represented the following constructs, respectively: Assessment Literacy (Understanding) (AU); Assessment for Learning (AL); Minimum Effort Orientation (MEO); and Assessment Literacy (Judgment) (AJ). The factor loading matrix for this ﬁnal solution is presented in Table 1 (pre-test and post-test responses). As evidenced by the following table, the four-factor model was found to be reliable across both the pre and post-tests.

As anticipated, the MEO scale scores were negatively correlated with AL (r = .26, p < 001), AU (r = .30, p < 001) and AJ (r = .27, p < 001). The assessment for learning scale (AL) was positively correlated with AU (r = .42, p < 001) and AJ (r = .33, p < 001). The other two assessment literacy scales, understanding (AU) and judgement (AJ) were also positively correlated (r = .50, p < 001). This is evidence of appropriate and predicted patterns of convergent and discriminant validity (Campbell and Fiske 1959).

Assessment of group differences (pre-test) Campus A participants scored signiﬁcantly lower assessment for learning scores (AL T1 M = 3.71) than Campus B participants (M = 3.92, t(334) = 3.02, p = .003, d = .33). These results point towards a general trend of slightly poorer baseline motivation and assessment literacy levels for Campus A (intervention) students, especially in terms of using assessment for learning. Consideration must be paid to these baseline differences in any later comparison of average report marks between campus groups, in addition to determining the intervention ’ s impact on assessment literacy levels and any associated improvement in report mark for Campus A. All means, standard deviations and effect sizes are shown in Table 2.

Intervention impact upon assessment literacy Paired sample t tests were conducted to investigate the impact of the intervention on participants ’ effort, use of assessment and assessment literacy levels. The intervention was effective in producing positive and signiﬁcant change across the three assessment literacy factors: AU t(153) = 10.21, p = .00, d = .73, AJ t(157) = 6.51, p = .00, d = .52, and AL t(155) = 6.03, p = .00, d = .39. These effectsizes reveal the intervention resulted in medium to large changes in the three assessment literacy levels (see Table 3).

Considering the brevity of the intervention, themagnitude of this impact may therefore be considered a hefty return on a smallinvestment. No signiﬁcant change occurred in the attitudinal measure of MEOt(157)= 1.67, p = .10, d = .08. All means, standard deviations and effect sizes are shownin Table 3 as follows.

Impact of assessment literacy on assessment results Correlational analysis and a regression model were developed to examine Campus A (intervention) students ’ report mark as a function of changes in their motivation or assessment literacy levels due to the intervention.

No signiﬁcant relationship was found between changes in MEO and report mark. This is most likely due to the intervention not leading to any signiﬁcant change in MEO levels (refer to Table 3). Table 4 describes outcomes from the correlational analysis.

As the change in MEO for Campus A was not signiﬁcantly correlated with report mark, this variable was not entered into the multiple regression model which comprised the change in scores for: AL Ch , AU Ch and AJ Ch , along with the report mark.

Thus, of the three assessment literacy factors, improving students ’ ability to judge the standards of their own and others ’ work appears to be the most critical to enhanced learning outcomes. Unstandardised(B), and standardised ( b ) regression coefﬁcients, and squared part correlations for each predictor in the model are shown in Table 5.

Unique contribution of improved assessment literacy To determine whether the improved assessment literacy levels predicted report marks over and above participants ’ post-test motivation (MEO T2 ) and pre-existing academic ability (pre- intervention quiz), a hierarchical multiple regression analysis was conducted for the intervention group.

This ﬁnding is practically signiﬁcant given the brevity of the intervention; it seems that from an intervention of just 50 minutes duration, designed to develop students ’ judgement abilities, their marks on a related task can be signiﬁcantly increased. A summary of the hierarchical regression for predicting report mark is shown in Table 6.

Between group differences in report mark as a function of assessment literacy for the pre- and post-test data. For both groups ’ pre-test data, the three assessment literacy factors (AU T1 , AJ T1 , AL T1 ) did not signiﬁcantly correlate with report mark. As hypothesised, these improved assessment literacy factors of AU T2 , AJ T2 , AL T2 as measured at post-test, were more strongly (and signi ﬁ cantly) related to report mark, such that higher assessment literacy levels were related to higher report marks.

While the intervention did not signiﬁcantly alter Campus A levels of MEO, this attitudinal factor as measured at both time one and time two, did signiﬁcantly relate to report mark, with a higher propensity towards using minimal effort related to lower report marks. Table 7 presents these relationships.

Discussion and conclusions This study theorised the notion of assessment literacy as multi-dimensional, and has shown how the dimensions of assessment literacy differentially contribute to the educational gains derived from this pedagogical intervention. Speciﬁcally, after controlling for prior academic ability and motivational attitude, one dimension of assessment literacy stands out as the ‘ high-leverage ’ dimension – the ability to judge actual works against criteria and standards. 本研究將評量素養視為一種多面向的，此多面向的評量素養透過教學介入有助於教學成效。特別的是，在控制前學術能力與動機態度後，有一個面向的評量素養─「高槓桿」向度─ 一種對應評量標準以判斷真實作品的能力。

The importance of this ﬁnding is that it was the nature of the intervention (i.e. getting students to look at and judge actual examples of student work) that created the gains in this dimension of assessment literacy. This implies that interventions aimed at garnering enhanced learning from assessment, should target the development of assessment literacy. 這樣的教學介入使得評量素養此面向的增進。從評量增進學習，將評量素養的發展視為目標。

This in turn means creating an emphasis on a meta- dialogue about assessment, its purposes and how it functions. A further implication is that gains typically attributable to formative feedback could be enhanced not by a more detailed explication of the feedback by lecturers but rather by deploying assessment literacy (judgement)-enhancing protocols at the formative feedback points during the semester. 同時也強調了學生的對於評量的「後設對話」，評量的目的？評量的運作方式？更進一步可以歸因，形成的回饋並非由講者的詳細解釋，而是開展了評量素養(評斷)的能力。

These ﬁndings support the view that helping students to develop their ability to judge their own and others ’ work will likely enhance their learning outcomes. Interventions which give student practice in judging work against standards, develops the judgement dimension of assessment literacy, which in turn allows them to perform better themselves on similar tasks. 本研究支持了幫助學生發展他們的能力，依據評量標準去判斷自己和同儕的作品表現，更增進學習結果，未來也會在相似的任務上表現較佳。

(比起O ’ Donovan, Price and Rust 2004的研究，聚焦在評量素養，而能縮短介入的時間。) 除了評量素養，本研究提出一個新的思維：「教學投資的回報」。尤其在工作繁重、資源減少行政支持、學生更多元的高等教育，因此可以改變教學實踐，教師們可以期待學生們什麼樣的改變。未來是否值得將這樣的教學，放入正規的教學實踐中？擴展到其他的課程裡？尤其學生的能力可能不僅是直線成長，而是曲線成長。甚至持續到學生晚期的程度。

Student comments such as‘. now I understand what it ’ s all about Student comments such as‘. . . now I understand what it ’ s all about . . .’ and ‘ I think I ’ m starting to get this . . .’ and ‘ Now I know what ’ s expected of me ’ indicated that students had learned from the experience. Comments from teaching staff included ‘ What a fabulous activity . . .’ and ‘. . . really useful ’ indicated that not only students but also teaching staff perceived beneﬁts in the intervention.

參考文獻 PISA歷年評量周期，2015年10月 9日取自臺灣PISA國家研究中心http://pisa.nutn.edu.tw/pisa_tw_03.htm PIRLS閱讀素養全面評量，取自台灣PIRLS團隊網站 http://lrn.ncu.edu.tw/Teacher%20web/hwawei/PIRLS_home. htm Smith, C. D., Worsfold, K., Davies, L., Fisher, R., & McPhail, R. (2013). Assessment literacy and student learning: The case for explicitly developing students "Assessment Literacy". Assessment & Evaluation in Higher Education, 38(1), 44-60.

謝謝聆聽！