张博恒 MD, PhD 复旦大学循证医学中心国际临床流行病学上海培训中心

张博恒 MD, PhD 复旦大学循证医学中心国际临床流行病学上海培训中心 bzhang@zshospital.net
临床统计学介绍张博恒 MD, PhD 复旦大学循证医学中心国际临床流行病学上海培训中心

为什么要做统计分析？统计分析的目的是应用样本资料的信息,作出有关研究总体的有效推测。应用概要性指标描述样本资料来实现。
这些概要性指标保留了足够的信息去估计研究总体的特征。 2004－6

关于总体的临床研究问题在发展中国家，人工喂养相比母乳喂养能否增加母亲为HIV阳性的婴儿生存率？
如何建立一个心脏搭桥手术后生存率模型？病人的特征能否预测术后生存率？相比内科治疗，搭桥手术后1，3，5年的生存率能否改善？局部治疗小肝癌能否代替外科手术切除？根治术后应用大剂量的干扰素能否降低肝癌复发率？ revised 2004－6

今天的主题总体，样本和个体资料的类型： Continuous vs. categorical 如何描述资料? 统计量和图
测量集中趋势和离散趋势标准误和95% 可信区间根据数据选择合适的统计方法诊断试验评价 2004－6

总体，样本和个体 “Aristotle maintained that women have fewer teeth than men; although he was twice married, it never occurred to him to verify this statement by examining his wives’ mouths.” -- Sir Bertrand Russell, The Impact of Science on Society, 1952. “It is a capital mistake to theorize before you have data.” -- Sir Arthur Conan Doyle, Scandal in Bohemia. 2004－6

总体，样本和个体 And, for another viewpoint:
“If your experiment needs statistics, you ought to have done a better experiment.” Ernest Rutherford. The bench science perspective: you can control all the variables! Clinicians, however, know better … human variation is large, and often inexplicable. Statistics help us describe it and generalize at least enough to improve our ability to practice medicine. 2004－6

总体，样本和个体 Aristotle 推测了一个女性总体 (比较男性总体). 他实际上手头就有一个包含2个女人的样本，他能对这个样本中的2个个体进行数牙。 The population is the collection of all people about whom you would like to ask a research question. This might be a fairly clear-cut easily defined set of people: “What proportion of people 65 or older in the US today have Alzheimer’s disease?” Or it might be a more hypothetical group: “How much of a reduction in symptomatic days could a person expect if treated with a new antiviral for flu?” 2004－6

总体，样本和个体实际上，我们不可能去研究总体中的每一个对象。
所以，我们研究一个样本, 并将其推广到整个人群。样本量是样本中个体的数目 (而不是对每个研究对象的测量指标数目!) 好的研究设计能帮助我们得到一个代表性好的样本。好的统计分析能帮助我们获得关于总体问题的答案。 2004－6

例子：HCC的裸鼠转移模型免疫重建对照组 CD3 31.5% 14.2% CD4 XX CD8 * 2个水平：裸鼠细胞 2004－6

数据类型计量资料 Quantitative: “how much ?” <> 连续的变量: 年龄, 体重, 身高, 血压 <> 实际数值: 家庭的子女数, 住院天数分类资料 Categorical: “what type ?” <>等级变量: 肿瘤分期 (I,II,III) ; 好 > 中 > 差 <>名义变量：男/女; 健康/生病; ABO血型 2004－6

数据类型－数据类型的转换计量数据可转换成分类数据： normal (value) vs. abnormal;
“young, middle-aged, old” 将连续变量转换成等级变量减少了资料的信息量，从而造成统计学检验的敏感度或把握度下降 2004－6

分类资料的统计描述计数百分比 N = 74 Notes: vertical axis can be count or percent
in the above example, counts do not add to 74 … individuals can have multiple risk factors tabular presentation may be more parsimonious for such data 2004－6

分类数据的统计描述构成比率比例 vs 率标化 2004－6

定量数据的统计描述下面是一组年龄数据（11例） 21, 32, 34, 34, 42, 44, 46, 48, 52, 56, 64
年龄是一个计量的变量，所以如果用条图就不合适。我们更感兴趣的是年龄分布的一些特征：年龄分别的中心点在哪里？如平均数年龄的变异又是如何? 是不是有些数据跟绝大部分数据差得很多（outliers）借助视觉工具帮助我们回答这些问题. 2004－6

计量数据的统计描述图表 1. Stem and Leaf plot 2. Histogram 3. Boxplot 数字
1. Location - mean, median, mode. 2. Spread - range, variance, standard deviation，percentile 3. Shape- skewness *例外：生存资料的描述 2004－6

Stem and Leaf Diagram We could group the data and tally the frequencies: But why “hide” the details? Instead, we’ll use the 10’s place as stems and the units as leaves: stem & leaf plot For small datasets 20: X 30: XXX 40: XXXX 50: XX 60: X 2* | 1 3* | 244 4* | 2468 5* | 26 6* | 4 2004－6

平均数方差 Examples 中位数百分位数 outlier 2004－6

集中趋势算术平均数：几何平均数中位数 2004－6

平均数和中位数比较 Mean is sensitive to a few very large (or small) values - “outliers” Median is “resistant” to outliers Mean is attractive mathematically 50% of sample is above the median, 50% of sample is below the median. 2004－6

Variation is important!
离散趋势 Variation is important! 2004－6

离散趋势方差标准差百分位数： IQR = Q.75- Q.25 2004－6

标准误和95% 可信区间描述样本：平均数，标准差？总体：为了估计总体的平均数，需要计算标准误标准误＝标准差/ 样本量
总体均数的95％CI: 样本的平均数±1.96*标准误论文中常用 2004－6

标准差 vs均数的标准误 ( when do you use one, but not the other ? )
标准差用于描述：量化样本均数周围的变异. 当确定两个样本是否来自于同一总体时，标准差是一个重要的统计量。 Central limit theorem; “同一总体中的样本均数呈正态分布” 样本均数的标准误用于样本均数估计总体的均数。标准误是一个重要的统计量，用于计算样本均数的可信度，取决于标准差和样本量。但实际上两者并不独立，当样本量增加时，标准差往往减少。 2004－6

正态分布 ( basis of statistical inference for many populations )
Mean = median = mode. all = same value in the distribution remember: 68.3% of data is between s.d. and s.d. 95.0% “ “ “ “ “ “ “ s.d. and s.d. 95.5% “ “ “ “ “ “ “ s.d. and s.d. 99.7% “ “ “ “ “ “ “ s.d. and s.d. 2004－6

推断性统计推广结论: 样本总体评价证据的强度比较预测 2004－6

计量资料的统计方法正态分布非正态分布配对资料（2组) 配对t检验符号检验符号等级检验成组比较 (2组）成组比较t检验
成组比较 (2组）成组比较t检验 Wilcoxon Mann & Whitney 中位数检验配伍组比较随机区组方差分析非参数配伍组比较－M检验多组比较完全随机设计方差分析非参数多组比较－H检验 2004－6

列联表分析行名义变量等级变量一般联系： Pearson’s χ2 行平均得分：（趋势分析） χ2 （趋势分析）相关分析：
χ2 （趋势分析）相关分析： cmh： χ2 列 * 四格表是全一致 2004－6

Make predictions: 回归分析
应变量：一般定量变量—— 线性分析等级或名义变量——Logistic 回归时间变量 —— Cox回归 2004－6

Index of community mosquito infestation
Descriptive epidemiology : pattern of occurrence Prevalence of HIV+ and community Mosquito index r = r - squared = * p < p < * * * * * * * * * * * * * * 20 15 10 5 HIV+ Index of community mosquito infestation 2004－6

诊断试验评价试验的设计 2004－6

诊断试验的设计 2004－6

诊断试验的评价金标准有病无病试验＋ a b 试验－ c d 敏感度＝a/a+c 特异度＝d/b+d 阳性预测值＝a/a+b
阴性预测值＝d/c+d 阳性拟然比＝敏感度/[1－特异度] 阴性拟然比＝[1－敏感度]/特异度 2004－6

医学论文中通常报道哪些? 大多数研究报道平均数（正态）或中位数（非正态）
有些研究报道标准差和/或标准误。Be careful!有时会看到图中有一个error bar，could be either. 如果资料非正态 (偏态，多峰，尾巴很长或很短等）, 往往报道中位数和百分位数，而不是均数和标准差. 写文章时一定有根主线——研究所要回答的问题： Do you want to ask about the average or typical person? Or do you want to figure out how unusual your patient might be? 2004－6

通常的流行病学（科学的）途径 1. 确定一个问题 : clinical suspicion ; case series ; review of medical literature 2. 组织一个假设 ( asking the right question ) ; good hypotheses are: Specific, Measurable, and Plausible 3. 检验假设 ( assumptions vs. type of data ) 4. 再验证 always Question the VALIDITY of the result(s) : Chance ; Bias ; and Causality 2004－6

结论的准确性 Chance : role of random error in outcome measure(s) ( p - value ; power of the study and the confidence interval ) --- largely determined by sample size Bias : role of systematic error in outcome measure(s) Selection bias - subjects not representativ Information bias - error(s) in subject data / classification Confounding - 3rd variable (causal) assoc. w/ both X and Y 2004－6

张博恒 MD, PhD 复旦大学循证医学中心国际临床流行病学上海培训中心

Similar presentations

Presentation on theme: "张博恒 MD, PhD 复旦大学循证医学中心国际临床流行病学上海培训中心"— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

张博恒 MD, PhD 复旦大学循证医学中心 国际临床流行病学上海培训中心

Similar presentations

Presentation on theme: "张博恒 MD, PhD 复旦大学循证医学中心 国际临床流行病学上海培训中心"— Presentation transcript:

Similar presentations

About project

反馈

张博恒 MD, PhD 复旦大学循证医学中心国际临床流行病学上海培训中心

Presentation on theme: "张博恒 MD, PhD 复旦大学循证医学中心国际临床流行病学上海培训中心"— Presentation transcript: