游戏的优化不仅仅是帧速率.

游戏的优化不仅仅是帧速率

我们将谈论什么? 何时何地需要做优化? C 和 C++ 的比较 C++中的性能问题算法优先我们需要 C++ 的高级特性吗？

优化无处不在最好的优化器是你的大脑，而不是编译器评测而不是臆测 Windows 游戏不应该有特权专家的选择是不优化
Windows 游戏首先是一个 Windows 程序每一点资源的节省都将是有意义的专家的选择是不优化 Popo龙珠的例子: 热点在python 接口的调用而不是一开始认为的图象显示

FPS? 提高了 10 fps 表示什么含义？最高帧速率和平均帧速率 Loading 时间关心一下 CPU 占用率
10fps->20fps 100fps->120fps 最高帧速率和平均帧速率 Loading 时间关心一下 CPU 占用率了解 Windows 多一点

记时器 TimeGetTime QueryPerformanceCounter RDTSC 精度多任务环境的影响流水线测不准原则
RDTSC 精度没有想象的高，而且受流水线影响(调用 cpuid)

微观和宏观微观上的优化宏观上的优化 CPU指令、流水线吞吐量、等待时间有限的硬件优化有限的编译器优化算法和代码结构的改进
减少需要处理的数据量，减少处理的频率和次数吞吐量 throughput 等待时间 latency 硬件优化掉部分操作，保证逻辑正确

C 与 C++ 微观上的比较 C 比 C++ 快 10%? C++ 编译器的改进不要迷信书本证据？更合理的参数传递方式 Inline
堆栈和函数调用静态变量的使用 Thiscall, stdcall, 函数返回 add esp,xxx 堆栈的起源，CPU 指令针对主流语言的优化

C++提供更强的语言特性 new/delete malloc/free C++ exception setjmp/longjmp
虚函数函数指针数组 Template 宏标准库函数重载等方便开发者写出质量更高的代码 C++ 多采用空间换取时间的策略几乎在所有情况下，时间效率都大过 C

C 的优势简单可移植性更强接口简洁更少的二义性 CRT 开销小编译速度快 C 的目标代码普遍小于 C++

C++ 需要了解更多

STL 最被人喜爱的容器大多数情况他们没有被正确的使用 std::map std::string std::vector
std::list 大多数情况他们没有被正确的使用

std::map 插入是很慢的 O(log(N)) 有额外的内存消耗 (三个指针+颜色) 大多数情况，我们需要的只是查找
数组+二分查找 Hash map 通常可以提高效率，但不绝对还有更多的优化手段 lua 的实现大话西游的实现 Vc 7.1 每个node要消耗 14字节参考：《effctive c++》 Lua 的 map

std::string 还有一种字符串叫作 const char * const std::string &
不要依赖 COW (copy-on-write) 考虑多线程环境良好的设计下，cow 通常多余 Lua 如何处理字符串?

std::vector std::vector 并不仅仅是数组通常我们把 vector 作为数组使用
Vector::push_back() 常引起内存重分配 vector::reserve() vector::clear() 不一定释放内存 POD 类型的优化记住：从 C 语言开始，就支持了数组 VC 里 100 次 push_back 会引起 13 次内存重新分配

std::list std::list 是一个双向链表 std::list 有内存的额外开销
链表可以在常数时间插入，而当 N 不大的时候，优势并不明显。

正确的使用STL STL 是 C++ 提供的强有力的工具 STL 的使用都是有开销的 STL 并不能解决我们所有的问题
让代码达到最佳的性能，需要用我们的大脑

重新发明轮子？不要因为你能够做到而重新实现 STL
几乎所有的 MyVector MyString MyMap 都不如 std::vector std::string std::map 更多的了解 STL 更多的了解 C++

CRT 的使用 sprintf(s,”%d”,n); sprintf(s,””); printf 与 puts 不要忽略 CRT 的开销
为什么不用 itoa ? sprintf(s,””); 为什么不用 s[0]=‘\0’; printf 与 puts 不要忽略 CRT 的开销

重写 CRT? 优化 memcpy：MMX 版本、SSE 版本… 重写 string 库，MyStrlen MyStrcmp …
任何小于 64k 数据复制的优化都没有意义重写 string 库，MyStrlen MyStrcmp … CRT 可以做的更好 Intrinsic 函数 #pragma intrinsic()

内存优化展开循环，消除数据相关性数据并行处理减少数据结构的尺寸，让数据尽量紧凑的放在一起数据对齐了解内存的工作方式
更进一步的讨论超出了本次的讨论范围

内存管理优化 C++ 提供了更灵活的内存管理机制 new/delete 不一定是最好的方式(STL就不用) 自定义内存分配器方便调试
分配速度和内存碎片同样重要注意分离模块的问题 DLL 最容易出错多种分配方式。

算法 C++ 更适合实现更复杂的游戏引擎引擎的复杂度提升，层次的增加，会降低效率更高的复杂度是为了宏观上的优化

脏矩形

问题脏矩形的合并算法并不简单合并后的脏区域并不是一个矩形，不方便做图片裁减有许多的物体在屏幕上移动卷动屏幕图像引擎设计的复杂度

改进的脏矩形分格处理渲染管道绘图操作对象化

滚动优化更大的back buffer 破碎的分格

覆盖优化

C++的高级特性天使还是恶魔?

Template 避免重复的代码 template <bool mask_blit>
void _blit(pixel *dst,const pixel *src,size_t s,bool mask_blit) { for (size_t i=s;i!=0;--i,++dst,++src) { if (!mask_blit || *src!=mask_color) *dst=*src; } template <bool mask_blit> void _blit(pixel *dst,const pixel *src,size_t s)

void blit(pixel *dst,const pixel *src,size_t s)
{ _blit<false>(dst,src,s); } void mask_blit(pixel *dst,const pixel *src,size_t s) _blit<true>(dst,src,s);

矩阵运算 Matrix A,B,C; A=B+C; 如何避免临时对象的返回？转化为 A=B; A+=C;
Matrix operator+(const Matrix &lhs,const Matrix &rhs); Matrix & my_type::operator=(const Matrix &v); 如何避免临时对象的返回？转化为 A=B; A+=C;

Expression Templates template<typename T> class add_type {
const T& _lhs; const T& _rhs; public: add_type(const T &lhs,const T &rhs) :_lhs(lhs),_rhs(rhs) {} T& calculate(T &result) const { result=_lhs; result+=_rhs; return result; } }; template <typename T> add_type<T> operator+(const T &a,const T &b) { return add_type<T>(a,b);

class Matrix { /* ... */ Matrix& operator+=(const Matrix &v); Matrix& operator=(const add_type<Matrix> &v) { return v.calculate(*this); }

编译时计算 template<int N> class factorial { public:
enum { value = N * factorial<N-1>::value }; }; template<> class factorial<1> { enum { value = 1 };

冒泡排序 inline void compare_swap(int &a,int &b) { if (a>b) { int t=a;
b=t; } void sort(int *data,int n) for (int i=0;i<n-1;i++) { for (int j=i+1;j<n;j++) { compare_swap(data[i],data[j]);

template<int N>
struct inner_loop { static inline void expand(int* data) { compare_swap(*data, data[N]); inner_loop<N-1>::expand(data); } }; template<> struct inner_loop<0> { static inline void expand(int*) {} struct sort { sort<N-1>::expand(++data); template<> struct sort<1> { static inline void expand(int* data) {}

int main() { int a[]={3,2,1}; const int len=sizeof(a)/sizeof(a[0]); sort<len>::expand(a); for (int i=0;i<len;i++) { printf("%d,",a[i]); } // sort<len>::expand(a) 展开后的代码 compare_swap(*data,data[2]); compare_swap(*data,data[1]); ++data;

尽可能的在编译时运算？还有更多的 template 技巧滥用 template 的后果编译速度下降模块间耦合度增加
《Modern C++ Design - Generic Programming and Design Pattens Applied》滥用 template 的后果编译速度下降模块间耦合度增加对开发伙伴要求增加调试不便

编译效率同样重要尽可能的，正确的，使用动态连接库允许的话，用 .c 编写部分代码减少 .h 依赖简单的东西更具有美感
预编译头文件是万恶之源简单的东西更具有美感

游戏的优化不仅仅是帧速率.

Similar presentations

Presentation on theme: "游戏的优化不仅仅是帧速率."— Presentation transcript:

Similar presentations

About project

反馈

请登录

Auth with social network:

游戏的优化 不仅仅是帧速率.

Similar presentations

Presentation on theme: "游戏的优化 不仅仅是帧速率."— Presentation transcript:

Similar presentations

About project

反馈

游戏的优化不仅仅是帧速率.

Presentation on theme: "游戏的优化不仅仅是帧速率."— Presentation transcript: