Optimizing existing large codebase 4Mr Mem threads Practical consequence in C+++ Guidelines o we want as few heap memory allocations as possible stack usage is much better o we want continuous memory blocks,specially for containers that means containers of objects,no pointers involved e.g.vector<obj*>or array<vector<Obj>>are banned 2 main rules o use container of objects,not of pointers ouse (const)references everywhere avoid any unnecessary copy of data o including implicit ones o use container reservation context comtainers reserving findBadCode 19/62 S.Ponce-CERN
Optimizing existing large codebase 19 / 62 S. Ponce - CERN Measure Modernize Mem threads low level c/c context containers reserving findBadCode Practical consequence in C++++ Guidelines we want as few heap memory allocations as possible stack usage is much better ! we want continuous memory blocks, specially for containers that means containers of objects, no pointers involved e.g. vector<Obj*> or array<vector<Obj>> are banned ! 2 main rules use container of objects, not of pointers use (const) references everywhere avoid any unnecessary copy of data including implicit ones use container reservation
Optimizing existing large codebase Mem threads Container of objects in memory Simple vector case std::vector<int>v; 为 Vector of objects struct A float x,y,z;} std:vector<A>v; XO % 20 灯 1 y2 22 Ao A A2 contet containers reserving findBadCode 20/62 S.Ponce-CERN
Optimizing existing large codebase 20 / 62 S. Ponce - CERN Measure Modernize Mem threads low level c/c context containers reserving findBadCode Container of objects in memory Simple vector case std::vector<int> v; x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 ... Vector of objects struct A { float x, y, z; }; std::vector<A> v; x0 y0 z0 A0 x1 y1 z1 A1 x2 y2 z2 A2 x2
Optimizing existing large codebase Mem threads Tom Container of pointers in memory Naive view struct A float x,y,z;} std::vector<A*>v; ptro ptnptr2ptrsptr4ptrs ptreptrptraptro Realistic view 23 y4 ZA Zg 为 28 20 ptro ptrptr2 ptrs ptra ptrs ptre ptr ptra ptre 6 灯 2红 211=
Optimizing existing large codebase 21 / 62 S. Ponce - CERN Measure Modernize Mem threads low level c/c context containers reserving findBadCode Container of pointers in memory Na¨ıve view struct A { float x, y, z; }; std::vector<A*> v; ptr0 ptr1 ptr2 ptr3 ptr4 ptr5 ptr6 ptr7 ptr8 ptr9 ... Realistic view ptr0 ptr1 ptr2 ptr3 ptr4 ptr5 ptr6 ptr7 ptr8 ptr9 x ... 0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3 x4 y4 z4 x5 y5 z5 x6 y6 z6 x7 y7 z7 x8 y8 z8 x9 y9 z9
Optimizing existing large codebase Mem threads Container of objects in cache Memory view for vector A Each line corresponds to a cache line (64 bytes,16 floats) 0x00C0 0x0080 知620y21X2222x3z3x4y4Z4x 0x0040 5x% 0x0000 All data are nicely collocated in cache contet containers reserving findBadCode 22/62 S.Ponce-CERN
Optimizing existing large codebase 22 / 62 S. Ponce - CERN Measure Modernize Mem threads low level c/c context containers reserving findBadCode Container of objects in cache Memory view for vector<A> Each line corresponds to a cache line (64 bytes, 16 floats) 0x0000 0x0040 0x0080 0x00C0 x0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3 x4 y4 z4 x5 y5 z5 x6 y6 z6 x7 y7 z7 x8 y8 z8 x9 y9 z9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . All data are nicely collocated in cache
Optimizing existing large codebase Measure Modemise Mem threads lom Container of pointers in cache Memory view for vector<A> Each line corresponds to a cache line (64 bytes,16 floats) 0x0240 0x0200 x88Z8 海929 0x01C0 灯727 0x0180 污西 x6626 0x0140 0x0100 8为23 ay424 0x00C0 2222 0x0080 0%四丸ya 0x0040 Po P1 P2 P3 P4 Ps P6 P7 P8 P9 0x0000 Cache nightmare data is completely sparse Note from Andrzej:this is already optimistic contert containers reserving fidBadCode 23/62 S.Ponce-CERN
Optimizing existing large codebase 23 / 62 S. Ponce - CERN Measure Modernize Mem threads low level c/c context containers reserving findBadCode Container of pointers in cache Memory view for vector<A*> Each line corresponds to a cache line (64 bytes, 16 floats) 0x0000 0x0040 0x0080 0x00C0 0x0100 0x0140 0x0180 0x01C0 0x0200 0x0240 p0 p1 p2 p3 p4 p5 p6 p7 p8 p9 x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 y0 y1 y2 y3 y4 y5 y6 y7 y8 y9 z0 z1 z2 z3 z4 z5 z6 z7 z8 z9 Cache nightmare : data is completely sparse Note from Andrzej : this is already optimistic