Low-power design-Software Design Compile Instructions selection and reordering,Loop optimization(merging and unrolling etc.),to reduce switching -Register allocation to minimize external memory reference ·Coding Rewrite critical blocks (e.g.inner loops)in assembly if needed Changing representation of data ·Algorithm Selection the optimized in candidates Reducing "hot"operations e.g.external memory R/W,some coprocessor operations ·OS - DPM(Dynamic Power Management):auto-transition to low power states e.g.Suspend,Sleep,Idle etc. DVFS llxx@ustc.edu.cn 21/62
Low-power design — Software Design • Compile – Instructions selection and reordering, Loop optimization (merging and unrolling etc.), to reduce switching – Register allocation to minimize external memory reference • Coding – Rewrite critical blocks (e.g. inner loops) in assembly if needed – Changing representation of data • Algorithm llxx@ustc.edu.cn 21/62 – Selection the optimized in candidates – Reducing “hot” operations • e.g. external memory R/W, some coprocessor operations • OS – DPM(Dynamic Power Management): auto-transition to low power states • e.g. Suspend, Sleep, Idle etc. – DVFS
代码优化 ·目标 P=Pon+Pe+P=0.5CLVDDAf+lVDDA+InVpD 一减小连续指令之间 的海明(Hamming)距 表1代码表 离 原代码 指令 Rd Rn Rm ·指令的重排 Add RO,R1,R2 0000 0001 0010 SUB R3.R4,R5 0011 0100 0101 MUL.R6,RO,R3 0110 一消除由于装载延迟、 0000c00011 Bi位变化 4CN 5 分支延迟、跳转延 优化后的代码 指令 Rd Rn Rm 迟等引起的指令流 ADD RO,R6,R7 0000 0110 011山 水线的失效 SUB R1,R2,R5 0001 0010 0101 MUL R3,RO.RI 0011 0000 0001 ·寄存器再分配 Bi位变化 2 2 2 注:Rd代表目的寄存器:Rn、Rm代表源操作数。 llxx@ustc.edu.cn 22/62
代码优化 • 目标 – 减小连续指令之间 的海明(Hamming)距 离 • 指令的重排 dyn sc lk L DD sc DD lkVDD P P P P C V Af I V A I 2 0.5 llxx@ustc.edu.cn 22/62 • 指令的重排 – 消除由于装载延迟、 分支延迟、跳转延 迟等引起的指令流 水线的失效 • 寄存器再分配
操作替换 ·X2+AX+B=(X+A)*X+B
操作替换 • X2+AX+B = ( X+A ) * X + B
算法选择 USTC 问题描述:对于1字节的变量V,求其二进制表示中1的个数 算法1:用除法和取余实现。 算法2:使用与&(即移位)操作。 算法3:使用与&操作,仅考虑V中1的个数。 算法4:使用分支操作 算法5:使用查表法 表4用不同算法实现同一问题 算法 时问复杂度 能耗/ CpU周期数 指令数 算法1 01og20 1762.208738 676 112 算法2 01og: 1642.718447 626 101 算法3 0(0 1449.174757 552 98 算法4 0(1) 899.345720 312 45 算法5 0(1) 223.689320 83 18 llxx@ustc.edu.cn 24/62
算法选择 • 问题描述:对于1字节的变量v ,求其二进制表示中1的个数 – 算法1 :用除法和取余实现。 – 算法2 :使用与&(即移位) 操作。 – 算法3 :使用与& 操作,仅考虑v中1的个数。 – 算法4 :使用分支操作 – 算法5 :使用查表法 llxx@ustc.edu.cn 24/62
Program-level optimization for multimedia USTC -Source code optimization in MP3 decode Encoded PCM audio Bitstream Frame Inverse samples Unpacking Reconstruction Mapping MP3 decode Stages: synchronizing the bitstream and decoder huffman decoding Requantization stereo processing IMDCT(Inverse MDCT) polyphase synthesis bank IMDCT function is the major energy source symbolic optimization and rewrite critical loops IMDCT function energy simulation Original Optimized Result result on Sim-Panalyzer Energy(pf) ALU 18,898 20,421.2654 -8% Multiplier 57,194 41,133.1802 28% Total 197,547 91,197.2750 54% llxx@ustc.edu.cn 25/62
Program-level optimization for multimedia —Source code optimization in MP3 decode • MP3 decode Stages: – synchronizing the bitstream and decoder – huffman decoding – Requantization llxx@ustc.edu.cn 25/62 – stereo processing – IMDCT(Inverse MDCT) – polyphase synthesis bank • IMDCT function is the major energy source — symbolic optimization and rewrite critical loops Energy (pf) Original Optimized Result ALU 18,898 20,421.2654 -8% Multiplier 57,194 41,133.1802 28% Total 197,547 91,197.2750 54% IMDCT function energy simulation result on Sim-Panalyzer