2月14日消息，外媒electronicsweekly刊发了一篇专业性文章，对今年ISSCC上出现的64位ARM v8和IBM的Power 8芯片作了对比，原文如下：
位于加利福尼亚的IC厂商Applied Micro在今年的ISSCC上推出了64位ARM v8架构芯片。
这款芯片主要应用于服务器，它的8400万个晶体管Potenza 处理器模块(PMD)拥有两个ARM核共享256Kb L2高速缓存。每个PMD核有一个四边无序的超规模微架构。
基于10金属层40nm CMOS，每个PMD占据14.8mm2 ，主频为3GHz，在0.9v典型的负载下平均功耗为4.5w。
ISSCC paper 5.8“一个基于40nm CMOS技术的3GHz 64位ARM处理器”
服务器的另一个看点——IBM发布了它的Power 8处理器，它有12个8线程核心，96Mb L3缓存，性能相比早前的Power7+提升2.5倍。
基于22nm SOI，15级金属 eDRAM技术，这款处理器拥有巨大的片外带宽，集成电压管理和谐振时钟。
基于22nm eDRAM技术的微调节器分配系统，能为Power 8传递12.3A电流，效率高达90.5%，密度36W/mm2。它的多模共振时钟频率范围是从2.5GHz到5GHz，并在非空闲状态下，能降低33%系统功耗，在高低振动频率模式之间动态切换。
ISSCC paper 5.1，Power8：一个基于22nmSOI技术的12核服务器级处理器，拥有7.6Tbit/s片外带宽。(元器件交易网刘光明 摘译)
A highlight of this year’s ISSCC is an insight into ARM‘s 64bit v8 architecture, courtesy of California’s Applied Micro.
Aimed at servers, its 84million transistor Potenza processor module (PMD) has two ARM cores sharing a 256kbyte L2 cache.
Each core of the PMD has a four-wide out-of-order superscalar micro-architecture.
“Execution units are crafted with pipeline designs for concurrent handling of one load, one store, two integer, as well as multiple ASIMD/floating-point operations,” said the firm. “Micro-architectural elements include branch predication, separate L1 instruction and data-caches, L1 and L2 data pre-fetch, and hardware table walk.”
The initial server configuration has four PMDs on the chip sharing 8Mbyte of L3 cache and four DRAM channels via a central switch.
Fabricated on 10 metal layer 40nm bulk CMOS, each PMD occupies 14.8mm2 and runs at up to 3GHz, averaging 4.5W consumption from 0.9V “under representative workloads”, said the paper.
In detail, the ISSCC presentation deals with memory building blocks – largely a 2kbyte (0.374μm2 6T SRAM) cell, power distribution and on-chip clock distribution.
ISSCC paper 5.8 ‘A 3GHz 64b ARM v8 processor in 40nm bulk CMOS technology‘
In another highlight, also aiming at servers, IBM is revealing its Power8 processor which has 12 eight-threaded cores with 96Mbyte of L3 achieving 2.5x performance improvement over its earlier Power7+.
Implemented in 22nm SOI eDRAM technology with 15 levels of metal, the processor has huge off-chip bandwidth, integrated voltage regulation and resonant clocking.
So novel are the last two features that the IEEE has allocated them their own separate papers at ISSCC 2014.
The distributed system of micro-regulators, implemented in the same 22nm eDRAM technology, can deliver 12.3A to the Power8 with 90.5% efficiency and a density of 36W/mm2. Its multimode resonant clock can oscillate from 2.5GHz to greater than 5GHz and can “reduce clock grid power by 33% and dynamically switch between high and low resonant frequency modes without idle cycles” claims IBM.
ISSCC paper 5.1, ‘POWER8: A 12-core server-class processor in 22nm SOI with 7.6Tbit/s off-chip bandwidth.’
“IBM and Applied Micro processors represent two major aspects of processor development: extreme performance for ‘big data’ handling and power efficiency for cloud computing,” said Session chair Atsuki Inoue of Fujitsu. “These greater levels of performance and energy efficiency in ever more dense form factors will extend our abilities from increasing multimedia social computing to scientific and medical applications, such as understanding human genomes.
Both Intel and AMD are talking about thier new processors in the same session.
Intel 4.31 billion transistor 15 core Ivytown Xeon has 37.5MB of shared L3 cache on a 22nm finfet process with nine metal layers.
In a second paper, Intel is talking about its 22nm finfet CMOS graphics core with adaptive clocking to deal with voltage droops. Operation is down to 0.38V.
A third Intel paper covers its Haswell fourth-generation Core processor, also using 22nm finfets. It has integrated voltage regulators and graphics with embedded DRAM providing 102Gbyte/s bandwidth at 1.22pJ/bit. Compared with earlier versions, standby power is down by 95%, and floating point capability is doubled.
AMD’s 236 million transistor x86 Steamroller occupies 29.47mm2 in 28nm and uses a shared 96kbyte 3-way instruction cache and 10kbyte branch target buffer improve single and multithreaded performance compared its earlier 32nm design. Steamroller is another processor using resonant clocking.
It also gets a second paper, once more dealing with clocks throttling back automatically to deal with voltage droops. AMD estimates it allows 7-15% better power efficiency, as frequency can be maintained at lower voltage.