On Monday, Ceremorphic of San Jose, California, formally debuted chip efforts that have been kept in a stealth mode for two years, discussing a chip the company claims will revolutionize the efficiency of AI computing in terms of power consumption.
“It’s counterintuitive today, but higher performance is lower power, said Venkat Mattela, founder and CEO of the company, in an interview with ZDNet via Zoom.
Mattela believes that numerous patents on low-power operation will enable his company’s chip to produce the same accuracy on signature tasks of machine learning with much less computing effort.
“What I’m trying to do is not just building a semiconductor chip but also the math and the algorithms to reduce the workload,” he said. “If a workload takes a hundred operations, I want to bring it down to fifty operations, and if fifty operations cost less energy than a hundred, I want to say mine is a higher-performance system.”
Mattela is wading into a heavily contested market, one where startups such as Cerebras Systems, Graphcore, and SambaNova have received vast sums of money and where, for all their achievements, they still struggle to topple the industry heavyweight, Nvidia.
Mattela is inclined to take the long view. His last startup, Redpine Signals, was built over a period of fourteen years, starting in 2006. That company was sold to chipmaker Silicon Labs in March of 2020 for $314 million for its low-power Bluetooth and Wi-Fi chip technology. (The chip is now being used in the recently introduced Garmin Fenix 7 smartwatch.)
Also: Meta says it will soon have the world’s fastest AI supercomputer
The lesson of that seventeen-year effort at Redpine and now at Ceremorphic is twofold: “I have a lot of patience,” he observed of himself with a chuckle. And “I don’t do incremental things.”
Mattela contends that when he takes on a problem in an area of chip design, it is in such a way as to get meaningfully ahead of state of the art. The Redpine wireless chip technology Silicon Labs bought, he said, went up against the offerings of giant companies, Qualcomm and Broadcom in Bluetooth and Wi-FI.
“I took a big challenge, I went against them, but only with one metric, ultra-low-energy wireless, twenty-six times less energy than the best in the industry,” said Mattela.
Now, Mattela believes he has a similarly winning focus on power, along with three other qualities he deems both unique in the AI chip market and essential to the discipline: reliability, quantum-safe security, and an ability to function in multiple markets.
To make all that possible, Mattela held onto the microprocessor assets that had been developed at Redpine, to form the foundation of Ceremorphic, and retained eighteen employees from that effort, whom he has complemented by hiring another 131 people. The company has offices in both San Jose, the official HQ, and a gleaming new office building in Hyderabad, India.
Also: Cerebras continues ‘absolute domination’ of high-end compute, it says, with world’s hugest chip two-dot-oh
Mattela has an intriguing list of 26 U.S. patents with his name on them, and an equally intriguing list of 14 U.S. patent applications from the last few years.
What Mattela dubs a “Hierarchical Learning Processor,” or HLP, consists of a computing element for machine learning running at 2-gigahertz; a custom floating-point unit at the same clock frequency; a custom-designed multi-threading workload scheduling approach; and specially-designed 16-lane PCIe gen-6 circuitry to connect the processor to a system’s host processor such as an x86 chip.
The last of these, the PCIe part, could almost be its own company, claims Mattela.
“Right now, what is in production is PCIe-4, the dominant one, and PCIe-5 just started last year,” explained MattelA. “And with us, PCIe-6 will be in production in 2024 – I own that technology.”
“That’s $12 million if you had to license that,” he said of PCIe-6. “That alone is a significant thing to design.” The PCIe link will allow Mattela to further refine the energy consumption of a total system, he said.
At the heart of the chip’s advantage are analog circuits resting underneath digital. Some companies have used analog circuits extensively for AI processing, the most well-known being startup Mythic, which in 2020 revealed a chip that can multiply vectors and matrices – the heart of machine learning – not as digital multiplications but as combinations of continuous energy waveforms in accordance with Ohm’s Law, what the company calls analog computing.
The Ceremorphic HLP chip will use analog computing more selectively than Mythic, Mattela told ZDNet.
“At the lowest level of the hierarchy” of chip functionality, “I do analog computation,” explained Mattela. “But higher level, I don’t do analog because I want to make the programming model easy.”
That means “twenty-three patterns” for multiply-accumulate in analog via the HLP’s micro-architecture. The analog multiplications will be a more efficient use of voltage than digital, he argued.
“At a higher level, it looks like a vector processing and data-path processing combination.”
The various chip features will contribute to making possible the four qualities Mattela promotes.
Also: ‘We are the best-funded AI startup,’ says SambaNova co-founder Olukotun following SoftBank, Intel infusion
In addition to power-efficient operation, there is reliability. AI silicon has a reliability problem today, claimed Mattela.
Machine learning chips have gotten vastly larger than conventional microprocessors. Nvidia’s “A100” GPU is already a fairly hefty, by classic standards, 826 square millimeters. But novel chips from startups can be much larger, such as Cerebras’s WSE-2 chip, measuring 45,225 square millimeters, almost the entire surface of an eight-inch silicon wafer.
“When you have more silicon, there is a greater possibility of failure because there are alpha particles, neutron bombardment,” observed Mattela. “In the last two years, people are already saying, my systems are failing in the data center.”
Mattela claims a unique hardware-software combination will enable his chip to “predict faults and correct them.”
“Reliable performance computing engineering is our key contribution,” he said.
The third quality Mattela is emphasizing is security, including protection against future quantum systems that could conceivably break conventional data security.
Also: Graphcore brings new competition to Nvidia in latest MLPerf AI benchmarks
“So far, security systems have been designed to counter hacking by humans,” explains Mattela. “But going forward, you can’t assume computing power will be limited, and that it will take two days to break it [a system], you had better assume perhaps two minutes!”
The Ceremorphic chip has “quantum-resistant random-number generation,” said Mattela, which “cannot be broken by a very high-performance computer.” In practical terms, said Mattela, that means such a system would take perhaps a month to break, affording a customer time to change the security key to foil the attack.
The fourth property is what Mattela refers to as scaling. What Mattela means by that is addressing multiple markets with one chip. The chip will be able to function in deep learning, automotive applications, robotics, life sciences, and some sort of future Metaverse application.
The same HLP will serve to do both training and inference, the two aspects of machine learning.
Scaling to multiple markets, claimed Mattela, will make his chip more relevant than those of competitors. He argues that startups such as Cerebras are impressive but not ultimately as relevant.
“It’s very fine engineering, yes, and you can always do something that nobody else can do, but your purpose is not to do something which nobody does,” said Mattela.
“Your purpose is to create an outcome that everybody makes money, and it has impact, it has some value to the market.”
Of course, Cerebras and the others are shipping products, while Mattela hasn’t even produced samples yet.
To make Ceremorphic’s design a winner, Mattela has what would appear to be an ace in his pocket: Taiwan Semi’s 5-nanometer chip process, which is one of the manufacturing giant’s “advanced technologies,” a chip process to which not every customer is given access.
“When I say to people that I am doing 5-nanometer, they say, How could you get 5-nanometer,” said Mattela, with evident delight. “Some of these companies with hundreds of millions in funding are not in 5-nm; they are in 7-nm.”
One reason is a close relationship Mattela cultivated with TSM years ago when he was at analog chip giant Analog Devices. More important, his sale of Redpine to Silicon Labs boosted his credibility. TSM, he says, had to believe that he can see his product through to fruition, for only then does TSM get paid in full.
“It takes many, many years,” said Mattela, referring to the production process. “I have to spend $200 million, I have to produce a chip, the chip has to work, and then if it works, they [TSM] will get paid at that time.”
Mattela’s team designed the chip as a prototype initially in what’s called a shuttle run, a small batch of chips, in the slightly less-sophisticated 7-nanometer process. The company will this year expand its shuttle run batches to 5-nanometer.
While Ceremorphic expects to provide first customer samples of its chip next year, full-scale production in 5-nanometer will probably not occur until 2024, he says. “These are very aggressive dates” for design in a leading-edge process such as 5-nano, Mattela observed of the timeline.
A gating factor is a cost. He points out that to move from shuttle run to full use of a wafer – what’s known as full mask – is the difference between $2 million and over $10 million.
For that, Ceremorphic will need to raise further capital. So far, Ceremorphic is self-funded, with Mattela and friends and family putting together $50 million in a Series A round, an extremely small amount of funding relative to AI chip startups such as SambaNova that have received billions in venture capital.
Mattela’s preferred route to future funding, he said, is through partnerships, though a formal Series B investment round is also a possibility in 2023.
The first instantiation of the HLP will be as a PCI card that bundles together what the chip needs to function in a computer system. “Typical system OEMs are the target,” he said. That route to market, he believes, will make his machine more broadly available.
“Every company needs a training supercomputer,” he said. “I want to provide the training supercomputer that can be affordable to every enterprise.”
In contrast to something mammoth such as Facebook’s
Research SuperComputer for AI – 6,080 GPUs and 175 petabytes of flash storage – the Ceremorphic PCI blade would be intended to make the technology more accessible.
“If I can provide one-tenth of a building size computer in a box, that’s the sweet spot.”
While his part is not yet shipping, Mattela is already predicting a rapid shakeout in the AI chip startup market. The challengers such as Graphcore have raised a lot of money and made very little revenue; he speculates, just a fraction of what Nvidia makes in a quarter off of AI.
“There are four to five companies today; they have close to five billion today [in capital raised], that’s a lot of money,” said Mattela. But, “the number one company, every quarter makes two billion dollars,” referring to Nvidia’s data center revenue.
“If you don’t even make 1% of the number one company, and you’re losing money, that’s not a business,” said Mattela.
The shakeout will come sooner rather than later, Mattela prophesies, because of profligacy. “In today’s hot market, they went and got money, good for them, but whatever money they got, I don’t think they really figured out how to spend the money because the amount being spent is just abnormal,” he said.
Junior engineers are being paid vast sums at the AI startups, he maintains. “If a fresher is getting $200K [in annual salary], that’s not sustainable,” he said, using the tech industry jargon for the most junior position, “because the guy will be productive after two years, but, by then, the money is already gone.”