A New Advanced Semiconductor Category
CORNAMI’s multicore TruStream architecture creates a new semiconductor category that allows for any C++ programmer to maximize the concurrency and pipelining of any algorithm directly in hardware. To support this capability, CORNAMI has developed and patented technology that maps a Directed Graph (cyclic or acyclic) onto a scalable and potentially unlimited sized collection of processors and efficiently and concurrently executes it.
Directed Graphs are the foundational structure for all task management, so it is important to pause a moment and talk about what this milestone means for the next generation of computing.
For a long time, CPU chip vendors used Frequency Scaling, increasing clock rates and corresponding powerconsumption, to upgrade performance. When clock rates hit around 4Ghz in 2005, Frequency Scaling stopped being effective as the industry was unable to dissipate the additional heat. While the growth rate of transistors on a single die was unaffected, the next level of performance could only be achieved by using those transistors as multiple processor cores. That same year marked the introduction of the first multicore CPU chips to the market and today most commercial CPU chips are multicore.
However, there was an issue. With the exception of a small class of problems, the current programming approach to computing made it very difficult to effectively scale appli
cation performance by adding processor cores. The current computing model enforces very tight instruction execution control relying on a Program Counter to define the “currently executing” as well as the next to be “executed instruction” instruction. As the number of processor cores working on a problem increased from “N” to “N+1”, the new processor core had to manage its interaction with the N cores already deployed and each of those N cores had to add the new core to its existing management and scheduling overhead.
- For two cores you have each core coordinating with the other (1-2 and 2-1),
- For three cores the coordination overhead goes from 2 to 6 (1-2, 1-3, 2-1, 2-3, 3-1, 3-2),
- For four cores goes to 12 (1-2, 1-3, 1-4, 2-1, 2-3, 2-4, 3-1, 3-2, 3-4, 4-1, 4-2, 4-3), …
The overhead rises exponentially according to the formula N2 – N. For five cores the coordination overhead becomes 20, for seven it becomes 42, and at ten it becomes 90! Very quickly the coordination overhead outstrips the capabilities of the entire processor core resource because the additional processor core resources added is dramatically overshadowed by the overhead required to coordinate it. This is the legacy of today’s computing architecture where Program Counter-based control is the key focus.
TruStream’s native programming control structure is expressed as a Directed Graph where maximum algorithmic concurrency is defined by its width and maximum pipelining is defined by its depth. TruStream dramatically reduces coordination overhead so as processor cores scales, overhead only increases by Log N. This means that as the number of processors grows, instead of overhead becoming an exponentially larger and larger percentage of the total processor core resource, it exponentially decreases as a percentage of the total resource.
Both the TruStream control system and processor cores are directly implemented in a custom chip with programming done through C++ extensions. Chips can be easily linked together to scale resources to support larger and larger Directed Graphs. This creates a new semiconductor category of easily deployed hardware support for applications that can be quickly modified just by changing the underlying application software. If you think about the different types of semiconductors: FPGAs, metal layer ASICs, standard cell ASICs, and full custom ASICs, they all share the same common features of organizing and structuring concurrent computing elements in silicon. With current techniques, regardless of which semiconductor type is chosen, developing hardware support for an application is expensive and time consuming. The differences are in application logic density, clock rate, engineering time and cost, and production unit cost. Moreover, once you are done, that structure is fixed in silicon so remedying bugs and/or adding new features needs to go through the same expensive and time-consuming process.
CORNAMI’s TruStream hardware provides the same function of coordinating and structuring concurrent computing elements, but instead of requiring many months, millions of dollars, and large teams of semiconductor engineers, application logic is defined in standard software by typical programmers. With TruStream, a software developer can now easily achieve the level of hardware support for their applications once reserved for expensive and long lead time ASICs.