Cores That Don't Count [pdf] (2021)

mofosyne a year ago

This is about unstable cores that randomly output incorrect calculation and ways to mitigate it via better hardware testing and duplicating parts of the core that can fail often.

I did however thought initially from the title that it's about 1-bit CPUs like the MC14500B Industrial Control Unit (ICU) which is a CMOS one-bit microprocessor designed by Motorola for simple control applications in 1977. It completely lacks an ALU so essentially cannot count, but is designed for PLCs.

winwang a year ago

Hey. It could count to 1, which is something.

freeqaz a year ago

Unrelated to the topic being discussed, but my mind immediately went to "per core pricing" which is common for databases. Some SQL servers would be charged for by the number of CPU cores in a system, and manufacturers would often offer an SKU with fewer, faster cores to compensate for this.

Taking that thought and thinking about adding "silent" cores is interesting to me. What if your CPU core is actually backed by multiple cores instead to get the "fastest" speed possible? For example imagine if you had say 2 CPU cores that appeared as one and each core would guess the opposite branch of the other (branch prediction) so that it was "right" more of the time.

An interesting thought that had never occurred to me. It's horribly inefficient but for constrained cases where peak performance is all that matters, I wonder if this style of thought would help. ("Competitive Code Execution"?)

buildbot a year ago

People have thought about it, but it’s so incredibly wasteful that it’s impractical. At 20% branching, you rapidly run out of resources pending the winning branch and spend possibly 8 cores just to predict three branches ahead, or roughly 15 instructions. That’s pretty rough!
- hinkley a year ago
  
  I wonder if you could put more logic units per core and load balance to prevent thermal throttling, or if you’d make the communication pathways slower at a rate that exceeds the gains.
  - buildbot a year ago
    
    Yep, you can do that, and yep, it gets slower.
    That’s basically the tradeoff Apple made with their M series chips vs AMD/Intel which until recently have been chasing fast and narrow designs. Apple in contrast, has a crazy “wide” core aka it can issue and retire many more instructions per clock than basically any other mainstream CPU.
- eep_social a year ago
  
  In distributed computing, a few layers of abstraction up, an analogous technique of sending two identical RPCs to distinct backends can be used to reduce tail latency.
userbinator a year ago

For example imagine if you had say 2 CPU cores that appeared as one and each core would guess the opposite branch of the other (branch prediction) so that it was "right" more of the time.
I belive some CPUs do speculate down both paths of branches if the branch predictor was really uncertain which one to take.
jcul a year ago

Not exactly the same thing, but I remember talking with a co-worker before about strategies to use a core and a hyperthreaded sibling core on the same work load, to get speed up.
However, in practice I think it would be really difficult to prevent them just trashing each others cache / using resources.
- lallysingh a year ago
  
  Yeah your options are to spin on a few lines of cache (e.g. an iterated function or processing a ring buffer) or streaming cache ops