Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible article of interest #18

Closed
Gingeropolous opened this issue Feb 6, 2019 · 4 comments
Closed

possible article of interest #18

Gingeropolous opened this issue Feb 6, 2019 · 4 comments

Comments

@Gingeropolous
Copy link

https://arxiv.org/abs/1902.00112

"HashCore: Proof-of-Work Functions for General Purpose Processors"

posted by sarang in #monero-pow

@SChernykh
Copy link
Collaborator

Related article mentioned there : https://www.computer.org/csdl/proceedings/ispass/2017/3890/00/07975285.pdf

@tevador
Copy link
Owner

tevador commented Feb 6, 2019

@Gingeropolous

Using a well-tested and optimized benchmark suite to design a PoW certainly has its merits.

However, if I was peer-reviewing this paper, I'd ask them to include more details about the runtimes of their 'widgets' especially in relation with PoW verification time. They mention "running time on the order of a few seconds or less", which is certainly too high for practical use. Symmetric PoW algorithms must have run times on the order of milliseconds.

@JustFranz
Copy link

Reading the blurb and beginning of article, + skimming the rest, this is what I get.
GPP(CPU) manufacturers optimize their hardware for benchmarks. The benchmark in question being SPEC CPU 2017 which should represent end user workloads (43 benchmarks).

Optimize how? ASIC like circuitry for specific cases? General optimizations for real world workloads? Something in the microcode?

Are the CPU manufacturers making their CPUs for real life workloads or are they cheating on a benchmark (precedent with nividia making drivers for benchmarks https://www.geek.com/games/futuremark-confirms-nvidia-is-cheating-in-benchmark-553361/ )

Sure it makes sense to have good benchmark scores and there is history of cheating on prevalent benchmarks. The article does not show that is happening though. Since SPEC CPU 2017 is a general benchmark designed to test different common workloads, one can imagine that a CPU will be capable of them proportionally to its computing power.

A specific end user workload or a representation of it can get its own ASIC circuit and you can massively parallelize it. A CPU might have something to make it 5% faster than the competition because of a benchmark (software) quirk somewhere but it won't be prepared to handle a billion instances of it. (I am not familiar of the nature of the 43 benchmarks)

The premise of the paper makes for an "interesting" story, but I think they start at a too high level of abstraction by looking at benchmarks.

ASIC resistant pow - why? Custom asics (or even common bitcoin ones) are restricted to few manufacturers and because there is money to be made there is little incentive to sell them to the public (hello dusty ASICs being sold at exorbitant prices just as a new generation is coming out (in secret) that will make the last generation obsolete).
What is more available? Ordinary CPU (GPU less so, as was seen during the boom, a high powered GPU is also more of a niche product).

CPU does something. You know what you can make it do and you can measure how long it took and what it took. What are the smallest things you can make a CPU do? How much does that use the CPU? What can you make the CPU do so that is uses as many resources as possible?
Randomization and sequencing of those things is key to thwart fixed circuit designs.

Lets say for a specific sequence of instructions there is dedicated circuitry or other magic on a CPU to boost some edge case in a benchmark/application (dubious). To get the benefit of those hidden optimizations you must use SPEC CPU 2017 esque workloads or because of the prevalence of the workload the CPU is just good at it. Is that good POW? Does that prevent ASIC optimizations to the highest degree?

This paper raises an interesting question (CPU and benchmark optimizations/cheating), doesn't answer it, makes a pow out of it (you can make a pow out of anything) and leaves it at that. Is it optimized? Is it asic resistant?

The special question is, is there magic sauce in CPUs that was put there to make it especially fast at certain tasks? If so then how flexible is that special sauce (useful for asic resistant POW leverage?) or is it just for cheating on a benchmark?

You'd start by testing a suspect benchmark with newer and older gen CPU's and see if there is an uptick not explained by general IPC or frequency. IF you find it you deconstruct the program and figure out what it might be. Test mutations of it on other architectures.

A real world workload is good for estimating real world performance (bottlenecks in RAM, GPU, IO, under utilization of hardware). A synthetic workload is good for measuring the performance of parts or combinations of parts of the CPU.

An ASIC resistant POW will be a purpose made workload and synthetic in nature when considering all other tasks out there. It will have certain hardware parameters in mind and should be designed to use as much of the CPU as possible and be fine grained to do it. A real world benchmark will remain in the real world.

If a CPU can introduce a magic speedup for a piece of code or pieces of code that match a pattern, could't an ASIC do the same?

Regarding speculative execution, branch prediction and stuff like that (correct me if I'm wrong). Stuff like this is useful when the CPU is underutilized. A CPU can use 30% resources now when they aren't being used to have 5% of the work done instantly instead of using 5% of resources and 5% of time when it could be used for something else. If a POW is using all of the cpu or most of the cpu or the circuits that would be doing the applicable prediction/speculation math, none of it will matter?

@JustFranz
Copy link

A question I've been pondering but can't begin to wrap my head around in any meaningful way.

Is there an algorithm that:

  1. Executes faster on a CPU with all of the modern out of order, speculation, prediction tricks.
  2. Is not also easier for ASIC implementation because of it (or where the balance lies).
  3. If there is some kind of an implementation edge in favor of CPUs, how does that weigh against just loading the available overhead with more raw compute.
  4. If there is something to those tricks that doesn't need free CPU time, how to optimise for it without optimising for ASICS.
  5. Lets say you have an algorithm that leaves some of the CPU idle for some time and you can not make it any better, is it more beneficial to make the algo more prediction friendly and have it use the remaining CPU or leave it as is.

Since it is a cat and mouse game, none of it is an immediate concern. There is lower hanging fruit. Maybe it will become a whack-a-mole game that Monero can manage to play.

@hyc
Copy link
Collaborator

hyc commented Feb 7, 2019

Funnily enough, they include randomX in their references. At least they've done some useful homework. And yes, CPU optimzations tend to be tailored to the SPEC benchmark, and a SPEC workload might represent a best case for a CPU workload. But the optimizations aren't always in the hardware - frequently they're just tricks built into the compiler. I already investigated this path when I first came up with randomJS. Also, their approach is necessarily tied to a specific ISA. I had this same realization before, and considered bundling gcc into the PoW to avoid being tied to a single ISA. It's really not that practical.

@hyc
Copy link
Collaborator

hyc commented Feb 11, 2019

Now that I've read to the end of the paper... They actually include gcc in their PoW. This opens up a potential hole as far as reproducibility, across gcc versions and target platforms.

@tevador tevador closed this as completed May 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants