Added miner documentation files.

todxx · Jul 31, 2020 · 212b40b · 212b40b
1 parent bac531f
commit 212b40b
Show file tree

Hide file tree

Showing 10 changed files with 1,300 additions and 0 deletions.
diff --git a/doc/CN_AUTOTUNING_WITH_TRM.txt b/doc/CN_AUTOTUNING_WITH_TRM.txt
@@ -0,0 +1,129 @@
+Team Red Miner CryptoNight Auto-Tuning Support
+==============================================
+
+TL;DR
+-----
+TRM now has auto-tuning capabilities and the recommended way is to either always run the miner without specific configurations or use a script to deduce the best configurations. The former means deleting any --cn_config arguments from existing start scripts. For the latter, edit the included run_autotune_quick.bat/.sh with your algo, pool, wallet and password. Run the script and wait until it completes. Expected runtime is 5-15 mins. Open the logfile autotune_quick.txt and copy the command line arg at the end of the file with the CN configs to your start script, replacing any existing --cn_config argument. If you change the rig setup (e.g. add/remove gpus), do this process again.
+
+
+Introduction
+------------
+In TRM 0.5, we had to add an set of additional tuning parameters for CN variants. The CN config format now contains an optional fine-tuning suffix set of a colon and three letters. This meant that the scan range became much larger than before, turning tuning a rig into a tedious project. Moreover, it's not trivial to explain when to try different combinations of the fine-tuning parameters.
+
+We also noticed that many of the previous variants could benefit from small tweaks to the CN configs as well, but in highly random ways. There is no simple default formula that provides the optimal CN config for TRM for both Windows and Linux, across all driver versions and sets of chosen clocks and timings. 
+
+Given the above, we felt it was necessary to include auto-tuning support in the miner rather than ask the users to spend hours on end and still not be sure they are running TRM in the best possible configuration for their rig(s). In some ways, this can be described as a much better default mode where we don't have to guess but can take 5 mins on startup and benchmark our way to the best found config, without restarting the miner and still mining at close to maximum speed while benchmarking.
+
+We have also added a manual tuning mode using key menus in the miner. This means you can switch freely between CN configs without restarting the miner. The only limitation is that intensities can't be increased.
+
+
+The CN Configuration
+--------------------
+The full CN config now consists of the following parts. Please note that this is mostly for reference, the idea with the auto-tuning is rather that the user should have to care _less_ about the parameters involved.
+
+o Prefix: an optional L. This enables a more compressed deployment on the gpu
+          where each work unit handles more pads. Mostly used for small pad
+          algos (Turtle, UPX2) or on small gpus like 550s.
+
+o Thread 1 intensity X: typically a number between 1-16 for 8GB gpus and
+          standard 2MB pad size, but it depends on the algo and pad size.
+
+o Tweak setting: one of the following chars:
+    o -  No tweak enabled.
+    o +  First tweak option (only option for Lexa/Baffin/Polaris cards).
+    o *  Second tweak option (for Vegas, especiall with timing mods enabled).
+    o .  I don't care, let the miner choose.
+
+o Thread 2 intensity Y: the second thread's intensity. Can be zero to disable
+          thread 2.
+
+o Finetune: three chars that are loosely defined from the user's perspective
+          since we might change these at any time. The standard finetune set
+          is always AAA. Not all algos have all combinations available. 
+    o Finetune 1: one of ABCDE
+    o Finetune 2: one of AB
+    o Finetune 3: one of AB
+
+Please note that it's _always_ possible to specify the CN config without the added finetune part.
+
+Example configs:
+
+    16*14:CAA    Common Vega config for 2MB pad algos.
+   L28+28:AAA    Common Vega config for small pad algos (Turtle, UPX2).
+      8+7:AAA    Common 470-580 config for 2MB pad algos.
+    16+14:CBB    Common 470-580 and Vega config for 4MB pad algos.
+
+
+Types of Auto-Tuning available
+------------------------------
+The miner supports the following types of auto-tuning:
+
+o AUTO:  A (default) quick scan on start-up for all GPUs, only checking a small
+         set of known good configs. If the user has specified specific config
+         units (intensities), only a single configuration per gpu is scanned,
+         otherwise a sensible default range is used.
+
+o QUICK: The same quick scan as AUTO, but with better reporting.
+
+o SCAN:  A thorough exhaustive search of all configurations. This can take a
+         long time to execute (> 1-2h) but guarantees that all configs have
+         been tested. It can also scan across multiple intensities.
+
+o NONE:  Disable all forms of auto-tuning, just run with the provided or
+         default configs. This is not recommended.
+
+
+The Auto-Tuning Process
+-----------------------
+Note 1: the quality of the auto-tuning output can vary, and in rare cases there might even be false best modes chosen. In general you will end up with a config that is either your best choice under the chosen clocks and timings, or it is at least very very close. That said, verify that you're getting the expected hashrates from the chosen configs, otherwise revert to the 2nd or 3rd best config in the final list printed in step (7) below.
+
+Note 2: during the auto-tuning, there is a small probability that false hw errors will occur when switching between configs. Therefore, please disregard any small amount of hw errors that occur during the auto-tuning process. Errors that occur when the process has completed for a gpu and it's listed as 100% done are real errors and indicate you're clocks and/or timings are too aggressive.
+
+1.  Set the clocks and timings you aim to run in production, but be a little
+    more generous. This primarily means lowering memclk and raising voltages
+    just a little. We want the clock regime to be compatible with the final
+    clocks from an optimization perspective, but we also want to minimize the
+    risk of crashing during the auto-tuning process.
+
+2.  Edit the run_autotune_quick.bat/.sh:
+     o Set your pool, user and password.
+     o If you only want to work with a subset of your gpus, set the DEVS
+       variable to a -d x,y,z argument.
+
+3.  Execute run_autotune_quick.bat/.sh. The miner will shut down when the
+    auto-tuning process has completed.
+
+4.  Open the autotune_quick_log.txt file and scroll to the bottom. Copy the 
+    command line argument with the CN configs printed by the miner to
+    your start script for the miner, either adding it or replacing any
+    existing --cn_config argument.
+
+5.  Optional continuation: if you _really_ want to make sure you are running
+    the best possible configuration for your gpu(s), open the 
+    run_autotune_full.bat/.sh file and insert the same variables as for the
+    quick script.
+
+6.  You can also enter CN configurations for all gpus in the full scan. The
+    only reason for doing so would be to force the start intensities to
+    specific values for a gpu. The scan will try all possible configurations
+    at each intensity level, decreasing the intensity one step at the time.
+    The miner will normally choose high start intensities when it knows it is
+    going to scan a full range, making this step unnecessary. If you still 
+    want to add it, set the CN_CFG variable to e.g. 
+    --cn_config=16+15,8+8 for a two-gpu system. Do NOT configure more than the
+    two intensities (i.e. 16+15 rather than 16+15:AAA) or you will disable the
+    auto-tuning.
+
+7.  Start run_autotune_full.bat/.sh and go grab lunch, coffee or dinner. This
+    will take a while.
+
+8.  Open the autotune_full_log.txt file and scroll to the bottom. Compare the
+    final output values to the values from the quick scan and use any mix of
+    the two.
+
+
+Manual menu-based tuning
+------------------------
+For manual tuning, we've also added a key-driven menu subsystem in the miner that reuses the same mechanisms as the auto-tuning mode. You enter it by pressing 't' and then then one of 0-9 (or a-f for 10-15).
+
+The mode itself should hopefully be self-explanatory. You can cycle all available options per finetune parameter, tweak mode and L prefix enabled or not. You can also move freely between intensities <= the ones chosen at startup. This is a great power-user mode and for playing around with the configs found by the auto-tune process.
diff --git a/doc/CN_GENERAL_TUNING.txt b/doc/CN_GENERAL_TUNING.txt
@@ -0,0 +1,68 @@
+Team Red Miner CryptoNight Tuning
+=================================
+
+IMPORTANT IMPORTANT IMPORTANT: this document preceeds the new tuning document included in the releae, CN_AUTOTUNING_WITH_TRM.txt. While the information below is still accurate, it is also somewhat outdated. It is very much recommended to read the auto-tuning guide and use the new auto-tuning support to trim your rigs.
+
+Note: the hashrates mentioned in this document are for the main Monero PoW variants, CNv8 and CN/r.
+
+Introduction
+------------
+This miner is more lean than other CN miners. This can translate into either increased hashrate, lower power draw, or both, or none of the above. Your mileage may vary and is highly dependent on mem straps, modded timings and clocks. For most CN variants, if this miner is running at the same hashrate as other CN miners, you can expect your power draw per GPU to decrease between -5-20W depending on gpu model and clocks. 
+
+There are fewer controls in this miner than the standard CN miner. You specify a config for one or two threads, and a mode. You provide one intensity value per thread in the range 0-16. Moreover, you can choose between three modes, +, - and *. The + and - modes will most often not have any effect on the end result, but it never hurts to try it. The effect of the + and - modes varies with your gpu model and clocks, making it difficult to make general recommendations on when to us one or the other.
+
+The * mode is different. It's designed specifically to take advantage of modded timings on Vega cards. Whenever you use modded timings with tightened latency, you should use the * mode. The AMD Memory Tweak tool released by ElioVP is truly an amazing addition to the Vega toolset. We recommend all Vega owners to read up on the tool and the current latest and greatest mem timings provided by the community. The tool is available at https://github.com/Eliovp/amdmemorytweak, and the Bitcointalk ANN thread can be found here: https://bitcointalk.org/index.php?topic=5123724.0
+
+Standard Tuning Guide
+---------------------
+
+[Windows drivers]
+For some Polaris cards, the good ol' blockchain driver works fine. However, the one driver that seems to be a good fit across the board is 18.6.1, and that's the driver we have used in all our Windows tests.
+
+[Windows swap space]
+You should to set up your swap space to be at least the sum of all GPU memory you intend to use when mining. Typically, for a 4GB card this is 3.5GB and for an 8GB it's 7.5GB. Playing it safe is recommended, i.e. rather add the full memory size of all your GPUs and set the swap to the total sum or more.
+
+[Linux drivers]
+For your Vegas to reach max possible hashrate under linux, you need amdgpu-pro drivers >= 18.30. Polaris/Baffins/Lexa Pros are not as sensitive to the driver version. Also, please note that this release does not include ROCm support for CN variants, it will be included in an upcoming release instead.
+
+[Polaris cards (470-580)]
+The standard configuration is 8+8 for all of these cards. 8+6 or 7+7 might give the same optimal hashrate, and 9+9 can, especially under linux, give a better result for 480/580. For some cards, 16+14 is the best choice but also increases the probability of stale shares. You must have good mem straps to reach a good hashrate. Normally e.g. the Pimp My Straps function in SRB Polaris Bios Editor is sufficient. For mem clk, boosting it as much as possible while avoiding mem errors is a good thing. The core clk should generally end up between 1230-1270. For 580s, a boosted core clk to 1300 can push the hashrate to 1100 h/s while still staying at a reasonable power draw. We have seen few Polaris cards not being able to reach 1020-1030 h/s with this miner when the proper mem straps are in place. With the introduction of CN/r, which is more power hungry than CNv8, the core clks mentioned above might be too high and skew your efficiency. Either lower them somewhat or make sure your temps and power draw at-the-wall is under control.
+
+[Baffin and Lexa Pro cards (550-560)]
+From v0.3.8, this miner has now been better optimized for these smaller cards. The major additions are that the '+' mode has been optimized and a 'L' prefix mode designed for the smallest Lexa Pro cards has been added. Some rules of thumb when you optimize your rigs:
+
+o We'd expect 4+4 and 4+3 to be the only interesting configs for 4GB cards.
+o For Lexa Pro cards with 8 CUs, prefix your config with 'L', i.e. L4+3.
+o The 'L' prefix is designed for Lexa Pro, but can also work well for Baffin with 10 CUs.
+o Many 2GB Lexa Pro can't do L4+3 under Win, only L3+3. For max performance you should try Linux and L4+3.
+o For an overkill full range test, you should try all of 4+4,4+3,3+3,3+2,2+2 in four modes: X+Y,LX+Y,X-Y,LX-Y
+
+[Vega cards]
+The Vegas can end up anywhere from 1900-2450 depending on if it's a 56 or 64, a reference card or not and your choice of clocks and modded mem timings. With the AMD Memory Tweak tool, we have an additional dimension to play with, and there this section has been expanded into a separate Vega tweaking document with full examples of how to bring different Vegas to their maximum potential with this miner. If you only want a quick overview of the interesting configurations for a Vega: try 14+14, 14-14, 14*14, 15+15, 15*15, 16+14, 16*14. You can also try 16+15, 16*15, 15+14, 15*14, etc. The mem clk is very important, and you should aim for as high as possible while keeping your rig stable. If you have a Vega 64 and don't mod your timings, a higher core clk will have a significant effect on the hashrate, but tweaking timings is much more efficient. The 16+14 configuration will often not show its true capability before hitting 1500 core clk. Your power draw should still stay reasonable (as in lower than other miners at more standard clocks). For a lower core clk around 1408, some cards do best with 16+14, others with 15+15, some with 14+14, YMMV. Again, with modded timings you can keep your core clk around 1408 and still hit a very high hashrate.
+
+[Older cards]
+We're sorry, we only support 470-580, 550/560 and Vega cards. There are reports of people successfully running the miner on Fiji and Tonga cards (R9 290X etc), but we do not test on said devices.
+
+Benchmark results (CNv8 results, somewhat outdated but still indicative)
+------------------------------------------------------------------------
+For most Polaris cards below, one-click Pimp My Straps in SRB Polaris Bios Editor has been used for mem straps.
+
+6 x Rx 470 8GB (Samsung mem) rig
+8+8, 1250/900 cclk, 2000/900 mclk, 6105 h/s, total rig 685W
+
+Rx 560 4GB (Samsung mem)
+4+4, 1230/900 cclk, 2050/900 mclk, 540 h/s, unknown power draw
+
+Rx 570 8GB (Samsung mem)
+8+8, 1270/900 cclk, 2100/900 mclk, 1030 h/s @ ~100W at wall
+
+Rx 580 8GB (Hynix mem)
+8+8, 1250/900 cclk, 2000/900 mclk, 1029 h/s @ ~105W at wall
+
+Vega 56 reference card (56 bios, ppt mod)
+16+14, 1413/880 cclk, 935/880 mclk, ~2000 h/s @ ~197W at wall
+
+Vega 64 liquid cooling
+15+15, 1408/880 cclk, 1100/880 mclk, ~2100 h/s @ ~190W(?) at wall
+16+14, 1560/925 cclk, 1100/880 mclk, ~2270 h/s @ ~210W(?) at wall
+