Proof:
Ethereum Stock
Wolf's Ethereum Miner
For those of you at work, it's a rough 9% increase in performance. 28 MH/s (stock) vs 31 MH/s (Wolf).
This was also done with the stock VBIOS on the cards - not modified.
Features:
* LDS Usage Eliminated:
Originally, Ethash made heavy use of LDS - removal of it ensures a latency decrease and, in theory, a reduction in power consumption.
* Compact:
Fits in the 32 KiB code cache - currently at 21,752 bytes.
* VPGR Count Reduction:
Waves in flight can be pulled alongside the original. Originally, Genoil's optimized miner was only able to achieve three waves due to using around 80 VPGR's (256 total). Wolf's Ethereum Miner uses an exact 64.
* Cycle Reduction:
While not as important in some algorithms (GCN masks the latency with other wavefronts), it provides a modest increase in performance.
* L1 Cache Bypassed on DAG Reads:
Since accesses are already scattered, there is no sense in caching them. Originally, Ethash was not able to access the bit used for it in load instruction.
Currently, only available for Hawaii - others can be done upon buyer's request.
Price: Available upon request - price is dependent on many factors. Interested farm owners can contact OhGodAGirl or Wolf0
Note: NVIDIA cannot be supported (not to the same degree). Their instruction set architecture is not documented. For the correct price, minor optimizations can be attempted, but buyers must understand that the tools are limited to begin with.
Special Thanks: Thanks to Genoil, who made a better ethminer to start with, improving the host code which I based off of. I modified his to load my custom GPU binary (entirely new code for the GPU, only loaded by the ethminer.)
Ethereum Stock
Wolf's Ethereum Miner
For those of you at work, it's a rough 9% increase in performance. 28 MH/s (stock) vs 31 MH/s (Wolf).
This was also done with the stock VBIOS on the cards - not modified.
Features:
* LDS Usage Eliminated:
Originally, Ethash made heavy use of LDS - removal of it ensures a latency decrease and, in theory, a reduction in power consumption.
* Compact:
Fits in the 32 KiB code cache - currently at 21,752 bytes.
* VPGR Count Reduction:
Waves in flight can be pulled alongside the original. Originally, Genoil's optimized miner was only able to achieve three waves due to using around 80 VPGR's (256 total). Wolf's Ethereum Miner uses an exact 64.
* Cycle Reduction:
While not as important in some algorithms (GCN masks the latency with other wavefronts), it provides a modest increase in performance.
* L1 Cache Bypassed on DAG Reads:
Since accesses are already scattered, there is no sense in caching them. Originally, Ethash was not able to access the bit used for it in load instruction.
Currently, only available for Hawaii - others can be done upon buyer's request.
Price: Available upon request - price is dependent on many factors. Interested farm owners can contact OhGodAGirl or Wolf0
Note: NVIDIA cannot be supported (not to the same degree). Their instruction set architecture is not documented. For the correct price, minor optimizations can be attempted, but buyers must understand that the tools are limited to begin with.
Special Thanks: Thanks to Genoil, who made a better ethminer to start with, improving the host code which I based off of. I modified his to load my custom GPU binary (entirely new code for the GPU, only loaded by the ethminer.)