Journey to a faster Gear Optimizer

Guild Wars 2@lemmy.wtf – 1 points – 1 years ago

parallelization.discretize-optimizer.pages.dev

Hey fellow Tyrian Lemmings,

my first Post on Lemmy, hurray \o/ !
I have dedicated a significant amount of time to enhancing the performance of the Discretize Gear Optimizer. It now supports multi-core processing for calculations and heuristics, enabling the simultaneous calculation of large amounts of Runes, Sigils, and Food!

What is the Gear Optimizer? In short, it runs damage calculations in the game using specific scenarios to determine the ideal gear combinations.

Unfortunately, due to known cirucmstances I am unable to provide links to previous Reddit posts regarding beginner-friendly details and explanations. However, there are a few noteworthy videos I'd like to mention:

https://www.youtube.com/watch?v=6HfjKorDWP4 by REMagic
https://www.youtube.com/watch?v=2vVbzzmoq5E by Connor

You can access the work-in-progress version here: https://parallelization.discretize-optimizer.pages.dev/?m=fractals Code is freely available here: https://github.com/discretize/discretize-gear-optimizer/tree/parallelization

The details are technical, so I don't expect individuals without a background in computer science or coding to fully understand them.

Details

Previously, the Optimizer utilized a single-threaded approach that exhaustively enumerated all possible combinations, calculated the damage/sustainability/healing values, and stored the top 50 results in memory. This caused the main thread to be blocked, resulting in a UI update rate of only 15 FPS whenever a progress update was yielded. Far from optimal.

The new approach incorporates three interesting concepts that significantly speed up the calculations:

Usage of Webworker (multiple threads)
Core calculations in Rust compiled to WebAssembly
Heuristics for removing dead weight combinations that likely are not gonna be useful

Various useful options can be adjusted in the optimizer's UI settings.

WebWorkers

Currently, deploying WebWorkers is the most widely available method for websites to utilize additional hardware threads. The old algorithm was designed in a way that prevented parallel processing. Thus, I had to devise a method to distribute the workload among WebWorkers. Each work chunk must be independent to avoid synchronization overhead.

To address this, I introduced the concept of an "Affix tree." Each Gear Slot can accept different affixes. For example, the helm can be Assassin, Berserker, or Dragon, resulting in three valid combinations. When these three choices are added to the next level (Shoulders), we end up with 3 * 3 = 9 combinations. By repeating this process for all 14 gear slots, we end up with 3^14 = 4,782,969 combinations. This may not sound too overwhelming, but let's reconsider it with 5 affixes: 5^14 = 6,103,515,625. Quite spicy, indeed.

However, there are even more combinations for extras. Assuming we choose one rune, one sigil1, one sigil2, two nourishments, and two enhancements, we get 1 * 1 * 1 * 2 * 2 = 4 combinations. For 5 affixes, this amounts to 24,414,062,500 combinations. Although each combination only takes a few milliseconds to calculate, the total time adds up when dealing with billions of combinations. Hence, the need to crunch these numbers using multiple cores simultaneously.

By utilizing the affix tree concept, we can divide the work into multiple chunks. Each layer of the tree contains (previous layer subtrees) * (current layer subtrees) subtrees. We can simply assign a number of subtrees to each thread, allowing for independent evaluation. In the end, we merge the best results from each subtree to find the global maximum. Subtree evaluation employs a depth-first search, as memory allocations must be minimized due to potentially trillions of executions.

Rust / WebAssembly

JavaScript can be slow, especially when code is not optimized. Rust, on the other hand, requires explicit consideration of memory allocation. The Rust implementation is compiled to WebAssembly (WASM), a form of bytecode similar to the JVM that can be executed by nearly all browsers. Initially, I benchmarked a barebones implementation of traversing the Affix Tree without any calculations and found that the Rust implementation is significantly faster. This gave me hope. The new Rust implementation appears to be between 2x and 5x faster than the JS implementation when running with a single thread, depending on the machine and the specific problem. Moreover, when adding more threads, the performance gains scale nearly linearly.

Heuristics

During discussions with some members of the GW2 development Discord (special thanks to Greaka), I realized that computing every combination of extras (sigils, runes, food) is unnecessary. Around 95% of the extras combinations are likely irrelevant. To address this, I implemented a benchmarking phase where 1000 random affix combinations are tested for each extras combination. We retain the 1000 best results and calculate the frequency of appearances for each extras combination. As it turns out, the benchmark quickly converges to the optimal extras combination. Each run has a slight variance of 2-3%, which is not ideal but sufficient to discard combinations that have 0 appearances in the top 1000 extras combinations. This is the reason why after clicking "calculate" now progress appears. Progress is at the moment only calculated for the actual calculation phase, not the benchmarking phase.

Limitations

Currently, the Rust implementation lacks numerous features:

Stopping/resuming the calculation is not possible
Infusions are not yet calculated
Displaying the best results for each combination (as an option in the result table) is not feasible since we don't calculate all combinations.
Mobile compatibility may vary

Going further

I plan to implement infusion calculations and the stop/resume mechanism and bring the overall UX up to par to the JS implementation.

Additionally, it should be possible to utilize a regression model to directly calculate the optimal gear without the need for brute-forcing. I have pondered this idea but couldn't come up with the required mathematical models as my expertise is limited in this area. If anyone with a background in ML/math is interested in tackling this challenge, please let me know. I would be more than happy to discuss and implement ideas.

Let me know what you think. Maybe you find a bug or two :)

I am of course available for any questions.

Thank you for reading :)

No comments yet. You could be first!