I don't think that an LLM could do it. But the mechanics of baba is you should be in the training set, since it's a relatively well knoun indie game.

I feel like the amount of data required to train any neural network would be larger than all the levels that currently exist for baba is you

you'd probably just end up overfitting the hell out of your model

But the mechanics are explained in text on the internet.

that would require an LLM then, but also multiple full walkthroughs are explained in text on the internet, so how would you be sure it was figuring stuff out by itself?

As I said: I don't think an LLM could do it (since LLMs can't reason). Just saying that it wouldn't have to deduce the mechanics from a single screenshot.

I'm saying that if you're attempting to parse the mechanics of play by shoving in the whole internet and saying "well the instructions are in there somewhere" then the best tool for that is an LLM.