Conversation
|
This is perfectly acceptable solution since you described how you build it. The fact that it does not learn is not a problem but you are doomed to be sub-optimal in such case since the optimal solution requires an initial phase of exploration followed by pure exploitation phase. Not that this hand-crafted solution might be the base for a more advanced solution and you can propose another entry. I'll merge and evaluate. Maybe I'll re-use the seed from the other entry to make things fair. |
|
Using seed 78, I got 13.71 ± 0.46 |
|
To answer your question about BG/HC/cerebellum/cortex, the goal is also to find whether they are necessary or if we can find some original structure that solved the task optimally. |
|
Thank you. To perform optimally on this specific problem it is enough to have a bias for going around the arena anti/clockwise and starting to perform sharper turns leading the agent through the mid section once it encountered reward. Implementing the bias is easy; going through the midsection requires a little fiddling with a second thresholded steering mechanism that draws you towards the next wall; to activate the second turning mechanism you just need a self-exciting neuron that can be switched on; to detect when energy levels rise you need a delayed memory cell to compute the difference between current and previous energy level. Takes about 20 non-zero weights to implement. But I don't feel that this would be a satisfying solution because there is no learning at all. All the intelligence (learning and reasoning ability) lies within the developer. Francois Chollet's ARC challenge highlights this problem by using a private test set so developers can't "meta-overfit" their approaches to the task (domain). With your challenge, I like that you have these hard constraints on the network size / compute budget (something Chollet picked up, too, after OpenAI kind-of-brute-forced his challenge with o3 high). However, I think that there should be a little more freedom with network design to encourage more complex solutions, e.g., having multiple leak rates / neuron models and allowing synaptic short-term plasticity. The unconstrained activation function already opens up many possibilities but you don't get access to synaptic variables there. |
|
Actually, you can have multiple leak rates if you use an array instead of a constant. As for the neuron model, the goal is really to check how far we can go with a single neuron model. If I allow more exotic models, there's risk to solve the task inside the neuron instead of using the population. Still, you can do a lot with this kind of neurons. See for example A Robust Model of Gated Working Memory, figure 3 that implements a working memory using 3 neurons. To solve the task optimally, it is not sufficient to circle around. you really need to explore in order to find the source and the exploit this knowledge to take the shortest path. If your model can implement this "algorithm", then you only need a working memory. |
|
Yes, that is true. However, I didn't even consider to use multiple leaks, since the wording in the challenge and the param specification seemed to rule this out. You can certainly do a lot with simple neurons. I'd only object that you potentially rule out many interesting and valuable solutions with this constraint. Technically, you can already do arbitrary computations in the activation function. Someone picked up the same idea I described for achieving an optimal score. I posted my solution for completeness' sake there: #8 (comment) |
|
I added your new solution and credited you and @snowgoon88 (is that right?) |
tl;dr I "hacked" the task with a simple heuristic. I may try the same with the next task, too.
Expected score: 13.5
I began with the idea of building a network inspired by the fly’s compass system---a ring attractor network that tracks heading direction (cf. https://doi.org/10.1038/s41593-024-01766-5). The compass was planned to integrate distance sensor data during movement, allowing me to construct an allocentric representation that could then interface with a memory module.
While developing this network, I realized that the multiple processing stages required (at minimum: recurrent excitatory and inhibitory components + recurrently connected shifter populations) paired with the inherent delays of continuous-time neurons (with stable, i.e., slow dynamics) would introduce significant lag in steering. (Compensating for that lag would have required an additional cerebellum-like forward model. In a way, this provided a nice exposition of the kinds of problems faced by Evolution.)
When refactoring my first draft, I noticed that you shouldn't take the full sensory input for immediate steering decisions, since you only need to avoid the wall closest to you. Further, only the left- and rightmost distance sensors are necessary for this. So I stripped away all of the complexity and replaced it with an extremely simple heuristic: threshold the outermost sensor inputs and steer away from whichever side reports the closer wall. The result is an agent that only circles the outer lane of the arena. Because the respective network only has two parameters, you can discover a very good agent by random sampling.
At least for me, this raises the question whether such a handcrafted solution will be considered valid for the challenge (I read the challenge as primarily about learning, not handcrafting). Even if it is disregared as a solution, I find it a useful contribution because it (a) may be a useful baseline for more advanced models and (b) shows the "true" complexity of the current task (i.e., how "hackable" it is), and could stimulate a discussion about what counts as handcrafted / (in)valid entry. After all, this solution still respects the challenge’s “no more than 64 constants” rule. I could put a fancy optimizer on top of this network topology, but the found algorithm would basically remain the same.
The original solution I envisioned would have been much more complex and would have required a lot of time for hyperparameter tuning and network design. Would it be a more positive outcome if I built a mini compass system plus hippocampus, cerebellum, and basal ganglia just to solve this simple task? I'd wager a more complex solution probably would generalize to a new unknown problem setting as well as a DeepQ network playing Breakout with reversed colors.
I’m curious to see whether the next task will still be “hackable.” I feel, it's a very tight balance between a solvable---given minimal resources---yet complex task.
*I also built a slightly more complicated heuristic that adapts to the randomly chosen reward position, but since it cannot be effectively optimized via random guessing and it seems to be completely missing the point of the challenge, I omit it here.