Avoid explore-exploit

Avoid explore-exploit#

Estimate value#

Why do we use the “explore-exploit” dichotomy when there is much better language to describe our goals?

We often look at the “explore” half of explore-exploit as collecting new tools (e.g. discovering new slot machines). As long as you’re only collecting new tools rather than using them, you’re exploring.

It’s easy to analogize this to the training vs. inference dichotomy. When you’re training you’re exploring (training is often analogized to exploring via gradient descent or ascent). When you’re running inference (no weight updates) then you’re presumably only exploiting.

The issue is that the world is not this binary. Consider what the distinction between “explore” and “exploit” is in the context of searching the web; see Search the web. Is reading the exploit, relative to the “explore” of collecting a list of documents to consider? Most would considering reading exploratory as well.

Instead, we’d define “explore” and “exploit” relative to how novel the content we are looking at is. If we’re doing a web search for a new term, it’s more explore. If we’re doing a web search for something we already know how to do (e.g. look up an article we’ve already read), it’s exploit. If we’re reading an article with many concepts we’ve never seen, it’s explore. If we’re reading an article with many concepts we’ve seen and even seen in the recent past, it’s exploit. In producing a list of web articles, to favor those with language we’re already familiar with would be to lean towards exploration rather than exploitation.

Other times we use “explore” to mean searching for value (not the same as learning). Finding an article that is “valuable” to us means that we’ve found an article that will help us achieve something valuable; it’s not valuable in itself. We could define the state of having an article that helps us achieve a goal as valuable, but most would say it only has instrumental value.

Sometimes we say that to focus on something is to explore it (e.g. digging in deep), but other times narrowing our focus is to exploit (take advantage of a good article we found).

To learn something new sounds like the epitome of novelty, but it takes focused work (discipline, directed research), not necessarily something someone would associate with exploration. Whenever we learn something new we must build from everything we already know i.e. use only our existing language. To do so may require reading an article with concepts we understand rather than an article with many new words, requiring more work to fill the transitive dependencies.

In general explore and exploit are not so strictly divided (so binary). Sometimes we are doing something more explorative relative to what we were just doing, but are still not being as explorative as we could be. It’s hard to turn explore and exploit into gradual terms; to say something is “more explorative” is rather long-winded.

The language of a “domain” is also rather imprecise; it may be better to switch to tasks. You can fit many tasks into a particular domain (such as “Clear” in the Cynefin framework), but each may may also have aspects that fit into different domains. It’d be better to reduce the language to tasks, which are more precisely or at least can be more precisely defined.

In general, the language of explore-exploit is too imprecise.

Design test#

Stop using “explore-exploit” in your own language or notes.

Estimate cost#

It’s easy to limit your vocabulary; you should eventually forget this word pair by simply not using it.