September 15th 2016 to January 22nd 2019

obscurity divers

exploring the trackless depths of the net where the wild hidden things are

By Chris Koeritz

i've been using a search engine a lot these days.  i think we all have.  and i've noticed an odd thing; some questions are harder to get good answers for than others.  perhaps that seems really obvious, since some topics are harder to comprehend than others, and search engines don't have flawless language parsing skills yet, and so on.  but the weaknesses i'm seeing in the search results don't quite seem grounded in how difficult the question was, or in the knowledge base assumed by the question.  it seems instead that there are occasionally topics that are just harder to answer by automated means.

if that were true, why would it be so? 

using a search engine can still be a frustrating experience for some questions, for potentially numerous reasons...  the information may simply not be visible to the search engine, but still exist hidden away in the dark parts of the net.  or the search could be blocked by mundane issues, such as when terms in the query are way more popular in other usages, so that one's search is hijacked by pages and pages of trivia before the result that might usefully answer the search.  or maybe the answer is not even there, anywhere on the internet, despite the subject matter still being known to humans, some of whom even know the answer (that is, the internet doesn't know everything yet, and never will).

of course there are topics still unknowable to us tiny humans, like "where did the first atom come from and how do i get there on a map?", or "how many intelligent species have existed in the universe so far?".  and of course some questions cannot be answered because they really don't make a lot of sense, such as "how blue is my turnip brain tomorrow yesterday?".  but i posit that there are whole classes of questions that are totally sensible but still don't yield good search results; i think everyone has seen an example of this.

so, i thought to myself, i thought, why not try to take more notes when this breakdown in a search happens, and write down what i'm looking for and how it's missing (what came up instead, what i actually needed to see, etc)...  and then what if i post this information in some kind of spooling "couldn't find it" blog.  it also seems really important to update this record if i do find the information later, via whatever means, even if i needed to consult the "real" world.  if one posts this "missing data" record up on a blog or website, and the record is indexed by search engines, then my "hard to search for" question is also documented.

if people really did this, the process wouldn't necessarily create useful reading material, nor even usable search results.  i probably wouldn't want to read someone else's failed search terms... not unless they were into some freaky stuff, anyway.  but this process is one method for filling in between the cracks in our knowledge with some electronic spackle, and
it's a way expand our understanding to cover more topics, reaching everywhere the mind traces.  if we keep a record of the blank places in our online knowledge, maybe we can eventually fill them in.

there, that's my silly obscurity divers meme.  thank you very much for being willing to rummage around in my old notes.  come back any time.
and yes, i admit the title of this article made it seem more exciting than it really was.  that's marketing!
--fred

* maybe a special tag for this spooling page could provide the semantic information needed to keep these things (spooled failure blogs) from showing up as the results of searches.  we don't after all want the search engine to substitute someone's failed search for an answer to the search.  the spooled failures are initially an absence of results, and then maybe eventually a sign-post to the results (when people actually go back and fill in the things they found out.  also unicorns!).  so the semantic tag embedded in a spooled failure page would indicate that the information there is purely connectivity, not a result in itself.  search engines could avoid showing any of these search failure pages as actual search results, but the engine could still build a neural network of associations from the topics listed on the page, in effect using the page as a hint about understanding more specifics of the question or even what an expected result would be from such a query.  and then the usual problems of accurately weighting the informational value of these pages would ensue.  perhaps community-ranked afficionados would jump into their meta search engine managers and rate the value of these interstitial pages, thus providing some feedback about the search failure page's "quality".

** hmmm, a lot of this work could be done by the search engines themselves, if they offered a very flexible rating system for how well they answered the search.  if the search engine itself were to take the comments about what was really sought and how it was missing, it could chew on these things and come back at a later time with other proposals for the search, and ask the user if these are any better.  if the users would put up with it, and rate their searches somewhat honestly, then the search engine could actually get smarter all the time, or at least get to know the tricksy humans a lot better.  so, google, where's the "rate this search" button on your search interface, which feeds human responses to a giant AI who tries to figure out what results are missing from the overall goog database, eh?

[posted September 15 2016]