obscurity divers
exploring the trackless depths of the net where the wild
hidden things are
By Chris Koeritz
i've been using a search engine a lot these days. i think we all have. and i've
noticed an odd thing; some questions are harder to get good answers for than others.
perhaps that seems really obvious, since some topics are harder to comprehend than others, and
search engines don't have flawless language parsing skills yet, and so on. but the
weaknesses i'm seeing in the search results don't quite seem grounded in how difficult the
question was, or in the knowledge base assumed by the question. it seems instead that
there are occasionally topics that are just harder to answer by automated means.
if that were true, why would it be so?
using a search engine can still be a frustrating experience for some questions, for
potentially numerous reasons... the information may simply not be visible to the search
engine, but still exist hidden away in the dark parts of the net. or the search could be
blocked by mundane issues, such as when terms in the query are way more popular in other
usages, so that one's search is hijacked by pages and pages of trivia before the result that
might usefully answer the search. or maybe the answer is not even there, anywhere on the
internet, despite the subject matter still being known to humans, some of whom even know the
answer (that is, the internet doesn't know everything yet, and never will).
of course there are topics still unknowable to us tiny humans, like "where did the first atom
come from and how do i get there on a map?", or "how many intelligent species have existed in
the universe so far?". and of course some questions cannot be answered because they
really don't make a lot of sense, such as "how blue is my turnip brain tomorrow
yesterday?". but i posit that there are whole classes of questions that are totally
sensible but still don't yield good search results; i think everyone has seen an example of
this.
so, i thought to myself, i thought, why not try to take more notes when this breakdown in a
search happens, and write down what i'm looking for and how it's missing (what came up
instead, what i actually needed to see, etc)... and then what if i post this information
in some kind of spooling "couldn't find it" blog. it also seems really important to
update this record if i do find the information later, via whatever means, even if i needed to
consult the "real" world. if one posts this "missing data" record up on a blog or
website, and the record is indexed by search engines, then my "hard to search for" question is
also documented.
if people really did this, the process wouldn't necessarily create useful reading material,
nor even usable search results. i probably wouldn't want to read someone else's failed
search terms... not unless they were into some freaky stuff, anyway. but this process is
one method for filling in between the cracks in our knowledge with some electronic spackle,
and
it's a way expand our understanding to cover more topics, reaching everywhere the mind
traces. if we keep a record of the blank places in our online knowledge, maybe we can
eventually fill them in.
there, that's my silly obscurity divers meme. thank you very much for being willing to
rummage around in my old notes. come back any time.
and yes, i admit the title of this article made it seem more exciting than it really
was. that's marketing!
--fred
* maybe a special tag for this spooling page could provide the semantic information needed to
keep these things (spooled failure blogs) from showing up as the results of searches. we
don't after all want the search engine to substitute someone's failed search for an answer to
the search. the spooled failures are initially an absence of results, and then maybe
eventually a sign-post to the results (when people actually go back and fill in the things
they found out. also unicorns!). so the semantic tag embedded in a spooled failure
page would indicate that the information there is purely connectivity, not a result in
itself. search engines could avoid showing any of these search failure pages as actual
search results, but the engine could still build a neural network of associations from the
topics listed on the page, in effect using the page as a hint about understanding more
specifics of the question or even what an expected result would be from such a query.
and then the usual problems of accurately weighting the informational value of these pages
would ensue. perhaps community-ranked afficionados would jump into their meta search
engine managers and rate the value of these interstitial pages, thus providing some feedback
about the search failure page's "quality".
** hmmm, a lot of this work could be done by the search engines themselves, if they offered a
very flexible rating system for how well they answered the search. if the search engine
itself were to take the comments about what was really sought and how it was missing, it could
chew on these things and come back at a later time with other proposals for the search, and
ask the user if these are any better. if the users would put up with it, and rate their
searches somewhat honestly, then the search engine could actually get smarter all the time, or
at least get to know the tricksy humans a lot better. so, google, where's the "rate this
search" button on your search interface, which feeds human responses to a giant AI who tries
to figure out what results are missing from the overall goog database, eh?