Posts Tagged ‘search’
Healthcare Bill Will…
Curated Information: The Next Big Thing?
That is what Wolfram|Alpha hopes to get to. The brainchild of Mathematica and other projects, Stephen Wolfram, is aiming at delivering a more relevant and accurate information than current day search is not capable of achieving , according to a reporting by The Independent and Wolfram’s blog. Frequenters of this space know that I have a strong desire for seeing information being made more useful. The observation that the explosion of content that is being rapidly added into the already vast Null Information was begging for the delivery of service that can accomplish what the Wolfram Alpha is presumed to be capable of; that is, making progress “towards what many consider to be the internet’s Holy Grail – a global store of information that understands and responds to ordinary language in the same way a person does.” That way quantity of information becomes irrelevant. Quality as measured by relevance will be king. Indeed, the idea of incorporating the old and the new guard is what I find to be striking about the promise of this service. The internet provides an easily malleable platform that can bring to bear diverse sets of resources to partake in instant and collaborative sourcing of information. At the same time, there is the age old model of value of information being tailored by the origin. Expertise had and continues to have a role to play. I believe the tackling of the Null Information is one that will require the division of labor in evaluating and pricing, if you will, information. To that end, the folks at Wolfram Alpha seem to get it. They hope to leverage the knowledge of the experts in curating information. In this type of environment, where the old and the new are harmoniously integrated, the notion of publishing will continue to evolve and thrive. The other interesting evolution is in the ability of machines to interact with humans. Humanities attempt at taking baby steps towards broad proliferation of artificial intelligence lives on…
An excerpt of The Independent report:
The new system, Wolfram Alpha, showcased at Harvard University in the US last week, takes the first step towards what many consider to be the internet’s Holy Grail – a global store of information that understands and responds to ordinary language in the same way a person does.
Although the system is still new, it has already produced massive interest and excitement among technology pundits and internet watchers.
Computer experts believe the new search engine will be an evolutionary leap in the development of the internet. Nova Spivack, an internet and computer expert, said that Wolfram Alpha could prove just as important as Google. “It is really impressive and significant,” he wrote. “In fact it may be as important for the web (and the world) as Google, but for a different purpose.
Tom Simpson, of the blog Convergenceofeverything.com, said: “What are the wider implications exactly? A new paradigm for using computers and the web? Probably. Emerging artificial intelligence and a step towards a self-organising internet? Possibly… I think this could be big.”
Wolfram Alpha will not only give a straight answer to questions such as “how high is Mount Everest?”, but it will also produce a neat page of related information – all properly sourced – such as geographical location and nearby towns, and other mountains, complete with graphs and charts.
The real innovation, however, is in its ability to work things out “on the fly”, according to its British inventor, Dr Stephen Wolfram. If you ask it to compare the height of Mount Everest to the length of the Golden Gate Bridge, it will tell you. Or ask what the weather was like in London on the day John F Kennedy was assassinated, it will cross-check and provide the answer. Ask it about D sharp major, it will play the scale. Type in “10 flips for four heads” and it will guess that you need to know the probability of coin-tossing. If you want to know when the next solar eclipse over Chicago is, or the exact current location of the International Space Station, it can work it out.
Dr Wolfram, an award-winning physicist who is based in America, added that the information is “curated”, meaning it is assessed first by experts. This means that the weaknesses of sites such as Wikipedia, where doubts are cast on the information because anyone can contribute, are taken out. It is based on his best-selling Mathematica software, a standard tool for scientists, engineers and academics for crunching complex maths.
“I’ve wanted to make the knowledge we’ve accumulated in our civilisation computable,” he said last week. “I was not sure it was possible. I’m a little surprised it worked out so well.”
The Art of Building a Successful Social Site
An insightful and impressive information on the ingredients necessary for the making of a successful venture in the social media space. It is regarding Stack Overflow, a free question and answer site built by developers for developers that has fostered a strong and committed online community in under one year. It was founded by Joel Spolsky and Jeff Attwood in an attempt to fill in a void that is not successful exploited by search engines. It is the perfect example of a service built to expand reaches in the vast and largely unexplored Null Information. Below is an excerpt from the article:
Why Search Engines are Failing when it Comes to Collaborative Sites
According to Spolsky, there are certain reasons why search engines are failing when it comes to Q&A sites, and they are the same issues Stack Overflow is trying to solve.
- Sign-up scams: Sites that a search engine may send you to where you must first sign up and pay, if you want an answer.
- Register: A “road bump” that many sites have, and one Spolsky thinks reduces participation dramatically
- Wrong answers: When searching for highly technical questions, a search engine may send you to a forum that has multiple answers. If you are unsure which answer is the correct one, you waste too much time working through the wrong ones.
- Obsolete results: Google, for instance, will oftentimes give an older page priority. In turn, the page you are served is often outdated and no longer relevant.
How did Stack Overflow address this issue? By applying and implementing what they call “The Nine Building Blocks of Social Engineering” in an effort to create a site that was anthropologically correct and would encourage people to behave in a way that would work.
Below is a talk given by Joel Spolsky on this issue:
The Search Pie Chart
Is the Internet Almost Full?
So asks Seth Godin in a post that explores the implications of the explosion in content creation and dissemination. Below is an excerpt of the post:
… Of course, the decentralized nature of the net means that it will never be physically full. As long as we can keep making hard drives, we won’t run out of space to store those inane videos of your Aunt Sally. What is full is our attention.
Ten years ago, you had a shot of at least being aware of everything that mattered. Five years ago, you had to be really selective about what you took in, but at least it was possible to know what you didn’t know. Today, it’s impossible. Today, you can’t even read every article on a thin slice of a thin topic.
You can’t keep up with the status of your friends on the social networks. No way. You can’t read every important blog… you can’t even read all the blogs that tell you what the important blogs are saying.
Used to be, you could finish reading your email, hit “check email” and nothing new would show up. Now, of course, the new mail is probably a longer list than the mail you just finished processing.
The internet isn’t full, but we are.”
This is a wonderful observation. I do not think that this particular topic is properly examined. Of course, an observation that was previously made about Null Information concerns exactly with this issue that Seth is referring to. He is making note of the tremendous amount of content, and information about the content, that is being generated, and the relative scarcity of our own time and attention to make use of it. Indeed, even with the increased productivity and prolonged work-day, we have a limited amount of time and attention to be paid. On the contrary, what goes into the Null Information is rapidly expanding and explosively growing. One is a finite resource the other has an infinite room for growth, especially given how storage is becoming cheap and increasingly portable.
There are two issues this raises to me. One is the need for coming up with drastic evaluation, assortment and synthesis of information. This could be in the form of increased efficiency and exactness in location, indexing and delivery of relevant information (e.g. better search algorithms), or by tackling it with renewed vigor by the age-old approach of division-of-labor and specialization. More so than at any time before, there is a need for development of expertise, not necessarily only in filling once brain with an amount of knowledge about a body of information, but also in locating where the information is/how it can be utilized/who can benefit most from it/ and making it readily accessible. The distributed nature of this task could allow for taking piece by piece on the behemoth that Null Information is becoming.
The other issue that this paradigm reminds of me is the limited nature of our foresightedness. Sure enough we have an explosion in the amount of information that is being made available and we are seeing a big danger sign staring us right in the face. But, my sense is that people in the old ages, when printing press was first discovered and made popular, were probably thinking of the same thing; fearing the rapid dissemination of all the information and knowledgethat is good and evil. Looking at it from today’s perspective it is difficult to argue that we have not fared well by the advent of the printing press, and we express little concern about all the books that are getting published in masses and filling up library shelves. The society as a whole has gradually figured out a way of filtering out, although still highly modulated and manipulated by marketing rather than substance of the books, what is relevant and important.
I think, in the long run, the issue is not going to be so much that there is just too much information out there that is beyond our ability to pay attention to, but it is rather about people, companies and society, at large, figuring out a way to organize and make infomation relevant and useful to the seeker.
Null Information

A Google Search for "information" on 12/17/08
A given individual is at most going to be able to check out information contained in the first few pages of this result. Let us assume that the Google search engine is an all-too-powerful one that scavenges all over the internet and returns all the relevant information for a given search inquiry. Let us also say that that one would have the patience to review and digest information from the first 1000 results, which would be 100 pages of search results. I understand that almost no one does this. But, for the sake of argument, let assume that it happens. If that is the case, the 1000 search results would be less than 0.00004 % of the overall outcome. The remaining 99.99994% of the information returned is what I would consider to be Null Information. It is not that this astounding amount of content does not have any intrinsic value. It is just that it has not been properly tapped into or made relevant to what is currently being searched for. To put it in perspective, let us consider the population of the U.S. On 12/18/08 at 01:14 GMT the population of the U.S. is projected to be 305,904,346. The equivalent of the search result would be taking ~ 100 people to be representative of what the American people are like. The remaining 305,904,246 people are completely left out of this outcome. All the intricate and unique information about these people is not really accessed or accounted for. This body of knowledge is what is the Null Information. It can be argued that it is possible to come up with 100 Americans that are a good representative of the general population. However, we can agree that whoever decides who gets to be one of the hundred better be really good in order for the outcome to have the most consensus. There are a few ways to ensure that this selection process come as close to being good as possible. One is to go ahead and undertake a massive census of all the people of the country and find out about the characteristic traits of each and every American; if not all as, many as possible. Endowed with this knowledge one can synthesize the data to filter out the most common traits that are shared by the people and select the top 100 people having them. This requires that a centralized entity undertakes such an effort. This would be the equivalent to how search is evolving, where the likes of Google are amassing as much personalized information as possible so that they can customize the delivery of services to the interested party. The other option is to leave it up to the people of the U.S. to decide who get to be their representatives. This would be much like the elections, but with an important twist. The election is not to select a member of the House of Representatives or the Senate, but the equivalent of selecting 100 presidents with as close to a 100% participation rate in the election as possible. Already one can see how messy of a process this can be. If it can be pulled off, this would give a direct platform for the people to make their desires be known. It also provides a direct access to the collective individualized-wants-and-needs of the people, and would truly provide for as an accurate a representation as possible. There are information retrieval approaches developed around this concept. The advent of tagging of information and bookmarking are attempts in creating a platform for democratic expression of interests and desires. Using this gathering of information, entities can then provide services back to the participants that are directly applicable.
The decision as to which method is the most effect is a difficult one to make. On the one hand, the first approach puts all the power of decision making on one entity. Although such an entity can aspire to take into account the ways in which one piece of information is more relevant than the other (in the case of Google the taking in to account not only the content of pages, but also the degree of inter-linking between the other pages and the one in consideration), the final decision remains exclusively with that entity. Reliance on a method of deciphering relevance through a particular approach has its own pitfalls as it gives incentive for other parties to come up with a way to essentially game the system. A very good example of this could be the budding industry in “search engine optimization.” Incidentally, a search for “search engine optimization” by itself returns > 27 million results. On the other hand, the process of allowing participants to express their interest is an involved process that is prone to high level of inefficiency and difficulty in consensus building. If there is not a 100% buy in to the process of selection, then the same problem of few deciding what is best for all is encountered. In the long run, both approaches will likely continue to develop and find ways to stay useful for the masses. Either way, there remains a core problem in tapping into the Null Information. Whether it is for companies to acquire as much personalized data as possible so that when individual X makes an inquiry about Y, (s)he is not inundated with billions of irrelevant data wasting the resources required for storage, retrieval and review of data or the case of building a large-scale platform for as many people as possible to participate in a democratic process of self-expression, there is a lot to be done about the Null Information. There is an astounding need for not only search for information, but also sorting and customizing it. As search technologies become more powerful, this may be accomplished with ease granted that there is an existing data connecting individual with pieces of information. There is also a need for providing venues for expressions of one kind or another. With the rapid rise in content creation and dissemination, it is going to be a battle fought over a long period of time with the aforementioned entities playing catch up with the population at large. Now that is a battle for the ages!

