This is part one of a multi-part series. The series is continued, here
The Semantic web has been touted has Web 3.0 and the next evolutionary step in internet communications. A Scientific American article from 2001 told us what we should expect of a semantic internet. This is how devices should communicate and how information should be passed to people and devices. Somehow everything we own should be able to understand its surroundings and make meaningful decisions based on current events. The method described is a brute force approach where software that wants to participate in the semantic web will have to add fantastic amounts of meta data so the ‘agents’ will have a basis for making these decisions.
There’s the problem. No one is going to re-catagorise the entire internet, so there needs to be an automated way of doing all of this. Organizing information has been a field of study for hundreds (if not thousands) of years. The term ontology was coined in the early 17th century and has been applied to this problem. Ontology Languages facilitate connections between words to form ideas. If you know that a ‘cup’ and a ‘mug’ are both ‘containers’ and that ‘milk’ is a ‘liquid’, you should be able to determine the context of a statement such has “A cup of milk”.
Ontologies help to simplify the problem, but they don’t solve it. It’s still a brute force approach where much of this information needs to be input by hand. Additional information is still needed to overcome words having double meanings or where the meaning is determined by other context. Trying to solve natural language problems with keywords alone is impossible.
Newer research is focusing on pattern recognition to find relationships between documents. Pattern recognition systems have been applied to text before, but mostly in an academic setting. Internet entrepreneurs are finding that this is a marketable field. Several new search engines are trying to apply these algorithms to organize search results and make finding what you want easier. In practice, the results you find through these engines aren’t much better than using google or yahoo.
Dr. Riza C. Berkan, founder of Hakia.com, is a pioneer in this field and
admits that this search engine system is in its infancy and works poorly with short queries (which are much more common). Hakia categorizes results and through the user clicking on links produces a longer query with more precise results. With the short queries most people present to search engines, there isn’t enough context to discover a useful meaning to many of the terms. As far as I can tell, this is still a supervised learning method. There is, however, another field where these same techniques are being employed.
Click here to continue on to part II