Search
The Way of the Software Engineer

Web companies - ad revenue based media companies in particular - have amassed great amounts of data on their users. However, they’re data rich and problem poor. Access and error logs are everywhere, but generally they’re not mined for anything more than simple metrics. I’ve used several log analyzers and web trend trackers like HBX and they all produce pretty graphs but these just bring up more questions than they answer. In part this is because people haven’t fully formed the questions they want answered, but mostly because they just graph data instead of actually analyzing the data.

If I’m a business owner trying to find which group of my customers make me the most money so I know where to spend my marketing budget, I don’t want the IPs they’re coming from, I want to know that they’re small businesses in Virginia who generally order small quantities of my product frequently and pay by MasterCard. A human can do this by spending a few minutes typing in SQL commands, but when the data set is very large or the grouping very subtle a tool becomes necessary.

Most programmers are familiar with forms of pattern matching with either regular expressions or other token parsers, but these systems don’t deal well with “noisy” data. For example, how would you write a regular expression for a hand written alpha numeric character?

In the 60’s the answer to this problem was “neural networks”, and they were supposed to solve all human decision type problems from expert systems to natural language processing (NLP). When neural networks fell short of their promises, they were given a negative stigma that remains to this day ( Dr. Marvin Minsky discussed this a few years ago ). Ironically, the problem that caused the stigma was solved only a few years later (several times in fact) with back propagation. I don’t know of any wide spread applications of neural network classifiers outside research institutions other than the now-defunct Apple Newton, but it’s a totally viable method of problem solving.

Perhaps that’s why I’m writing this. Neural nets aren’t going to suddenly sprout a HAL, but they should be given their rightful place in the programmers toolbox.

Generally neural networks are trained by a teacher (”feed forward”), meaning you need to have an answer to compare with your results in order to train the weighted synapses of the network. However, several methods have been proposed for building neural networks for unsupervised learning. Unsupervised learning methods train based on their input alone, but we’ll talk about that later.

Something to say?

You must be logged in to post a comment.