JOONE is a toolset used to build and run neural networks in Java. To demonstrate its capability, I’ve built a simple supervised network and trained it on a common data set used for other machine learning projects. By using a common data set, comparisons can be made between the different approaches.
The data set was published by the Audubon Society Field Guide and describes the characteristics of mushrooms found in North America. The version I’m using was compiled by the UCI Machine Learning Repository. It contains 8124 records (one record per line) with its classification and each of the 22 mushroom characteristics represented by a character value in a comma separated list. The first value describes the poisonous or edible classification.
p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u
e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g
JOONE requires semicolon separated numerical values for input, so I replaced each character value with its alphabetical position and changed the commas to semicolons. Missing values were given the value 27.
16;24;19;14;20;16;6;3;14;11;5;5;19;19;23;23;16;23;15;16;11;19;21
5;24;19;25;20;1;6;3;2;11;5;3;19;19;23;23;16;23;15;16;14;14;7
The network has three layers: 22 input nodes, 10 hidden nodes, and a single output node. If the output node is 16 (p), the mushroom is classified as poisonous. If this node is 5 (e), it is classified as edible. The hidden nodes and output node have a sigmoidal activation function. The network is trained on the first 3000 elements of the data set using JOON’s built in back propagation functions and a Root Mean Squared Error (RMSE) function. The remaining ~5124 nodes can be used in verifying the application. Running in training batches of 10,000 iterations (epochs) and storing a serialized representation of the network to disk every 100 iterations allowed fine grained monitor the progress of the application and ensure net trained network could be recovered in the even of a crash.
Serialization is a mechanism where an object in memory is converted into a portable form (XML in this case) so it can be later retrieved and the object restored to memory exactly as it once was. In this case, we are using the ’serializeable’ java interface to store a neural network that contains the network diagram, weighted synapses, and trainer (error).
The error after the first 100 iterations was ~5%, and decreased to 4.25% after 50,000 iterations. While this is rather slow, the error is still decreasing and could be within acceptable levels with a few million iterations.
Further research should be done on the design of the network and its training. Adding another layer or changing the number of hidden nodes could converge more quickly. The serialization mechanism could produce an easy way to distribute and parallelize the training. If the current RMSE of the network were stored along with serialized net, a node could determine if its error is less than the current “best” for a group of nodes. The node with the lowest error would write a new serialized net and global error file and nodes with greater error would use the least error net to continue training.
Here are the files required to continue developing this network:
I tried execute your code but doesn’t work.
I trained and when I try test the net, the answer are wrong.
I normalized the input and ask with normalized data too.
Do you have some advance with the joone using this dataset.
My code work using the XOR dataset.
Thanks!
Left by marmundo on June 22nd, 2009
Thanks for trying it out!
My code is designed to work with JOONE 2.0.0RC1. Are you using this version? Is it giving you an error that you could send me? The input data should not require any pre-processing. Can you send me what you’re trying?
Left by admin on June 22nd, 2009