MLlib is Spark’s machine learning (ML) library. It’s goal is to make practical machine learning scalable and easy.

I tried to make a complete step by step classification example using the Iris flower data set  using the BeakerX Jupyter kernel which covers the following steps

  • Setup
  • Data Preparation
  • Testing and Prediction
  • Validation

The example is written in Scala but you could use any other language which is supported by the JVM.

My example can be found the this GIST

I leave it up to you to replace the classifier with e.g. NaiveBayes or with a MultilayerPerceptronClassifier. The MLib Programming Guide contains the right level of information and is easy to use.

 


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *