{"id":587,"date":"2018-11-02T13:41:06","date_gmt":"2018-11-02T12:41:06","guid":{"rendered":"https:\/\/www.pschatzmann.ch\/home\/?p=587"},"modified":"2020-11-21T22:22:51","modified_gmt":"2020-11-21T21:22:51","slug":"random-forrest-classifier-in-spark-ml","status":"publish","type":"post","link":"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/","title":{"rendered":"Random Forrest Classifier in Spark ML"},"content":{"rendered":"<p><a href=\"https:\/\/spark.apache.org\/docs\/latest\/ml-guide.html\">MLlib<\/a> is Spark\u2019s machine learning (ML) library. It&#8217;s goal is to make practical machine learning scalable and easy.<\/p>\n<p>I tried to make a complete step by step classification example using the\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Iris_flower_data_set\"><i>Iris<\/i><span>\u00a0<\/span>flower data set\u00a0<\/a>\u00a0using the <a href=\"http:\/\/beakerx.com\/\">BeakerX<\/a> <a href=\"http:\/\/jupyter.org\/\">Jupyter<\/a>\u00a0kernel which covers the following steps<\/p>\n<ul>\n<li>Setup<\/li>\n<li>Data Preparation<\/li>\n<li>Testing and Prediction<\/li>\n<li>Validation<\/li>\n<\/ul>\n<p>The example is written in Scala but you could use any other language which is supported by the JVM.<\/p>\n<p><a href=\"https:\/\/nbviewer.jupyter.org\/gist\/pschatzmann\/a8a3741c20d6a063d7dc6f9d82f9c29b\">My example can be found the this GIST<\/a><\/p>\n<p>I leave it up to you to replace the classifier with e.g.\u00a0<span class=\"nc\">NaiveBayes or with a\u00a0MultilayerPerceptronClassifier. The <a href=\"https:\/\/spark.apache.org\/docs\/latest\/ml-guide.html\">MLib Programming Guide\u00a0<\/a><\/span>contains the right level of information and is easy to use.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>MLlib is Spark\u2019s machine learning (ML) library. It&#8217;s goal is to make practical machine learning scalable and easy. I tried to make a complete step by step classification example using the\u00a0Iris\u00a0flower data set\u00a0\u00a0using the BeakerX Jupyter\u00a0kernel which covers the following steps Setup Data Preparation Testing and Prediction Validation The example [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":590,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_crdt_document":"","_import_markdown_pro_load_document_selector":0,"_import_markdown_pro_submit_text_textarea":"","_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[14],"tags":[],"class_list":["post-587","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Random Forrest Classifier in Spark ML - Phil Schatzmann<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Random Forrest Classifier in Spark ML - Phil Schatzmann\" \/>\n<meta property=\"og:description\" content=\"MLlib is Spark\u2019s machine learning (ML) library. It&#8217;s goal is to make practical machine learning scalable and easy. I tried to make a complete step by step classification example using the\u00a0Iris\u00a0flower data set\u00a0\u00a0using the BeakerX Jupyter\u00a0kernel which covers the following steps Setup Data Preparation Testing and Prediction Validation The example [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/\" \/>\n<meta property=\"og:site_name\" content=\"Phil Schatzmann\" \/>\n<meta property=\"article:published_time\" content=\"2018-11-02T12:41:06+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-11-21T21:22:51+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.pschatzmann.ch\/wp-content\/uploads\/2018\/11\/spark-logo-trademark.png\" \/>\n\t<meta property=\"og:image:width\" content=\"376\" \/>\n\t<meta property=\"og:image:height\" content=\"200\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"pschatzmann\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"pschatzmann\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/2018\\\/11\\\/02\\\/random-forrest-classifier-in-spark-ml\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/2018\\\/11\\\/02\\\/random-forrest-classifier-in-spark-ml\\\/\"},\"author\":{\"name\":\"pschatzmann\",\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/#\\\/schema\\\/person\\\/73a53638a4e34e8373405fd737dac9b1\"},\"headline\":\"Random Forrest Classifier in Spark ML\",\"datePublished\":\"2018-11-02T12:41:06+00:00\",\"dateModified\":\"2020-11-21T21:22:51+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/2018\\\/11\\\/02\\\/random-forrest-classifier-in-spark-ml\\\/\"},\"wordCount\":121,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/#\\\/schema\\\/person\\\/73a53638a4e34e8373405fd737dac9b1\"},\"image\":{\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/2018\\\/11\\\/02\\\/random-forrest-classifier-in-spark-ml\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.pschatzmann.ch\\\/wp-content\\\/uploads\\\/2018\\\/11\\\/spark-logo-trademark.png\",\"articleSection\":[\"Machine Learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/2018\\\/11\\\/02\\\/random-forrest-classifier-in-spark-ml\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/2018\\\/11\\\/02\\\/random-forrest-classifier-in-spark-ml\\\/\",\"url\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/2018\\\/11\\\/02\\\/random-forrest-classifier-in-spark-ml\\\/\",\"name\":\"Random Forrest Classifier in Spark ML - Phil Schatzmann\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/2018\\\/11\\\/02\\\/random-forrest-classifier-in-spark-ml\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/2018\\\/11\\\/02\\\/random-forrest-classifier-in-spark-ml\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.pschatzmann.ch\\\/wp-content\\\/uploads\\\/2018\\\/11\\\/spark-logo-trademark.png\",\"datePublished\":\"2018-11-02T12:41:06+00:00\",\"dateModified\":\"2020-11-21T21:22:51+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/2018\\\/11\\\/02\\\/random-forrest-classifier-in-spark-ml\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/2018\\\/11\\\/02\\\/random-forrest-classifier-in-spark-ml\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/2018\\\/11\\\/02\\\/random-forrest-classifier-in-spark-ml\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.pschatzmann.ch\\\/wp-content\\\/uploads\\\/2018\\\/11\\\/spark-logo-trademark.png\",\"contentUrl\":\"https:\\\/\\\/www.pschatzmann.ch\\\/wp-content\\\/uploads\\\/2018\\\/11\\\/spark-logo-trademark.png\",\"width\":376,\"height\":200},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/2018\\\/11\\\/02\\\/random-forrest-classifier-in-spark-ml\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Random Forrest Classifier in Spark ML\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/#website\",\"url\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/\",\"name\":\"Phil Schatzmann Consulting\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/#\\\/schema\\\/person\\\/73a53638a4e34e8373405fd737dac9b1\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/home\\\/#\\\/schema\\\/person\\\/73a53638a4e34e8373405fd737dac9b1\",\"name\":\"pschatzmann\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/wp-content\\\/uploads\\\/2022\\\/08\\\/pschatzmann.png\",\"url\":\"https:\\\/\\\/www.pschatzmann.ch\\\/wp-content\\\/uploads\\\/2022\\\/08\\\/pschatzmann.png\",\"contentUrl\":\"https:\\\/\\\/www.pschatzmann.ch\\\/wp-content\\\/uploads\\\/2022\\\/08\\\/pschatzmann.png\",\"width\":305,\"height\":305,\"caption\":\"pschatzmann\"},\"logo\":{\"@id\":\"https:\\\/\\\/www.pschatzmann.ch\\\/wp-content\\\/uploads\\\/2022\\\/08\\\/pschatzmann.png\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Random Forrest Classifier in Spark ML - Phil Schatzmann","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/","og_locale":"en_US","og_type":"article","og_title":"Random Forrest Classifier in Spark ML - Phil Schatzmann","og_description":"MLlib is Spark\u2019s machine learning (ML) library. It&#8217;s goal is to make practical machine learning scalable and easy. I tried to make a complete step by step classification example using the\u00a0Iris\u00a0flower data set\u00a0\u00a0using the BeakerX Jupyter\u00a0kernel which covers the following steps Setup Data Preparation Testing and Prediction Validation The example [&hellip;]","og_url":"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/","og_site_name":"Phil Schatzmann","article_published_time":"2018-11-02T12:41:06+00:00","article_modified_time":"2020-11-21T21:22:51+00:00","og_image":[{"width":376,"height":200,"url":"https:\/\/www.pschatzmann.ch\/wp-content\/uploads\/2018\/11\/spark-logo-trademark.png","type":"image\/png"}],"author":"pschatzmann","twitter_card":"summary_large_image","twitter_misc":{"Written by":"pschatzmann","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/#article","isPartOf":{"@id":"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/"},"author":{"name":"pschatzmann","@id":"https:\/\/www.pschatzmann.ch\/home\/#\/schema\/person\/73a53638a4e34e8373405fd737dac9b1"},"headline":"Random Forrest Classifier in Spark ML","datePublished":"2018-11-02T12:41:06+00:00","dateModified":"2020-11-21T21:22:51+00:00","mainEntityOfPage":{"@id":"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/"},"wordCount":121,"commentCount":0,"publisher":{"@id":"https:\/\/www.pschatzmann.ch\/home\/#\/schema\/person\/73a53638a4e34e8373405fd737dac9b1"},"image":{"@id":"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/#primaryimage"},"thumbnailUrl":"https:\/\/www.pschatzmann.ch\/wp-content\/uploads\/2018\/11\/spark-logo-trademark.png","articleSection":["Machine Learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/","url":"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/","name":"Random Forrest Classifier in Spark ML - Phil Schatzmann","isPartOf":{"@id":"https:\/\/www.pschatzmann.ch\/home\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/#primaryimage"},"image":{"@id":"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/#primaryimage"},"thumbnailUrl":"https:\/\/www.pschatzmann.ch\/wp-content\/uploads\/2018\/11\/spark-logo-trademark.png","datePublished":"2018-11-02T12:41:06+00:00","dateModified":"2020-11-21T21:22:51+00:00","breadcrumb":{"@id":"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/#primaryimage","url":"https:\/\/www.pschatzmann.ch\/wp-content\/uploads\/2018\/11\/spark-logo-trademark.png","contentUrl":"https:\/\/www.pschatzmann.ch\/wp-content\/uploads\/2018\/11\/spark-logo-trademark.png","width":376,"height":200},{"@type":"BreadcrumbList","@id":"https:\/\/www.pschatzmann.ch\/home\/2018\/11\/02\/random-forrest-classifier-in-spark-ml\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.pschatzmann.ch\/home\/"},{"@type":"ListItem","position":2,"name":"Random Forrest Classifier in Spark ML"}]},{"@type":"WebSite","@id":"https:\/\/www.pschatzmann.ch\/home\/#website","url":"https:\/\/www.pschatzmann.ch\/home\/","name":"Phil Schatzmann Consulting","description":"","publisher":{"@id":"https:\/\/www.pschatzmann.ch\/home\/#\/schema\/person\/73a53638a4e34e8373405fd737dac9b1"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.pschatzmann.ch\/home\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/www.pschatzmann.ch\/home\/#\/schema\/person\/73a53638a4e34e8373405fd737dac9b1","name":"pschatzmann","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.pschatzmann.ch\/wp-content\/uploads\/2022\/08\/pschatzmann.png","url":"https:\/\/www.pschatzmann.ch\/wp-content\/uploads\/2022\/08\/pschatzmann.png","contentUrl":"https:\/\/www.pschatzmann.ch\/wp-content\/uploads\/2022\/08\/pschatzmann.png","width":305,"height":305,"caption":"pschatzmann"},"logo":{"@id":"https:\/\/www.pschatzmann.ch\/wp-content\/uploads\/2022\/08\/pschatzmann.png"}}]}},"post_mailing_queue_ids":[],"_links":{"self":[{"href":"https:\/\/www.pschatzmann.ch\/home\/wp-json\/wp\/v2\/posts\/587","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pschatzmann.ch\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pschatzmann.ch\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pschatzmann.ch\/home\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pschatzmann.ch\/home\/wp-json\/wp\/v2\/comments?post=587"}],"version-history":[{"count":1,"href":"https:\/\/www.pschatzmann.ch\/home\/wp-json\/wp\/v2\/posts\/587\/revisions"}],"predecessor-version":[{"id":2219,"href":"https:\/\/www.pschatzmann.ch\/home\/wp-json\/wp\/v2\/posts\/587\/revisions\/2219"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.pschatzmann.ch\/home\/wp-json\/wp\/v2\/media\/590"}],"wp:attachment":[{"href":"https:\/\/www.pschatzmann.ch\/home\/wp-json\/wp\/v2\/media?parent=587"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pschatzmann.ch\/home\/wp-json\/wp\/v2\/categories?post=587"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pschatzmann.ch\/home\/wp-json\/wp\/v2\/tags?post=587"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}