Distributed Machine Learning with Apache Mahout

This article introduces Mahout, a library for scalable machine learning, and studies potential applications through two Mahout projects. It was written by Linda Terlouw. Linda is a computer scientist who works on Data Science (Data Analysis, Data Visualization, Process Mining).

Apache Mahout is a library for scalable machine learning. Originally a subproject of Apache Lucene (a high-performance text search engine library), Mahout has progressed to be a top-level Apache project. 

While Mahout has only been around for a few years, it has established itself as a frontrunner in the field of machine learning technologies. Mahout has currently been adopted by: Foursquare, which uses Mahout with Apache Hadoop and Apache Hiveto power its recommendation engine; Twitter, which creates user interest models using Mahout; and Yahoo!, which uses Mahout in their anti-spam analytic platform. Other commercial and academic uses of Mahout have been catalogued at https://mahout.apache.org/general/powered-by-mahout.html.

This Refcard will present the basics of Mahout by studying two possible applications:

  • Training and testing a Random Forest for handwriting recognition using Amazon Web Services EMR AND
  • Running a recommendation engine on a standalone Spark cluster.

In this article there are 10 sections:

  1. Introduction
  2. Machine Learning
  3. Algorithms Supported in Apache Mahout
  4. Installing Apache Mahout
  5. Example of Multi-Class Classification Using Amazon Elastic MapReduce
  6. Getting and Preparing the Data
  7. Classifying From Command Line Using Amazon Elastic MapReduce
  8. Interpreting the Test Results
  9. Using Apache Mahout With Apache Spark for Recommendations
  10. Running Mahout from Java or Scala

To check out all this information, click here

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 3161


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service