Weka has a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
Rapid Miner was developed at Technical University of Dortmund, Germany. It provides a GUI and a Java API for developing your own applications. It provides data handling, visualization and modeling with machine learning algorithms.
Environment for Developing KDD-Applications Supported by Index-Structure (ELKI) is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection.
Massive Online Analysis (MOA) is a popular open source framework for data stream mining, with a very active growing community. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation. Related to the WEKA project, MOA is also written in Java, while scaling to more demanding problems.
Apache SAMOA is a machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms and enables development of new ML algorithms without directly dealing with the complexity of underlying distributed stream processing engines (DSPEe, such as Apache Storm, Apache S4, and Apache Samza). Its users can develop distributed streaming ML algorithms once and execute them on multiple DSPEs.
JSAT is a library for quickly getting started with Machine Learning problems. It is developed in my free time, and made available for use under the GPL 3. Part of the library is for self education, as such - all code is self contained. JSAT has no external dependencies, and is pure Java.
Java-ML is a Java API with a collection of machine learning algorithms implemented in Java. It only provides a standard interface for algorithms.
MLlib (Spark) is Apache Spark's scalable machine learning library. Although Java, the library and the platform support Java, Scala and Python bindings. The library is new and the list of algorithms is long.
H2O is a machine learning API for smarter applications. It scales statistics, machine learning, and math over big data. H2O is extensible and individual can build blocks using simple math legos in the core.
RankLib is a library of learning to rank algorithms. Currently eight popular algorithms have been implemented.
A bigger list including more niche libraries is on Demnag.