Deploying Breiman’s Random Forest Algorithm in Machine Learning

Authors

  • Divya Sai Jaladi Senior Lead Application Developer, SCDMV, 10311 Wilson Boulevard, Blythewood, SC 29016, UNITED STATES
  • Sandeep Vutla Assistant Vice President, Senior-Data Engineer, Chubb, 202 Halls Mill Rd, Whitehouse Station, NJ 08889, UNITED STATES

Keywords:

Artificial Intelligence (AI), Machine Learning, Algorithm, Data, Training, Accuracy

Abstract

This paper presents the implementation and evaluation of Leo Breiman’s Random Forest algorithm within the Weka data mining environment, with a particular focus on addressing imbalanced datasets that hinder machine learning performance in domains such as fraud detection and rare disease diagnosis. The study enhances Weka's Random Forest functionality by integrating variable importance calculations, allowing for the identification of key predictive attributes. The implementation involves modifications to existing Weka classes and introduces new extensions to support out-of-bag (OOB) error estimation and variable permutation testing. Verification of the implementation is conducted by comparing results from Weka with those obtained from Breiman’s original Fortran-based code, demonstrating comparable variable importance behavior. To simplify the use of Breiman’s code, data converters and a Java-based GUI were developed. Additional improvements, such as transitioning Breiman’s static Fortran code to dynamic array handling in Fortran 90, further enhance usability. This work contributes to more robust model evaluation and attribute selection in Random Forests, offering insights for future development and practical application in machine learning research and software tools.

Downloads

Published

2019-02-09