Deploying Breiman’s Random Forest Algorithm in Machine Learning
Keywords:
Artificial Intelligence (AI), Machine Learning, Algorithm, Data, Training, AccuracyAbstract
This paper presents the implementation and evaluation of Leo Breiman’s Random Forest algorithm within the Weka data mining environment, with a particular focus on addressing imbalanced datasets that hinder machine learning performance in domains such as fraud detection and rare disease diagnosis. The study enhances Weka's Random Forest functionality by integrating variable importance calculations, allowing for the identification of key predictive attributes. The implementation involves modifications to existing Weka classes and introduces new extensions to support out-of-bag (OOB) error estimation and variable permutation testing. Verification of the implementation is conducted by comparing results from Weka with those obtained from Breiman’s original Fortran-based code, demonstrating comparable variable importance behavior. To simplify the use of Breiman’s code, data converters and a Java-based GUI were developed. Additional improvements, such as transitioning Breiman’s static Fortran code to dynamic array handling in Fortran 90, further enhance usability. This work contributes to more robust model evaluation and attribute selection in Random Forests, offering insights for future development and practical application in machine learning research and software tools.