Mojmelo v0.1: Machine Learning algorithms in pure Mojo

I’m happy to announce the release of Mojmelo v0.1; Implementations of Machine Learning algorithms from scratch in pure Mojo: GitHub - yetalit/Mojmelo: Machine Learning algorithms in pure Mojo 🔥

Here is the list of the algorithms:

  • Linear Regression

  • Polynomial Regression

  • Logistic Regression

  • KNN

  • KMeans

  • DBSCAN

  • SVM

  • Naive Bayes

    1. GaussianNB

    2. MultinomialNB

  • Decision Tree (Regression/Classification)

  • Random Forest (Regression/Classification)

  • GBDT (Regression/Classification)

  • PCA

Preprocessing:

  • normalize

  • MinMaxScaler

  • StandardScaler

  • KFold

  • GridSearchCV

  • LabelEncoder

Documentation: https://yetalit.github.io/Mojmelo/docs/_index.html

With version 0.1, the fundamental development phase got completed. Thus, mojmelo is now open to be tested on different user datasets.

Due to some limitations, I haven’t been able to provide user friendly benchmarking results yet. So, if you manage to do some benchmarking, feel free to submit them to the repository (by opening Pull Requests).

Finally, I want to give a huge thanks to everyone who showed interest in or support for the project. It was really motivating along the way :heart:

This is really nice, great work Doby!

After 6 months, I’d like to share some updates with you:

  • HDBSCAN algorithm has joined the algorithms list.
  • Algorithms with interface nature now support save() and load() functions to let you save your trained models.
  • Mojmelo codebase has been updated based on Mojo v1.0.0b1 and the new version will be released to the community channel through this PR.
  • Initial benchmarks on speed and correctness were completed and here are the results:

KMeans

Model Fit Time (s) ARI vs sklearn ARI vs truth
sklearn KMeans 0.2716 ± 0.0012 - 0.9389
mojmelo KMeans 0.1870 ± 0.0052 0.8821 0.9389

HDBSCAN (algorithm=‘boruvka_kdtree’)

Model Fit Time (s) ARI vs sklearn ARI vs truth
skl-contrib HDBS 1.1495 ± 0.0083 - 0.9997
mojmelo HDBS 0.3198 ± 0.0079 0.9930 0.9932

DBSCAN (algorithm=‘kd_tree’)

Model Fit Time (s) ARI vs sklearn ARI vs truth
sklearn DBS 1.1434 ± 0.0055 - 0.8566
mojmelo DBS 0.4028 ± 0.0038 0.9996 0.8566

KNN (algorithm=‘kd_tree’)

Model Fit Time (s) Predict Time (s) Accuracy
sklearn KNN 0.0353 ± 0.0005 1.7600 ± 0.0063 0.8543
mojmelo KNN 0.0149 ± 0.0006 0.2126 ± 0.0040 0.8347

SVM

Model Fit Time (s) Predict Time (s) Accuracy
sklearn SVM 1.0595 ± 0.0010 0.3066 ± 0.0002 0.9798
mojmelo SVM 0.8733 ± 0.0129 0.0603 ± 0.0032 0.9797

DecisionTreeClassifier

Model Fit Time (s) Predict Time (s) Accuracy
sklearn DTC 0.9051 ± 0.0008 0.0004 ± 0.0000 0.9300
mojmelo DTC 0.0749 ± 0.0028 0.0002 ± 0.0000 0.9328

DecisionTreeRegressor

Model Fit Time (s) Predict Time (s) MSE
sklearn DTR 0.6466 ± 0.0006 0.0005 ± 0.0000 8247.9358
mojmelo DTR 0.0795 ± 0.0049 0.0003 ± 0.0000 8192.1982

RandomForestClassifier

Model Fit Time (s) Predict Time (s) Accuracy
sklearn RFC 0.4707 ± 0.0064 0.0140 ± 0.0003 0.9182
mojmelo RFC 0.4534 ± 0.0094 0.0040 ± 0.0000 0.9174

RandomForestRegressor

Model Fit Time (s) Predict Time (s) MSE
sklearn RFR 2.0257 ± 0.0050 0.0134 ± 0.0004 8454.5517
mojmelo RFR 1.2247 ± 0.0094 0.0067 ± 0.0002 9155.6895

PCA (svd_solver=‘full’)

Model Fit Time (s) Transform Time (s) Explained Var
sklearn PCA 0.2070 ± 0.0025 0.0061 ± 0.0000 0.5363
mojmelo PCA 0.0737 ± 0.0003 0.0270 ± 0.0015 0.5363