Databricks is a good tool for building, training and deploying models. Easy to learn and use (1-2 days)
- It has useful features like experiment tracking, logging and versioning the best model during training, storing the artifacts in a catalog (cloud dbfs data store)
- It allows easy reloading, retraining and redeploying the model
- It uses the MLFlow library to perform these activities
- The Unity catalog acts as a versioned data store (metastore, schema, tables, etc) with governance policies and access control
- It is flexible in connecting with different cloud providers (AWS, Google , etc) to get access to compute and storage (cloud buckets)
- Both data and models can be easily read/written from/to the cloud using their API calls
- One main thing to remember is that you still need to know how to build, train, tune models, i.e. the data science process
- Databricks can’t build the model for you. It’s a tool to simplify the MLOps process and provide all the audit, governance, policy features in one connected platform
- It provides ready made connectors to data lakes and warehouses like Snowflake, to for easy data ingestion and other data engineering activities
- I am giving a sample code notebook for building, training, tuning and deploying a regression model for predicting a disease using synthetic data
- This can be easily extended to different models, hyperparameter tuning, deploying, etc
https://github.com/datawisdomx1/DataScience-AI-ML-Ops-Platforms-Tools