ML Workflow
The life cycle of an ML project has a pretty standard workflow. This is a well-trodden path.
- Look at the big picture.
- Frame the problem
- How will the model benefit the company?
- What current solutions exist, if any?
- Supervised, unsupervised, or reinforcement learning?
- Classification, regression, or something else?
- Batch learning or online learning?
- Select a performance measure (RMSE ‖∙‖2, MAE ‖∙‖1, or ?)
- Check any assumptions
- Frame the problem
- Get the data.
- Familiarize yourself with it (pandas)
- Plot histograms (matplotlib)
- Create a test set
- Discover and visualize the data to gain insights.
- Scatter plots (pandas)
- Correlations plots
- Prepare the data.
- Clean data
- Delete rows with nulls (dropna)
- Delete attributes with nulls (drop)
- Replace nulls with default value (fillna or imputer)
- Convert text to integers (factorize() to integer or one-hot encode)
- Feature scaling (min-max scaling or standardization)
- Clean data
- Select a model and train it.
- Cross-validation (cross_val_score)
- Fine-tune your model.
- GridSearchCV
- RandomizedSearchCV
- Present your solution.
- Launch, monitor and maintain your system.