ML Workflow
The life cycle of an ML project has a pretty standard workflow. This is a well-trodden path.
- Look at the big picture.
- Frame the problem
- How will the model benefit the company?
 - What current solutions exist, if any?
 - Supervised, unsupervised, or reinforcement learning?
 - Classification, regression, or something else?
 - Batch learning or online learning?
 
 - Select a performance measure (RMSE ‖∙‖2, MAE ‖∙‖1, or ?)
 - Check any assumptions
 
 - Frame the problem
 - Get the data.
- Familiarize yourself with it (pandas)
 - Plot histograms (matplotlib)
 - Create a test set
 
 - Discover and visualize the data to gain insights.
- Scatter plots (pandas)
 - Correlations plots
 
 - Prepare the data.
- Clean data
- Delete rows with nulls (dropna)
 - Delete attributes with nulls (drop)
 - Replace nulls with default value (fillna or imputer)
 
 - Convert text to integers (factorize() to integer or one-hot encode)
 - Feature scaling (min-max scaling or standardization)
 
 - Clean data
 - Select a model and train it.
- Cross-validation (cross_val_score)
 
 - Fine-tune your model.
- GridSearchCV
 - RandomizedSearchCV
 
 - Present your solution.
 - Launch, monitor and maintain your system.