Data Wrangling(preprocessing, prep, etc) is the most important and time consuming part of any data science project. Depending on the quality of data sources, 50%-60% of the initial effort is spent in extracting, cleaning, formatting, standardising, encoding categorical data, imputing missing values, removing junk data, slicing/dicing, other manipulations are performed on the data before it is ready for passing to the algorithms. Scikit-learn has a very good guide to perform most of these steps and worth going through. It covers
- Standardization, or mean removal and variance scaling
- Non-linear transformation
- Normalization
- Encoding categorical features
- Discretization
- Imputation of missing values
- Generating polynomial features
- Custom transformers
https://scikit-learn.org/stable/modules/preprocessing.html
Imputation of missing values https://scikit-learn.org/stable/modules/impute.html#impute