···551. Learn more about the problem. Search for similar Kaggle competitions. Check the task in [Papers with Code](https://paperswithcode.com/).
662. Do a basic data exploration. Try to understand the problem and gather a sense of what can be important.
773. Get baseline model working.
88-4. Create `scikit-learn` compatible metric if needed.
88+4. Design an evaluation method as close as the final evaluation. Plot local evaluation metrics against the public ones (correlation) to validate how well your validation strategy works.
995. Try different approaches for preprocessing (encodings, Deep Feature Synthesis, lags, aggregations, imputers, ...). If you're working as a group, split preprocessing feature generation between files.
1010-6. Plot learning curves ([sklearn](https://scikit-learn.org/stable/modules/learning_curve.html) or [external tools](https://github.com/reiinakano/scikit-plot)) to avoid overfitting.
1111-7. Tune hyper-parameters once you've settled on an specific approach. ([optuna](https://optuna.readthedocs.io/)).
1212-8. Plot and visualize the predictions (histograms, random prediction, ...) to make sure they're doing as expected. Explain the predictions with [SHAP](https://github.com/slundberg/shap).
1313-9. Think about what postprocessing heuristics can be done to improve or correct predictions.
1414-10. [Stack](https://scikit-learn.org/stable/auto_examples/ensemble/plot_stack_predictors.html) classifiers ([example](https://www.kaggle.com/couyang/featuretools-sklearn-pipeline#ML-Pipeline)).
1515-11. Try AutoML models. For tabular data: [TPOT](https://github.com/EpistasisLab/tpot), [AutoSklearn](https://github.com/automl/auto-sklearn), [AutoGluon](https://auto.gluon.ai/stable/index.html), Google AI Platform, [PyCaret](https://github.com/pycaret/pycaret), [Fast.ai](https://docs.fast.ai/), [Alex](https://github.com/Alex-Lekov/AutoML_Alex).For time series: [AtsPy](https://github.com/firmai/atspy), [DeepAR](https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-recipe-deeparplus.html).
1010+6. Plot learning curves ([sklearn](https://scikit-learn.org/stable/modules/learning_curve.html) or [external tools](https://github.com/reiinakano/scikit-plot)) to avoid overfitting.
1111+7. Plot real and predicted target distribution to see how well your model understand the underlying distribution. Apply any postprocessing that might fix small things.
1212+8. Tune hyper-parameters once you've settled on an specific approach ([hyperopt](target distribution), [optuna](https://optuna.readthedocs.io/)).
1313+9. Plot and visualize the predictions (histograms, random prediction, ...) to make sure they're doing as expected. Explain the predictions with [SHAP](https://github.com/slundberg/shap).
1414+10. Think about what postprocessing heuristics can be done to improve or correct predictions.
1515+11. [Stack](https://scikit-learn.org/stable/auto_examples/ensemble/plot_stack_predictors.html) classifiers ([example](https://www.kaggle.com/couyang/featuretools-sklearn-pipeline#ML-Pipeline)).
1616+12. Try AutoML models. For tabular data: [TPOT](https://github.com/EpistasisLab/tpot), [AutoSklearn](https://github.com/automl/auto-sklearn), [AutoGluon](https://auto.gluon.ai/stable/index.html), Google AI Platform, [PyCaret](https://github.com/pycaret/pycaret), [Fast.ai](https://docs.fast.ai/), [Alex](https://github.com/Alex-Lekov/AutoML_Alex).For time series: [AtsPy](https://github.com/firmai/atspy), [DeepAR](https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-recipe-deeparplus.html).
16171718## Preprocessing Resources
1819···4344- [Sktime](https://github.com/alan-turing-institute/sktime) / [Aeon](https://github.com/aeon-toolkit/aeon)
4445- [Awesome Collection](https://github.com/MaxBenChrist/awesome_time_series_in_python)
4546- [Video with great ideas](https://www.youtube.com/watch?v=9QtL7m3YS9I)
4747+- [Tutorial Kaggle Notebook](https://www.kaggle.com/code/tumpanjawat/s3e19-course-eda-fe-lightgbm)
+1
IPFS.md
···2233- It's a file system with [content based addressing](https://www.youtube.com/watch?v=5Uj6uR3fp-U).
44 - Files are automatically deduplicated.
55+ - [It chunks, hashes and organizes blobs in a smart way](https://docs.google.com/presentation/d/1Gx8vSqrWZ7X-3SCgITXqQdinZQeXIAA7ITqL25SsPN8/edit#slide=id.g741b4d76cd_0_13).
56- Once something is added, it can't be changed anymore.
67- IPFS supports versioning using commits.
78- Keeping files available is a challenge. If the nodes storing a file go down, it'll disappear from the network. Filecoin can help with this adding incentives to the equation.