this repo has no description
4
fork

Configure Feed

Select the types of activity you want to include in your feed.

:art:

+12 -2
+1
Data/Analytics Engineering.md
··· 48 48 - [Flipside Crypto](https://github.com/FlipsideCrypto/external-models) 49 49 - [MetricsDAO](https://github.com/MetricsDAO) 50 50 - [RA Analytics](https://github.com/rittmananalytics/ra_data_warehouse) 51 + - [Anomstack](https://github.com/andrewm4894/anomstack) 51 52 52 53 ### Related [[Podcasts]] 53 54
+10 -2
Datathons.md
··· 7 7 3. Get baseline model working. 8 8 4. Design an evaluation method as close as the final evaluation. Plot local evaluation metrics against the public ones (correlation) to validate how well your validation strategy works. 9 9 5. Try different approaches for preprocessing (encodings, Deep Feature Synthesis, lags, aggregations, imputers, ...). If you're working as a group, split preprocessing feature generation between files. 10 - 6. Plot learning curves ([sklearn](https://scikit-learn.org/stable/modules/learning_curve.html) or [external tools](https://github.com/reiinakano/scikit-plot)) to avoid overfitting. 10 + 6. Plot learning curves ([sklearn](https://scikit-learn.org/stable/modules/learning_curve.html) or [external tools](https://github.com/reiinakano/scikit-plot)) to avoid overfitting. 11 11 7. Plot real and predicted target distribution to see how well your model understand the underlying distribution. Apply any postprocessing that might fix small things. 12 12 8. Tune hyper-parameters once you've settled on an specific approach ([hyperopt](target distribution), [optuna](https://optuna.readthedocs.io/)). 13 13 9. Plot and visualize the predictions (histograms, random prediction, ...) to make sure they're doing as expected. Explain the predictions with [SHAP](https://github.com/slundberg/shap). ··· 44 44 - [Sktime](https://github.com/alan-turing-institute/sktime) / [Aeon](https://github.com/aeon-toolkit/aeon) 45 45 - [Awesome Collection](https://github.com/MaxBenChrist/awesome_time_series_in_python) 46 46 - [Video with great ideas](https://www.youtube.com/watch?v=9QtL7m3YS9I) 47 - - [Tutorial Kaggle Notebook](https://www.kaggle.com/code/tumpanjawat/s3e19-course-eda-fe-lightgbm) 47 + - [Tutorial Kaggle Notebook](https://www.kaggle.com/code/tumpanjawat/s3e19-course-eda-fe-lightgbm) 48 + 49 + ## Datathon Platforms 50 + 51 + - [Kaggle](https://www.kaggle.com/competitions) 52 + - [MLContest](https://mlcontests.com/) 53 + - [Humyn](https://app.humyn.ai/) 54 + - [DrivenData](https://www.drivendata.org/competitions/) 55 + - [Xeek](https://xeek.ai/challenges)
+1
Open Data.md
··· 287 287 - [Victoriano's Data Sources](https://victorianoi.notion.site/Data-Sources-79b28912c6d941af99e6ef102c578fa0) 288 288 - [Data is Plural](https://www.data-is-plural.com/) 289 289 - [Public APIs](https://github.com/public-api-lists/public-api-lists) 290 + - [Real Time Datasets](https://github.com/bytewax/awesome-public-real-time-datasets) 290 291 291 292 ## Open Source Web Data IDE 292 293