···170170#### Data Package Managers
171171172172- [Qri](https://qri.io/). An evolution of the classical open portals that added [[Decentralized Protocols]] (IPFS) and computing on top of the data. Sadly, [it came to an end early in 2022](https://qri.io/winding_down).
173173-- [Datalad](https://www.datalad.org/). [Extended to IPFS](https://kinshukk.github.io/posts/gsoc-summary-and-future-thoughts/). Is a [great candidate](https://archive.fosdem.org/2020/schedule/event/open_research_datalad/) and uses Git Annex (distributed binary object tracking layer on top of git).
173173+- [Datalad](https://www.datalad.org/). [Extended to IPFS](https://kinshukk.github.io/posts/gsoc-summary-and-future-thoughts/).
174174+ - Is a [great tool](https://archive.fosdem.org/2020/schedule/event/open_research_datalad/) and uses Git Annex (distributed binary object tracking layer on top of git).
175175+ - Complicated to wrap your head around. Lots of different commands and concepts. On the other hand, it's very powerful and flexible. Git Annex is complex but powerful and flexible.
176176+ - The handbook is very good, but it's a lot of reading if you just want to test things out.
174177- [Huggingface Datasets](https://huggingface.co/docs/datasets).
175178- [Quilt](https://github.com/quiltdata/quilt).
179179+ - Forces both Python and S3.
176180- [Oxen](https://github.com/Oxen-AI/Oxen).
181181+ - Data is not accesible from other tools.
182182+ - [Docs](https://github.com/Oxen-AI/oxen-release#-oxen-release) are sparse.
183183+ - Definitely more in the Git for Data space than Dataset Package Manager.
177184- [Frictionless Data](https://frictionlessdata.io/projects/#software-and-standards).
178185- [Datopian Data CLI](https://github.com/datopian/data-cli). Sucesor of [DPM](https://github.com/frictionlessdata/dpm-js).
179186- [LakeFS](https://lakefs.io/blog/git-for-data/). More like Git for Data.
···186193- [Splitgraph](https://github.com/splitgraph/sgr).
187194- [Deep Lake](https://github.com/activeloopai/deeplake).
188195- [Dim](https://github.com/c-3lab/dim).
196196+ - Hard to grok how to use it from the docs.
197197+ - Quite small surface area. You can basically install datasets from URLs, create new ones, or apply some kind of GPT3 transformation on top of them.
189198- [Juan Benet's data](https://github.com/jbenet/data).
190199- [Colah's data](https://github.com/colah/data).
191200- [Dolt](https://docs.dolthub.com/) is another interesting project in the space with some awesome data structures. They also [do data bounties](https://www.dolthub.com/repositories/dolthub/us-businesses)!