this repo has no description
4
fork

Configure Feed

Select the types of activity you want to include in your feed.

:art:

+10 -1
+10 -1
Open Data.md
··· 170 170 #### Data Package Managers 171 171 172 172 - [Qri](https://qri.io/). An evolution of the classical open portals that added [[Decentralized Protocols]] (IPFS) and computing on top of the data. Sadly, [it came to an end early in 2022](https://qri.io/winding_down). 173 - - [Datalad](https://www.datalad.org/). [Extended to IPFS](https://kinshukk.github.io/posts/gsoc-summary-and-future-thoughts/). Is a [great candidate](https://archive.fosdem.org/2020/schedule/event/open_research_datalad/) and uses Git Annex (distributed binary object tracking layer on top of git). 173 + - [Datalad](https://www.datalad.org/). [Extended to IPFS](https://kinshukk.github.io/posts/gsoc-summary-and-future-thoughts/). 174 + - Is a [great tool](https://archive.fosdem.org/2020/schedule/event/open_research_datalad/) and uses Git Annex (distributed binary object tracking layer on top of git). 175 + - Complicated to wrap your head around. Lots of different commands and concepts. On the other hand, it's very powerful and flexible. Git Annex is complex but powerful and flexible. 176 + - The handbook is very good, but it's a lot of reading if you just want to test things out. 174 177 - [Huggingface Datasets](https://huggingface.co/docs/datasets). 175 178 - [Quilt](https://github.com/quiltdata/quilt). 179 + - Forces both Python and S3. 176 180 - [Oxen](https://github.com/Oxen-AI/Oxen). 181 + - Data is not accesible from other tools. 182 + - [Docs](https://github.com/Oxen-AI/oxen-release#-oxen-release) are sparse. 183 + - Definitely more in the Git for Data space than Dataset Package Manager. 177 184 - [Frictionless Data](https://frictionlessdata.io/projects/#software-and-standards). 178 185 - [Datopian Data CLI](https://github.com/datopian/data-cli). Sucesor of [DPM](https://github.com/frictionlessdata/dpm-js). 179 186 - [LakeFS](https://lakefs.io/blog/git-for-data/). More like Git for Data. ··· 186 193 - [Splitgraph](https://github.com/splitgraph/sgr). 187 194 - [Deep Lake](https://github.com/activeloopai/deeplake). 188 195 - [Dim](https://github.com/c-3lab/dim). 196 + - Hard to grok how to use it from the docs. 197 + - Quite small surface area. You can basically install datasets from URLs, create new ones, or apply some kind of GPT3 transformation on top of them. 189 198 - [Juan Benet's data](https://github.com/jbenet/data). 190 199 - [Colah's data](https://github.com/colah/data). 191 200 - [Dolt](https://docs.dolthub.com/) is another interesting project in the space with some awesome data structures. They also [do data bounties](https://www.dolthub.com/repositories/dolthub/us-businesses)!