this repo has no description
4
fork

Configure Feed

Select the types of activity you want to include in your feed.

feat: ✨ Add guidelines for quick data usage and incremental changes

- Expanded "Open Data.md" with a motivational quote and clarification on "Open Data".
- Enhanced "Planning.md" with a step-by-step guide for rapid project execution.
- Updated "Systems.md" to emphasize evolutionary over revolutionary changes.
- Added a fast writing methodology to "Writing.md".

+22 -1
+8
Open Data.md
··· 27 27 28 28 We have better and cheaper infrastructure. That includes things like faster storage, better compute, and, larger amounts of data. We need to improve our data workflows now. How does a world where people collaborate on datasets looks like? [The data is there. We just need to use it](https://twitter.com/auren/status/1509340748054945794). 29 29 30 + **[The best thing to do with your data will be thought by someone else](https://youtu.be/_agrBn50kyE?t=925)**. 31 + 30 32 During the last few years, a large number of new data and open source tools have emerged. There are new query engines (e.g: DuckDB, DataFusion, ...), execution frameworks (WASM), data standards (Arrow, Parquet, ...), and a growing set of open data marketplaces (Datahub, HuggingFace Datasets, Kaggle Datasets). 31 33 32 34 These trends are already making it's way towards movements like [DeSci](https://ethereum.org/en/desci/) or smaller projects like [Py-Code Datasets](https://py-code.org/datasets). But, we still need more tooling around data to improve interoperability as much as possible. Lots of companies have figured out how to make the most of their datasets. **We should use similar tooling and approaches companies are using to manage the open datasets that surrounds us**. A sort of [Data Operating system](https://data-operating-system.com/). ··· 140 142 - **Exploratory**. Allow drill downs and customization. Offer a [simple way](https://lite.datasette.io/) for people to query/explore the data. 141 143 - **Dynamic**. Use only the data you need. No need to pull 150GB. 142 144 - **Default APIs**. For some datasets, allowing REST API / GraphQL endpoints might be useful. Same with providing an SQL interface. 145 + - Users should be able to clone public datasets with a single CLI command. 143 146 - **Don't break history**. If a dataset is updated, the [old versions should still be accessible](https://www.heltweg.org/posts/how-to-make-sure-no-one-cares-about-your-open-data/). 147 + - Make sure the datasets are there for the long run. This might take different forms (using a domain name, IPFS, ...). 144 148 145 149 ## Frequently Asked Questions 146 150 ··· 197 201 ### 12. Curated and small data (e.g: at the community level) is not reachable by Google. How can we help there? 198 202 199 203 Indeed! With LLMs on the rise, community curated datasets become more important as they don't appear in the big data dumps. 204 + 205 + ### 13. Wait, wait... What do you mean by "Open Data"? 206 + 207 + I use it as a generic term to refero to data and content that can be freely used, modified, and shared by anyone for any purpose. Generally alligned with the [Open Definition](https://opendefinition.org/od/2.1/en/) and [The Open Data Commons](https://opendatacommons.org/). 200 208 201 209 ## Related Projects 202 210
+7
Planning.md
··· 18 18 - List of options. 19 19 20 20 No matter what the final plan is, [[Teamwork|document it]] and you'll have a log of all the plans to reflect back on. 21 + 22 + Speed is important. [You can build / do things quickly](https://learnhowtolearn.org/how-to-build-extremely-quickly/). 23 + 24 + 1. Make an outline of the project 25 + 2. For each item in the outline, make an outline. Do this recursively until the items are small. 26 + 3. Fill in/do each item as fast as possible. do not perfect/iterate them as you go. 27 + 4. Finally, once completely done, go back and perfect.
+1 -1
Systems.md
··· 22 22 23 23 Complex systems usually have [attractor landscapes](https://ncase.me/attractors/) that can be used to change it. [The world is richer and more complicated than we give it credit for](https://slatestarcodex.com/2017/03/16/book-review-seeing-like-a-state/). 24 24 25 - A good approach to incrementally change a system (similar to [[Evolution|natural selection]]) is to: 25 + Evolution is easier than revolution. A good approach to incrementally change a system (similar to [[Evolution|natural selection]]) is to: 26 26 27 27 1. Start by identifying the highest-leverage level to optimize at: Ask whether you're optimizing the machine or a cog within it. Complex systems might change in unexpected ways (butterfly effects). Minor differences in starting points make big differences on future states. 28 28 2. Begin optimizing the system by following the [Theory of Constraints](https://en.wikipedia.org/wiki/Theory_of_constraints): At any time, just one of a system's inputs is constraining its other inputs from achieving a greater total output. Make incremental changes. Alter the incentive landscape. [If you can make your system less miserable, make your system less miserable!](https://astralcodexten.substack.com/p/book-review-the-cult-of-smart)
+6
Writing.md
··· 40 40 2. It is interesting. Written from your own experience, in your personal voice. 41 41 3. It is grounded in reality. 42 42 - Reading is the inhale, writing is the exhale. Breathe. 43 + - [To write fast](https://learnhowtolearn.org/how-to-build-extremely-quickly/): 44 + 1. Get topic to write about. 45 + 2. Quickly write the outline. 46 + 3. Repeat 2 for each section recursively, until the lowest-level sections are small enough to not need outlines. 47 + 4. Speedrun. Without caring about quality, fill in each outline (starting at the lowest level) until the whole doc is filled out. 48 + 5. Enjoy the feeling of being 90% done while you go back and perfect the doc, color the title text, add pictures, etc. 43 49 44 50 ## Executable Writing 45 51