this repo has no description
4
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: ✏️ add insights on data modeling and future technologies

Enhanced the "Data/Analytics Engineering.md" with new perspectives on the importance of robust data models and insights behind null values. Updated "Data/Data Culture.md" to emphasize the complexity of accurately modeling reality. In "Data Engineering.md," refined consistency practices by clarifying schema implementation. Lastly, extended "Future.md" to include distributed/decentralized data storage systems and improved prediction market utility examples.

+6 -2
+2
Data/Analytics Engineering.md
··· 28 28 - Reduce the areas where business logic can be injected, create "time to live" policies on last mile transforms, build a culture of standardizing + celebrating access to cross-functional codebases. 29 29 - People default to writing business logic in the tool they are most comfortable with. The best way for data teams to prevent sprawling business logic is to limit last mile transforms in other tools and invite others into their tools. [The logic will be written, and if the data team gate-keeps, it will be written outside of their visibility](https://ian-macomber.medium.com/data-systems-tend-towards-production-be5a86f65561)! If a data team can educate and encourage contributions to their codebase, they invite code to be written where it most belongs. 30 30 - Modern data warehouses [might need new model design paradigms](https://github.com/ActivitySchema/ActivitySchema/blob/main/2.0.md). 31 + - Good data models make good products. 32 + - Behind every null value there is a story. 31 33 32 34 ## Resources 33 35
+1
Data/Data Culture.md
··· 137 137 - [[Experimentation]] 138 138 - Observe sudden, unexplained special variation in your data, which you must then investigate to uncover new control factors that you don't already know about. 139 139 - Don't over rely on data. [Data is inherently objectifying](https://schmud.de/posts/2024-08-18-data-is-a-bad-idea.html) and naturally reduces complex conceptions and process into coarse representations. There's a certain fetish for data that can be quantified ([McNamara fallacy](https://en.wikipedia.org/wiki/McNamara_fallacy)) 140 + - [It's hard to capture reality with data](https://javisantana.com/fastdata/40-things-I-learned-about-data.html). Modelling reality always gets complex. There are always small nuances, special conditions, things that changed, edge cases and, of course, errors (which sometimes became features). Data visualizations are lossy. 140 141 141 142 ## Tools 142 143
+1 -1
Data/Data Engineering.md
··· 22 22 - **Simplicity**: Each steps is easy to understand and modify. Rely on immutable data. Write only. No deletes. No updates. Avoid having too much "state". Hosting static files on S3 is much less friction and maintenance than a server somewhere serving an API. 23 23 - **Reliability**: Errors in the pipelines can be recovered. Pipelines are monitored and tested. Data is saved in each step (storage is cheap) so it can be used later if needed. For example, adding a new column to a table can be done extracting the column from the intermediary data without having to query the data source. It is better to support 1 feature that works reliably and has a great UX than 2 that are unreliable or hard to use. One solid step is better than 2 finicky ones. 24 24 - **[[Modularity]]**: Steps are independent, declarative, and [[Idempotence|itempotent]]. This makes pipelines composable. 25 - - **Consistency**: Same conventions and design patterns across pipelines. If a failure is actionable by the user, clearly let them know what they can do. Schema on write. 25 + - **Consistency**: Same conventions and design patterns across pipelines. If a failure is actionable by the user, clearly let them know what they can do. Schema on write as there is always a schema. 26 26 - **Efficiency**: Low event latency when needed. Easy to scale up and down. A user should not be able to configure something that will not work. Don't mix heterogeneous workloads under the same tooling (e.g: big data warehouses doing simple queries 95% of their time and 1 big batch once a day). 27 27 - **Flexibility**: Steps change to conform data points. Changes don't stop the pipeline or losses data. Fail fast and upstream. 28 28
+2 -1
Future.md
··· 12 12 - The ignorance of Social Media and its [full impact on s_o_ciety](https://twitter.com/M_B_Petersen/status/1483457679800651787). 13 13 - Is "being bad for society" an emergent property of social networks as they grow? 14 14 - Current Voting Systems. 15 - - Not relying more into tools like Prediction Markets. 15 + - Not relying more into tools like Prediction Markets (e.g: [to spot papers that might not replicate](https://vitalik.eth.limo/general/2024/11/09/infofinance.html)). 16 16 17 17 ## Predictions 18 18 ··· 36 36 - If you work in tech, this is a fun thought experiment. Imagine you don’t need any money and can devote your time to benefiting everyone by building common digital infrastructure. What would exist in a better future? 37 37 - Content Addressed Data + Immutability 38 38 - CRDTs 39 + - Distributed / decentralized data storage systems. 39 40 - Homomorphic Encryption 40 41 - Prolly/Merkle Trees 41 42 - Differential/Timely Dataflow