this repo has no description atmosphereconf-vods.wisp.place/
4
fork

Configure Feed

Select the types of activity you want to include in your feed.

at main 2026 lines 45 kB view raw
1WEBVTT 2 31 400:00:00.000 --> 00:02:05.000 5you you you you you you you you you you you you you Excellent. 6 72 800:02:05.000 --> 00:02:10.400 9Thank you very much. 10 113 1200:02:10.400 --> 00:02:16.800 13I know that we're running weird on some time somewhere, so people can perfect. 14 154 1600:02:16.800 --> 00:02:17.800 17I won't need it all. 18 195 2000:02:17.800 --> 00:02:20.440 21I'll go through this quickly. Hopefully. But yeah, 22 236 2400:02:20.440 --> 00:02:24.520 25thank you for being here. Thank you for joining us talk. Enterprise data is maybe not the 26 277 2800:02:24.520 --> 00:02:31.000 29like most exciting topic, but I hope that I will make this interesting for you. 30 318 3200:02:31.000 --> 00:02:36.120 33There we go. All right. So the title of this talk, it's the thing 34 359 3600:02:36.120 --> 00:02:37.880 37where you, when the headline is a question, 38 3910 4000:02:37.880 --> 00:02:40.640 41and it can be answered no, maybe it's not worth your time. 42 4311 4400:02:40.640 --> 00:02:43.000 45So I'm going to tell you a little bit in advance 46 4712 4800:02:43.000 --> 00:02:46.560 49that the answer of did Lex kindicon accidentally solve the enterprise data problem. 50 5113 5200:02:46.560 --> 00:02:48.400 53The answer is probably no, but I want to give this talk 54 5514 5600:02:48.400 --> 00:02:50.640 57because I really want it to be yes. 58 5915 6000:02:50.640 --> 00:02:58.360 61And I'm going to explain why over the next say 29 minutes and 16 seconds. 62 6316 6400:02:58.360 --> 00:02:59.360 65So a little bit about me. 66 6717 6800:02:59.360 --> 00:03:01.160 69My name is Emily Gorsensky. 70 7118 7200:03:01.160 --> 00:03:04.280 73I'm a data scientist by background. 74 7519 7600:03:04.280 --> 00:03:10.640 77I was actually R&D is sort of where I started my career. 78 7920 8000:03:10.640 --> 00:03:13.400 81I realized at some point that all the algorithms I was doing 82 8321 8400:03:13.400 --> 00:03:16.280 85for biotechnology and aerospace and all that good stuff 86 8722 8800:03:16.280 --> 00:03:18.480 89was in this new and upcoming field that they were calling 90 9123 9200:03:18.480 --> 00:03:22.160 93data science and that those people get paid a lot more money than I was making. 94 9524 9600:03:22.160 --> 00:03:24.920 97So I just said I'm a data scientist now and it's stuck. 98 9925 10000:03:24.920 --> 00:03:29.080 101But you can find me on the internet here. 102 10326 10400:03:29.080 --> 00:03:32.120 105And how did I get into AT Proto stuff? 106 10727 10800:03:32.120 --> 00:03:34.560 109Well, it started because I think that the internet 110 11128 11200:03:34.560 --> 00:03:35.800 113is horrible. 114 11529 11600:03:35.800 --> 00:03:41.080 117And I really wanted to curate my own presence on the internet. 118 11930 12000:03:41.080 --> 00:03:45.320 121So if you've ever had a tweet or a ski go viral, 122 12331 12400:03:45.320 --> 00:03:46.760 125it's like the worst thing in the world. 126 12732 12800:03:46.760 --> 00:03:50.040 129It's actually something you want to happen. 130 13133 13200:03:50.040 --> 00:03:51.540 133So I created a little script called sketer 134 13534 13600:03:51.540 --> 00:03:54.640 137de leader, which I used to curate my own feed. 138 13935 14000:03:54.640 --> 00:03:56.360 141So I get to pick the things that I want to keep, 142 14336 14400:03:56.360 --> 00:03:58.240 145so that if anyone visits me, they 146 14737 14800:03:58.240 --> 00:04:02.240 149can't go back in my timeline and find things to cancel me over. 150 15138 15200:04:02.240 --> 00:04:06.000 153But I will say some things that you'll probably cancel me over today. 154 15539 15600:04:06.000 --> 00:04:08.000 157Data things. 158 15940 16000:04:08.000 --> 00:04:11.000 161I also run a labeler called brand block online because the 162 16341 16400:04:11.000 --> 00:04:12.000 165Internet is horrible. 166 16742 16800:04:12.000 --> 00:04:16.320 169The worst part about it is the evil brands who are coming in trying to be funny. So I built a 170 17143 17200:04:16.320 --> 00:04:21.920 173labeler to block the brands. So if you don't want to see on your timeline, Arby's 174 17544 17600:04:21.920 --> 00:04:24.000 177trying to joke with Taco Bell. 178 17945 18000:04:24.000 --> 00:04:26.000 181This is your tool. 182 18346 18400:04:26.000 --> 00:04:30.000 185And this is something new that I just recently built, 186 18747 18800:04:30.000 --> 00:04:32.000 189because I finally figured out how to do OAuth, 190 19148 19200:04:32.000 --> 00:04:35.680 193the kind of for AT Prodo. 194 19549 19600:04:35.680 --> 00:04:39.000 197This is just like a little wish list slash registry 198 19950 20000:04:39.000 --> 00:04:42.800 201slash mutual aid compilation tool. 202 20351 20400:04:42.800 --> 00:04:45.040 205I'm going to be opening up this up for beta. 206 20752 20800:04:45.040 --> 00:04:48.760 209So if you want to test it out, I'm going to try to launch 210 21153 21200:04:48.760 --> 00:04:50.080 213this by the end of the conference. 214 21554 21600:04:50.080 --> 00:04:52.000 217But you do need to follow me on Bluesky. 218 21955 22000:04:52.000 --> 00:04:54.920 221That's not intentionally manipulative. 222 22356 22400:04:54.920 --> 00:04:58.320 225It's just the easiest way that I have to limit the number 226 22757 22800:04:58.320 --> 00:05:00.000 229of people using it right now. 230 23158 23200:05:00.000 --> 00:05:04.160 233I don't feel like doing an invite system. 234 23559 23600:05:04.160 --> 00:05:08.040 237And then my professional world, I'm a CTO of a startup. 238 23960 24000:05:08.040 --> 00:05:11.400 241The title is fancier than the actual. 242 24361 24400:05:11.400 --> 00:05:14.000 245What I'm doing day to day, what I'm doing day to day, 246 24762 24800:05:14.000 --> 00:05:15.520 249what I'm doing day to day is actually building 250 25163 25200:05:15.520 --> 00:05:17.880 253just a counting software. 254 25564 25600:05:17.880 --> 00:05:25.360 257So if you have quick books data and you want dashboards, I can do that. 258 25965 26000:05:25.360 --> 00:05:27.440 261But before I was, I joined the startup world. 262 26366 26400:05:27.440 --> 00:05:33.720 265I was doing consulting for eight years, which is why I talk about enterprise data. 266 26767 26800:05:33.720 --> 00:05:37.600 269So the thing that's interesting about the enterprise data space, and I told you that 270 27168 27200:05:37.600 --> 00:05:38.600 273I'm a data scientist. 274 27569 27600:05:38.600 --> 00:05:40.000 277I came from an R&D background. 278 27970 28000:05:40.000 --> 00:05:42.000 281I peaked in Fortran. 282 28371 28400:05:42.000 --> 00:05:44.000 285I'm not one of these web people. 286 28772 28800:05:44.000 --> 00:05:46.000 289If you look at the AT Proto or I'm sorry, 290 29173 29200:05:46.000 --> 00:05:48.920 293at Proto, I learned that it's at Proto. 294 29574 29600:05:48.920 --> 00:05:51.760 297I thought it was actually Austria Proto for a while. 298 29975 30000:05:53.440 --> 00:06:00.360 301If you look at the tooling, it's in TypeScript, it's in Go, and it's in, if you're 302 30376 30400:06:00.360 --> 00:06:02.680 305really cool, it's in Rust. 306 30777 30800:06:02.680 --> 00:06:05.760 309And those are like the hip languages, right? 310 31178 31200:06:05.760 --> 00:06:09.000 313And those aren't the languages that data people use. 314 31579 31600:06:09.000 --> 00:06:11.880 317But it's also reflective of the mindset 318 31980 32000:06:11.880 --> 00:06:14.480 321that at Protod developers have, which 322 32381 32400:06:14.480 --> 00:06:16.520 325is that when you're building web applications, you're 326 32782 32800:06:16.520 --> 00:06:18.480 329treating data like a hot potato. 330 33183 33200:06:18.480 --> 00:06:22.080 333Like data comes in, you want to get rid of it as quickly as you can, right? 334 33584 33600:06:22.080 --> 00:06:25.560 337Because longer that you're holding on to data and processing it and doing things to it, 338 33985 34000:06:25.560 --> 00:06:28.720 341the harder your system becomes to maintain an operate 342 34386 34400:06:28.720 --> 00:06:30.640 345at scale. 346 34787 34800:06:30.640 --> 00:06:35.000 349And so web developers have built all of these really cool tools and languages. 350 35188 35200:06:35.000 --> 00:06:36.000 353And they're like, really hip. 354 35589 35600:06:36.000 --> 00:06:39.600 357We've got these really cool conferences and places like the University of British Columbia. 358 35990 36000:06:39.600 --> 00:06:44.000 361And they wear jeans to work and they're really awesome. 362 36391 36400:06:44.000 --> 00:06:48.240 365And data people are kind of different than that. 366 36792 36800:06:48.240 --> 00:06:50.880 369Like they kind of emerge from the world of database 370 37193 37200:06:50.880 --> 00:06:52.640 373administrators and database administrators 374 37594 37600:06:52.640 --> 00:06:56.960 377are famous for being grumpy and they don't like change and they don't like you and they don't 378 37995 38000:06:56.960 --> 00:07:01.080 381like your fancy languages like we're all toiling in the SQL minds. 382 38396 38400:07:01.080 --> 00:07:04.960 385We might be writing some Python if you really like 386 38797 38800:07:04.960 --> 00:07:13.000 389fancy might do a little scholar or spark in the data space. And a lot of these folks are sitting there maintaining systems that have been around for 25 years. 390 39198 39200:07:13.000 --> 00:07:16.000 393And they're like not wearing jeans and hoodies to work. 394 39599 39600:07:16.000 --> 00:07:18.200 397They're wearing like khakis and polos 398 399100 40000:07:18.200 --> 00:07:19.440 401because they have serious jobs 402 403101 40400:07:19.440 --> 00:07:21.600 405and they're holding serious data 406 407102 40800:07:21.600 --> 00:07:23.520 409that all of these like multi-billion companies 410 411103 41200:07:23.520 --> 00:07:26.360 413will stop operating if anything happens to it. 414 415104 41600:07:26.360 --> 00:07:27.780 417So they don't want to do change. 418 419105 42000:07:27.780 --> 00:07:31.660 421They don't want to do a lot of fancy technology development. 422 423106 42400:07:31.660 --> 00:07:33.140 425And this is a little bit of hyperbole. 426 427107 42800:07:33.140 --> 00:07:35.380 429I have known database administrators who 430 431108 43200:07:35.380 --> 00:07:38.640 433from time to time do where genes. 434 435109 43600:07:38.640 --> 00:07:43.160 437The thing about enterprise data is that it's not really technology development. 438 439110 44000:07:43.160 --> 00:07:47.600 441It's anthropology. Because when you're dealing with a company that has been around 442 443111 44400:07:47.600 --> 00:07:55.140 445for 20, 30, 50, 100, 150, sometimes longer years, the information that you have is all 446 447112 44800:07:55.140 --> 00:08:02.240 449historical and it reflects the relationships of a business, its departments, its entities, 450 451113 45200:08:02.240 --> 00:08:05.000 453its customers, and its suppliers. it's customers and it's suppliers. 454 455114 45600:08:05.000 --> 00:08:10.780 457You all are familiar with Conway's law, that the architecture of a system reflects the 458 459115 46000:08:10.780 --> 00:08:14.400 461communication patterns of a company. 462 463116 46400:08:14.400 --> 00:08:17.400 465Well, that all comes through an enterprise data. 466 467117 46800:08:17.400 --> 00:08:21.240 469I've actually worked with data systems that you can pinpoint the exact day, 470 471118 47200:08:21.240 --> 00:08:24.240 473that two teams stop talking to each other. 474 475119 47600:08:24.240 --> 00:08:26.760 477And you can do that because the fields change. 478 479120 48000:08:26.760 --> 00:08:35.640 481And then you have to carry around this conditional with a magically coded date because on June 16th 482 483121 48400:08:35.640 --> 00:08:40.760 485of 2007, these two teams went through a reorg. 486 487122 48800:08:40.760 --> 00:08:44.920 489And then all of your logic has to carry that through for all time. 490 491123 49200:08:44.920 --> 00:08:47.920 493And so this is like an anthropological problem, 494 495124 49600:08:47.920 --> 00:08:50.600 497which makes sharing and using this data 498 499125 50000:08:50.600 --> 00:08:55.600 501and doing anything meaningful with it very, very difficult. 502 503126 50400:08:55.600 --> 00:08:57.500 505So what is the problem with enterprise data? 506 507127 50800:08:57.500 --> 00:09:00.500 509The problem is that data access is slow, it's difficult, 510 511128 51200:09:00.500 --> 00:09:02.060 513it's expensive. 514 515129 51600:09:02.060 --> 00:09:05.700 517So is the data engineering. 518 519130 52000:09:05.700 --> 00:09:08.080 521I've gone to data engineering teams as a consultant. 522 523131 52400:09:08.080 --> 00:09:09.400 525They said, tell me about your problems. 526 527132 52800:09:09.400 --> 00:09:10.600 529They said, well, here's our backlog. 530 531133 53200:09:10.600 --> 00:09:14.880 533And I said, how many tickets do you do each week? 534 535134 53600:09:14.880 --> 00:09:17.920 537And they said, that's ambitious. 538 539135 54000:09:17.920 --> 00:09:20.920 541We do maybe 20 tickets per quarter. 542 543136 54400:09:20.920 --> 00:09:22.760 545And they said, well, how many new tickets per quarter 546 547137 54800:09:22.760 --> 00:09:23.120 549do you get? 550 551138 55200:09:23.120 --> 00:09:25.480 553They say 40. 554 555139 55600:09:25.480 --> 00:09:26.800 557So that's not a solution. 558 559140 56000:09:26.800 --> 00:09:29.920 561That's not like a situation is getting better. 562 563141 56400:09:29.920 --> 00:09:32.040 565And then you also see things like data science 566 567142 56800:09:32.040 --> 00:09:35.800 569seems like early mid 2000s, everyone's like 570 571143 57200:09:35.800 --> 00:09:39.280 573oh you got to have data scientists because AI is coming. 574 575144 57600:09:39.280 --> 00:09:42.800 577So they went out and they pulled all of these people out of academia. 578 579145 58000:09:42.800 --> 00:09:48.400 581A lot of neuropsych people, a lot of astronomy physics, 582 583146 58400:09:48.400 --> 00:09:50.560 585all of that people who know good mathies stuff, 586 587147 58800:09:50.560 --> 00:09:51.840 589and they put them in a team, they didn't really 590 591148 59200:09:51.840 --> 00:09:54.880 593explain to them how to work in an enterprise environment. 594 595149 59600:09:54.880 --> 00:09:59.400 597And so they built a lot of stuff that was really cool, but it wasn't in phase of product development. 598 599150 60000:10:00.000 --> 00:10:02.060 601And so then you have these really cool algorithms 602 603151 60400:10:02.060 --> 00:10:03.920 605that nobody knows how to deploy. 606 607152 60800:10:03.920 --> 00:10:07.480 609Nobody knows how to run the code for it. 610 611153 61200:10:07.480 --> 00:10:08.880 613And by the time you actually get it, 614 615154 61600:10:08.880 --> 00:10:10.920 617they're like six months out of date anyways, 618 619155 62000:10:10.920 --> 00:10:13.400 621so it doesn't actually add value. 622 623156 62400:10:13.400 --> 00:10:17.760 625And then data folks don't really do agile. 626 627157 62800:10:17.760 --> 00:10:24.000 629Like, version control, like there's still clicking tools, like building ETL pipelines with drag and drop, right? 630 631158 63200:10:24.000 --> 00:10:26.000 633Like, continuous delivery isn't a thing. 634 635159 63600:10:26.000 --> 00:10:28.000 637CD, it's like, I think that you put music on. 638 639160 64000:10:28.000 --> 00:10:32.080 641And DevOps is a fancy team to data folks 642 643161 64400:10:32.080 --> 00:10:33.280 645that nobody can explain, right? 646 647162 64800:10:33.280 --> 00:10:35.720 649Like the data folks they don't really understand 650 651163 65200:10:35.720 --> 00:10:39.160 653how to do rapid fast agile software development. 654 655164 65600:10:39.160 --> 00:10:42.280 657Again, hyperbole, because a lot of things are getting better. 658 659165 66000:10:42.280 --> 00:10:44.800 661But if you go into an enterprise, there's still a lot of stuff 662 663166 66400:10:44.800 --> 00:10:46.880 665where they don't even know how to use Git. 666 667167 66800:10:46.880 --> 00:10:49.680 669I've actually gone into clients at big companies 670 671168 67200:10:49.680 --> 00:10:51.200 673and had day one. 674 675169 67600:10:51.200 --> 00:10:53.320 677I'm like, all right, let's talk about your architecture. 678 679170 68000:10:53.320 --> 00:10:59.000 681And by day three, I'm like, OK, here's how to do Git status. And the problem is that, like, okay, here's how to do get status. 682 683171 68400:10:59.000 --> 00:11:02.200 685And the problem is that within software development, 686 687172 68800:11:02.200 --> 00:11:04.200 689we've favored software developers. 690 691173 69200:11:04.200 --> 00:11:07.000 693We've given them a lot of tools to build software really quick. 694 695174 69600:11:07.000 --> 00:11:10.000 697And a lot of what we've done is we've absolve them of the duty and 698 699175 70000:11:10.000 --> 00:11:13.000 701the responsibility to care about things like data quality, 702 703176 70400:11:13.000 --> 00:11:15.920 705data semantics, the relevance of the data. 706 707177 70800:11:15.920 --> 00:11:18.160 709We're just like, here, keep the data quick, 710 711178 71200:11:18.160 --> 00:11:19.120 713throw it somewhere. 714 715179 71600:11:19.120 --> 00:11:21.960 717And then they're pushing it to Kafka Streams and stuff like that. 718 719180 72000:11:21.960 --> 00:11:23.200 721And it's all great. 722 723181 72400:11:23.200 --> 00:11:24.000 725It's great for them in, in, 726 727182 72800:11:24.000 --> 00:11:30.000 729and then some data engineer has to sort out that mess. 730 731183 73200:11:30.000 --> 00:11:32.000 733And we've tried solving this with like lots of 734 735184 73600:11:32.000 --> 00:11:35.720 737different architectures. If you talk to data people they love talking about architecture, 738 739185 74000:11:35.720 --> 00:11:37.640 741we've gone from data warehouses to data lakes. 742 743186 74400:11:37.640 --> 00:11:40.760 745Data lakes became data swamps. 746 747187 74800:11:40.760 --> 00:11:43.800 749We've built things like data vaults, which is, I guess, 750 751188 75200:11:43.800 --> 00:11:45.600 753if you really want to make your data model 754 755189 75600:11:45.600 --> 00:11:47.480 757complicated data vault is great. 758 759190 76000:11:47.480 --> 00:11:50.160 761We've added Greek letters to it, a lambda architecture, 762 763191 76400:11:50.160 --> 00:11:52.960 765or a kappa architecture. 766 767192 76800:11:52.960 --> 00:11:56.640 769And this is all gone sort of in a cyclical pattern, right? 770 771193 77200:11:56.640 --> 00:12:01.400 773This shift between centralization and decentralization. 774 775194 77600:12:01.400 --> 00:12:05.880 777What happens is most enterprises like we need a big database. 778 779195 78000:12:05.880 --> 00:12:08.080 781They build a big database and they're like, 782 783196 78400:12:08.080 --> 00:12:10.760 785here's our data warehouse, it's OLAB, it's all this stuff. 786 787197 78800:12:10.760 --> 00:12:13.560 789Then there's a bottleneck and there's a backlog. 790 791198 79200:12:13.560 --> 00:12:15.360 793They say this isn't working and then finally, 794 795199 79600:12:15.360 --> 00:12:18.720 797somebody with enough political clout in the organization is like, screw you all 798 799200 80000:12:18.720 --> 00:12:20.920 801on building my own database. 802 803201 80400:12:20.920 --> 00:12:23.280 805And then that happens once, and then the next team is like, well, they did it. 806 807202 80800:12:23.280 --> 00:12:24.680 809So I'm going to do it. 810 811203 81200:12:24.680 --> 00:12:28.240 813And so the next thing you know, like your three, four, five years later, and now you've got the 814 815204 81600:12:28.240 --> 00:12:32.160 817shadow IT situation going on, there's all of these different data systems. 818 819205 82000:12:32.160 --> 00:12:35.120 821And somebody goes, why are we spending all this money on data systems? 822 823206 82400:12:35.120 --> 00:12:37.000 825Let's do another big consolidation. 826 827207 82800:12:37.000 --> 00:12:39.360 829I've come in on the back end of $100 million 830 831208 83200:12:39.360 --> 00:12:43.120 833failed data architecture consolidation projects. 834 835209 83600:12:43.120 --> 00:12:46.500 837Tons of money go into this. 838 839210 84000:12:46.500 --> 00:12:49.560 841It's all been very expensive and nothing has worked. 842 843211 84400:12:49.560 --> 00:12:51.500 845Nothing has worked. 846 847212 84800:12:51.500 --> 00:12:53.820 849Sometimes things get a little bit better, 850 851213 85200:12:53.820 --> 00:12:57.000 853but every organization I talk to has the same exact problems. 854 855214 85600:12:57.000 --> 00:13:05.400 857And so when I was at ThoughtWorks we came up with this solution that I'm going to talk about. 858 859215 86000:13:05.400 --> 00:13:06.920 861But a little bit about why it doesn't work, 862 863216 86400:13:06.920 --> 00:13:09.720 865sorry, I almost skipped a slide here. 866 867217 86800:13:09.720 --> 00:13:11.800 869You just have this situation of data governance 870 871218 87200:13:11.800 --> 00:13:14.020 873as a mass, data catalogs are all 874 875219 87600:13:14.020 --> 00:13:15.560 877kind of terrible. 878 879220 88000:13:15.560 --> 00:13:19.220 881Schematics are just combobulated, semantics are even worse. 882 883221 88400:13:19.220 --> 00:13:20.460 885Like systems don't talk to each other. 886 887222 88800:13:20.460 --> 00:13:22.480 889People don't talk to each other. People don't talk to each other. 890 891223 89200:13:22.480 --> 00:13:26.240 893It's just a bad situation for most of the time. 894 895224 89600:13:26.240 --> 00:13:29.160 897And what we actually need is a scalable and consistent way 898 899225 90000:13:29.160 --> 00:13:30.640 901to define the exchange of data 902 903226 90400:13:30.640 --> 00:13:36.480 905between organization systems, parties, companies, whatever. 906 907227 90800:13:36.480 --> 00:13:38.240 909We need platform independence, right? 910 911228 91200:13:38.240 --> 00:13:42.040 913So we have systems, lots of companies are like, 914 915229 91600:13:42.040 --> 00:13:45.040 917we're going to be on Azure and AWS and Google 918 919230 92000:13:45.040 --> 00:13:47.760 921because we don't want to put all our eggs in one basket. 922 923231 92400:13:47.760 --> 00:13:49.920 925And now you have three different types of ecosystems. 926 927232 92800:13:49.920 --> 00:13:51.000 929It's like Tower of Babel. 930 931233 93200:13:51.000 --> 00:13:53.920 933Nobody's speaking the same data language. 934 935234 93600:13:53.920 --> 00:13:57.920 937And we need lightweight oriented models. 938 939235 94000:13:57.920 --> 00:14:00.360 941So when I was at ThoughtWorks, we came up with this idea 942 943236 94400:14:00.360 --> 00:14:01.080 945called data mesh. 946 947237 94800:14:01.080 --> 00:14:02.760 949Data mesh is a bit like communism. 950 951238 95200:14:02.760 --> 00:14:10.000 953It came out of conditions. And the conditions that we talked about were all of this like fragmented 954 955239 95600:14:10.000 --> 00:14:17.000 957ecosystem of not having this way, this common way of defining what it is that you mean 958 959240 96000:14:17.000 --> 00:14:18.000 961when you're talking about data. 962 963241 96400:14:18.000 --> 00:14:21.000 965So if you don't know what you're talking about when it comes to data, 966 967242 96800:14:21.000 --> 00:14:27.000 969you don't know how to even describe a way to share and exchange it. 970 971243 97200:14:27.000 --> 00:14:33.840 973So, data mesh came up with, it's a decentralized model of working with data that focuses on data as a product. 974 975244 97600:14:33.840 --> 00:14:36.400 977So we actually wanted to think about what 978 979245 98000:14:36.400 --> 00:14:38.080 981if you worked with big data in the same way 982 983246 98400:14:38.080 --> 00:14:40.080 985that you work with microservices. 986 987247 98800:14:40.080 --> 00:14:41.840 989Which is kind of a nonsensical way of thinking 990 991248 99200:14:41.840 --> 00:14:43.000 993about it because microservices 994 995249 99600:14:43.000 --> 00:14:50.200 997are designed to be really, really small and big data is like by definition very, very big. 998 999250 100000:14:50.200 --> 00:14:55.000 1001So we came up with these ideas of like, okay, let's make data a product. Well, when 1002 1003251 100400:14:55.000 --> 00:14:58.680 1005you're doing microservices development, you're usually domain oriented. You usually 1006 1007252 100800:14:58.680 --> 00:14:59.840 1009have a platform like Kubernetes. 1010 1011253 101200:15:00.000 --> 00:15:02.920 1013and that is something to deploy your service on. 1014 1015254 101600:15:02.920 --> 00:15:04.400 1017And then, of course, everyone said, well, 1018 1019255 102000:15:04.400 --> 00:15:05.120 1021that doesn't work. 1022 1023256 102400:15:05.120 --> 00:15:05.960 1025We need to govern it. 1026 1027257 102800:15:05.960 --> 00:15:08.520 1029And so then we stapled on federated computational 1030 1031258 103200:15:08.520 --> 00:15:11.600 1033governance to make them happy. 1034 1035259 103600:15:11.600 --> 00:15:15.920 1037And then try to figure out what that was going to be. 1038 1039260 104000:15:15.920 --> 00:15:17.720 1041And this is a really great theory. 1042 1043261 104400:15:17.720 --> 00:15:18.960 1045Again, data mesh is like communism. 1046 1047262 104800:15:18.960 --> 00:15:25.920 1049It's really great in theory. The practice has been a little bit less stellar in some cases. 1050 1051263 105200:15:25.920 --> 00:15:30.120 1053Because domain-oriented data products, they're a good idea, 1054 1055264 105600:15:30.120 --> 00:15:31.680 1057but we haven't given anyone the tools 1058 1059265 106000:15:31.680 --> 00:15:35.200 1061of how to actually do it and implement it correctly. 1062 1063266 106400:15:37.760 --> 00:15:42.240 1065So, what if we just pretended like those challenges didn't exist? 1066 1067267 106800:15:42.240 --> 00:15:45.480 1069Like what if we just decided that we're going to start from scratch? 1070 1071268 107200:15:45.480 --> 00:15:48.080 1073We're not going to talk about all of your semantic layer. We're not going to talk about all of your legacy systems. We're just going to talk about all of your semantic layer. 1074 1075269 107600:15:48.080 --> 00:15:50.480 1077We're not going to talk about all of your legacy systems. 1078 1079270 108000:15:50.480 --> 00:15:53.360 1081We're just going to build a microservice around data. 1082 1083271 108400:15:53.360 --> 00:15:55.160 1085What would that look like? 1086 1087272 108800:15:55.160 --> 00:15:58.080 1089It's actually not that bad of an idea. 1090 1091273 109200:15:58.080 --> 00:16:02.280 1093The problem is, once you start to try to implement that, 1094 1095274 109600:16:02.280 --> 00:16:03.560 1097everything falls apart. 1098 1099275 110000:16:03.560 --> 00:16:04.840 1101The tooling sucks. 1102 1103276 110400:16:04.840 --> 00:16:06.000 1105The data catalog catalog suck. 1106 1107277 110800:16:06.000 --> 00:16:08.000 1109It requires a ton of platform engineering. 1110 1111278 111200:16:08.000 --> 00:16:10.000 1113Nothing is really set up for this. 1114 1115279 111600:16:10.000 --> 00:16:13.000 1117There's no standard language for defining what a data 1118 1119280 112000:16:13.000 --> 00:16:17.000 1121product is or how it should interact. 1122 1123281 112400:16:17.000 --> 00:16:20.000 1125There's no standard way of joining the semantics 1126 1127282 112800:16:20.000 --> 00:16:23.000 1129from two different domains together. 1130 1131283 113200:16:23.000 --> 00:16:24.800 1133And there's a lot of languages that have come up 1134 1135284 113600:16:24.800 --> 00:16:28.200 1137like owl and RDF and all this other stuff, 1138 1139285 114000:16:28.200 --> 00:16:31.960 1141to try to define how data looks and how data should be 1142 1143286 114400:16:31.960 --> 00:16:32.960 1145shaped. 1146 1147287 114800:16:32.960 --> 00:16:35.240 1149But they have all of these gaps about how we're 1150 1151288 115200:16:35.240 --> 00:16:40.520 1153supposed to define how we're supposed to work with data. 1154 1155289 115600:16:40.520 --> 00:16:44.120 1157So when you actually dive into, and this is where we get to lexicon, there's a little 1158 1159290 116000:16:44.120 --> 00:16:47.600 1161bit of check-ups gone that I'm setting up here. 1162 1163291 116400:16:47.600 --> 00:16:52.200 1165If you look at the way that we model data, we care about really two things. 1166 1167292 116800:16:52.200 --> 00:16:54.600 1169We look at the nouns and the verbs. 1170 1171293 117200:16:54.600 --> 00:16:57.520 1173And the nouns are what the data means, 1174 1175294 117600:16:57.520 --> 00:16:59.440 1177how it's structured, who owns it, 1178 1179295 118000:16:59.440 --> 00:17:00.500 1181who can access it, 1182 1183296 118400:17:00.500 --> 00:17:03.200 1185what are the permissions, etc. 1186 1187297 118800:17:03.200 --> 00:17:05.840 1189And sometimes those have adjectives like the type 1190 1191298 119200:17:05.840 --> 00:17:08.920 1193of the data or the frequency that it's updated, 1194 1195299 119600:17:08.920 --> 00:17:12.040 1197the freshness of it, things like that. 1198 1199300 120000:17:12.040 --> 00:17:16.920 1201And then we also have, so I'll jump into the verbs in a second, but this is a great example 1202 1203301 120400:17:16.920 --> 00:17:19.840 1205of talking about the nouns. So I pulled this like 1206 1207302 120800:17:19.840 --> 00:17:28.400 1209random owl to definition off of, I don't know, the internet somewhere, which is great. 1210 1211303 121200:17:28.400 --> 00:17:32.080 1213It's talking about the physical quality of the thermal energy of a system. 1214 1215304 121600:17:32.080 --> 00:17:34.960 1217And it defines all of this really specific stuff. 1218 1219305 122000:17:34.960 --> 00:17:37.840 1221Half of the character, solidly more than half of the characters in here, 1222 1223306 122400:17:37.840 --> 00:17:40.220 1225are metadata, like meta metadata, 1226 1227307 122800:17:40.220 --> 00:17:44.640 1229about the links of where to find the actual thing that you're seeing. 1230 1231308 123200:17:44.640 --> 00:17:50.920 1233So this is really cool if you want to know that this number 1234 1235309 123600:17:50.920 --> 00:17:52.960 1237is about the physical quality of the thermal energy 1238 1239310 124000:17:52.960 --> 00:17:54.760 1241of a system, but it doesn't tell you what 1242 1243311 124400:17:54.760 --> 00:17:56.600 1245that means in any sort of context. 1246 1247312 124800:17:56.600 --> 00:18:07.680 1249It doesn't tell you what you should do about it or why you should care about it. The problem with that type of approach and the problem with all of those types of data specification 1250 1251313 125200:18:07.680 --> 00:18:11.720 1253languages that we've seen is that they tend to be very static. 1254 1255314 125600:18:11.720 --> 00:18:13.880 1257It assumes that data is fully self-contained, 1258 1259315 126000:18:13.880 --> 00:18:16.840 1261that you can just kind of look at a thing and describe 1262 1263316 126400:18:16.840 --> 00:18:18.780 1265its properties and be like, a, a, a, 1266 1267317 126800:18:18.780 --> 00:18:21.500 1269now have a perfect platonic ideal of a chair. 1270 1271318 127200:18:21.500 --> 00:18:23.760 1273I've defined exactly what a chair is, 1274 1275319 127600:18:23.760 --> 00:18:28.000 1277but you don't do anything about how a chair should be used. 1278 1279320 128000:18:34.000 --> 00:18:43.000 1281And then on top of that it becomes really difficult to start to contextualize it because in reality you need to start to think about how these nouns become composed with each other, how they interplay. 1282 1283321 128400:18:43.000 --> 00:18:50.000 1285And so there's some stuff out there working on how to overlay different data definition languages 1286 1287322 128800:18:50.000 --> 00:18:55.000 1289in a practical systems context, but we're still missing the whole action element of it. 1290 1291323 129200:19:00.400 --> 00:19:08.000 1293The thing is with like building a microservice style architecture you also need to define how these services interact. That's actually the critical thing about it. 1294 1295324 129600:19:08.000 --> 00:19:12.880 1297If you look at something like the open API spec, it's really great. 1298 1299325 130000:19:12.880 --> 00:19:15.840 1301It talks about how you access the data, 1302 1303326 130400:19:15.840 --> 00:19:18.280 1305what to do with the data, what are the things that you can, 1306 1307327 130800:19:18.280 --> 00:19:21.920 1309like how do you write it, update it, delete it, whatever. 1310 1311328 131200:19:21.920 --> 00:19:23.680 1313You define your API spec. 1314 1315329 131600:19:23.680 --> 00:19:26.080 1317It tells you exactly like this is going to be a put, 1318 1319330 132000:19:26.080 --> 00:19:30.200 1321this is going to be a post, whatever. 1322 1323331 132400:19:30.200 --> 00:19:32.800 1325How to request permission for the data is often part 1326 1327332 132800:19:32.800 --> 00:19:37.000 1329of that broader spec. and how systems should handle the data, right? 1330 1331333 133200:19:39.000 --> 00:19:43.840 1333And so you can you can do that with those tools. 1334 1335334 133600:19:43.840 --> 00:19:46.760 1337But if you start looking at things like OpenAPI, 1338 1339335 134000:19:46.760 --> 00:19:52.560 1341then they start to sort of fall away on the definitions of the nouns. 1342 1343336 134400:19:52.560 --> 00:19:55.560 1345So opening API is great because it gives you 1346 1347337 134800:19:55.560 --> 00:19:57.720 1349a little bit of a shape of the data, 1350 1351338 135200:19:57.720 --> 00:19:59.760 1353but beyond sort of the 1354 1355339 135600:20:00.000 --> 00:20:05.000 1357basic sort of JSON types and it's really limited in that case. 1358 1359340 136000:20:05.000 --> 00:20:09.000 1361It doesn't really tell you about what the data means in a broader context. 1362 1363341 136400:20:10.000 --> 00:20:18.000 1365So all of this sort of ecosystem of defining data 1366 1367342 136800:20:18.000 --> 00:20:21.600 1369through these sort of specification languages, 1370 1371343 137200:20:21.600 --> 00:20:24.620 1373all this also omits this concept of domain ownership 1374 1375344 137600:20:24.620 --> 00:20:25.360 1377whatsoever. 1378 1379345 138000:20:25.360 --> 00:20:28.840 1381You're basically just saying data exists here it is. 1382 1383346 138400:20:28.840 --> 00:20:30.840 1385Or systems exist, here they are. 1386 1387347 138800:20:30.840 --> 00:20:36.360 1389We're not really talking about how to access them, who owns them, what they should do about them. 1390 1391348 139200:20:36.360 --> 00:20:40.600 1393So if you're going to build a good data product language, you need to have a concrete encoding of your domain 1394 1395349 139600:20:40.600 --> 00:20:41.080 1397ownership. 1398 1399350 140000:20:41.080 --> 00:20:43.640 1401So who owns this? 1402 1403351 140400:20:43.640 --> 00:20:45.400 1405What do they own it for? 1406 1407352 140800:20:45.400 --> 00:20:50.760 1409You need to have a clear and accessible and an extensible definition of the nouns and a flexible 1410 1411353 141200:20:51.240 --> 00:20:55.840 1413Versionable definition of the verbs. So you need to be able to encode change over time 1414 1415354 141600:20:56.040 --> 00:20:58.480 1417You need to be able to encode what it is that you're 1418 1419355 142000:20:58.480 --> 00:21:00.240 1421describing and who owns it. 1422 1423356 142400:21:00.240 --> 00:21:03.800 1425So those three things are key. 1426 1427357 142800:21:03.800 --> 00:21:11.520 1429As I mentioned, data mesh came, like was a great idea for trying to put together these data 1430 1431358 143200:21:11.520 --> 00:21:15.000 1433products, but it really suffered from like four key flaws. 1434 1435359 143600:21:15.000 --> 00:21:17.000 1437The first of it is if you know ThoughtWorks at all, 1438 1439360 144000:21:17.000 --> 00:21:19.000 1441ThoughtWorks tended to be a little bit dogmatic. 1442 1443361 144400:21:19.000 --> 00:21:22.000 1445It was a criticism of the company as long as I work there, 1446 1447362 144800:21:22.000 --> 00:21:25.760 1449that existed even before I worked there. 1450 1451363 145200:21:25.760 --> 00:21:29.160 1453And data mesh was like we were trying to do everything with 1454 1455364 145600:21:29.160 --> 00:21:30.080 1457microservices. 1458 1459365 146000:21:30.080 --> 00:21:37.560 1461And so we just sort of almost naively applied that to a really big data architecture project. 1462 1463366 146400:21:37.560 --> 00:21:38.360 1465And this exists. 1466 1467367 146800:21:38.360 --> 00:21:39.560 1469There's no tooling for it. 1470 1471368 147200:21:39.560 --> 00:21:47.000 1473There's no equivalent to Kubernetes for standing up like a massive large scale data processing system. 1474 1475369 147600:21:47.000 --> 00:21:50.000 1477Aside from Kubernetes, which is really difficult if you've ever 1478 1479370 148000:21:50.000 --> 00:21:51.920 1481actually tried to do big data processing and Kubernetes, like running Spark and Kubernetes is really difficult if you've ever actually tried to do big data processing and Kubernetes, 1482 1483371 148400:21:51.920 --> 00:21:54.480 1485like running Spark and Kubernetes is really difficult. 1486 1487372 148800:21:56.800 --> 00:21:58.800 1489A lot of change, like again, 1490 1491373 149200:21:58.800 --> 00:22:00.960 1493data mesh, it requires a lot of 1494 1495374 149600:22:00.960 --> 00:22:02.400 1497reeducation, right? 1498 1499375 150000:22:02.400 --> 00:22:04.720 1501The way that you work with data, the way that your data 1502 1503376 150400:22:04.720 --> 00:22:06.240 1505people are actually doing things. 1506 1507377 150800:22:06.240 --> 00:22:07.880 1509You have to teach them DevOps. 1510 1511378 151200:22:07.880 --> 00:22:09.780 1513You have to teach some agile. You have to teach them DevOps, you have to teach them Agile, you have to teach 1514 1515379 151600:22:09.780 --> 00:22:15.120 1517them domain ownership, really difficult. And then we also spent the last, like, 1518 1519380 152000:22:15.120 --> 00:22:18.640 1521decade decoupling software devs from any downstream 1522 1523381 152400:22:18.640 --> 00:22:20.240 1525responsibility for their data, and now we're 1526 1527382 152800:22:20.240 --> 00:22:22.640 1529turning around saying, oh, wait, no, no, no, now you have 1530 1531383 153200:22:22.640 --> 00:22:24.640 1533to be responsible for that again. 1534 1535384 153600:22:24.640 --> 00:22:28.800 1537And they didn't really like that. 1538 1539385 154000:22:28.800 --> 00:22:34.760 1541So I mentioned that we need all of these definitions 1542 1543386 154400:22:34.760 --> 00:22:38.000 1545around nouns and verbs and domain ownership and stuff like that. 1546 1547387 154800:22:38.000 --> 00:22:43.000 1549And yes, this is now check-offs going off because if you look at lexicon, you look at the spec. 1550 1551388 155200:22:43.000 --> 00:22:48.180 1553It actually kind of elegantly solved all three of those problems and I don't even think 1554 1555389 155600:22:48.180 --> 00:22:50.680 1557that it was trying to. 1558 1559390 156000:22:50.680 --> 00:22:59.640 1561The first thing that it will do is this reverse DNS addressing is brilliant. This solves a huge problem 1562 1563391 156400:22:59.640 --> 00:23:05.680 1565for the enterprise data space because now you can actually encode your domain ownership literally 1566 1567392 156800:23:05.680 --> 00:23:12.600 1569in a domain, like a domain name solution. 1570 1571393 157200:23:12.600 --> 00:23:15.800 1573And you're able to, you can do that across an entire organization. 1574 1575394 157600:23:15.800 --> 00:23:18.320 1577So you can actually set up an organization, 1578 1579395 158000:23:18.320 --> 00:23:20.920 1581give every department, every team, whatever, 1582 1583396 158400:23:20.920 --> 00:23:28.000 1585a path in that domain name, reverse it, and then they can host their own lexicons. 1586 1587397 158800:23:28.000 --> 00:23:30.700 1589It does come up with the definition of the nouns. 1590 1591398 159200:23:30.700 --> 00:23:34.100 1593This is actually, I don't know if you notice when I started it. 1594 1595399 159600:23:34.100 --> 00:23:35.580 1597I decided, I was really bored. 1598 1599400 160000:23:35.580 --> 00:23:37.580 1601So I decided yesterday. 1602 1603401 160400:23:37.580 --> 00:23:38.820 1605Instead of doing it like Google sites, 1606 1607402 160800:23:38.820 --> 00:23:41.000 1609I would build my own slideshow system on at Proto. 1610 1611403 161200:23:41.000 --> 00:23:45.000 1613So I vibration coded a slideshow. 1614 1615404 161600:23:45.000 --> 00:23:51.800 1617This is the lexicon for the slideshow. Slideshow system, this is the lexicon for the slideshow. 1618 1619405 162000:23:51.800 --> 00:23:54.960 1621And so you can see that there's the nouns or defined here. 1622 1623406 162400:23:54.960 --> 00:23:56.200 1625I need a title. 1626 1627407 162800:23:56.200 --> 00:24:00.000 1629I need a short description. I need maybe a time stamp when it's created. 1630 1631408 163200:24:00.000 --> 00:24:02.000 1633All of that is there and it's contextual. 1634 1635409 163600:24:02.000 --> 00:24:09.200 1637It's not just the data type, like the JSON data type. Of course that is in there, but it tells me what 1638 1639410 164000:24:09.200 --> 00:24:13.320 1641the meaning of that data is and why it's relevant and why I should care about it. 1642 1643411 164400:24:13.320 --> 00:24:16.800 1645And if I had gone a little bit farther, I would have been able to put the 1646 1647412 164800:24:16.800 --> 00:24:23.280 1649verbs in here as well for how to author a slideshow, how to assign permissions to a slideshow, 1650 1651413 165200:24:23.280 --> 00:24:25.360 1653how to revoke permissions to a slideshow. 1654 1655414 165600:24:25.360 --> 00:24:28.640 1657So I could turn this into essentially a Google Docs, 1658 1659415 166000:24:28.640 --> 00:24:31.280 1661or the Google Slides type of situation or PowerPoint type 1662 1663416 166400:24:31.280 --> 00:24:35.440 1665of situation by encoding what the actions are directly in 1666 1667417 166800:24:35.440 --> 00:24:47.260 1669the lexicon. So again, I mentioned this kind of casually solved all three of those problems, which would 1670 1671418 167200:24:47.260 --> 00:24:52.520 1673be a really big benefit for an enterprise, because not only does that mean that if you can define 1674 1675419 167600:24:52.520 --> 00:24:56.280 1677a lexicon for your data products, you can start sharing that data across your 1678 1679420 168000:24:56.280 --> 00:24:59.720 1681organization but also outside of your organization if you 1682 1683421 168400:25:00.000 --> 00:25:02.560 1685want to publish those lexicons openly. 1686 1687422 168800:25:02.560 --> 00:25:05.740 1689And this is, by the way, not even talking about using 1690 1691423 169200:25:05.740 --> 00:25:07.880 1693the PDS as your platform. 1694 1695424 169600:25:07.880 --> 00:25:09.880 1697I'm not necessarily talking about, like, 1698 1699425 170000:25:09.880 --> 00:25:13.600 1701over a place your data platform with a PDS and everything will be solved. 1702 1703426 170400:25:13.600 --> 00:25:20.200 1705I'm actually just talking about use a flexible data definition language that gets away 1706 1707427 170800:25:20.200 --> 00:25:25.000 1709from all of the sort of dogmatism of ontologies and all of that stuff 1710 1711428 171200:25:25.000 --> 00:25:32.000 1713and focus on actually shareability rather than fully flushing out the definition. 1714 1715429 171600:25:32.000 --> 00:25:34.000 1717This is a standard way of doing it. 1718 1719430 172000:25:34.000 --> 00:25:37.000 1721I think Emily Hunt spoke about this yesterday, 1722 1723431 172400:25:37.000 --> 00:25:40.400 1725sharing, exploding stars data 1726 1727432 172800:25:40.400 --> 00:25:44.440 1729through a globally accessible standardized feed 1730 1731433 173200:25:44.440 --> 00:25:51.480 1733that every astronomy lab in the world can subscribe to and then see the exact data format for that 1734 1735434 173600:25:51.480 --> 00:25:56.160 1737data without having to go through and set up their own Kafka consumers without having to run 1738 1739435 174000:25:56.160 --> 00:25:58.300 1741a Kafka cluster. all of that stuff. 1742 1743436 174400:25:58.300 --> 00:26:00.300 1745That's a really compelling idea. 1746 1747437 174800:26:00.300 --> 00:26:05.640 1749It's a really, really, really, really brilliant way of being able to make that data available 1750 1751438 175200:26:05.640 --> 00:26:09.680 1753in a way that people can not just access it but understand 1754 1755439 175600:26:09.680 --> 00:26:12.360 1757what it means. 1758 1759440 176000:26:12.360 --> 00:26:17.400 1761So then this, like what I look at, actually the first time I saw lexicon, I said, 1762 1763441 176400:26:17.400 --> 00:26:22.000 1765I've been trying to develop this as a data product specification language for the last three years. 1766 1767442 176800:26:22.000 --> 00:26:28.000 1769And here it is. So it's worth exploring. I think that this is actually relevant. 1770 1771443 177200:26:28.000 --> 00:26:33.000 1773Because if you know me from any of my other work, you know that I don't really like fascists. 1774 1775444 177600:26:33.000 --> 00:26:41.000 1777And one of the problems in the data space is that I've mentioned the sort of cyclical behavior. 1778 1779445 178000:26:41.000 --> 00:26:44.560 1781We've gone from data warehouses to data lakes, 1782 1783446 178400:26:44.560 --> 00:26:46.720 1785to streaming architectures and back again. 1786 1787447 178800:26:46.720 --> 00:26:49.640 1789And the trend right now is towards consolidation. 1790 1791448 179200:26:49.640 --> 00:26:51.480 1793And so you see companies like Palant here 1794 1795449 179600:26:51.480 --> 00:26:53.920 1797who go up to these big organizations and government 1798 1799450 180000:26:53.920 --> 00:26:57.040 1801agencies, and they say we can solve your data consolidation 1802 1803451 180400:26:57.040 --> 00:26:59.040 1805problems. 1806 1807452 180800:26:59.040 --> 00:27:04.920 1809And I've never known anyone who's used Palantir, who likes it. Right? But they sell 1810 1811453 181200:27:04.920 --> 00:27:10.480 1813it and they make a fuckload of money selling it because they're promising companies to solve 1814 1815454 181600:27:10.480 --> 00:27:15.200 1817exactly this problem that we can make all of your data accessible in one 1818 1819455 182000:27:15.200 --> 00:27:22.960 1821place really easily. And so my argument is if we want to be able to break the 1822 1823456 182400:27:22.960 --> 00:27:28.320 1825stranglehold that these rent-seeking platform organizations like Palant here, 1826 1827457 182800:27:28.320 --> 00:27:31.120 1829like Oracle, like all of these other companies have, 1830 1831458 183200:27:31.120 --> 00:27:34.160 1833we need to have a decentralized way of sharing data, 1834 1835459 183600:27:34.160 --> 00:27:37.760 1837and it needs to be consistent, and we need to get away from trying to 1838 1839460 184000:27:37.760 --> 00:27:42.480 1841overly define the languages and the specs and all of that stuff, the ontologies and we need 1842 1843461 184400:27:42.480 --> 00:27:47.520 1845to focus on the ability for sharing and decentralizing that data. 1846 1847462 184800:27:47.520 --> 00:27:49.520 1849That is necessary to fight fascism. 1850 1851463 185200:27:49.520 --> 00:27:52.440 1853It's not just good business. 1854 1855464 185600:27:52.440 --> 00:27:56.400 1857So what do we need to make that work? 1858 1859465 186000:27:56.400 --> 00:28:00.600 1861The problem with I mentioned going back to the beginning is that software developers love 1862 1863466 186400:28:00.600 --> 00:28:05.000 1865working with cool technologies, TypeScript and Go and Rust and all that stuff. 1866 1867467 186800:28:05.000 --> 00:28:07.600 1869And us data folks, we don't know any of that. 1870 1871468 187200:28:07.600 --> 00:28:09.400 1873We don't work in those languages. 1874 1875469 187600:28:09.400 --> 00:28:10.840 1877We need better Python tooling. 1878 1879470 188000:28:10.840 --> 00:28:15.200 1881The Marshall X library is great, but I couldn't use it to build my own lexicons. 1882 1883471 188400:28:15.200 --> 00:28:21.240 1885Maybe I'm not smart enough, but I just couldn't get it to work to build my own stuff that wasn't built around Bluesky. So I had to write my own stuff that wasn't built around Bluesky. 1886 1887472 188800:28:21.240 --> 00:28:27.720 1889So I had to write my own lexicon like data class generator, 1890 1891473 189200:28:27.720 --> 00:28:29.640 1893which was hard because I don't know, 1894 1895474 189600:28:29.640 --> 00:28:31.000 1897like I'm not a computer science person. 1898 1899475 190000:28:31.000 --> 00:28:33.000 1901I don't know what an abstract syntax tree is. 1902 1903476 190400:28:33.000 --> 00:28:37.000 1905I think that's like the thing that the spotted lantern flies eat. 1906 1907477 190800:28:37.000 --> 00:28:43.160 1909But if you actually look at some of the stuff like lexicon.gov is an amazing like it's a global 1910 1911478 191200:28:43.160 --> 00:28:50.620 1913data catalog like it's solved a problem that I've watched companies pour millions of dollars into and it just solved it like go 1914 1915479 191600:28:50.620 --> 00:28:51.620 1917That's cool 1918 1919480 192000:28:51.620 --> 00:28:55.440 1921We can you can register your lexicons and then we can make a data catalog for them 1922 1923481 192400:28:56.080 --> 00:28:58.880 1925And it just did it. 1926 1927482 192800:28:58.880 --> 00:29:01.200 1929So applying that tooling and making that tooling 1930 1931483 193200:29:01.200 --> 00:29:03.280 1933available at the enterprise level or the business level 1934 1935484 193600:29:03.280 --> 00:29:06.640 1937would be really great. 1938 1939485 194000:29:06.640 --> 00:29:07.800 1941And then we also need a mindset. 1942 1943486 194400:29:07.800 --> 00:29:09.920 1945Like a lot of the conversation I've been having here 1946 1947487 194800:29:09.920 --> 00:29:12.400 1949was really wrapped around social internet, 1950 1951488 195200:29:12.400 --> 00:29:14.360 1953which is awesome because I really believe 1954 1955489 195600:29:14.360 --> 00:29:17.040 1957in the social internet and like was moved to tears for 1958 1959490 196000:29:17.040 --> 00:29:22.680 1961errands talk today. But this is so much bigger than just social internet. Right. This is data 1962 1963491 196400:29:22.680 --> 00:29:27.000 1965exchange at scale, which is more than just people talking to people. 1966 1967492 196800:29:27.000 --> 00:29:30.000 1969It's also about exchanging information. 1970 1971493 197200:29:30.000 --> 00:29:33.500 1973We need better permissions management and private data. 1974 1975494 197600:29:33.500 --> 00:29:42.720 1977I know all of that is in progress, but right now you can't have somebody publishing their internal data to a PDS and then like putting it out there. 1978 1979495 198000:29:42.720 --> 00:29:45.320 1981And then we also need to have the tooling 1982 1983496 198400:29:45.320 --> 00:29:49.480 1985to be able to run that stuff inside closed ecosystems. 1986 1987497 198800:29:49.480 --> 00:29:55.000 1989And then also like brave data people to start building radical shit because again there's 1990 1991498 199200:29:55.000 --> 00:29:59.760 1993too much conservatives conservatism in the data space unwilliness to 1994 1995499 199600:30:00.000 --> 00:30:02.760 1997to try new approaches, try new technologies. 1998 1999500 200000:30:02.760 --> 00:30:05.880 2001If more data people can start doing things like, hey look, 2002 2003501 200400:30:05.880 --> 00:30:08.640 2005I just solved a globally distributed data problem 2006 2007502 200800:30:08.640 --> 00:30:14.000 2009with this cool protocol and some Python and some lexicons, 2010 2011503 201200:30:14.000 --> 00:30:17.000 2013then I think that we'll be able to do some really great stuff. 2014 2015504 201600:30:17.000 --> 00:30:20.000 2017So I'll wrap it up there just about on time. 2018 2019505 202000:30:20.000 --> 00:30:21.660 2021Thank you so much. Let's chat about data. 2022 2023506 202400:30:21.660 --> 00:30:23.660 2025Thank you. 2026