this repo has no description
atmosphereconf-vods.wisp.place/
1WEBVTT
2
31
400:00:00.000 --> 00:02:05.000
5you you you you you you you you you you you you you Excellent.
6
72
800:02:05.000 --> 00:02:10.400
9Thank you very much.
10
113
1200:02:10.400 --> 00:02:16.800
13I know that we're running weird on some time somewhere, so people can perfect.
14
154
1600:02:16.800 --> 00:02:17.800
17I won't need it all.
18
195
2000:02:17.800 --> 00:02:20.440
21I'll go through this quickly. Hopefully. But yeah,
22
236
2400:02:20.440 --> 00:02:24.520
25thank you for being here. Thank you for joining us talk. Enterprise data is maybe not the
26
277
2800:02:24.520 --> 00:02:31.000
29like most exciting topic, but I hope that I will make this interesting for you.
30
318
3200:02:31.000 --> 00:02:36.120
33There we go. All right. So the title of this talk, it's the thing
34
359
3600:02:36.120 --> 00:02:37.880
37where you, when the headline is a question,
38
3910
4000:02:37.880 --> 00:02:40.640
41and it can be answered no, maybe it's not worth your time.
42
4311
4400:02:40.640 --> 00:02:43.000
45So I'm going to tell you a little bit in advance
46
4712
4800:02:43.000 --> 00:02:46.560
49that the answer of did Lex kindicon accidentally solve the enterprise data problem.
50
5113
5200:02:46.560 --> 00:02:48.400
53The answer is probably no, but I want to give this talk
54
5514
5600:02:48.400 --> 00:02:50.640
57because I really want it to be yes.
58
5915
6000:02:50.640 --> 00:02:58.360
61And I'm going to explain why over the next say 29 minutes and 16 seconds.
62
6316
6400:02:58.360 --> 00:02:59.360
65So a little bit about me.
66
6717
6800:02:59.360 --> 00:03:01.160
69My name is Emily Gorsensky.
70
7118
7200:03:01.160 --> 00:03:04.280
73I'm a data scientist by background.
74
7519
7600:03:04.280 --> 00:03:10.640
77I was actually R&D is sort of where I started my career.
78
7920
8000:03:10.640 --> 00:03:13.400
81I realized at some point that all the algorithms I was doing
82
8321
8400:03:13.400 --> 00:03:16.280
85for biotechnology and aerospace and all that good stuff
86
8722
8800:03:16.280 --> 00:03:18.480
89was in this new and upcoming field that they were calling
90
9123
9200:03:18.480 --> 00:03:22.160
93data science and that those people get paid a lot more money than I was making.
94
9524
9600:03:22.160 --> 00:03:24.920
97So I just said I'm a data scientist now and it's stuck.
98
9925
10000:03:24.920 --> 00:03:29.080
101But you can find me on the internet here.
102
10326
10400:03:29.080 --> 00:03:32.120
105And how did I get into AT Proto stuff?
106
10727
10800:03:32.120 --> 00:03:34.560
109Well, it started because I think that the internet
110
11128
11200:03:34.560 --> 00:03:35.800
113is horrible.
114
11529
11600:03:35.800 --> 00:03:41.080
117And I really wanted to curate my own presence on the internet.
118
11930
12000:03:41.080 --> 00:03:45.320
121So if you've ever had a tweet or a ski go viral,
122
12331
12400:03:45.320 --> 00:03:46.760
125it's like the worst thing in the world.
126
12732
12800:03:46.760 --> 00:03:50.040
129It's actually something you want to happen.
130
13133
13200:03:50.040 --> 00:03:51.540
133So I created a little script called sketer
134
13534
13600:03:51.540 --> 00:03:54.640
137de leader, which I used to curate my own feed.
138
13935
14000:03:54.640 --> 00:03:56.360
141So I get to pick the things that I want to keep,
142
14336
14400:03:56.360 --> 00:03:58.240
145so that if anyone visits me, they
146
14737
14800:03:58.240 --> 00:04:02.240
149can't go back in my timeline and find things to cancel me over.
150
15138
15200:04:02.240 --> 00:04:06.000
153But I will say some things that you'll probably cancel me over today.
154
15539
15600:04:06.000 --> 00:04:08.000
157Data things.
158
15940
16000:04:08.000 --> 00:04:11.000
161I also run a labeler called brand block online because the
162
16341
16400:04:11.000 --> 00:04:12.000
165Internet is horrible.
166
16742
16800:04:12.000 --> 00:04:16.320
169The worst part about it is the evil brands who are coming in trying to be funny. So I built a
170
17143
17200:04:16.320 --> 00:04:21.920
173labeler to block the brands. So if you don't want to see on your timeline, Arby's
174
17544
17600:04:21.920 --> 00:04:24.000
177trying to joke with Taco Bell.
178
17945
18000:04:24.000 --> 00:04:26.000
181This is your tool.
182
18346
18400:04:26.000 --> 00:04:30.000
185And this is something new that I just recently built,
186
18747
18800:04:30.000 --> 00:04:32.000
189because I finally figured out how to do OAuth,
190
19148
19200:04:32.000 --> 00:04:35.680
193the kind of for AT Prodo.
194
19549
19600:04:35.680 --> 00:04:39.000
197This is just like a little wish list slash registry
198
19950
20000:04:39.000 --> 00:04:42.800
201slash mutual aid compilation tool.
202
20351
20400:04:42.800 --> 00:04:45.040
205I'm going to be opening up this up for beta.
206
20752
20800:04:45.040 --> 00:04:48.760
209So if you want to test it out, I'm going to try to launch
210
21153
21200:04:48.760 --> 00:04:50.080
213this by the end of the conference.
214
21554
21600:04:50.080 --> 00:04:52.000
217But you do need to follow me on Bluesky.
218
21955
22000:04:52.000 --> 00:04:54.920
221That's not intentionally manipulative.
222
22356
22400:04:54.920 --> 00:04:58.320
225It's just the easiest way that I have to limit the number
226
22757
22800:04:58.320 --> 00:05:00.000
229of people using it right now.
230
23158
23200:05:00.000 --> 00:05:04.160
233I don't feel like doing an invite system.
234
23559
23600:05:04.160 --> 00:05:08.040
237And then my professional world, I'm a CTO of a startup.
238
23960
24000:05:08.040 --> 00:05:11.400
241The title is fancier than the actual.
242
24361
24400:05:11.400 --> 00:05:14.000
245What I'm doing day to day, what I'm doing day to day,
246
24762
24800:05:14.000 --> 00:05:15.520
249what I'm doing day to day is actually building
250
25163
25200:05:15.520 --> 00:05:17.880
253just a counting software.
254
25564
25600:05:17.880 --> 00:05:25.360
257So if you have quick books data and you want dashboards, I can do that.
258
25965
26000:05:25.360 --> 00:05:27.440
261But before I was, I joined the startup world.
262
26366
26400:05:27.440 --> 00:05:33.720
265I was doing consulting for eight years, which is why I talk about enterprise data.
266
26767
26800:05:33.720 --> 00:05:37.600
269So the thing that's interesting about the enterprise data space, and I told you that
270
27168
27200:05:37.600 --> 00:05:38.600
273I'm a data scientist.
274
27569
27600:05:38.600 --> 00:05:40.000
277I came from an R&D background.
278
27970
28000:05:40.000 --> 00:05:42.000
281I peaked in Fortran.
282
28371
28400:05:42.000 --> 00:05:44.000
285I'm not one of these web people.
286
28772
28800:05:44.000 --> 00:05:46.000
289If you look at the AT Proto or I'm sorry,
290
29173
29200:05:46.000 --> 00:05:48.920
293at Proto, I learned that it's at Proto.
294
29574
29600:05:48.920 --> 00:05:51.760
297I thought it was actually Austria Proto for a while.
298
29975
30000:05:53.440 --> 00:06:00.360
301If you look at the tooling, it's in TypeScript, it's in Go, and it's in, if you're
302
30376
30400:06:00.360 --> 00:06:02.680
305really cool, it's in Rust.
306
30777
30800:06:02.680 --> 00:06:05.760
309And those are like the hip languages, right?
310
31178
31200:06:05.760 --> 00:06:09.000
313And those aren't the languages that data people use.
314
31579
31600:06:09.000 --> 00:06:11.880
317But it's also reflective of the mindset
318
31980
32000:06:11.880 --> 00:06:14.480
321that at Protod developers have, which
322
32381
32400:06:14.480 --> 00:06:16.520
325is that when you're building web applications, you're
326
32782
32800:06:16.520 --> 00:06:18.480
329treating data like a hot potato.
330
33183
33200:06:18.480 --> 00:06:22.080
333Like data comes in, you want to get rid of it as quickly as you can, right?
334
33584
33600:06:22.080 --> 00:06:25.560
337Because longer that you're holding on to data and processing it and doing things to it,
338
33985
34000:06:25.560 --> 00:06:28.720
341the harder your system becomes to maintain an operate
342
34386
34400:06:28.720 --> 00:06:30.640
345at scale.
346
34787
34800:06:30.640 --> 00:06:35.000
349And so web developers have built all of these really cool tools and languages.
350
35188
35200:06:35.000 --> 00:06:36.000
353And they're like, really hip.
354
35589
35600:06:36.000 --> 00:06:39.600
357We've got these really cool conferences and places like the University of British Columbia.
358
35990
36000:06:39.600 --> 00:06:44.000
361And they wear jeans to work and they're really awesome.
362
36391
36400:06:44.000 --> 00:06:48.240
365And data people are kind of different than that.
366
36792
36800:06:48.240 --> 00:06:50.880
369Like they kind of emerge from the world of database
370
37193
37200:06:50.880 --> 00:06:52.640
373administrators and database administrators
374
37594
37600:06:52.640 --> 00:06:56.960
377are famous for being grumpy and they don't like change and they don't like you and they don't
378
37995
38000:06:56.960 --> 00:07:01.080
381like your fancy languages like we're all toiling in the SQL minds.
382
38396
38400:07:01.080 --> 00:07:04.960
385We might be writing some Python if you really like
386
38797
38800:07:04.960 --> 00:07:13.000
389fancy might do a little scholar or spark in the data space. And a lot of these folks are sitting there maintaining systems that have been around for 25 years.
390
39198
39200:07:13.000 --> 00:07:16.000
393And they're like not wearing jeans and hoodies to work.
394
39599
39600:07:16.000 --> 00:07:18.200
397They're wearing like khakis and polos
398
399100
40000:07:18.200 --> 00:07:19.440
401because they have serious jobs
402
403101
40400:07:19.440 --> 00:07:21.600
405and they're holding serious data
406
407102
40800:07:21.600 --> 00:07:23.520
409that all of these like multi-billion companies
410
411103
41200:07:23.520 --> 00:07:26.360
413will stop operating if anything happens to it.
414
415104
41600:07:26.360 --> 00:07:27.780
417So they don't want to do change.
418
419105
42000:07:27.780 --> 00:07:31.660
421They don't want to do a lot of fancy technology development.
422
423106
42400:07:31.660 --> 00:07:33.140
425And this is a little bit of hyperbole.
426
427107
42800:07:33.140 --> 00:07:35.380
429I have known database administrators who
430
431108
43200:07:35.380 --> 00:07:38.640
433from time to time do where genes.
434
435109
43600:07:38.640 --> 00:07:43.160
437The thing about enterprise data is that it's not really technology development.
438
439110
44000:07:43.160 --> 00:07:47.600
441It's anthropology. Because when you're dealing with a company that has been around
442
443111
44400:07:47.600 --> 00:07:55.140
445for 20, 30, 50, 100, 150, sometimes longer years, the information that you have is all
446
447112
44800:07:55.140 --> 00:08:02.240
449historical and it reflects the relationships of a business, its departments, its entities,
450
451113
45200:08:02.240 --> 00:08:05.000
453its customers, and its suppliers. it's customers and it's suppliers.
454
455114
45600:08:05.000 --> 00:08:10.780
457You all are familiar with Conway's law, that the architecture of a system reflects the
458
459115
46000:08:10.780 --> 00:08:14.400
461communication patterns of a company.
462
463116
46400:08:14.400 --> 00:08:17.400
465Well, that all comes through an enterprise data.
466
467117
46800:08:17.400 --> 00:08:21.240
469I've actually worked with data systems that you can pinpoint the exact day,
470
471118
47200:08:21.240 --> 00:08:24.240
473that two teams stop talking to each other.
474
475119
47600:08:24.240 --> 00:08:26.760
477And you can do that because the fields change.
478
479120
48000:08:26.760 --> 00:08:35.640
481And then you have to carry around this conditional with a magically coded date because on June 16th
482
483121
48400:08:35.640 --> 00:08:40.760
485of 2007, these two teams went through a reorg.
486
487122
48800:08:40.760 --> 00:08:44.920
489And then all of your logic has to carry that through for all time.
490
491123
49200:08:44.920 --> 00:08:47.920
493And so this is like an anthropological problem,
494
495124
49600:08:47.920 --> 00:08:50.600
497which makes sharing and using this data
498
499125
50000:08:50.600 --> 00:08:55.600
501and doing anything meaningful with it very, very difficult.
502
503126
50400:08:55.600 --> 00:08:57.500
505So what is the problem with enterprise data?
506
507127
50800:08:57.500 --> 00:09:00.500
509The problem is that data access is slow, it's difficult,
510
511128
51200:09:00.500 --> 00:09:02.060
513it's expensive.
514
515129
51600:09:02.060 --> 00:09:05.700
517So is the data engineering.
518
519130
52000:09:05.700 --> 00:09:08.080
521I've gone to data engineering teams as a consultant.
522
523131
52400:09:08.080 --> 00:09:09.400
525They said, tell me about your problems.
526
527132
52800:09:09.400 --> 00:09:10.600
529They said, well, here's our backlog.
530
531133
53200:09:10.600 --> 00:09:14.880
533And I said, how many tickets do you do each week?
534
535134
53600:09:14.880 --> 00:09:17.920
537And they said, that's ambitious.
538
539135
54000:09:17.920 --> 00:09:20.920
541We do maybe 20 tickets per quarter.
542
543136
54400:09:20.920 --> 00:09:22.760
545And they said, well, how many new tickets per quarter
546
547137
54800:09:22.760 --> 00:09:23.120
549do you get?
550
551138
55200:09:23.120 --> 00:09:25.480
553They say 40.
554
555139
55600:09:25.480 --> 00:09:26.800
557So that's not a solution.
558
559140
56000:09:26.800 --> 00:09:29.920
561That's not like a situation is getting better.
562
563141
56400:09:29.920 --> 00:09:32.040
565And then you also see things like data science
566
567142
56800:09:32.040 --> 00:09:35.800
569seems like early mid 2000s, everyone's like
570
571143
57200:09:35.800 --> 00:09:39.280
573oh you got to have data scientists because AI is coming.
574
575144
57600:09:39.280 --> 00:09:42.800
577So they went out and they pulled all of these people out of academia.
578
579145
58000:09:42.800 --> 00:09:48.400
581A lot of neuropsych people, a lot of astronomy physics,
582
583146
58400:09:48.400 --> 00:09:50.560
585all of that people who know good mathies stuff,
586
587147
58800:09:50.560 --> 00:09:51.840
589and they put them in a team, they didn't really
590
591148
59200:09:51.840 --> 00:09:54.880
593explain to them how to work in an enterprise environment.
594
595149
59600:09:54.880 --> 00:09:59.400
597And so they built a lot of stuff that was really cool, but it wasn't in phase of product development.
598
599150
60000:10:00.000 --> 00:10:02.060
601And so then you have these really cool algorithms
602
603151
60400:10:02.060 --> 00:10:03.920
605that nobody knows how to deploy.
606
607152
60800:10:03.920 --> 00:10:07.480
609Nobody knows how to run the code for it.
610
611153
61200:10:07.480 --> 00:10:08.880
613And by the time you actually get it,
614
615154
61600:10:08.880 --> 00:10:10.920
617they're like six months out of date anyways,
618
619155
62000:10:10.920 --> 00:10:13.400
621so it doesn't actually add value.
622
623156
62400:10:13.400 --> 00:10:17.760
625And then data folks don't really do agile.
626
627157
62800:10:17.760 --> 00:10:24.000
629Like, version control, like there's still clicking tools, like building ETL pipelines with drag and drop, right?
630
631158
63200:10:24.000 --> 00:10:26.000
633Like, continuous delivery isn't a thing.
634
635159
63600:10:26.000 --> 00:10:28.000
637CD, it's like, I think that you put music on.
638
639160
64000:10:28.000 --> 00:10:32.080
641And DevOps is a fancy team to data folks
642
643161
64400:10:32.080 --> 00:10:33.280
645that nobody can explain, right?
646
647162
64800:10:33.280 --> 00:10:35.720
649Like the data folks they don't really understand
650
651163
65200:10:35.720 --> 00:10:39.160
653how to do rapid fast agile software development.
654
655164
65600:10:39.160 --> 00:10:42.280
657Again, hyperbole, because a lot of things are getting better.
658
659165
66000:10:42.280 --> 00:10:44.800
661But if you go into an enterprise, there's still a lot of stuff
662
663166
66400:10:44.800 --> 00:10:46.880
665where they don't even know how to use Git.
666
667167
66800:10:46.880 --> 00:10:49.680
669I've actually gone into clients at big companies
670
671168
67200:10:49.680 --> 00:10:51.200
673and had day one.
674
675169
67600:10:51.200 --> 00:10:53.320
677I'm like, all right, let's talk about your architecture.
678
679170
68000:10:53.320 --> 00:10:59.000
681And by day three, I'm like, OK, here's how to do Git status. And the problem is that, like, okay, here's how to do get status.
682
683171
68400:10:59.000 --> 00:11:02.200
685And the problem is that within software development,
686
687172
68800:11:02.200 --> 00:11:04.200
689we've favored software developers.
690
691173
69200:11:04.200 --> 00:11:07.000
693We've given them a lot of tools to build software really quick.
694
695174
69600:11:07.000 --> 00:11:10.000
697And a lot of what we've done is we've absolve them of the duty and
698
699175
70000:11:10.000 --> 00:11:13.000
701the responsibility to care about things like data quality,
702
703176
70400:11:13.000 --> 00:11:15.920
705data semantics, the relevance of the data.
706
707177
70800:11:15.920 --> 00:11:18.160
709We're just like, here, keep the data quick,
710
711178
71200:11:18.160 --> 00:11:19.120
713throw it somewhere.
714
715179
71600:11:19.120 --> 00:11:21.960
717And then they're pushing it to Kafka Streams and stuff like that.
718
719180
72000:11:21.960 --> 00:11:23.200
721And it's all great.
722
723181
72400:11:23.200 --> 00:11:24.000
725It's great for them in, in,
726
727182
72800:11:24.000 --> 00:11:30.000
729and then some data engineer has to sort out that mess.
730
731183
73200:11:30.000 --> 00:11:32.000
733And we've tried solving this with like lots of
734
735184
73600:11:32.000 --> 00:11:35.720
737different architectures. If you talk to data people they love talking about architecture,
738
739185
74000:11:35.720 --> 00:11:37.640
741we've gone from data warehouses to data lakes.
742
743186
74400:11:37.640 --> 00:11:40.760
745Data lakes became data swamps.
746
747187
74800:11:40.760 --> 00:11:43.800
749We've built things like data vaults, which is, I guess,
750
751188
75200:11:43.800 --> 00:11:45.600
753if you really want to make your data model
754
755189
75600:11:45.600 --> 00:11:47.480
757complicated data vault is great.
758
759190
76000:11:47.480 --> 00:11:50.160
761We've added Greek letters to it, a lambda architecture,
762
763191
76400:11:50.160 --> 00:11:52.960
765or a kappa architecture.
766
767192
76800:11:52.960 --> 00:11:56.640
769And this is all gone sort of in a cyclical pattern, right?
770
771193
77200:11:56.640 --> 00:12:01.400
773This shift between centralization and decentralization.
774
775194
77600:12:01.400 --> 00:12:05.880
777What happens is most enterprises like we need a big database.
778
779195
78000:12:05.880 --> 00:12:08.080
781They build a big database and they're like,
782
783196
78400:12:08.080 --> 00:12:10.760
785here's our data warehouse, it's OLAB, it's all this stuff.
786
787197
78800:12:10.760 --> 00:12:13.560
789Then there's a bottleneck and there's a backlog.
790
791198
79200:12:13.560 --> 00:12:15.360
793They say this isn't working and then finally,
794
795199
79600:12:15.360 --> 00:12:18.720
797somebody with enough political clout in the organization is like, screw you all
798
799200
80000:12:18.720 --> 00:12:20.920
801on building my own database.
802
803201
80400:12:20.920 --> 00:12:23.280
805And then that happens once, and then the next team is like, well, they did it.
806
807202
80800:12:23.280 --> 00:12:24.680
809So I'm going to do it.
810
811203
81200:12:24.680 --> 00:12:28.240
813And so the next thing you know, like your three, four, five years later, and now you've got the
814
815204
81600:12:28.240 --> 00:12:32.160
817shadow IT situation going on, there's all of these different data systems.
818
819205
82000:12:32.160 --> 00:12:35.120
821And somebody goes, why are we spending all this money on data systems?
822
823206
82400:12:35.120 --> 00:12:37.000
825Let's do another big consolidation.
826
827207
82800:12:37.000 --> 00:12:39.360
829I've come in on the back end of $100 million
830
831208
83200:12:39.360 --> 00:12:43.120
833failed data architecture consolidation projects.
834
835209
83600:12:43.120 --> 00:12:46.500
837Tons of money go into this.
838
839210
84000:12:46.500 --> 00:12:49.560
841It's all been very expensive and nothing has worked.
842
843211
84400:12:49.560 --> 00:12:51.500
845Nothing has worked.
846
847212
84800:12:51.500 --> 00:12:53.820
849Sometimes things get a little bit better,
850
851213
85200:12:53.820 --> 00:12:57.000
853but every organization I talk to has the same exact problems.
854
855214
85600:12:57.000 --> 00:13:05.400
857And so when I was at ThoughtWorks we came up with this solution that I'm going to talk about.
858
859215
86000:13:05.400 --> 00:13:06.920
861But a little bit about why it doesn't work,
862
863216
86400:13:06.920 --> 00:13:09.720
865sorry, I almost skipped a slide here.
866
867217
86800:13:09.720 --> 00:13:11.800
869You just have this situation of data governance
870
871218
87200:13:11.800 --> 00:13:14.020
873as a mass, data catalogs are all
874
875219
87600:13:14.020 --> 00:13:15.560
877kind of terrible.
878
879220
88000:13:15.560 --> 00:13:19.220
881Schematics are just combobulated, semantics are even worse.
882
883221
88400:13:19.220 --> 00:13:20.460
885Like systems don't talk to each other.
886
887222
88800:13:20.460 --> 00:13:22.480
889People don't talk to each other. People don't talk to each other.
890
891223
89200:13:22.480 --> 00:13:26.240
893It's just a bad situation for most of the time.
894
895224
89600:13:26.240 --> 00:13:29.160
897And what we actually need is a scalable and consistent way
898
899225
90000:13:29.160 --> 00:13:30.640
901to define the exchange of data
902
903226
90400:13:30.640 --> 00:13:36.480
905between organization systems, parties, companies, whatever.
906
907227
90800:13:36.480 --> 00:13:38.240
909We need platform independence, right?
910
911228
91200:13:38.240 --> 00:13:42.040
913So we have systems, lots of companies are like,
914
915229
91600:13:42.040 --> 00:13:45.040
917we're going to be on Azure and AWS and Google
918
919230
92000:13:45.040 --> 00:13:47.760
921because we don't want to put all our eggs in one basket.
922
923231
92400:13:47.760 --> 00:13:49.920
925And now you have three different types of ecosystems.
926
927232
92800:13:49.920 --> 00:13:51.000
929It's like Tower of Babel.
930
931233
93200:13:51.000 --> 00:13:53.920
933Nobody's speaking the same data language.
934
935234
93600:13:53.920 --> 00:13:57.920
937And we need lightweight oriented models.
938
939235
94000:13:57.920 --> 00:14:00.360
941So when I was at ThoughtWorks, we came up with this idea
942
943236
94400:14:00.360 --> 00:14:01.080
945called data mesh.
946
947237
94800:14:01.080 --> 00:14:02.760
949Data mesh is a bit like communism.
950
951238
95200:14:02.760 --> 00:14:10.000
953It came out of conditions. And the conditions that we talked about were all of this like fragmented
954
955239
95600:14:10.000 --> 00:14:17.000
957ecosystem of not having this way, this common way of defining what it is that you mean
958
959240
96000:14:17.000 --> 00:14:18.000
961when you're talking about data.
962
963241
96400:14:18.000 --> 00:14:21.000
965So if you don't know what you're talking about when it comes to data,
966
967242
96800:14:21.000 --> 00:14:27.000
969you don't know how to even describe a way to share and exchange it.
970
971243
97200:14:27.000 --> 00:14:33.840
973So, data mesh came up with, it's a decentralized model of working with data that focuses on data as a product.
974
975244
97600:14:33.840 --> 00:14:36.400
977So we actually wanted to think about what
978
979245
98000:14:36.400 --> 00:14:38.080
981if you worked with big data in the same way
982
983246
98400:14:38.080 --> 00:14:40.080
985that you work with microservices.
986
987247
98800:14:40.080 --> 00:14:41.840
989Which is kind of a nonsensical way of thinking
990
991248
99200:14:41.840 --> 00:14:43.000
993about it because microservices
994
995249
99600:14:43.000 --> 00:14:50.200
997are designed to be really, really small and big data is like by definition very, very big.
998
999250
100000:14:50.200 --> 00:14:55.000
1001So we came up with these ideas of like, okay, let's make data a product. Well, when
1002
1003251
100400:14:55.000 --> 00:14:58.680
1005you're doing microservices development, you're usually domain oriented. You usually
1006
1007252
100800:14:58.680 --> 00:14:59.840
1009have a platform like Kubernetes.
1010
1011253
101200:15:00.000 --> 00:15:02.920
1013and that is something to deploy your service on.
1014
1015254
101600:15:02.920 --> 00:15:04.400
1017And then, of course, everyone said, well,
1018
1019255
102000:15:04.400 --> 00:15:05.120
1021that doesn't work.
1022
1023256
102400:15:05.120 --> 00:15:05.960
1025We need to govern it.
1026
1027257
102800:15:05.960 --> 00:15:08.520
1029And so then we stapled on federated computational
1030
1031258
103200:15:08.520 --> 00:15:11.600
1033governance to make them happy.
1034
1035259
103600:15:11.600 --> 00:15:15.920
1037And then try to figure out what that was going to be.
1038
1039260
104000:15:15.920 --> 00:15:17.720
1041And this is a really great theory.
1042
1043261
104400:15:17.720 --> 00:15:18.960
1045Again, data mesh is like communism.
1046
1047262
104800:15:18.960 --> 00:15:25.920
1049It's really great in theory. The practice has been a little bit less stellar in some cases.
1050
1051263
105200:15:25.920 --> 00:15:30.120
1053Because domain-oriented data products, they're a good idea,
1054
1055264
105600:15:30.120 --> 00:15:31.680
1057but we haven't given anyone the tools
1058
1059265
106000:15:31.680 --> 00:15:35.200
1061of how to actually do it and implement it correctly.
1062
1063266
106400:15:37.760 --> 00:15:42.240
1065So, what if we just pretended like those challenges didn't exist?
1066
1067267
106800:15:42.240 --> 00:15:45.480
1069Like what if we just decided that we're going to start from scratch?
1070
1071268
107200:15:45.480 --> 00:15:48.080
1073We're not going to talk about all of your semantic layer. We're not going to talk about all of your legacy systems. We're just going to talk about all of your semantic layer.
1074
1075269
107600:15:48.080 --> 00:15:50.480
1077We're not going to talk about all of your legacy systems.
1078
1079270
108000:15:50.480 --> 00:15:53.360
1081We're just going to build a microservice around data.
1082
1083271
108400:15:53.360 --> 00:15:55.160
1085What would that look like?
1086
1087272
108800:15:55.160 --> 00:15:58.080
1089It's actually not that bad of an idea.
1090
1091273
109200:15:58.080 --> 00:16:02.280
1093The problem is, once you start to try to implement that,
1094
1095274
109600:16:02.280 --> 00:16:03.560
1097everything falls apart.
1098
1099275
110000:16:03.560 --> 00:16:04.840
1101The tooling sucks.
1102
1103276
110400:16:04.840 --> 00:16:06.000
1105The data catalog catalog suck.
1106
1107277
110800:16:06.000 --> 00:16:08.000
1109It requires a ton of platform engineering.
1110
1111278
111200:16:08.000 --> 00:16:10.000
1113Nothing is really set up for this.
1114
1115279
111600:16:10.000 --> 00:16:13.000
1117There's no standard language for defining what a data
1118
1119280
112000:16:13.000 --> 00:16:17.000
1121product is or how it should interact.
1122
1123281
112400:16:17.000 --> 00:16:20.000
1125There's no standard way of joining the semantics
1126
1127282
112800:16:20.000 --> 00:16:23.000
1129from two different domains together.
1130
1131283
113200:16:23.000 --> 00:16:24.800
1133And there's a lot of languages that have come up
1134
1135284
113600:16:24.800 --> 00:16:28.200
1137like owl and RDF and all this other stuff,
1138
1139285
114000:16:28.200 --> 00:16:31.960
1141to try to define how data looks and how data should be
1142
1143286
114400:16:31.960 --> 00:16:32.960
1145shaped.
1146
1147287
114800:16:32.960 --> 00:16:35.240
1149But they have all of these gaps about how we're
1150
1151288
115200:16:35.240 --> 00:16:40.520
1153supposed to define how we're supposed to work with data.
1154
1155289
115600:16:40.520 --> 00:16:44.120
1157So when you actually dive into, and this is where we get to lexicon, there's a little
1158
1159290
116000:16:44.120 --> 00:16:47.600
1161bit of check-ups gone that I'm setting up here.
1162
1163291
116400:16:47.600 --> 00:16:52.200
1165If you look at the way that we model data, we care about really two things.
1166
1167292
116800:16:52.200 --> 00:16:54.600
1169We look at the nouns and the verbs.
1170
1171293
117200:16:54.600 --> 00:16:57.520
1173And the nouns are what the data means,
1174
1175294
117600:16:57.520 --> 00:16:59.440
1177how it's structured, who owns it,
1178
1179295
118000:16:59.440 --> 00:17:00.500
1181who can access it,
1182
1183296
118400:17:00.500 --> 00:17:03.200
1185what are the permissions, etc.
1186
1187297
118800:17:03.200 --> 00:17:05.840
1189And sometimes those have adjectives like the type
1190
1191298
119200:17:05.840 --> 00:17:08.920
1193of the data or the frequency that it's updated,
1194
1195299
119600:17:08.920 --> 00:17:12.040
1197the freshness of it, things like that.
1198
1199300
120000:17:12.040 --> 00:17:16.920
1201And then we also have, so I'll jump into the verbs in a second, but this is a great example
1202
1203301
120400:17:16.920 --> 00:17:19.840
1205of talking about the nouns. So I pulled this like
1206
1207302
120800:17:19.840 --> 00:17:28.400
1209random owl to definition off of, I don't know, the internet somewhere, which is great.
1210
1211303
121200:17:28.400 --> 00:17:32.080
1213It's talking about the physical quality of the thermal energy of a system.
1214
1215304
121600:17:32.080 --> 00:17:34.960
1217And it defines all of this really specific stuff.
1218
1219305
122000:17:34.960 --> 00:17:37.840
1221Half of the character, solidly more than half of the characters in here,
1222
1223306
122400:17:37.840 --> 00:17:40.220
1225are metadata, like meta metadata,
1226
1227307
122800:17:40.220 --> 00:17:44.640
1229about the links of where to find the actual thing that you're seeing.
1230
1231308
123200:17:44.640 --> 00:17:50.920
1233So this is really cool if you want to know that this number
1234
1235309
123600:17:50.920 --> 00:17:52.960
1237is about the physical quality of the thermal energy
1238
1239310
124000:17:52.960 --> 00:17:54.760
1241of a system, but it doesn't tell you what
1242
1243311
124400:17:54.760 --> 00:17:56.600
1245that means in any sort of context.
1246
1247312
124800:17:56.600 --> 00:18:07.680
1249It doesn't tell you what you should do about it or why you should care about it. The problem with that type of approach and the problem with all of those types of data specification
1250
1251313
125200:18:07.680 --> 00:18:11.720
1253languages that we've seen is that they tend to be very static.
1254
1255314
125600:18:11.720 --> 00:18:13.880
1257It assumes that data is fully self-contained,
1258
1259315
126000:18:13.880 --> 00:18:16.840
1261that you can just kind of look at a thing and describe
1262
1263316
126400:18:16.840 --> 00:18:18.780
1265its properties and be like, a, a, a,
1266
1267317
126800:18:18.780 --> 00:18:21.500
1269now have a perfect platonic ideal of a chair.
1270
1271318
127200:18:21.500 --> 00:18:23.760
1273I've defined exactly what a chair is,
1274
1275319
127600:18:23.760 --> 00:18:28.000
1277but you don't do anything about how a chair should be used.
1278
1279320
128000:18:34.000 --> 00:18:43.000
1281And then on top of that it becomes really difficult to start to contextualize it because in reality you need to start to think about how these nouns become composed with each other, how they interplay.
1282
1283321
128400:18:43.000 --> 00:18:50.000
1285And so there's some stuff out there working on how to overlay different data definition languages
1286
1287322
128800:18:50.000 --> 00:18:55.000
1289in a practical systems context, but we're still missing the whole action element of it.
1290
1291323
129200:19:00.400 --> 00:19:08.000
1293The thing is with like building a microservice style architecture you also need to define how these services interact. That's actually the critical thing about it.
1294
1295324
129600:19:08.000 --> 00:19:12.880
1297If you look at something like the open API spec, it's really great.
1298
1299325
130000:19:12.880 --> 00:19:15.840
1301It talks about how you access the data,
1302
1303326
130400:19:15.840 --> 00:19:18.280
1305what to do with the data, what are the things that you can,
1306
1307327
130800:19:18.280 --> 00:19:21.920
1309like how do you write it, update it, delete it, whatever.
1310
1311328
131200:19:21.920 --> 00:19:23.680
1313You define your API spec.
1314
1315329
131600:19:23.680 --> 00:19:26.080
1317It tells you exactly like this is going to be a put,
1318
1319330
132000:19:26.080 --> 00:19:30.200
1321this is going to be a post, whatever.
1322
1323331
132400:19:30.200 --> 00:19:32.800
1325How to request permission for the data is often part
1326
1327332
132800:19:32.800 --> 00:19:37.000
1329of that broader spec. and how systems should handle the data, right?
1330
1331333
133200:19:39.000 --> 00:19:43.840
1333And so you can you can do that with those tools.
1334
1335334
133600:19:43.840 --> 00:19:46.760
1337But if you start looking at things like OpenAPI,
1338
1339335
134000:19:46.760 --> 00:19:52.560
1341then they start to sort of fall away on the definitions of the nouns.
1342
1343336
134400:19:52.560 --> 00:19:55.560
1345So opening API is great because it gives you
1346
1347337
134800:19:55.560 --> 00:19:57.720
1349a little bit of a shape of the data,
1350
1351338
135200:19:57.720 --> 00:19:59.760
1353but beyond sort of the
1354
1355339
135600:20:00.000 --> 00:20:05.000
1357basic sort of JSON types and it's really limited in that case.
1358
1359340
136000:20:05.000 --> 00:20:09.000
1361It doesn't really tell you about what the data means in a broader context.
1362
1363341
136400:20:10.000 --> 00:20:18.000
1365So all of this sort of ecosystem of defining data
1366
1367342
136800:20:18.000 --> 00:20:21.600
1369through these sort of specification languages,
1370
1371343
137200:20:21.600 --> 00:20:24.620
1373all this also omits this concept of domain ownership
1374
1375344
137600:20:24.620 --> 00:20:25.360
1377whatsoever.
1378
1379345
138000:20:25.360 --> 00:20:28.840
1381You're basically just saying data exists here it is.
1382
1383346
138400:20:28.840 --> 00:20:30.840
1385Or systems exist, here they are.
1386
1387347
138800:20:30.840 --> 00:20:36.360
1389We're not really talking about how to access them, who owns them, what they should do about them.
1390
1391348
139200:20:36.360 --> 00:20:40.600
1393So if you're going to build a good data product language, you need to have a concrete encoding of your domain
1394
1395349
139600:20:40.600 --> 00:20:41.080
1397ownership.
1398
1399350
140000:20:41.080 --> 00:20:43.640
1401So who owns this?
1402
1403351
140400:20:43.640 --> 00:20:45.400
1405What do they own it for?
1406
1407352
140800:20:45.400 --> 00:20:50.760
1409You need to have a clear and accessible and an extensible definition of the nouns and a flexible
1410
1411353
141200:20:51.240 --> 00:20:55.840
1413Versionable definition of the verbs. So you need to be able to encode change over time
1414
1415354
141600:20:56.040 --> 00:20:58.480
1417You need to be able to encode what it is that you're
1418
1419355
142000:20:58.480 --> 00:21:00.240
1421describing and who owns it.
1422
1423356
142400:21:00.240 --> 00:21:03.800
1425So those three things are key.
1426
1427357
142800:21:03.800 --> 00:21:11.520
1429As I mentioned, data mesh came, like was a great idea for trying to put together these data
1430
1431358
143200:21:11.520 --> 00:21:15.000
1433products, but it really suffered from like four key flaws.
1434
1435359
143600:21:15.000 --> 00:21:17.000
1437The first of it is if you know ThoughtWorks at all,
1438
1439360
144000:21:17.000 --> 00:21:19.000
1441ThoughtWorks tended to be a little bit dogmatic.
1442
1443361
144400:21:19.000 --> 00:21:22.000
1445It was a criticism of the company as long as I work there,
1446
1447362
144800:21:22.000 --> 00:21:25.760
1449that existed even before I worked there.
1450
1451363
145200:21:25.760 --> 00:21:29.160
1453And data mesh was like we were trying to do everything with
1454
1455364
145600:21:29.160 --> 00:21:30.080
1457microservices.
1458
1459365
146000:21:30.080 --> 00:21:37.560
1461And so we just sort of almost naively applied that to a really big data architecture project.
1462
1463366
146400:21:37.560 --> 00:21:38.360
1465And this exists.
1466
1467367
146800:21:38.360 --> 00:21:39.560
1469There's no tooling for it.
1470
1471368
147200:21:39.560 --> 00:21:47.000
1473There's no equivalent to Kubernetes for standing up like a massive large scale data processing system.
1474
1475369
147600:21:47.000 --> 00:21:50.000
1477Aside from Kubernetes, which is really difficult if you've ever
1478
1479370
148000:21:50.000 --> 00:21:51.920
1481actually tried to do big data processing and Kubernetes, like running Spark and Kubernetes is really difficult if you've ever actually tried to do big data processing and Kubernetes,
1482
1483371
148400:21:51.920 --> 00:21:54.480
1485like running Spark and Kubernetes is really difficult.
1486
1487372
148800:21:56.800 --> 00:21:58.800
1489A lot of change, like again,
1490
1491373
149200:21:58.800 --> 00:22:00.960
1493data mesh, it requires a lot of
1494
1495374
149600:22:00.960 --> 00:22:02.400
1497reeducation, right?
1498
1499375
150000:22:02.400 --> 00:22:04.720
1501The way that you work with data, the way that your data
1502
1503376
150400:22:04.720 --> 00:22:06.240
1505people are actually doing things.
1506
1507377
150800:22:06.240 --> 00:22:07.880
1509You have to teach them DevOps.
1510
1511378
151200:22:07.880 --> 00:22:09.780
1513You have to teach some agile. You have to teach them DevOps, you have to teach them Agile, you have to teach
1514
1515379
151600:22:09.780 --> 00:22:15.120
1517them domain ownership, really difficult. And then we also spent the last, like,
1518
1519380
152000:22:15.120 --> 00:22:18.640
1521decade decoupling software devs from any downstream
1522
1523381
152400:22:18.640 --> 00:22:20.240
1525responsibility for their data, and now we're
1526
1527382
152800:22:20.240 --> 00:22:22.640
1529turning around saying, oh, wait, no, no, no, now you have
1530
1531383
153200:22:22.640 --> 00:22:24.640
1533to be responsible for that again.
1534
1535384
153600:22:24.640 --> 00:22:28.800
1537And they didn't really like that.
1538
1539385
154000:22:28.800 --> 00:22:34.760
1541So I mentioned that we need all of these definitions
1542
1543386
154400:22:34.760 --> 00:22:38.000
1545around nouns and verbs and domain ownership and stuff like that.
1546
1547387
154800:22:38.000 --> 00:22:43.000
1549And yes, this is now check-offs going off because if you look at lexicon, you look at the spec.
1550
1551388
155200:22:43.000 --> 00:22:48.180
1553It actually kind of elegantly solved all three of those problems and I don't even think
1554
1555389
155600:22:48.180 --> 00:22:50.680
1557that it was trying to.
1558
1559390
156000:22:50.680 --> 00:22:59.640
1561The first thing that it will do is this reverse DNS addressing is brilliant. This solves a huge problem
1562
1563391
156400:22:59.640 --> 00:23:05.680
1565for the enterprise data space because now you can actually encode your domain ownership literally
1566
1567392
156800:23:05.680 --> 00:23:12.600
1569in a domain, like a domain name solution.
1570
1571393
157200:23:12.600 --> 00:23:15.800
1573And you're able to, you can do that across an entire organization.
1574
1575394
157600:23:15.800 --> 00:23:18.320
1577So you can actually set up an organization,
1578
1579395
158000:23:18.320 --> 00:23:20.920
1581give every department, every team, whatever,
1582
1583396
158400:23:20.920 --> 00:23:28.000
1585a path in that domain name, reverse it, and then they can host their own lexicons.
1586
1587397
158800:23:28.000 --> 00:23:30.700
1589It does come up with the definition of the nouns.
1590
1591398
159200:23:30.700 --> 00:23:34.100
1593This is actually, I don't know if you notice when I started it.
1594
1595399
159600:23:34.100 --> 00:23:35.580
1597I decided, I was really bored.
1598
1599400
160000:23:35.580 --> 00:23:37.580
1601So I decided yesterday.
1602
1603401
160400:23:37.580 --> 00:23:38.820
1605Instead of doing it like Google sites,
1606
1607402
160800:23:38.820 --> 00:23:41.000
1609I would build my own slideshow system on at Proto.
1610
1611403
161200:23:41.000 --> 00:23:45.000
1613So I vibration coded a slideshow.
1614
1615404
161600:23:45.000 --> 00:23:51.800
1617This is the lexicon for the slideshow. Slideshow system, this is the lexicon for the slideshow.
1618
1619405
162000:23:51.800 --> 00:23:54.960
1621And so you can see that there's the nouns or defined here.
1622
1623406
162400:23:54.960 --> 00:23:56.200
1625I need a title.
1626
1627407
162800:23:56.200 --> 00:24:00.000
1629I need a short description. I need maybe a time stamp when it's created.
1630
1631408
163200:24:00.000 --> 00:24:02.000
1633All of that is there and it's contextual.
1634
1635409
163600:24:02.000 --> 00:24:09.200
1637It's not just the data type, like the JSON data type. Of course that is in there, but it tells me what
1638
1639410
164000:24:09.200 --> 00:24:13.320
1641the meaning of that data is and why it's relevant and why I should care about it.
1642
1643411
164400:24:13.320 --> 00:24:16.800
1645And if I had gone a little bit farther, I would have been able to put the
1646
1647412
164800:24:16.800 --> 00:24:23.280
1649verbs in here as well for how to author a slideshow, how to assign permissions to a slideshow,
1650
1651413
165200:24:23.280 --> 00:24:25.360
1653how to revoke permissions to a slideshow.
1654
1655414
165600:24:25.360 --> 00:24:28.640
1657So I could turn this into essentially a Google Docs,
1658
1659415
166000:24:28.640 --> 00:24:31.280
1661or the Google Slides type of situation or PowerPoint type
1662
1663416
166400:24:31.280 --> 00:24:35.440
1665of situation by encoding what the actions are directly in
1666
1667417
166800:24:35.440 --> 00:24:47.260
1669the lexicon. So again, I mentioned this kind of casually solved all three of those problems, which would
1670
1671418
167200:24:47.260 --> 00:24:52.520
1673be a really big benefit for an enterprise, because not only does that mean that if you can define
1674
1675419
167600:24:52.520 --> 00:24:56.280
1677a lexicon for your data products, you can start sharing that data across your
1678
1679420
168000:24:56.280 --> 00:24:59.720
1681organization but also outside of your organization if you
1682
1683421
168400:25:00.000 --> 00:25:02.560
1685want to publish those lexicons openly.
1686
1687422
168800:25:02.560 --> 00:25:05.740
1689And this is, by the way, not even talking about using
1690
1691423
169200:25:05.740 --> 00:25:07.880
1693the PDS as your platform.
1694
1695424
169600:25:07.880 --> 00:25:09.880
1697I'm not necessarily talking about, like,
1698
1699425
170000:25:09.880 --> 00:25:13.600
1701over a place your data platform with a PDS and everything will be solved.
1702
1703426
170400:25:13.600 --> 00:25:20.200
1705I'm actually just talking about use a flexible data definition language that gets away
1706
1707427
170800:25:20.200 --> 00:25:25.000
1709from all of the sort of dogmatism of ontologies and all of that stuff
1710
1711428
171200:25:25.000 --> 00:25:32.000
1713and focus on actually shareability rather than fully flushing out the definition.
1714
1715429
171600:25:32.000 --> 00:25:34.000
1717This is a standard way of doing it.
1718
1719430
172000:25:34.000 --> 00:25:37.000
1721I think Emily Hunt spoke about this yesterday,
1722
1723431
172400:25:37.000 --> 00:25:40.400
1725sharing, exploding stars data
1726
1727432
172800:25:40.400 --> 00:25:44.440
1729through a globally accessible standardized feed
1730
1731433
173200:25:44.440 --> 00:25:51.480
1733that every astronomy lab in the world can subscribe to and then see the exact data format for that
1734
1735434
173600:25:51.480 --> 00:25:56.160
1737data without having to go through and set up their own Kafka consumers without having to run
1738
1739435
174000:25:56.160 --> 00:25:58.300
1741a Kafka cluster. all of that stuff.
1742
1743436
174400:25:58.300 --> 00:26:00.300
1745That's a really compelling idea.
1746
1747437
174800:26:00.300 --> 00:26:05.640
1749It's a really, really, really, really brilliant way of being able to make that data available
1750
1751438
175200:26:05.640 --> 00:26:09.680
1753in a way that people can not just access it but understand
1754
1755439
175600:26:09.680 --> 00:26:12.360
1757what it means.
1758
1759440
176000:26:12.360 --> 00:26:17.400
1761So then this, like what I look at, actually the first time I saw lexicon, I said,
1762
1763441
176400:26:17.400 --> 00:26:22.000
1765I've been trying to develop this as a data product specification language for the last three years.
1766
1767442
176800:26:22.000 --> 00:26:28.000
1769And here it is. So it's worth exploring. I think that this is actually relevant.
1770
1771443
177200:26:28.000 --> 00:26:33.000
1773Because if you know me from any of my other work, you know that I don't really like fascists.
1774
1775444
177600:26:33.000 --> 00:26:41.000
1777And one of the problems in the data space is that I've mentioned the sort of cyclical behavior.
1778
1779445
178000:26:41.000 --> 00:26:44.560
1781We've gone from data warehouses to data lakes,
1782
1783446
178400:26:44.560 --> 00:26:46.720
1785to streaming architectures and back again.
1786
1787447
178800:26:46.720 --> 00:26:49.640
1789And the trend right now is towards consolidation.
1790
1791448
179200:26:49.640 --> 00:26:51.480
1793And so you see companies like Palant here
1794
1795449
179600:26:51.480 --> 00:26:53.920
1797who go up to these big organizations and government
1798
1799450
180000:26:53.920 --> 00:26:57.040
1801agencies, and they say we can solve your data consolidation
1802
1803451
180400:26:57.040 --> 00:26:59.040
1805problems.
1806
1807452
180800:26:59.040 --> 00:27:04.920
1809And I've never known anyone who's used Palantir, who likes it. Right? But they sell
1810
1811453
181200:27:04.920 --> 00:27:10.480
1813it and they make a fuckload of money selling it because they're promising companies to solve
1814
1815454
181600:27:10.480 --> 00:27:15.200
1817exactly this problem that we can make all of your data accessible in one
1818
1819455
182000:27:15.200 --> 00:27:22.960
1821place really easily. And so my argument is if we want to be able to break the
1822
1823456
182400:27:22.960 --> 00:27:28.320
1825stranglehold that these rent-seeking platform organizations like Palant here,
1826
1827457
182800:27:28.320 --> 00:27:31.120
1829like Oracle, like all of these other companies have,
1830
1831458
183200:27:31.120 --> 00:27:34.160
1833we need to have a decentralized way of sharing data,
1834
1835459
183600:27:34.160 --> 00:27:37.760
1837and it needs to be consistent, and we need to get away from trying to
1838
1839460
184000:27:37.760 --> 00:27:42.480
1841overly define the languages and the specs and all of that stuff, the ontologies and we need
1842
1843461
184400:27:42.480 --> 00:27:47.520
1845to focus on the ability for sharing and decentralizing that data.
1846
1847462
184800:27:47.520 --> 00:27:49.520
1849That is necessary to fight fascism.
1850
1851463
185200:27:49.520 --> 00:27:52.440
1853It's not just good business.
1854
1855464
185600:27:52.440 --> 00:27:56.400
1857So what do we need to make that work?
1858
1859465
186000:27:56.400 --> 00:28:00.600
1861The problem with I mentioned going back to the beginning is that software developers love
1862
1863466
186400:28:00.600 --> 00:28:05.000
1865working with cool technologies, TypeScript and Go and Rust and all that stuff.
1866
1867467
186800:28:05.000 --> 00:28:07.600
1869And us data folks, we don't know any of that.
1870
1871468
187200:28:07.600 --> 00:28:09.400
1873We don't work in those languages.
1874
1875469
187600:28:09.400 --> 00:28:10.840
1877We need better Python tooling.
1878
1879470
188000:28:10.840 --> 00:28:15.200
1881The Marshall X library is great, but I couldn't use it to build my own lexicons.
1882
1883471
188400:28:15.200 --> 00:28:21.240
1885Maybe I'm not smart enough, but I just couldn't get it to work to build my own stuff that wasn't built around Bluesky. So I had to write my own stuff that wasn't built around Bluesky.
1886
1887472
188800:28:21.240 --> 00:28:27.720
1889So I had to write my own lexicon like data class generator,
1890
1891473
189200:28:27.720 --> 00:28:29.640
1893which was hard because I don't know,
1894
1895474
189600:28:29.640 --> 00:28:31.000
1897like I'm not a computer science person.
1898
1899475
190000:28:31.000 --> 00:28:33.000
1901I don't know what an abstract syntax tree is.
1902
1903476
190400:28:33.000 --> 00:28:37.000
1905I think that's like the thing that the spotted lantern flies eat.
1906
1907477
190800:28:37.000 --> 00:28:43.160
1909But if you actually look at some of the stuff like lexicon.gov is an amazing like it's a global
1910
1911478
191200:28:43.160 --> 00:28:50.620
1913data catalog like it's solved a problem that I've watched companies pour millions of dollars into and it just solved it like go
1914
1915479
191600:28:50.620 --> 00:28:51.620
1917That's cool
1918
1919480
192000:28:51.620 --> 00:28:55.440
1921We can you can register your lexicons and then we can make a data catalog for them
1922
1923481
192400:28:56.080 --> 00:28:58.880
1925And it just did it.
1926
1927482
192800:28:58.880 --> 00:29:01.200
1929So applying that tooling and making that tooling
1930
1931483
193200:29:01.200 --> 00:29:03.280
1933available at the enterprise level or the business level
1934
1935484
193600:29:03.280 --> 00:29:06.640
1937would be really great.
1938
1939485
194000:29:06.640 --> 00:29:07.800
1941And then we also need a mindset.
1942
1943486
194400:29:07.800 --> 00:29:09.920
1945Like a lot of the conversation I've been having here
1946
1947487
194800:29:09.920 --> 00:29:12.400
1949was really wrapped around social internet,
1950
1951488
195200:29:12.400 --> 00:29:14.360
1953which is awesome because I really believe
1954
1955489
195600:29:14.360 --> 00:29:17.040
1957in the social internet and like was moved to tears for
1958
1959490
196000:29:17.040 --> 00:29:22.680
1961errands talk today. But this is so much bigger than just social internet. Right. This is data
1962
1963491
196400:29:22.680 --> 00:29:27.000
1965exchange at scale, which is more than just people talking to people.
1966
1967492
196800:29:27.000 --> 00:29:30.000
1969It's also about exchanging information.
1970
1971493
197200:29:30.000 --> 00:29:33.500
1973We need better permissions management and private data.
1974
1975494
197600:29:33.500 --> 00:29:42.720
1977I know all of that is in progress, but right now you can't have somebody publishing their internal data to a PDS and then like putting it out there.
1978
1979495
198000:29:42.720 --> 00:29:45.320
1981And then we also need to have the tooling
1982
1983496
198400:29:45.320 --> 00:29:49.480
1985to be able to run that stuff inside closed ecosystems.
1986
1987497
198800:29:49.480 --> 00:29:55.000
1989And then also like brave data people to start building radical shit because again there's
1990
1991498
199200:29:55.000 --> 00:29:59.760
1993too much conservatives conservatism in the data space unwilliness to
1994
1995499
199600:30:00.000 --> 00:30:02.760
1997to try new approaches, try new technologies.
1998
1999500
200000:30:02.760 --> 00:30:05.880
2001If more data people can start doing things like, hey look,
2002
2003501
200400:30:05.880 --> 00:30:08.640
2005I just solved a globally distributed data problem
2006
2007502
200800:30:08.640 --> 00:30:14.000
2009with this cool protocol and some Python and some lexicons,
2010
2011503
201200:30:14.000 --> 00:30:17.000
2013then I think that we'll be able to do some really great stuff.
2014
2015504
201600:30:17.000 --> 00:30:20.000
2017So I'll wrap it up there just about on time.
2018
2019505
202000:30:20.000 --> 00:30:21.660
2021Thank you so much. Let's chat about data.
2022
2023506
202400:30:21.660 --> 00:30:23.660
2025Thank you.
2026