this repo has no description
1# GigaBrain Architecture
2
3This document provides a comprehensive overview of GigaBrain's architecture, design decisions, and implementation details.
4
5## Table of Contents
6
7- [Overview](#overview)
8- [Core Components](#core-components)
9- [Data Structures](#data-structures)
10- [Query Processing](#query-processing)
11- [API Layer](#api-layer)
12- [Storage Layer](#storage-layer)
13- [Concurrency Model](#concurrency-model)
14- [Performance Optimizations](#performance-optimizations)
15
16## Overview
17
18GigaBrain is designed as a high-performance, in-memory graph database with the following architectural principles:
19
20- **Performance First**: Optimized data structures and algorithms
21- **Memory Efficiency**: Compact representation with minimal overhead
22- **Concurrency**: Lock-free structures where possible
23- **Modularity**: Clean separation of concerns
24- **Extensibility**: Plugin architecture for storage and algorithms
25
26## Core Components
27
28### Graph Engine (`src/core/`)
29
30The heart of GigaBrain, responsible for graph storage and basic operations.
31
32```
33src/core/
34├── mod.rs # Core types and traits
35├── graph.rs # Main Graph implementation
36├── node.rs # Node representation
37├── relationship.rs # Relationship representation
38├── property.rs # Property system
39└── schema.rs # Schema management
40```
41
42**Key Features:**
43- Adjacency list representation for fast traversal
44- Efficient node and relationship storage
45- Property type system with compression
46- Label and property key management
47
48### Query Engine (`src/cypher/`)
49
50Cypher query language parser and execution engine.
51
52```
53src/cypher/
54├── mod.rs # Public interface
55├── lexer.rs # Tokenization
56├── parser.rs # AST generation
57├── planner.rs # Query planning
58├── executor.rs # Query execution
59└── optimizer.rs # Query optimization
60```
61
62**Query Processing Pipeline:**
631. **Lexical Analysis**: Convert query string to tokens
642. **Parsing**: Build Abstract Syntax Tree (AST)
653. **Planning**: Generate execution plan
664. **Optimization**: Apply optimization rules
675. **Execution**: Execute plan against graph
68
69### Algorithm Library (`src/algorithms/`)
70
71Graph algorithms for analytics and computation.
72
73```
74src/algorithms/
75├── mod.rs # Algorithm traits and common utilities
76├── pathfinding.rs # Shortest path, A*, Dijkstra
77├── centrality.rs # PageRank, betweenness, closeness
78├── community.rs # Louvain, modularity optimization
79├── traversal.rs # BFS, DFS, random walks
80└── simple.rs # Basic graph metrics
81```
82
83**Algorithm Categories:**
84- **Pathfinding**: Shortest paths, reachability
85- **Centrality**: Node importance measures
86- **Community**: Clustering and community detection
87- **Traversal**: Graph exploration patterns
88
89### API Layer (`src/server/`)
90
91Dual REST and gRPC API interfaces.
92
93```
94src/server/
95├── mod.rs # Server orchestration
96├── rest.rs # HTTP/REST endpoints
97├── grpc.rs # gRPC service implementation
98├── auth.rs # JWT authentication
99└── middleware.rs # Request processing middleware
100```
101
102**Features:**
103- Concurrent REST (Axum) and gRPC (Tonic) servers
104- JWT-based authentication with RBAC
105- Request timing and rate limiting
106- CORS support for web applications
107
108### Storage Layer (`src/storage/`)
109
110Pluggable storage backends for persistence.
111
112```
113src/storage/
114├── mod.rs # Storage traits
115├── memory.rs # In-memory storage
116├── persistent_store.rs # Persistent storage interface
117├── rocksdb.rs # RocksDB backend
118├── wal.rs # Write-ahead logging
119└── btree.rs # B-tree index structures
120```
121
122## Data Structures
123
124### Graph Representation
125
126GigaBrain uses an adjacency list representation optimized for graph traversal:
127
128```rust
129pub struct Graph {
130 nodes: DashMap<NodeId, Node>,
131 relationships: DashMap<RelationshipId, Relationship>,
132 // Adjacency lists for fast traversal
133 outgoing: DashMap<NodeId, RoaringBitmap>,
134 incoming: DashMap<NodeId, RoaringBitmap>,
135 // Indexes for efficient queries
136 label_index: DashMap<LabelId, RoaringBitmap>,
137 property_index: DashMap<PropertyKeyId, BTreeMap<PropertyValue, RoaringBitmap>>,
138}
139```
140
141**Design Decisions:**
142- **DashMap**: Concurrent hash map for thread-safe access
143- **RoaringBitmap**: Compressed bitmap for node/relationship sets
144- **BTreeMap**: Ordered index for range queries on properties
145
146### Node Structure
147
148```rust
149pub struct Node {
150 pub id: NodeId,
151 pub labels: SmallVec<[LabelId; 2]>,
152 pub properties: PropertyMap,
153}
154```
155
156**Optimizations:**
157- **SmallVec**: Stack allocation for common case of few labels
158- **PropertyMap**: Efficient property storage with type compression
159
160### Relationship Structure
161
162```rust
163pub struct Relationship {
164 pub id: RelationshipId,
165 pub start_node: NodeId,
166 pub end_node: NodeId,
167 pub rel_type: RelationshipTypeId,
168 pub properties: PropertyMap,
169}
170```
171
172### Property System
173
174```rust
175pub enum PropertyValue {
176 Null,
177 Boolean(bool),
178 Integer(i64),
179 Float(f64),
180 String(String),
181 List(Vec<PropertyValue>),
182 Map(HashMap<String, PropertyValue>),
183}
184
185pub struct Property {
186 pub key_id: PropertyKeyId,
187 pub value: PropertyValue,
188}
189```
190
191**Features:**
192- Type-safe property values
193- Recursive data structures (lists, maps)
194- Efficient serialization for storage
195
196## Query Processing
197
198### Parser Architecture
199
200The Cypher parser uses a recursive descent approach with nom combinators:
201
202```rust
203// Example parser structure
204fn parse_match_clause(input: &str) -> IResult<&str, MatchClause> {
205 let (input, _) = tag("MATCH")(input)?;
206 let (input, pattern) = parse_pattern(input)?;
207 let (input, where_clause) = opt(parse_where_clause)(input)?;
208
209 Ok((input, MatchClause { pattern, where_clause }))
210}
211```
212
213### Query Planning
214
215The query planner converts AST to executable plans:
216
217```rust
218pub struct QueryPlan {
219 pub operators: Vec<LogicalOperator>,
220 pub estimated_cost: f64,
221}
222
223pub enum LogicalOperator {
224 NodeScan { label: Option<LabelId> },
225 Filter { predicate: Expression },
226 Expand { rel_types: Vec<RelationshipTypeId> },
227 Project { expressions: Vec<Expression> },
228 Limit { count: usize },
229}
230```
231
232### Execution Engine
233
234The executor implements a pull-based model:
235
236```rust
237trait Executor {
238 fn execute(&self, plan: QueryPlan) -> Result<QueryResult>;
239}
240
241pub struct QueryResult {
242 pub columns: Vec<String>,
243 pub rows: Vec<Vec<Value>>,
244 pub statistics: ExecutionStatistics,
245}
246```
247
248## API Layer
249
250### REST API Design
251
252The REST API follows RESTful principles with consistent error handling:
253
254```rust
255// Example endpoint structure
256async fn create_node(
257 State(graph): State<Arc<Graph>>,
258 Json(request): Json<CreateNodeRequest>,
259) -> Result<Json<CreateNodeResponse>, ApiError> {
260 let node_id = graph.create_node();
261 // Add labels and properties...
262
263 Ok(Json(CreateNodeResponse { node_id }))
264}
265```
266
267### gRPC Service
268
269The gRPC service provides high-performance binary protocol access:
270
271```protobuf
272service GigaBrainService {
273 rpc CreateNode(CreateNodeRequest) returns (CreateNodeResponse);
274 rpc ExecuteCypher(CypherQueryRequest) returns (CypherQueryResponse);
275 // ... other operations
276}
277```
278
279### Authentication Flow
280
281JWT authentication with role-based access control:
282
283```rust
284pub struct AuthService {
285 encoding_key: EncodingKey,
286 decoding_key: DecodingKey,
287}
288
289pub enum Role {
290 Admin, // Full access
291 ReadWrite, // CRUD operations
292 ReadOnly, // Query access only
293}
294```
295
296## Storage Layer
297
298### Memory Storage
299
300The default in-memory storage provides maximum performance:
301
302```rust
303pub struct MemoryStore {
304 data: DashMap<Vec<u8>, Vec<u8>>,
305 indexes: DashMap<String, BTreeMap<Vec<u8>, Vec<Vec<u8>>>>,
306}
307```
308
309### Persistent Storage
310
311RocksDB backend for durability:
312
313```rust
314pub struct RocksDBStore {
315 db: Arc<DB>,
316 cf_handles: HashMap<String, ColumnFamily>,
317}
318```
319
320**Features:**
321- Write-ahead logging for crash recovery
322- Configurable compression and caching
323- Backup and restoration support
324
325### Transaction Support
326
327ACID transactions with configurable isolation levels:
328
329```rust
330pub struct Transaction {
331 id: TransactionId,
332 isolation_level: IsolationLevel,
333 read_timestamp: u64,
334 write_set: HashMap<Vec<u8>, Vec<u8>>,
335 read_set: HashSet<Vec<u8>>,
336}
337
338pub enum IsolationLevel {
339 ReadUncommitted,
340 ReadCommitted,
341 RepeatableRead,
342 Serializable,
343}
344```
345
346## Concurrency Model
347
348### Lock-Free Structures
349
350GigaBrain uses lock-free data structures where possible:
351
352- **DashMap**: Lock-free concurrent hash map
353- **Crossbeam**: Lock-free queues and channels
354- **Atomic operations**: For counters and flags
355
356### Read-Write Access Patterns
357
358```rust
359// Optimistic read access
360let node = graph.nodes.get(&node_id)?;
361
362// Write access with minimal locking
363graph.nodes.insert(node_id, node);
364
365// Bulk operations with batching
366let mut batch = graph.begin_batch();
367for (id, node) in nodes {
368 batch.insert_node(id, node);
369}
370batch.commit()?;
371```
372
373### Thread Pool Management
374
375```rust
376// Query execution thread pool
377let query_pool = ThreadPoolBuilder::new()
378 .num_threads(num_cpus::get())
379 .thread_name(|i| format!("query-worker-{}", i))
380 .build()?;
381
382// I/O thread pool for storage operations
383let io_pool = ThreadPoolBuilder::new()
384 .num_threads(4)
385 .thread_name(|i| format!("io-worker-{}", i))
386 .build()?;
387```
388
389## Performance Optimizations
390
391### Memory Layout
392
393- **Data locality**: Related data stored together
394- **Cache-friendly**: Minimize pointer chasing
395- **Compression**: Property value compression
396- **Pooling**: Object pools for frequent allocations
397
398### Query Optimization
399
400- **Index selection**: Choose optimal indexes
401- **Join ordering**: Minimize intermediate results
402- **Predicate pushdown**: Filter early in pipeline
403- **Projection pushdown**: Select only needed columns
404
405### Caching Strategy
406
407```rust
408pub struct QueryCache {
409 // LRU cache for parsed queries
410 parsed_queries: LruCache<String, QueryPlan>,
411 // Result cache for expensive computations
412 result_cache: LruCache<QueryKey, QueryResult>,
413}
414```
415
416### Batch Operations
417
418```rust
419// Efficient batch inserts
420pub fn batch_create_nodes(&self, nodes: Vec<CreateNodeRequest>) -> Result<Vec<NodeId>> {
421 let mut node_ids = Vec::with_capacity(nodes.len());
422
423 // Pre-allocate IDs
424 let start_id = self.next_node_id.fetch_add(nodes.len(), Ordering::Relaxed);
425
426 // Batch insert with minimal locking
427 for (i, node_req) in nodes.into_iter().enumerate() {
428 let node_id = NodeId(start_id + i);
429 let node = self.create_node_from_request(node_id, node_req)?;
430 self.nodes.insert(node_id, node);
431 node_ids.push(node_id);
432 }
433
434 Ok(node_ids)
435}
436```
437
438### Algorithm Optimizations
439
440- **Parallel execution**: Multi-threaded algorithms
441- **Early termination**: Stop when criteria met
442- **Approximation**: Trade accuracy for speed
443- **Incremental updates**: Avoid full recomputation
444
445## Monitoring and Observability
446
447### Metrics Collection
448
449```rust
450// Performance metrics
451struct Metrics {
452 node_count: AtomicU64,
453 relationship_count: AtomicU64,
454 query_count: AtomicU64,
455 query_duration: Histogram,
456 memory_usage: Gauge,
457}
458```
459
460### Distributed Tracing
461
462```rust
463use tracing::{instrument, Span};
464
465#[instrument(skip(self, query))]
466async fn execute_query(&self, query: &str) -> Result<QueryResult> {
467 let span = Span::current();
468 span.record("query.type", &query_type);
469
470 // Execution with span tracking
471 let result = self.execute_internal(query).await?;
472
473 span.record("result.rows", result.rows.len());
474 Ok(result)
475}
476```
477
478### Health Checks
479
480```rust
481pub struct HealthCheck {
482 pub status: HealthStatus,
483 pub checks: Vec<ComponentHealth>,
484}
485
486pub struct ComponentHealth {
487 pub component: String,
488 pub status: HealthStatus,
489 pub message: Option<String>,
490 pub details: HashMap<String, Value>,
491}
492```
493
494## Future Architecture Considerations
495
496### Distributed Deployment
497
498- **Sharding**: Horizontal partitioning strategies
499- **Replication**: Master-slave and multi-master setups
500- **Consensus**: Raft for distributed coordination
501- **Load balancing**: Smart routing and failover
502
503### Stream Processing
504
505- **Real-time updates**: Event-driven architecture
506- **Change streams**: Continuous query results
507- **Temporal graphs**: Time-based operations
508- **Event sourcing**: Audit trail and replay
509
510### Machine Learning Integration
511
512- **Graph embeddings**: Node and edge vectorization
513- **Feature extraction**: Graph-based features
514- **Model serving**: Integrated ML inference
515- **AutoML**: Automated algorithm selection
516
517This architecture enables GigaBrain to deliver high performance while maintaining flexibility and extensibility for future enhancements.