docs/ARCHITECTURE.md at main · cameron.stream/gigabrain

cameron.stream / gigabrain
fork
this repo has no description
fork
gigabrain / docs / ARCHITECTURE.md
at main 517 lines 13 kB view raw view rendered
wrap content
Cameron Pfiffer Add comprehensive documentation suite 10mo ago
ededb829
  1# GigaBrain Architecture
  2
  3This document provides a comprehensive overview of GigaBrain's architecture, design decisions, and implementation details.
  4
  5## Table of Contents
  6
  7- [Overview](#overview)
  8- [Core Components](#core-components)
  9- [Data Structures](#data-structures)
 10- [Query Processing](#query-processing)
 11- [API Layer](#api-layer)
 12- [Storage Layer](#storage-layer)
 13- [Concurrency Model](#concurrency-model)
 14- [Performance Optimizations](#performance-optimizations)
 15
 16## Overview
 17
 18GigaBrain is designed as a high-performance, in-memory graph database with the following architectural principles:
 19
 20- **Performance First**: Optimized data structures and algorithms
 21- **Memory Efficiency**: Compact representation with minimal overhead
 22- **Concurrency**: Lock-free structures where possible
 23- **Modularity**: Clean separation of concerns
 24- **Extensibility**: Plugin architecture for storage and algorithms
 25
 26## Core Components
 27
 28### Graph Engine (`src/core/`)
 29
 30The heart of GigaBrain, responsible for graph storage and basic operations.
 31
 32```
 33src/core/
 34├── mod.rs              # Core types and traits
 35├── graph.rs            # Main Graph implementation
 36├── node.rs             # Node representation
 37├── relationship.rs     # Relationship representation
 38├── property.rs         # Property system
 39└── schema.rs           # Schema management
 40```
 41
 42**Key Features:**
 43- Adjacency list representation for fast traversal
 44- Efficient node and relationship storage
 45- Property type system with compression
 46- Label and property key management
 47
 48### Query Engine (`src/cypher/`)
 49
 50Cypher query language parser and execution engine.
 51
 52```
 53src/cypher/
 54├── mod.rs              # Public interface
 55├── lexer.rs            # Tokenization
 56├── parser.rs           # AST generation
 57├── planner.rs          # Query planning
 58├── executor.rs         # Query execution
 59└── optimizer.rs        # Query optimization
 60```
 61
 62**Query Processing Pipeline:**
 631. **Lexical Analysis**: Convert query string to tokens
 642. **Parsing**: Build Abstract Syntax Tree (AST)
 653. **Planning**: Generate execution plan
 664. **Optimization**: Apply optimization rules
 675. **Execution**: Execute plan against graph
 68
 69### Algorithm Library (`src/algorithms/`)
 70
 71Graph algorithms for analytics and computation.
 72
 73```
 74src/algorithms/
 75├── mod.rs              # Algorithm traits and common utilities
 76├── pathfinding.rs      # Shortest path, A*, Dijkstra
 77├── centrality.rs       # PageRank, betweenness, closeness
 78├── community.rs        # Louvain, modularity optimization
 79├── traversal.rs        # BFS, DFS, random walks
 80└── simple.rs           # Basic graph metrics
 81```
 82
 83**Algorithm Categories:**
 84- **Pathfinding**: Shortest paths, reachability
 85- **Centrality**: Node importance measures
 86- **Community**: Clustering and community detection
 87- **Traversal**: Graph exploration patterns
 88
 89### API Layer (`src/server/`)
 90
 91Dual REST and gRPC API interfaces.
 92
 93```
 94src/server/
 95├── mod.rs              # Server orchestration
 96├── rest.rs             # HTTP/REST endpoints
 97├── grpc.rs             # gRPC service implementation
 98├── auth.rs             # JWT authentication
 99└── middleware.rs       # Request processing middleware
100```
101
102**Features:**
103- Concurrent REST (Axum) and gRPC (Tonic) servers
104- JWT-based authentication with RBAC
105- Request timing and rate limiting
106- CORS support for web applications
107
108### Storage Layer (`src/storage/`)
109
110Pluggable storage backends for persistence.
111
112```
113src/storage/
114├── mod.rs              # Storage traits
115├── memory.rs           # In-memory storage
116├── persistent_store.rs # Persistent storage interface
117├── rocksdb.rs          # RocksDB backend
118├── wal.rs              # Write-ahead logging
119└── btree.rs            # B-tree index structures
120```
121
122## Data Structures
123
124### Graph Representation
125
126GigaBrain uses an adjacency list representation optimized for graph traversal:
127
128```rust
129pub struct Graph {
130    nodes: DashMap<NodeId, Node>,
131    relationships: DashMap<RelationshipId, Relationship>,
132    // Adjacency lists for fast traversal
133    outgoing: DashMap<NodeId, RoaringBitmap>,
134    incoming: DashMap<NodeId, RoaringBitmap>,
135    // Indexes for efficient queries
136    label_index: DashMap<LabelId, RoaringBitmap>,
137    property_index: DashMap<PropertyKeyId, BTreeMap<PropertyValue, RoaringBitmap>>,
138}
139```
140
141**Design Decisions:**
142- **DashMap**: Concurrent hash map for thread-safe access
143- **RoaringBitmap**: Compressed bitmap for node/relationship sets
144- **BTreeMap**: Ordered index for range queries on properties
145
146### Node Structure
147
148```rust
149pub struct Node {
150    pub id: NodeId,
151    pub labels: SmallVec<[LabelId; 2]>,
152    pub properties: PropertyMap,
153}
154```
155
156**Optimizations:**
157- **SmallVec**: Stack allocation for common case of few labels
158- **PropertyMap**: Efficient property storage with type compression
159
160### Relationship Structure
161
162```rust
163pub struct Relationship {
164    pub id: RelationshipId,
165    pub start_node: NodeId,
166    pub end_node: NodeId,
167    pub rel_type: RelationshipTypeId,
168    pub properties: PropertyMap,
169}
170```
171
172### Property System
173
174```rust
175pub enum PropertyValue {
176    Null,
177    Boolean(bool),
178    Integer(i64),
179    Float(f64),
180    String(String),
181    List(Vec<PropertyValue>),
182    Map(HashMap<String, PropertyValue>),
183}
184
185pub struct Property {
186    pub key_id: PropertyKeyId,
187    pub value: PropertyValue,
188}
189```
190
191**Features:**
192- Type-safe property values
193- Recursive data structures (lists, maps)
194- Efficient serialization for storage
195
196## Query Processing
197
198### Parser Architecture
199
200The Cypher parser uses a recursive descent approach with nom combinators:
201
202```rust
203// Example parser structure
204fn parse_match_clause(input: &str) -> IResult<&str, MatchClause> {
205    let (input, _) = tag("MATCH")(input)?;
206    let (input, pattern) = parse_pattern(input)?;
207    let (input, where_clause) = opt(parse_where_clause)(input)?;
208    
209    Ok((input, MatchClause { pattern, where_clause }))
210}
211```
212
213### Query Planning
214
215The query planner converts AST to executable plans:
216
217```rust
218pub struct QueryPlan {
219    pub operators: Vec<LogicalOperator>,
220    pub estimated_cost: f64,
221}
222
223pub enum LogicalOperator {
224    NodeScan { label: Option<LabelId> },
225    Filter { predicate: Expression },
226    Expand { rel_types: Vec<RelationshipTypeId> },
227    Project { expressions: Vec<Expression> },
228    Limit { count: usize },
229}
230```
231
232### Execution Engine
233
234The executor implements a pull-based model:
235
236```rust
237trait Executor {
238    fn execute(&self, plan: QueryPlan) -> Result<QueryResult>;
239}
240
241pub struct QueryResult {
242    pub columns: Vec<String>,
243    pub rows: Vec<Vec<Value>>,
244    pub statistics: ExecutionStatistics,
245}
246```
247
248## API Layer
249
250### REST API Design
251
252The REST API follows RESTful principles with consistent error handling:
253
254```rust
255// Example endpoint structure
256async fn create_node(
257    State(graph): State<Arc<Graph>>,
258    Json(request): Json<CreateNodeRequest>,
259) -> Result<Json<CreateNodeResponse>, ApiError> {
260    let node_id = graph.create_node();
261    // Add labels and properties...
262    
263    Ok(Json(CreateNodeResponse { node_id }))
264}
265```
266
267### gRPC Service
268
269The gRPC service provides high-performance binary protocol access:
270
271```protobuf
272service GigaBrainService {
273  rpc CreateNode(CreateNodeRequest) returns (CreateNodeResponse);
274  rpc ExecuteCypher(CypherQueryRequest) returns (CypherQueryResponse);
275  // ... other operations
276}
277```
278
279### Authentication Flow
280
281JWT authentication with role-based access control:
282
283```rust
284pub struct AuthService {
285    encoding_key: EncodingKey,
286    decoding_key: DecodingKey,
287}
288
289pub enum Role {
290    Admin,      // Full access
291    ReadWrite,  // CRUD operations
292    ReadOnly,   // Query access only
293}
294```
295
296## Storage Layer
297
298### Memory Storage
299
300The default in-memory storage provides maximum performance:
301
302```rust
303pub struct MemoryStore {
304    data: DashMap<Vec<u8>, Vec<u8>>,
305    indexes: DashMap<String, BTreeMap<Vec<u8>, Vec<Vec<u8>>>>,
306}
307```
308
309### Persistent Storage
310
311RocksDB backend for durability:
312
313```rust
314pub struct RocksDBStore {
315    db: Arc<DB>,
316    cf_handles: HashMap<String, ColumnFamily>,
317}
318```
319
320**Features:**
321- Write-ahead logging for crash recovery
322- Configurable compression and caching
323- Backup and restoration support
324
325### Transaction Support
326
327ACID transactions with configurable isolation levels:
328
329```rust
330pub struct Transaction {
331    id: TransactionId,
332    isolation_level: IsolationLevel,
333    read_timestamp: u64,
334    write_set: HashMap<Vec<u8>, Vec<u8>>,
335    read_set: HashSet<Vec<u8>>,
336}
337
338pub enum IsolationLevel {
339    ReadUncommitted,
340    ReadCommitted,
341    RepeatableRead,
342    Serializable,
343}
344```
345
346## Concurrency Model
347
348### Lock-Free Structures
349
350GigaBrain uses lock-free data structures where possible:
351
352- **DashMap**: Lock-free concurrent hash map
353- **Crossbeam**: Lock-free queues and channels
354- **Atomic operations**: For counters and flags
355
356### Read-Write Access Patterns
357
358```rust
359// Optimistic read access
360let node = graph.nodes.get(&node_id)?;
361
362// Write access with minimal locking
363graph.nodes.insert(node_id, node);
364
365// Bulk operations with batching
366let mut batch = graph.begin_batch();
367for (id, node) in nodes {
368    batch.insert_node(id, node);
369}
370batch.commit()?;
371```
372
373### Thread Pool Management
374
375```rust
376// Query execution thread pool
377let query_pool = ThreadPoolBuilder::new()
378    .num_threads(num_cpus::get())
379    .thread_name(|i| format!("query-worker-{}", i))
380    .build()?;
381
382// I/O thread pool for storage operations
383let io_pool = ThreadPoolBuilder::new()
384    .num_threads(4)
385    .thread_name(|i| format!("io-worker-{}", i))
386    .build()?;
387```
388
389## Performance Optimizations
390
391### Memory Layout
392
393- **Data locality**: Related data stored together
394- **Cache-friendly**: Minimize pointer chasing
395- **Compression**: Property value compression
396- **Pooling**: Object pools for frequent allocations
397
398### Query Optimization
399
400- **Index selection**: Choose optimal indexes
401- **Join ordering**: Minimize intermediate results
402- **Predicate pushdown**: Filter early in pipeline
403- **Projection pushdown**: Select only needed columns
404
405### Caching Strategy
406
407```rust
408pub struct QueryCache {
409    // LRU cache for parsed queries
410    parsed_queries: LruCache<String, QueryPlan>,
411    // Result cache for expensive computations
412    result_cache: LruCache<QueryKey, QueryResult>,
413}
414```
415
416### Batch Operations
417
418```rust
419// Efficient batch inserts
420pub fn batch_create_nodes(&self, nodes: Vec<CreateNodeRequest>) -> Result<Vec<NodeId>> {
421    let mut node_ids = Vec::with_capacity(nodes.len());
422    
423    // Pre-allocate IDs
424    let start_id = self.next_node_id.fetch_add(nodes.len(), Ordering::Relaxed);
425    
426    // Batch insert with minimal locking
427    for (i, node_req) in nodes.into_iter().enumerate() {
428        let node_id = NodeId(start_id + i);
429        let node = self.create_node_from_request(node_id, node_req)?;
430        self.nodes.insert(node_id, node);
431        node_ids.push(node_id);
432    }
433    
434    Ok(node_ids)
435}
436```
437
438### Algorithm Optimizations
439
440- **Parallel execution**: Multi-threaded algorithms
441- **Early termination**: Stop when criteria met
442- **Approximation**: Trade accuracy for speed
443- **Incremental updates**: Avoid full recomputation
444
445## Monitoring and Observability
446
447### Metrics Collection
448
449```rust
450// Performance metrics
451struct Metrics {
452    node_count: AtomicU64,
453    relationship_count: AtomicU64,
454    query_count: AtomicU64,
455    query_duration: Histogram,
456    memory_usage: Gauge,
457}
458```
459
460### Distributed Tracing
461
462```rust
463use tracing::{instrument, Span};
464
465#[instrument(skip(self, query))]
466async fn execute_query(&self, query: &str) -> Result<QueryResult> {
467    let span = Span::current();
468    span.record("query.type", &query_type);
469    
470    // Execution with span tracking
471    let result = self.execute_internal(query).await?;
472    
473    span.record("result.rows", result.rows.len());
474    Ok(result)
475}
476```
477
478### Health Checks
479
480```rust
481pub struct HealthCheck {
482    pub status: HealthStatus,
483    pub checks: Vec<ComponentHealth>,
484}
485
486pub struct ComponentHealth {
487    pub component: String,
488    pub status: HealthStatus,
489    pub message: Option<String>,
490    pub details: HashMap<String, Value>,
491}
492```
493
494## Future Architecture Considerations
495
496### Distributed Deployment
497
498- **Sharding**: Horizontal partitioning strategies
499- **Replication**: Master-slave and multi-master setups
500- **Consensus**: Raft for distributed coordination
501- **Load balancing**: Smart routing and failover
502
503### Stream Processing
504
505- **Real-time updates**: Event-driven architecture
506- **Change streams**: Continuous query results
507- **Temporal graphs**: Time-based operations
508- **Event sourcing**: Audit trail and replay
509
510### Machine Learning Integration
511
512- **Graph embeddings**: Node and edge vectorization
513- **Feature extraction**: Graph-based features
514- **Model serving**: Integrated ML inference
515- **AutoML**: Automated algorithm selection
516
517This architecture enables GigaBrain to deliver high performance while maintaining flexibility and extensibility for future enhancements.
Configure Feed

Configure Feed