GigaBrain Architecture#
This document provides a comprehensive overview of GigaBrain's architecture, design decisions, and implementation details.
Table of Contents#
- Overview
- Core Components
- Data Structures
- Query Processing
- API Layer
- Storage Layer
- Concurrency Model
- Performance Optimizations
Overview#
GigaBrain is designed as a high-performance, in-memory graph database with the following architectural principles:
- Performance First: Optimized data structures and algorithms
- Memory Efficiency: Compact representation with minimal overhead
- Concurrency: Lock-free structures where possible
- Modularity: Clean separation of concerns
- Extensibility: Plugin architecture for storage and algorithms
Core Components#
Graph Engine (src/core/)#
The heart of GigaBrain, responsible for graph storage and basic operations.
src/core/
├── mod.rs # Core types and traits
├── graph.rs # Main Graph implementation
├── node.rs # Node representation
├── relationship.rs # Relationship representation
├── property.rs # Property system
└── schema.rs # Schema management
Key Features:
- Adjacency list representation for fast traversal
- Efficient node and relationship storage
- Property type system with compression
- Label and property key management
Query Engine (src/cypher/)#
Cypher query language parser and execution engine.
src/cypher/
├── mod.rs # Public interface
├── lexer.rs # Tokenization
├── parser.rs # AST generation
├── planner.rs # Query planning
├── executor.rs # Query execution
└── optimizer.rs # Query optimization
Query Processing Pipeline:
- Lexical Analysis: Convert query string to tokens
- Parsing: Build Abstract Syntax Tree (AST)
- Planning: Generate execution plan
- Optimization: Apply optimization rules
- Execution: Execute plan against graph
Algorithm Library (src/algorithms/)#
Graph algorithms for analytics and computation.
src/algorithms/
├── mod.rs # Algorithm traits and common utilities
├── pathfinding.rs # Shortest path, A*, Dijkstra
├── centrality.rs # PageRank, betweenness, closeness
├── community.rs # Louvain, modularity optimization
├── traversal.rs # BFS, DFS, random walks
└── simple.rs # Basic graph metrics
Algorithm Categories:
- Pathfinding: Shortest paths, reachability
- Centrality: Node importance measures
- Community: Clustering and community detection
- Traversal: Graph exploration patterns
API Layer (src/server/)#
Dual REST and gRPC API interfaces.
src/server/
├── mod.rs # Server orchestration
├── rest.rs # HTTP/REST endpoints
├── grpc.rs # gRPC service implementation
├── auth.rs # JWT authentication
└── middleware.rs # Request processing middleware
Features:
- Concurrent REST (Axum) and gRPC (Tonic) servers
- JWT-based authentication with RBAC
- Request timing and rate limiting
- CORS support for web applications
Storage Layer (src/storage/)#
Pluggable storage backends for persistence.
src/storage/
├── mod.rs # Storage traits
├── memory.rs # In-memory storage
├── persistent_store.rs # Persistent storage interface
├── rocksdb.rs # RocksDB backend
├── wal.rs # Write-ahead logging
└── btree.rs # B-tree index structures
Data Structures#
Graph Representation#
GigaBrain uses an adjacency list representation optimized for graph traversal:
pub struct Graph {
nodes: DashMap<NodeId, Node>,
relationships: DashMap<RelationshipId, Relationship>,
// Adjacency lists for fast traversal
outgoing: DashMap<NodeId, RoaringBitmap>,
incoming: DashMap<NodeId, RoaringBitmap>,
// Indexes for efficient queries
label_index: DashMap<LabelId, RoaringBitmap>,
property_index: DashMap<PropertyKeyId, BTreeMap<PropertyValue, RoaringBitmap>>,
}
Design Decisions:
- DashMap: Concurrent hash map for thread-safe access
- RoaringBitmap: Compressed bitmap for node/relationship sets
- BTreeMap: Ordered index for range queries on properties
Node Structure#
pub struct Node {
pub id: NodeId,
pub labels: SmallVec<[LabelId; 2]>,
pub properties: PropertyMap,
}
Optimizations:
- SmallVec: Stack allocation for common case of few labels
- PropertyMap: Efficient property storage with type compression
Relationship Structure#
pub struct Relationship {
pub id: RelationshipId,
pub start_node: NodeId,
pub end_node: NodeId,
pub rel_type: RelationshipTypeId,
pub properties: PropertyMap,
}
Property System#
pub enum PropertyValue {
Null,
Boolean(bool),
Integer(i64),
Float(f64),
String(String),
List(Vec<PropertyValue>),
Map(HashMap<String, PropertyValue>),
}
pub struct Property {
pub key_id: PropertyKeyId,
pub value: PropertyValue,
}
Features:
- Type-safe property values
- Recursive data structures (lists, maps)
- Efficient serialization for storage
Query Processing#
Parser Architecture#
The Cypher parser uses a recursive descent approach with nom combinators:
// Example parser structure
fn parse_match_clause(input: &str) -> IResult<&str, MatchClause> {
let (input, _) = tag("MATCH")(input)?;
let (input, pattern) = parse_pattern(input)?;
let (input, where_clause) = opt(parse_where_clause)(input)?;
Ok((input, MatchClause { pattern, where_clause }))
}
Query Planning#
The query planner converts AST to executable plans:
pub struct QueryPlan {
pub operators: Vec<LogicalOperator>,
pub estimated_cost: f64,
}
pub enum LogicalOperator {
NodeScan { label: Option<LabelId> },
Filter { predicate: Expression },
Expand { rel_types: Vec<RelationshipTypeId> },
Project { expressions: Vec<Expression> },
Limit { count: usize },
}
Execution Engine#
The executor implements a pull-based model:
trait Executor {
fn execute(&self, plan: QueryPlan) -> Result<QueryResult>;
}
pub struct QueryResult {
pub columns: Vec<String>,
pub rows: Vec<Vec<Value>>,
pub statistics: ExecutionStatistics,
}
API Layer#
REST API Design#
The REST API follows RESTful principles with consistent error handling:
// Example endpoint structure
async fn create_node(
State(graph): State<Arc<Graph>>,
Json(request): Json<CreateNodeRequest>,
) -> Result<Json<CreateNodeResponse>, ApiError> {
let node_id = graph.create_node();
// Add labels and properties...
Ok(Json(CreateNodeResponse { node_id }))
}
gRPC Service#
The gRPC service provides high-performance binary protocol access:
service GigaBrainService {
rpc CreateNode(CreateNodeRequest) returns (CreateNodeResponse);
rpc ExecuteCypher(CypherQueryRequest) returns (CypherQueryResponse);
// ... other operations
}
Authentication Flow#
JWT authentication with role-based access control:
pub struct AuthService {
encoding_key: EncodingKey,
decoding_key: DecodingKey,
}
pub enum Role {
Admin, // Full access
ReadWrite, // CRUD operations
ReadOnly, // Query access only
}
Storage Layer#
Memory Storage#
The default in-memory storage provides maximum performance:
pub struct MemoryStore {
data: DashMap<Vec<u8>, Vec<u8>>,
indexes: DashMap<String, BTreeMap<Vec<u8>, Vec<Vec<u8>>>>,
}
Persistent Storage#
RocksDB backend for durability:
pub struct RocksDBStore {
db: Arc<DB>,
cf_handles: HashMap<String, ColumnFamily>,
}
Features:
- Write-ahead logging for crash recovery
- Configurable compression and caching
- Backup and restoration support
Transaction Support#
ACID transactions with configurable isolation levels:
pub struct Transaction {
id: TransactionId,
isolation_level: IsolationLevel,
read_timestamp: u64,
write_set: HashMap<Vec<u8>, Vec<u8>>,
read_set: HashSet<Vec<u8>>,
}
pub enum IsolationLevel {
ReadUncommitted,
ReadCommitted,
RepeatableRead,
Serializable,
}
Concurrency Model#
Lock-Free Structures#
GigaBrain uses lock-free data structures where possible:
- DashMap: Lock-free concurrent hash map
- Crossbeam: Lock-free queues and channels
- Atomic operations: For counters and flags
Read-Write Access Patterns#
// Optimistic read access
let node = graph.nodes.get(&node_id)?;
// Write access with minimal locking
graph.nodes.insert(node_id, node);
// Bulk operations with batching
let mut batch = graph.begin_batch();
for (id, node) in nodes {
batch.insert_node(id, node);
}
batch.commit()?;
Thread Pool Management#
// Query execution thread pool
let query_pool = ThreadPoolBuilder::new()
.num_threads(num_cpus::get())
.thread_name(|i| format!("query-worker-{}", i))
.build()?;
// I/O thread pool for storage operations
let io_pool = ThreadPoolBuilder::new()
.num_threads(4)
.thread_name(|i| format!("io-worker-{}", i))
.build()?;
Performance Optimizations#
Memory Layout#
- Data locality: Related data stored together
- Cache-friendly: Minimize pointer chasing
- Compression: Property value compression
- Pooling: Object pools for frequent allocations
Query Optimization#
- Index selection: Choose optimal indexes
- Join ordering: Minimize intermediate results
- Predicate pushdown: Filter early in pipeline
- Projection pushdown: Select only needed columns
Caching Strategy#
pub struct QueryCache {
// LRU cache for parsed queries
parsed_queries: LruCache<String, QueryPlan>,
// Result cache for expensive computations
result_cache: LruCache<QueryKey, QueryResult>,
}
Batch Operations#
// Efficient batch inserts
pub fn batch_create_nodes(&self, nodes: Vec<CreateNodeRequest>) -> Result<Vec<NodeId>> {
let mut node_ids = Vec::with_capacity(nodes.len());
// Pre-allocate IDs
let start_id = self.next_node_id.fetch_add(nodes.len(), Ordering::Relaxed);
// Batch insert with minimal locking
for (i, node_req) in nodes.into_iter().enumerate() {
let node_id = NodeId(start_id + i);
let node = self.create_node_from_request(node_id, node_req)?;
self.nodes.insert(node_id, node);
node_ids.push(node_id);
}
Ok(node_ids)
}
Algorithm Optimizations#
- Parallel execution: Multi-threaded algorithms
- Early termination: Stop when criteria met
- Approximation: Trade accuracy for speed
- Incremental updates: Avoid full recomputation
Monitoring and Observability#
Metrics Collection#
// Performance metrics
struct Metrics {
node_count: AtomicU64,
relationship_count: AtomicU64,
query_count: AtomicU64,
query_duration: Histogram,
memory_usage: Gauge,
}
Distributed Tracing#
use tracing::{instrument, Span};
#[instrument(skip(self, query))]
async fn execute_query(&self, query: &str) -> Result<QueryResult> {
let span = Span::current();
span.record("query.type", &query_type);
// Execution with span tracking
let result = self.execute_internal(query).await?;
span.record("result.rows", result.rows.len());
Ok(result)
}
Health Checks#
pub struct HealthCheck {
pub status: HealthStatus,
pub checks: Vec<ComponentHealth>,
}
pub struct ComponentHealth {
pub component: String,
pub status: HealthStatus,
pub message: Option<String>,
pub details: HashMap<String, Value>,
}
Future Architecture Considerations#
Distributed Deployment#
- Sharding: Horizontal partitioning strategies
- Replication: Master-slave and multi-master setups
- Consensus: Raft for distributed coordination
- Load balancing: Smart routing and failover
Stream Processing#
- Real-time updates: Event-driven architecture
- Change streams: Continuous query results
- Temporal graphs: Time-based operations
- Event sourcing: Audit trail and replay
Machine Learning Integration#
- Graph embeddings: Node and edge vectorization
- Feature extraction: Graph-based features
- Model serving: Integrated ML inference
- AutoML: Automated algorithm selection
This architecture enables GigaBrain to deliver high performance while maintaining flexibility and extensibility for future enhancements.