Knowledge Graphs
Knowledge Graph Fundamentals
Knowledge graphs represent information as nodes (entities) connected by edges (relationships). Each edge can have a type and direction. Properties store attributes on both nodes and edges, enabling rich data modeling beyond simple connections.
- Nodes: Entities (people, places, concepts)
- Edges: Relationships between entities
- Properties: Attributes on nodes/edges
- Directed or undirected connections
- Labels/types for categorization
The atomic unit of knowledge representation. 'Alice knows Bob' becomes (Alice, knows, Bob). Triples can be combined to represent complex knowledge. Foundation of RDF and semantic web standards.
- Subject: The entity being described
- Predicate: The relationship type
- Object: Target entity or literal value
- Composable into complex graphs
- Machine-readable knowledge
Formal specifications defining concepts, relationships, and constraints in a domain. Ontologies enable reasoning, inference, and semantic interoperability. Range from simple taxonomies to complex formal logic.
- Class hierarchies (is-a relationships)
- Property definitions and constraints
- Domain and range specifications
- Inference rules
- Cross-domain integration
Navigate the graph by following edges from node to node. Enables multi-hop queries, pathfinding, and pattern matching. Traversal algorithms like BFS/DFS power recommendation and fraud detection systems.
- Multi-hop queries
- Shortest path algorithms
- Pattern matching
- Neighborhood exploration
- Subgraph extraction
Graph Databases
| Database | Type | Query Language | Best For | Managed Options |
|---|---|---|---|---|
| Neo4j | Native Graph | Cypher | General purpose, fraud detection, recommendations | Neo4j Aura, Self-hosted |
| Amazon Neptune | Native Graph | Gremlin, SPARQL, openCypher | AWS ecosystem, RDF/Property graphs | Fully managed (AWS) |
| Azure Cosmos DB | Multi-model | Gremlin | Azure ecosystem, global distribution | Fully managed (Azure) |
| TigerGraph | Native Graph | GSQL | Deep link analytics, real-time ML | TigerGraph Cloud |
| JanusGraph | Native Graph | Gremlin | Scalable, open-source, pluggable backends | Self-hosted, IBM Compose |
| ArangoDB | Multi-model | AQL | Document + Graph hybrid, flexibility | ArangoDB Oasis |
| Dgraph | Native Graph | GraphQL, DQL | GraphQL-native, horizontal scaling | Dgraph Cloud |
Query Languages
Neo4j's declarative graph query language. Pattern-based syntax using ASCII art for intuitive graph patterns. Most popular property graph query language, now standardized as openCypher.
- ASCII-art pattern matching: (a)-[r]->(b)
- Declarative and readable
- MATCH, CREATE, MERGE operations
- Aggregations and filtering
- openCypher standardization
W3C standard query language for RDF graphs. Pattern matching against triples with powerful federation and reasoning capabilities. Essential for semantic web and linked data applications.
- Triple pattern matching
- Federated queries across endpoints
- CONSTRUCT for graph creation
- Inference support (RDFS/OWL)
- Standard for RDF databases
Apache TinkerPop's graph traversal language. Functional, step-based approach to navigating graphs. Supported by many graph databases including Neptune, JanusGraph, and Cosmos DB.
- Traversal-based queries
- Functional composition
- Turing-complete language
- Wide database support
- Imperative style
GraphRAG: Knowledge Graphs + LLMs
Use LLMs to extract entities and relationships from unstructured text, then link them to existing knowledge graph nodes. Enables automatic knowledge graph construction and enrichment from documents.
- Named Entity Recognition (NER)
- Relationship extraction
- Entity disambiguation
- Link to existing graph nodes
- Incremental graph building
Combine vector similarity search with graph traversal. Find relevant documents via embeddings, then traverse the knowledge graph to find connected context. Richer context than pure vector RAG.
- Vector search for initial retrieval
- Graph traversal for context expansion
- Multi-hop relationship discovery
- Structured + unstructured fusion
- Better for complex queries
Identify clusters of related entities in the knowledge graph. Use community summaries for high-level context in RAG. Microsoft's GraphRAG uses this for hierarchical summarization.
- Leiden/Louvain clustering
- Community summarization
- Hierarchical abstraction
- Global query answering
- Theme identification
Convert natural language questions to graph queries. LLM generates Cypher/SPARQL from user questions, executes against knowledge graph, and formats results. Precise answers from structured data.
- Natural language to Cypher/SPARQL
- Schema-aware generation
- Query validation
- Result formatting
- Explainable answers
Ontology & Schema Design
| Standard | Description | Use Case | Complexity |
|---|---|---|---|
| RDF (Resource Description Framework) | W3C standard for representing data as subject-predicate-object triples | Semantic web, linked data, interoperability | Medium |
| OWL (Web Ontology Language) | Expressive ontology language built on RDF for complex reasoning | Formal reasoning, inference, domain modeling | High |
| RDFS (RDF Schema) | Lightweight schema vocabulary for RDF class/property hierarchies | Simple taxonomies, basic inference | Low |
| Property Graph Model | Nodes and edges with properties (key-value pairs) | Application data, flexible schemas | Low |
| Schema.org | Shared vocabulary for structured data on web pages | SEO, web data extraction, common entities | Low |
Entity Resolution & Linking
Identify and merge duplicate entities that refer to the same real-world object. Use string similarity, embeddings, and rule-based matching. Critical for data quality in knowledge graphs.
- String similarity (Levenshtein, Jaro-Winkler)
- Embedding-based matching
- Blocking for scalability
- Merge strategies
- Conflict resolution
Connect mentions in text to entities in a knowledge base (e.g., Wikipedia, Wikidata). Disambiguation based on context. Essential for building knowledge from unstructured sources.
- Candidate generation
- Context-based disambiguation
- Wikidata/DBpedia linking
- NIL detection (new entities)
- Cross-lingual linking
Establish canonical (preferred) forms for entities and relationships. Handle aliases, abbreviations, and alternative names. Enables consistent querying and data integration.
- Primary identifier selection
- Alias management
- Preferred label handling
- Cross-reference maintenance
- URI/IRI standards
Knowledge Graph Tools
Microsoft's open-source implementation of graph-based RAG. Builds knowledge graphs from documents using LLMs, performs community detection, and enables both local and global queries.
- Automatic graph construction
- Community detection & summarization
- Local + global query modes
- Hierarchical indexing
- Open source (Python)
LlamaIndex's knowledge graph index for RAG. Extracts triples from documents, stores in graph, and combines graph traversal with vector retrieval for enhanced context.
- Triple extraction from docs
- Multiple graph store backends
- Hybrid retrieval
- Natural language querying
- LlamaIndex integration
Stanford's open-source ontology editor. Visual interface for creating OWL ontologies. Industry standard for ontology development with reasoning and visualization capabilities.
- Visual ontology editing
- OWL 2 support
- Reasoner integration (HermiT, Pellet)
- Plugin ecosystem
- Collaborative editing
NLP library with entity linking capabilities. Extract entities from text and link to knowledge bases. Foundation for building knowledge graph pipelines from unstructured data.
- Named Entity Recognition
- Entity linking to Wikidata/custom KB
- Relation extraction (via extensions)
- Fast processing
- Python ecosystem
Python library for working with RDF. Parse, serialize, and query RDF data. Build knowledge graphs programmatically with support for multiple serialization formats.
- RDF parsing/serialization
- SPARQL queries
- Multiple formats (Turtle, N-Triples, JSON-LD)
- Graph operations
- OWL-RL inference
Python library for graph analysis. Not a database, but essential for graph algorithms, analysis, and visualization. Useful for prototyping and analyzing knowledge graph structure.
- Graph algorithms (centrality, paths)
- Community detection
- Visualization integration
- In-memory processing
- Scientific computing
