Skip to main content
knowledge-management rag llm local-ai embeddings productivity second-brain

Codex-V Knowledge Engine

A local-first, privacy-centric knowledge management system that transforms scattered technical content into an intelligent, searchable second brain.

Codex-V Knowledge Engine

Your Personal Technical Knowledge Engine

Codex-V is a local-first knowledge management system designed for software engineers and technical professionals. It automatically ingests, analyzes, and organizes technical content from diverse sources into a searchable knowledge base with AI-powered chat capabilities.

The Problem We Solve

Information Overload

Technical professionals accumulate knowledge across dozens of sources: bookmarks, self-sent emails, saved videos, research papers, GitHub repos. This scattered information becomes increasingly difficult to find and leverage.
  • Bookmarks saved but never revisited
  • Links emailed to yourself, buried in your inbox
  • YouTube tutorials watched but details forgotten
  • Research papers downloaded but never searchable
  • GitHub repos starred but their purpose forgotten

Codex-V transforms this chaos into an intelligent, queryable knowledge base.


How It Works


flowchart LR
    subgraph Discovery["1. Discovery"]
        A[Email with URLs] --> B[IMAP Polling]
        C[Manual URL] --> B
        B --> D[URL Classification]
    end

    subgraph Extraction["2. Extraction"]
        D --> E{Source Type}
        E -->|GitHub| F[API + README]
        E -->|YouTube| G[Transcript]
        E -->|ArXiv| H[LaTeX/PDF]
        E -->|Blog| I[Readability]
        E -->|Reddit| J[Post + Comments]
    end

    subgraph Analysis["3. AI Analysis"]
        F & G & H & I & J --> K[Local LLM]
        K --> L[Summary]
        K --> M[Concepts]
        K --> N[Relevance]
    end

    subgraph Storage["4. Knowledge Base"]
        L & M & N --> O[PostgreSQL]
        O --> P[pgvector]
        P --> Q[Semantic Search]
    end

    style Discovery fill:#e0f2fe,stroke:#0284c7
    style Extraction fill:#fef3c7,stroke:#d97706
    style Analysis fill:#dcfce7,stroke:#16a34a
    style Storage fill:#f3e8ff,stroke:#9333ea

    

The Workflow

  1. Send yourself an email with links to interesting content
  2. Codex-V polls your inbox and discovers new URLs
  3. Specialized extractors fetch content based on source type
  4. Local LLM analyzes and generates summaries, concepts, relevance scores
  5. Content is embedded and stored in a vector database
  6. Search semantically or chat with your knowledge base

Key Features

Multi-Source Ingestion

Supported Sources

Codex-V automatically detects and uses the optimal extraction strategy for each source type.
SourceExtraction MethodSpecial Features
YouTubeTranscript API / WhisperTimestamped chunks, deep links to moments
GitHubAPI + selective cloningREADME, stars, language, file analysis
ArXivLaTeX source or PDFMath preservation, author/abstract extraction
RedditJSON APIPost + top comments with consensus analysis
BlogsPlaywright + ReadabilityCookie wall bypass, paywall support via cookies
DocumentationBrowser renderingFull page extraction with code blocks

AI-Powered Analysis

Every piece of content is analyzed by a local LLM to generate:

  • Concise Summary: 2-3 sentences capturing the key value
  • Concept Tags: Automatically extracted topics and technologies
  • Relevance Score: How well it matches your interests
  • Key Insights: Actionable takeaways

Semantic Search & RAG Chat

Find by Meaning, Not Keywords

Vector embeddings enable searching by concept rather than exact text matches. Ask questions in natural language and get answers with source citations.

Search: “How do I implement mutex locking in Go?”

Chat Response: “Based on your saved YouTube video ‘Concurrency in Go’ and the GitHub repo ‘awesome-go-patterns’, mutex locking involves…”

[Sources cited with links and timestamps]

Trend Detection & Insights

Codex-V analyzes your ingestion patterns to identify:

  • Emerging Interests: Topics appearing more frequently
  • Key Themes: Dominant concepts in your recent reading
  • Learning Patterns: How your focus evolves over time

Email Group Detection

When an email contains multiple related URLs (paper + code + demo), Codex-V automatically groups them together, making it easy to see the full context of related resources.


Privacy-First Architecture

Your Data Never Leaves Your Machine

Unlike cloud-based solutions, Codex-V runs entirely locally. Your knowledge base, embeddings, and LLM interactions stay on your hardware.

Local-Only Components

  • PostgreSQL + pgvector: All data stored locally
  • Local LLM via Ollama/LM Studio: No API calls to external services
  • Local Embeddings: sentence-transformers running on your machine
  • Desktop App: Native Wails application, no browser required

Optional External Services

  • Authenticated scraping: Uses your browser cookies for paywalled content
  • Publishing LLM: Optional high-quality model for content generation

Technology Stack

ComponentTechnologyPurpose
FrontendWails v2 + Vue 3 + TailwindCSSNative desktop application
BackendPython FastAPIAPI, content extraction, LLM orchestration
DatabasePostgreSQL 17 + pgvectorStorage and vector search
LLMOllama / LM StudioLocal inference (Nemotron, Qwen, Llama)
Embeddingssentence-transformersall-MiniLM-L6-v2 (384 dimensions)
Transcriptionfaster-whisperLocal GPU-accelerated transcription
BrowserPlaywrightAuthenticated and JavaScript-rendered content

Content Studio

Beyond knowledge management, Codex-V includes a Content Studio for technical content creators:

  1. Topic Proposals: AI analyzes recent ingestions and suggests blog/post topics
  2. Content Generation: Creates publish-ready blog posts and social threads
  3. Source Attribution: Links back to the knowledge that inspired the content
  4. Publishing Pipeline: Integration with Hugo, social platforms, and more

Use Cases

For Software Engineers

  • Capture interesting repos, tutorials, and documentation
  • Search your personal knowledge during development
  • Track emerging technologies in your field

For Researchers

  • Organize papers, preprints, and technical blogs
  • Query across your reading history
  • Generate literature review summaries

For Technical Writers

  • Build a knowledge base of source material
  • Generate content ideas from recent reading
  • Maintain attribution to original sources

For Team Leads

  • Curate learning resources for your team
  • Track industry trends and emerging technologies
  • Build institutional knowledge repositories

Getting Started

Quick Setup

Codex-V requires Docker (for PostgreSQL), Ollama or LM Studio (for LLM), and the desktop application.

Prerequisites

  1. Docker Desktop - For PostgreSQL with pgvector
  2. Ollama - ollama pull nemotron or your preferred model
  3. Python 3.11+ - For the backend services

Installation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Clone the repository
git clone https://github.com/smarttechlabs-projects/codex-v

# Start PostgreSQL
docker-compose up -d

# Install Python dependencies
cd backend && pip install -r requirements.txt

# Run database migrations
alembic upgrade head

# Start the application
./scripts/start_dev.sh

Configuration

  1. Add your email source in Settings > Email Sources
  2. Configure LLM endpoint (Ollama default: http://localhost:11434/v1)
  3. Send yourself an email with interesting URLs
  4. Watch Codex-V build your knowledge base

Roadmap

Current Features

  • Multi-source ingestion (YouTube, GitHub, ArXiv, Reddit, blogs)
  • Local LLM analysis and summarization
  • Semantic search with pgvector
  • RAG chat with source citations
  • Trend detection and daily insights
  • Content Studio for topic proposals

Planned Features

  • Browser extension for one-click capture
  • Mobile companion app for search
  • Team collaboration features
  • Export to Obsidian/Notion formats
  • Advanced graph visualization

Ready to build your second brain? Contact us for a demo or check out the GitHub repository.