CortexOS — Complete Technical Documentation

🤖

What is CortexOS?

CortexOS is a personal AI operating system that acts as your intelligent assistant across every communication channel. Unlike simple chatbots, CortexOS can see, hear, speak, remember, and take action — managing your email, calendar, files, messages, and more.

📧

Email Management

"Read my latest emails and reply to the one from Marco"

📅

Calendar Control

"Schedule a meeting with the team for tomorrow at 3 PM"

👁️

Vision AI

"What do you see in front of me?" — using your live camera

🔍

Web Research

"Research the latest AI news and summarize the top 5 articles"

💳

Billing & Invoices

"Send an invoice for $500 to the client"

🧠

Long-Term Memory

"Remember that I prefer meetings in the afternoon"

🌐

6 Communication Channels

Talk to CortexOS however you want — by text, voice, video, or even a phone call. Every channel has access to the same brain, the same tools, and the same memory.

💬

Web Chat

Browser-based chat at /chat

✈️

Telegram

Full bot with photo & voice support

📱

WhatsApp

Via Evolution API — text and media

📞

Phone Call

Twilio SIP — natural voice conversation

🗣️

Voice Agent

WebRTC real-time voice via Daily

👁️

Vision Agent

Live camera + voice — CortexOS can see

🛠️

Complete Tool Suite

Over 25 built-in tools, plus the ability to create new ones on the fly. CortexOS doesn't just talk — it takes action.

📋 Productivity & Google Workspace

📧

Gmail Manager

Read, search, and send emails. Filters by primary inbox by default. Supports full Gmail search queries — from, to, labels, unread, attachments.

list_emails · read_email · send_email

📅

Calendar Manager

View upcoming events, create meetings with timezone support, delete events. Multi-calendar support with custom date ranges.

list_events · create_event · delete_event

✅

Tasks Manager

Full Google Tasks CRUD — create, list, complete, and delete tasks. Supports multiple task lists and due dates.

list_tasks · create_task · complete_task · delete_task

📁

Google Drive

Create, read, update, search, and list files on Google Drive. Save reports, notes, documents directly from conversation.

drive_create · drive_read · drive_update · drive_list

👥

Google Contacts

Save new contacts with full details (name, email, phone, company, title) and search existing contacts by any field.

save_contact · search_contacts

📇

Business Card Scanner

Take a photo of a business card — CortexOS reads it with vision AI and automatically saves the contact to Google Contacts.

scan_business_card

💬 Communication

📱

WhatsApp Messages

Send WhatsApp messages to any phone number via the Evolution API integration. International format, text messages.

send_whatsapp

📲

SMS via ClickSend

Send professional SMS text messages worldwide. Customizable sender ID, international phone format, 480 character limit.

send_sms

🧠 Intelligence & Research

🔍

Web Search

Real-time internet search via DuckDuckGo + Tavily. Research topics, check news, find factual data that CortexOS doesn't have locally.

web_search · tavily_search · tavily_research

🌐

Browser Automation

Full headless browser: navigate, click, type, fill forms, take screenshots, extract text and links. Multi-step web interactions.

browse_web · browser_action

🧠

Archival Memory

Semantic long-term memory powered by ChromaDB. Stores and recalls preferences, facts, and conversation history across sessions. Per-user isolation.

store_memory · search_archival_memory

⏰

Time & Date

Current time awareness for scheduling, timezone conversions, and time-sensitive operations.

get_current_time

💰 Business & Billing

💳

Square Billing

Create invoices, process payments, manage customers via Square POS integration. Supports sandbox and production environments.

square_billing

⚙️

Config Manager

CortexOS can modify its own instructions, daily briefing routines, and user profiles at runtime. Self-improving system.

manage_config

🧬 Self-Evolution

🔧

Dynamic Tool Creator

CortexOS can create new tools on its own. Need a tool that doesn't exist? Just describe it — CortexOS writes the Python code, tests it in a sandbox, registers it, and makes it available instantly. With automatic rollback if anything fails.

create_tool · delete_tool

🔌

MCP Integration

Model Context Protocol (MCP) server support for extending capabilities. Currently connected to Tavily for advanced web research with search, extract, crawl, and map tools.

tavily_search · tavily_extract · tavily_crawl · tavily_map

⚡

How It Works

CortexOS uses a "Think → Act → Observe" loop to solve problems step by step, just like a human assistant would.

💭 You ask a question

→

🧠 CortexOS thinks

→

🛠️ Uses tools if needed

→

👀 Observes result

→

💬 Responds naturally

💡 Example: "What's on my schedule today and email the summary to Marco"

Step 1 — Think: I need to check the calendar and then compose an email.
Step 2 — Act: Calls list_events for today's date.
Step 3 — Observe: Gets 3 events: standup at 9, client call at 11, team review at 3.
Step 4 — Act: Calls search_contacts for "Marco" to find his email.
Step 5 — Observe: Marco's email is marco@company.com.
Step 6 — Act: Calls send_email with a formatted summary.
Step 7 — Reply: "Done! I sent Marco your schedule for today."

🏗️

Technical Architecture

For developers and technical users — how CortexOS is built under the hood.

Communication Layer

💬 Web Chat

✈️ Telegram Bot

📱 WhatsApp

📞 Twilio Voice

🗣️ Voice (Daily)

👁️ Vision (WebRTC)

▼

Core Engine

🧠 Agent Core
ReAct Loop

🔀 LLM Router
Failover

🔐 Auth Manager
OAuth2

▼

Tool Layer

📧 Gmail

📅 Calendar

📁 Drive

🔍 Web Search

💳 Square

🔧 Dynamic Tools

▼

Data Layer

🗄️ SQLite
Conversations, Logs

🧠 ChromaDB
Vector Memory

📄 Config Files
.env, Identity

⚙️

Technical Specifications

🧠 AI Models & Routing

Primary: Google Gemini 2.0 Flash (generativelanguage API)
Fallback: Anthropic Claude Sonnet 4
Local: Ollama / vLLM (OpenAI-compatible)
Voice/Vision: Gemini 2.5 Flash Native Audio (bidi)
Automatic failover: Primary → Fallback → Local
ReAct loop with max 10 iterations safety cap

🏛️ Framework & Runtime

Backend: FastAPI (Python 3.12, async)
Voice/Vision: Pipecat (WebRTC pipelines)
WebRTC: Daily.co transport
Container: Docker + Docker Compose
Reverse proxy: Traefik (auto SSL)
Config: Pydantic-Settings with .env

🗄️ Databases

SQLite: Conversations, logs, audit trail (WAL mode)
ChromaDB: Semantic vector memory (per-user)
Tables: conversations, logs, agent_state, tool_executions
Collections: agent_memory, tool_docs
Automatic daily backups to Google Drive

🔐 Security & Authentication

OAuth2: Google authorization code flow
Scopes: Gmail, Calendar, Tasks, Drive, Contacts
Token auto-refresh via token_manager.py
Web login: Google Sign-In with signed session cookies
Role-based access: admin vs user dashboard
Destructive actions require user confirmation

🧠 Memory Architecture

Working Memory: Identity (IDENTITY.md) + User Profile (USER_PROFILE.md)
Archival Memory: ChromaDB semantic search
Conversation History: SQLite (last 20 messages)
Per-user memory isolation (by telegram_id)
Agent can store and recall memories via tools
Self-modifying config: EXTRA_INSTRUCTIONS.md

🔧 Dynamic Tool System

Agent generates Python code from description
Sandbox testing via py_compile
Auto-registration in available_tools.json
Timestamped backups before changes
Automatic rollback on failure
Dynamic import via importlib at runtime

📡 Channel Integrations

Telegram: python-telegram-bot (async webhook)
WhatsApp: Evolution API (self-hosted)
Voice: Pipecat + Daily.co + Gemini bidi API
Vision: Camera → 1 FPS JPEG → Gemini realtime_input
Phone: Twilio SIP + Pipecat pipeline
Web: Jinja2 templates + WebSocket

📦 Infrastructure

Server: VPS (96 GB disk, Ubuntu)
Containers: CortexOS, n8n, Evolution API, PostgreSQL, Traefik
CI/CD: Git push → deploy.sh → Docker rebuild
Backups: Daily at 2 AM → Google Drive (14 retained)
Maintenance: Weekly Docker prune (Sundays 3 AM)
Monitoring: /health endpoint + structured JSON logs

👁️

Vision + Voice Pipeline

The most advanced feature — real-time video and voice conversation with AI that can see your camera.

📷 Camera

→

Daily WebRTC

→

JPEG 1 FPS

→

Gemini bidi API

→

Native Audio

→

🔊 Speaker

🔬 How the Vision Pipeline Works

User clicks "Start Session" — creates a Daily.co WebRTC room
Bot joins the room as "CortexOS" with audio output enabled
User joins → capture_participant_video() starts decoding their camera frames
Frames flow through Pipecat pipeline as InputImageRawFrame objects (1 FPS)
Each frame is JPEG-encoded and sent to Gemini via send_realtime_input(video=Blob)
Gemini processes audio + video simultaneously via the bidirectional streaming API
Bot responds with native audio (natural speech synthesis, not TTS)
All 25+ tools are available during the vision session — CortexOS can see AND act

📂

Project Structure

CortexOS/
├── app/
│   ├── main.py              ← FastAPI entry point, routing
│   ├── agent_core.py        ← ReAct brain (1440 lines)
│   ├── llm_router.py        ← Multi-model failover
│   ├── config_manager.py    ← Pydantic-Settings config
│   ├── telegram_bot.py      ← Telegram webhook handler
│   ├── whatsapp_handler.py  ← Evolution API handler
│   ├── voice_agent.py       ← Pipecat voice pipeline
│   ├── vision_agent.py      ← Pipecat vision+voice pipeline
│   ├── mcp_client.py        ← MCP server connector
│   ├── auth/
│   │   ├── oauth_handler.py ← Google OAuth2 flow
│   │   └── token_manager.py ← Token refresh & persistence
│   ├── db/
│   │   ├── sqlite_manager.py← SQLite schema & queries
│   │   └── chroma_manager.py← ChromaDB vector store
│   └── tools/
│       ├── google_gmail.py  ← Gmail API integration
│       ├── google_calendar.py
│       ├── google_tasks.py
│       ├── google_drive.py
│       ├── web_search.py    ← DuckDuckGo search
│       ├── browser_tool.py  ← Playwright headless browser
│       ├── square_billing.py
│       ├── clicksend_sms.py
│       ├── business_card_scanner.py
│       ├── tool_creator.py  ← Dynamic tool generation
│       └── dynamic_tools/   ← Auto-generated tools
├── config/
│   ├── .env                 ← API keys (Setup Wizard)
│   ├── core_instructions.md ← System prompt
│   └── available_tools.json ← Tool registry
├── templates/              ← Jinja2 HTML templates
├── static/                 ← CSS, JS, icons
├── scripts/
│   ├── backup.sh            ← Daily backup → Google Drive
│   └── gdrive_upload.py     ← Drive upload utility
├── docker-compose.prod.yml
├── Dockerfile
└── requirements.txt