CortexOS Logo

CortexOS

Your AI Operating System — Voice, Vision, and Total Control of Your Digital Life

🧠 AI Agent 🗣️ Voice + Vision ⚡ 25+ Tools
🤖

What is CortexOS?

CortexOS is a personal AI operating system that acts as your intelligent assistant across every communication channel. Unlike simple chatbots, CortexOS can see, hear, speak, remember, and take action — managing your email, calendar, files, messages, and more.

📧

Email Management

"Read my latest emails and reply to the one from Marco"

📅

Calendar Control

"Schedule a meeting with the team for tomorrow at 3 PM"

👁️

Vision AI

"What do you see in front of me?" — using your live camera

🔍

Web Research

"Research the latest AI news and summarize the top 5 articles"

💳

Billing & Invoices

"Send an invoice for $500 to the client"

🧠

Long-Term Memory

"Remember that I prefer meetings in the afternoon"

🌐

6 Communication Channels

Talk to CortexOS however you want — by text, voice, video, or even a phone call. Every channel has access to the same brain, the same tools, and the same memory.

💬

Web Chat

Browser-based chat at /chat

✈️

Telegram

Full bot with photo & voice support

📱

WhatsApp

Via Evolution API — text and media

📞

Phone Call

Twilio SIP — natural voice conversation

🗣️

Voice Agent

WebRTC real-time voice via Daily

👁️

Vision Agent

Live camera + voice — CortexOS can see

🛠️

Complete Tool Suite

Over 25 built-in tools, plus the ability to create new ones on the fly. CortexOS doesn't just talk — it takes action.

📋 Productivity & Google Workspace

📧

Gmail Manager

Read, search, and send emails. Filters by primary inbox by default. Supports full Gmail search queries — from, to, labels, unread, attachments.

list_emails · read_email · send_email
📅

Calendar Manager

View upcoming events, create meetings with timezone support, delete events. Multi-calendar support with custom date ranges.

list_events · create_event · delete_event

Tasks Manager

Full Google Tasks CRUD — create, list, complete, and delete tasks. Supports multiple task lists and due dates.

list_tasks · create_task · complete_task · delete_task
📁

Google Drive

Create, read, update, search, and list files on Google Drive. Save reports, notes, documents directly from conversation.

drive_create · drive_read · drive_update · drive_list
👥

Google Contacts

Save new contacts with full details (name, email, phone, company, title) and search existing contacts by any field.

save_contact · search_contacts
📇

Business Card Scanner

Take a photo of a business card — CortexOS reads it with vision AI and automatically saves the contact to Google Contacts.

scan_business_card

💬 Communication

📱

WhatsApp Messages

Send WhatsApp messages to any phone number via the Evolution API integration. International format, text messages.

send_whatsapp
📲

SMS via ClickSend

Send professional SMS text messages worldwide. Customizable sender ID, international phone format, 480 character limit.

send_sms

🧠 Intelligence & Research

🔍

Web Search

Real-time internet search via DuckDuckGo + Tavily. Research topics, check news, find factual data that CortexOS doesn't have locally.

web_search · tavily_search · tavily_research
🌐

Browser Automation

Full headless browser: navigate, click, type, fill forms, take screenshots, extract text and links. Multi-step web interactions.

browse_web · browser_action
🧠

Archival Memory

Semantic long-term memory powered by ChromaDB. Stores and recalls preferences, facts, and conversation history across sessions. Per-user isolation.

store_memory · search_archival_memory

Time & Date

Current time awareness for scheduling, timezone conversions, and time-sensitive operations.

get_current_time

💰 Business & Billing

💳

Square Billing

Create invoices, process payments, manage customers via Square POS integration. Supports sandbox and production environments.

square_billing
⚙️

Config Manager

CortexOS can modify its own instructions, daily briefing routines, and user profiles at runtime. Self-improving system.

manage_config

🧬 Self-Evolution

🔧

Dynamic Tool Creator

CortexOS can create new tools on its own. Need a tool that doesn't exist? Just describe it — CortexOS writes the Python code, tests it in a sandbox, registers it, and makes it available instantly. With automatic rollback if anything fails.

create_tool · delete_tool
🔌

MCP Integration

Model Context Protocol (MCP) server support for extending capabilities. Currently connected to Tavily for advanced web research with search, extract, crawl, and map tools.

tavily_search · tavily_extract · tavily_crawl · tavily_map

How It Works

CortexOS uses a "Think → Act → Observe" loop to solve problems step by step, just like a human assistant would.

💭 You ask a question
🧠 CortexOS thinks
🛠️ Uses tools if needed
👀 Observes result
💬 Responds naturally

💡 Example: "What's on my schedule today and email the summary to Marco"

Step 1 — Think: I need to check the calendar and then compose an email.
Step 2 — Act: Calls list_events for today's date.
Step 3 — Observe: Gets 3 events: standup at 9, client call at 11, team review at 3.
Step 4 — Act: Calls search_contacts for "Marco" to find his email.
Step 5 — Observe: Marco's email is marco@company.com.
Step 6 — Act: Calls send_email with a formatted summary.
Step 7 — Reply: "Done! I sent Marco your schedule for today."
🏗️

Technical Architecture

For developers and technical users — how CortexOS is built under the hood.

Communication Layer
💬 Web Chat
✈️ Telegram Bot
📱 WhatsApp
📞 Twilio Voice
🗣️ Voice (Daily)
👁️ Vision (WebRTC)
Core Engine
🧠 Agent Core
ReAct Loop
🔀 LLM Router
Failover
🔐 Auth Manager
OAuth2
Tool Layer
📧 Gmail
📅 Calendar
📁 Drive
🔍 Web Search
💳 Square
🔧 Dynamic Tools
Data Layer
🗄️ SQLite
Conversations, Logs
🧠 ChromaDB
Vector Memory
📄 Config Files
.env, Identity
⚙️

Technical Specifications

🧠 AI Models & Routing

  • Primary: Google Gemini 2.0 Flash (generativelanguage API)
  • Fallback: Anthropic Claude Sonnet 4
  • Local: Ollama / vLLM (OpenAI-compatible)
  • Voice/Vision: Gemini 2.5 Flash Native Audio (bidi)
  • Automatic failover: Primary → Fallback → Local
  • ReAct loop with max 10 iterations safety cap

🏛️ Framework & Runtime

  • Backend: FastAPI (Python 3.12, async)
  • Voice/Vision: Pipecat (WebRTC pipelines)
  • WebRTC: Daily.co transport
  • Container: Docker + Docker Compose
  • Reverse proxy: Traefik (auto SSL)
  • Config: Pydantic-Settings with .env

🗄️ Databases

  • SQLite: Conversations, logs, audit trail (WAL mode)
  • ChromaDB: Semantic vector memory (per-user)
  • Tables: conversations, logs, agent_state, tool_executions
  • Collections: agent_memory, tool_docs
  • Automatic daily backups to Google Drive

🔐 Security & Authentication

  • OAuth2: Google authorization code flow
  • Scopes: Gmail, Calendar, Tasks, Drive, Contacts
  • Token auto-refresh via token_manager.py
  • Web login: Google Sign-In with signed session cookies
  • Role-based access: admin vs user dashboard
  • Destructive actions require user confirmation

🧠 Memory Architecture

  • Working Memory: Identity (IDENTITY.md) + User Profile (USER_PROFILE.md)
  • Archival Memory: ChromaDB semantic search
  • Conversation History: SQLite (last 20 messages)
  • Per-user memory isolation (by telegram_id)
  • Agent can store and recall memories via tools
  • Self-modifying config: EXTRA_INSTRUCTIONS.md

🔧 Dynamic Tool System

  • Agent generates Python code from description
  • Sandbox testing via py_compile
  • Auto-registration in available_tools.json
  • Timestamped backups before changes
  • Automatic rollback on failure
  • Dynamic import via importlib at runtime

📡 Channel Integrations

  • Telegram: python-telegram-bot (async webhook)
  • WhatsApp: Evolution API (self-hosted)
  • Voice: Pipecat + Daily.co + Gemini bidi API
  • Vision: Camera → 1 FPS JPEG → Gemini realtime_input
  • Phone: Twilio SIP + Pipecat pipeline
  • Web: Jinja2 templates + WebSocket

📦 Infrastructure

  • Server: VPS (96 GB disk, Ubuntu)
  • Containers: CortexOS, n8n, Evolution API, PostgreSQL, Traefik
  • CI/CD: Git push → deploy.sh → Docker rebuild
  • Backups: Daily at 2 AM → Google Drive (14 retained)
  • Maintenance: Weekly Docker prune (Sundays 3 AM)
  • Monitoring: /health endpoint + structured JSON logs
👁️

Vision + Voice Pipeline

The most advanced feature — real-time video and voice conversation with AI that can see your camera.

📷 Camera
Daily WebRTC
JPEG 1 FPS
Gemini bidi API
Native Audio
🔊 Speaker

🔬 How the Vision Pipeline Works

📂

Project Structure

CortexOS/
├── app/
│   ├── main.py              ← FastAPI entry point, routing
│   ├── agent_core.py        ← ReAct brain (1440 lines)
│   ├── llm_router.py        ← Multi-model failover
│   ├── config_manager.py    ← Pydantic-Settings config
│   ├── telegram_bot.py      ← Telegram webhook handler
│   ├── whatsapp_handler.py  ← Evolution API handler
│   ├── voice_agent.py       ← Pipecat voice pipeline
│   ├── vision_agent.py      ← Pipecat vision+voice pipeline
│   ├── mcp_client.py        ← MCP server connector
│   ├── auth/
│   │   ├── oauth_handler.py ← Google OAuth2 flow
│   │   └── token_manager.py ← Token refresh & persistence
│   ├── db/
│   │   ├── sqlite_manager.py← SQLite schema & queries
│   │   └── chroma_manager.py← ChromaDB vector store
│   └── tools/
│       ├── google_gmail.py  ← Gmail API integration
│       ├── google_calendar.py
│       ├── google_tasks.py
│       ├── google_drive.py
│       ├── web_search.py    ← DuckDuckGo search
│       ├── browser_tool.py  ← Playwright headless browser
│       ├── square_billing.py
│       ├── clicksend_sms.py
│       ├── business_card_scanner.py
│       ├── tool_creator.py  ← Dynamic tool generation
│       └── dynamic_tools/   ← Auto-generated tools
├── config/
│   ├── .env                 ← API keys (Setup Wizard)
│   ├── core_instructions.md ← System prompt
│   └── available_tools.json ← Tool registry
├── templates/              ← Jinja2 HTML templates
├── static/                 ← CSS, JS, icons
├── scripts/
│   ├── backup.sh            ← Daily backup → Google Drive
│   └── gdrive_upload.py     ← Drive upload utility
├── docker-compose.prod.yml
├── Dockerfile
└── requirements.txt