perseus

LLM relay. Chains Ollama (local) → OllamaFreeAPI → Gemini → OpenAI-compat. Pure httpx; no pydantic, no Rust dep — works on Python 3.15.

Source: llamaclaw/perseus.

Perseus — LLM relay for the llamaclaw ecosystem.

perseus.agent_available() bool[source]

Return True when at least one live LLM provider is available.

Returns:

True if a live provider is detected, False if only local fallback is available.

Return type:

bool

Examples

>>> isinstance(agent_available(), bool)
True
perseus.ask_percy(question: str, *, context: str | None = None, model: str | None = None, system_prompt: str = 'You are Perseus, the ESML agent for epidemiological semiparametric machine learning.\nHelp users understand datasets, methods, debugging steps, testing strategy, and interpretation.\nBe explicit about assumptions, limitations, missing data concerns, and reproducibility risks.\nDo not invent data access or approval status for restricted datasets.', allow_fallback: bool = True, stream: bool = False, use_agent: bool = True) dict[str, Any][source]

Query Perseus via the LLM provider chain.

When use_agent=True (default) and Ollama is available, Perseus uses the full agentic loop with 13 tools (search, execute, read/write, shell, data). Falls back to simple LLM chat or static text when tools are unavailable.

Returns a dict with mode, model, and either output_text (str) or output_stream (Iterator[str]).

perseus.build_prompt(question: str, context: str | None = None) str[source]

Build a prompt from a user question and optional context.

Pure-httpx OllamaFreeAPI client — no pydantic/Rust dependencies.

Drop-in replacement for the ollamafreeapi pip package. Uses the same bundled JSON model registry and the standard Ollama /api/generate HTTP endpoint, but relies only on httpx (already a core dependency) instead of the ollama SDK (which pulls in pydantic-core via Rust/PyO3).

This allows ESML to run on Python 3.15+ where PyO3 doesn’t yet have pre-built wheels.

class perseus.fam.OllamaFreeAPI[source]

Bases: object

Lightweight client for free community Ollama servers.

chat(prompt: str, model: str | None = None, **kwargs: Any) str[source]
chat_messages(messages: list[dict[str, str]], model: str | None = None, **kwargs: Any) str[source]

Send multi-turn messages via /api/chat (Ollama chat format).

get_model_info(model: str) dict[str, Any][source]
get_model_servers(model: str) list[dict[str, Any]][source]
list_families() list[str][source]
list_models(family: str | None = None) list[str][source]
stream_chat(prompt: str, model: str | None = None, **kwargs: Any) Iterator[str][source]

Ollama-first LLM integration layer for the ESML package.

Provides a provider chain that attempts local Ollama inference first, then OllamaFreeAPI (free remote models, no API key), then Gemini (Google), then a generic OpenAI-compatible endpoint (e.g. Qwen via OpenRouter, GPT-OSS models via Together/Groq), then the official OpenAI API, and finally a local help-text fallback that requires no network access.

HTTP-based providers use httpx against OpenAI-compatible endpoints. OllamaFreeAPI uses its own Python SDK (ollamafreeapi) for free remote model access without any API key.

Environment Variables

OLLAMA_BASE_URLstr

Base URL for a running Ollama instance. Default: http://localhost:11434

esmlfamstr

Override the OllamaFreeAPI model (esml free api model). Default: mistral-nemo:custom.

GEMINI_API_KEYstr

Google AI Studio API key. Free-tier keys work for development. Model defaults to gemini-2.0-flash.

GEMINI_MODELstr

Override the Gemini model (e.g. gemini-1.5-pro). Optional.

LLM_API_BASE_URLstr

Base URL for any OpenAI-compatible API (e.g., OpenRouter, Together, Groq). Use this to point at Qwen, Mistral, GPT-OSS, or any hosted model.

LLM_API_KEYstr

API key for the endpoint at LLM_API_BASE_URL.

OPENAI_API_KEYstr

API key for the official OpenAI API at https://api.openai.com.

Provider priority (auto-detected at runtime):
  1. Ollama — local, private, no API key needed

  2. FreeAPI — OllamaFreeAPI, free remote models, no API key

  3. Gemini — Google AI, generous free tier

  4. API — generic OpenAI-compatible (Qwen, GPT-OSS, Groq, etc.)

  5. OpenAI — official OpenAI API

  6. local — static help text, no network required

References

perseus.llm.agent_available() bool[source]

Return True when at least one live LLM provider is available.

Returns:

True if a live provider is detected, False if only local fallback is available.

Return type:

bool

Examples

>>> isinstance(agent_available(), bool)
True
perseus.llm.ask(prompt: str, context: dict[str, Any] | None = None, *, stream: bool = False, model: str | None = None, provider: str | None = None, system_prompt: str | None = None, timeout: float = 120.0) str | Iterator[str][source]

Send a prompt to the best available LLM provider and return the response.

The provider chain is: Ollama (local) -> OpenAI-compatible API -> OpenAI direct -> local fallback. Each provider is tried in order; on failure the next is attempted.

Parameters:
  • prompt (str) – The user’s question or instruction.

  • context (dict[str, Any] | None) – Optional context dictionary (e.g., from build_esml_context()). Injected into the system prompt to give the LLM awareness of available modules, CPADS schema, and the user’s working directory.

  • stream (bool) – If True, return an iterator of string chunks for streaming output. If False (default), return the full response as a single string.

  • model (str | None) – Override the model identifier. When None, a sensible default is chosen per provider.

  • provider (str | None) – Force a specific provider ("ollama", "api", "openai", "local"). When None, detect_available_provider() is used to auto-detect.

  • system_prompt (str | None) – Override the entire system prompt. When None, the standard ESML system prompt is built from the context parameter.

  • timeout (float) – HTTP request timeout in seconds.

Returns:

The LLM response text (or a streaming iterator of text chunks). When all providers fail, returns a local fallback help string.

Return type:

str | Iterator[str]

Examples

>>> # Non-streaming (returns full text)
>>> response = ask("What is AIPW?")
>>> isinstance(response, str)
True
>>> # Streaming
>>> for chunk in ask("Explain TMLE", stream=True):
...     print(chunk, end="")
perseus.llm.ask_multi(messages: list[dict[str, str]], *, stream: bool = False, model: str | None = None, provider: str | None = None, timeout: float = 120.0) str | Iterator[str][source]

Send a pre-built messages array to the best available LLM provider.

Unlike ask(), this accepts the full messages array directly, enabling multi-turn conversation support. The caller is responsible for constructing the system and user messages.

Parameters:
  • messages (list[dict[str, str]]) – The chat messages array (system, user, assistant turns).

  • stream (bool) – If True, return an iterator of string chunks.

  • model (str | None) – Override the model identifier.

  • provider (str | None) – Force a specific provider. Auto-detected when None.

  • timeout (float) – HTTP request timeout in seconds.

Returns:

The LLM response text (or a streaming iterator).

Return type:

str | Iterator[str]

perseus.llm.assistant_available() bool

Return True when at least one live LLM provider is available.

Returns:

True if a live provider is detected, False if only local fallback is available.

Return type:

bool

Examples

>>> isinstance(agent_available(), bool)
True
perseus.llm.build_esml_context(repo_root: str | Path | None = None) dict[str, Any][source]

Build an LLM-friendly context dictionary from the ESML package state.

The returned dictionary is designed to be injected into the system prompt so the LLM is aware of the available modules, the CPADS data contract, and the current working directory.

Parameters:

repo_root (str | Path | None) – Path to the ESML repository root. When None the function attempts to resolve the root from this file’s location.

Returns:

A dictionary with keys:

  • module_list – list of module name/description pairs.

  • cpads_schema – the CPADS data contract dictionary.

  • cwd – the current working directory as a string.

  • repo_root – the resolved repository root, or "unknown".

Return type:

dict[str, Any]

Examples

>>> ctx = build_esml_context()
>>> "module_list" in ctx and "cpads_schema" in ctx
True
perseus.llm.detect_available_provider() str[source]

Detect which LLM provider is currently available.

The detection order mirrors the provider chain priority:

  1. ollama – a local Ollama instance is reachable (probed via HTTP).

  2. freeapiollamafreeapi package is installed and servers respond.

  3. geminiGEMINI_API_KEY is set.

  4. apiLLM_API_BASE_URL and LLM_API_KEY are set.

  5. openaiOPENAI_API_KEY is set.

  6. local – no live provider; ESML will return static help text.

Returns:

One of "ollama", "freeapi", "gemini", "api", "openai", or "local".

Return type:

str

Examples

>>> provider = detect_available_provider()
>>> provider in ("ollama", "freeapi", "gemini", "api", "openai", "local")
True
perseus.llm.detect_model_display() dict[str, str][source]

Return display info with inner (family:size) and outer (model name).

Returns:

Keys: inner, outer, model, provider. HomeScreen format: LLM: {inner} [{outer}]

Return type:

dict[str, str]

perseus.llm.detect_provider_and_model() tuple[str, str][source]

Detect LLM provider and return (provider, human-readable model label).

Returns:

(provider_key, display_label) — e.g. ("freeapi", "Gemma3:4.3b").

Return type:

tuple[str, str]

perseus.llm.get_last_traceback() str[source]

Return the last Python traceback, if any, for error-context injection.

perseus.llm.list_freeapi_models() list[dict[str, str]][source]

List all available OllamaFreeAPI models from vendored JSONs.

Returns:

Each dict has keys: model, family, size, label, alias.

Return type:

list[dict[str, str]]

perseus.llm.pick_thinking_word(query: str) str[source]

Pick a context-aware thinking word based on the query, or a random one.

Local Ollama client for ESML.

Pure httpx-based client for a local Ollama instance running at localhost:11434. Provides model management (pull, list, remove), chat, and streaming — no external deps beyond httpx.

This module backs the ollama provider slot in esml.llm.

Environment Variables

OLLAMA_BASE_URLstr

Override the Ollama endpoint. Default: http://localhost:11434.

ESML_OLLAMA_MODELstr

Override the default local model. Default: gemma4:e2b.

class perseus.loc.LocalOllama(base_url: str | None = None, model: str | None = None, timeout: float = 300.0)[source]

Bases: object

Client for a local Ollama instance.

Parameters:
  • base_url (str, optional) – Override OLLAMA_BASE_URL.

  • model (str, optional) – Override ESML_OLLAMA_MODEL.

  • timeout (float, optional) – Request timeout in seconds (default 120).

Examples

>>> client = LocalOllama()
>>> client.is_running()
True
>>> models = client.list_models()
>>> response = client.chat("What is IPW?")
chat(prompt: str, *, model: str | None = None, system: str | None = None, context: list[dict[str, str]] | None = None, temperature: float = 0.1, num_predict: int = 4096) str[source]

Send a chat message and return the full response.

Parameters:
  • prompt (str) – User message.

  • model (str, optional) – Override the default model.

  • system (str, optional) – System prompt.

  • context (list, optional) – Prior messages as [{"role": "user", "content": "..."}, ...].

  • temperature (float) – Sampling temperature.

  • num_predict (int) – Max tokens to generate.

Returns:

The assistant’s response text.

Return type:

str

generate(prompt: str, *, model: str | None = None, system: str | None = None, stream: bool = False, temperature: float = 0.1, num_predict: int = 4096) str | Iterator[str][source]

Raw generation endpoint (non-chat). Returns full text or stream.

has_model(name: str) bool[source]

Check if a specific model is available locally.

is_running(timeout: float = 2.0) bool[source]

Check if Ollama is reachable.

list_models() list[ModelInfo][source]

List locally available models.

property model: str

Active model — auto-detected from Ollama if not explicitly set.

model_names() list[str][source]

Return just the model name strings.

pull(name: str, *, stream: bool = True, timeout: float = 600.0) Iterator[dict[str, Any]] | dict[str, Any][source]

Pull (download) a model.

Parameters:
  • name (str) – Model name, e.g. gemma3:4b or llama3.2:3b.

  • stream (bool) – If True, yield progress dicts as they arrive.

  • timeout (float) – Download timeout (default 600s).

Yields:

dict – Progress updates with status, digest, total, completed.

remove(name: str) bool[source]

Delete a local model. Returns True on success.

show(name: str | None = None) dict[str, Any][source]

Get model details (parameters, template, license).

stream_chat(prompt: str, *, model: str | None = None, system: str | None = None, context: list[dict[str, str]] | None = None, temperature: float = 0.1, num_predict: int = 4096) Iterator[str][source]

Stream chat response chunks.

Yields:

str – Content chunks as they arrive from the model.

class perseus.loc.ModelInfo(name: str, size: int = 0, parameter_size: str = '', family: str = '', quantization: str = '', modified_at: str = '')[source]

Bases: object

Metadata for a locally available Ollama model.

family: str = ''
property label: str
modified_at: str = ''
name: str
parameter_size: str = ''
quantization: str = ''
size: int = 0
property size_gb: float

Perseus — the ESML resident AI agent.

Delegates to the provider-chain in esml.llm. The LLM module handles Ollama, Gemini, OpenAI-compatible APIs, direct OpenAI, and a local static fallback.

When stream=True is passed to ask_percy(), the returned dictionary contains an output_stream key (an iterator of string chunks) instead of output_text.

perseus.perseus.ask_esml_assistant(question: str, *, context: str | None = None, model: str | None = None, system_prompt: str = 'You are Perseus, the ESML agent for epidemiological semiparametric machine learning.\nHelp users understand datasets, methods, debugging steps, testing strategy, and interpretation.\nBe explicit about assumptions, limitations, missing data concerns, and reproducibility risks.\nDo not invent data access or approval status for restricted datasets.', allow_fallback: bool = True, stream: bool = False, use_agent: bool = True) dict[str, Any]

Query Perseus via the LLM provider chain.

When use_agent=True (default) and Ollama is available, Perseus uses the full agentic loop with 13 tools (search, execute, read/write, shell, data). Falls back to simple LLM chat or static text when tools are unavailable.

Returns a dict with mode, model, and either output_text (str) or output_stream (Iterator[str]).

perseus.perseus.ask_percy(question: str, *, context: str | None = None, model: str | None = None, system_prompt: str = 'You are Perseus, the ESML agent for epidemiological semiparametric machine learning.\nHelp users understand datasets, methods, debugging steps, testing strategy, and interpretation.\nBe explicit about assumptions, limitations, missing data concerns, and reproducibility risks.\nDo not invent data access or approval status for restricted datasets.', allow_fallback: bool = True, stream: bool = False, use_agent: bool = True) dict[str, Any][source]

Query Perseus via the LLM provider chain.

When use_agent=True (default) and Ollama is available, Perseus uses the full agentic loop with 13 tools (search, execute, read/write, shell, data). Falls back to simple LLM chat or static text when tools are unavailable.

Returns a dict with mode, model, and either output_text (str) or output_stream (Iterator[str]).

perseus.perseus.build_assistant_prompt(question: str, context: str | None = None) str

Build a prompt from a user question and optional context.

perseus.perseus.build_prompt(question: str, context: str | None = None) str[source]

Build a prompt from a user question and optional context.

Perseus Relay — serve Perseus as a cloud API endpoint.

Run on Pi (or any machine with Ollama) to let remote users access Perseus with full tool-calling capabilities over the internet.

Usage:

python -m esml.perseus_relay # default :8421 python -m esml.perseus_relay –port 9000 # custom port python -m esml.perseus_relay –token mysecret # require auth token

Then from any machine:

esml percy –cloud https://your-server:8421 “What is Moran’s I?”

Or set PERSEUS_CLOUD_URL in .env and it auto-connects.

Security: The relay only exposes Perseus agent capabilities (search, run functions, read files within sandbox). No shell access, no filesystem writes outside the project. Optional token auth for production use.

class perseus.perseus_relay.PerseusCloudClient(url: str, token: str | None = None)[source]

Bases: object

Client for connecting to a remote Perseus relay.

ask(question: str, *, timeout: float = 120.0) dict[str, Any][source]
health(timeout: float = 5.0) dict[str, Any][source]
is_available() bool[source]
class perseus.perseus_relay.PerseusRelayHandler(request, client_address, server)[source]

Bases: BaseHTTPRequestHandler

agent = None
auth_token = None
do_GET()[source]
do_POST()[source]
log_message(format, *args)[source]

Log an arbitrary message.

This is used by all other logging functions. Override it if you have specific logging wishes.

The first argument, FORMAT, is a format string for the message to be logged. If the format string contains any % escapes requiring parameters, they should be specified as subsequent arguments (it’s just like printf!).

The client ip and current date/time are prefixed to every message.

Unicode control characters are replaced with escaped hex before writing the output to stderr.

perseus.perseus_relay.main()[source]
perseus.perseus_relay.serve(port: int = 8421, token: str | None = None, bind: str = '0.0.0.0')[source]