Skip to content
Protean LabsDocs

Model Layer

Protean Labs uses models as routed capabilities inside a controlled runtime. They support proposal, extraction, embedding, explanation, and review context. They do not override validators, scoring contracts, or scientific review.

The model layer is local-first and task-specific. Each route has a role, a fallback, and a boundary.

Model Classes

Protean’s platform currently organizes model capabilities into several classes:

  • Protein sequence embeddings for peptide vectorization, similarity, novelty context, and comparison against stable or failed examples.
  • Local reasoning models for proposal support, evidence extraction, candidate explanation, and failure reasoning.
  • Text embedding and reranking models for source context, document similarity, and evidence organization.
  • Deterministic heuristics for baseline features, validation, scoring, ranking, and degraded operation.

Sequence Embeddings

Protean supports ESM-family protein language models for peptide sequence embeddings. The local default is an ESM-2 style model suitable for CPU-oriented development and controlled sequence feature generation.

Embeddings can support:

  • Sequence vectorization.
  • Similarity to stable examples.
  • Similarity to failure examples.
  • Novelty context.
  • Candidate clustering.
  • Additional review features for ranking explanations.

Embeddings do not prove stability, permeability, toxicity, activity, oral bioavailability, or synthesizability. They are decision-support signals layered underneath deterministic validation and scientific review.

Local Reasoning Routes

Local reasoning models are used as proposal and interpretation surfaces. Qwen-family and similar local instruction models can support peptide proposal, literature extraction, failure reasoning, and candidate explanation when routed and available.

The route matters. A model used for candidate explanation is not automatically trusted for candidate acceptance. A model used for proposal is not allowed to accept its own outputs.

proposal route
-> constrained prompt context
-> candidate suggestions
-> deterministic validation
-> scoring and failure checks
-> reviewable rationale

Text Embeddings And Reranking

Text embedding and reranking models can support evidence organization and document-level context. Their role is to help the platform retrieve and structure relevant source material without turning retrieval into a biological claim.

Typical roles include:

  • Grouping related evidence records.
  • Supporting literature context retrieval.
  • Improving source ranking for review.
  • Helping distinguish repeated source material from genuinely new signal.

Deterministic Authority

Models sit inside the runtime. They do not own it.

Deterministic systems remain authoritative for:

  • Residue validity.
  • Length and format checks.
  • Low-complexity rejection.
  • Cleavage and failure motif warnings.
  • Scoring caps.
  • Ranking order when model routes are unavailable.
  • Bounded learning limits.

This structure lets Protean use strong local models without allowing model behavior to become uncontrolled infrastructure behavior.

Degraded Operation

If an embedding model or reasoning route is unavailable, the system remains operational through deterministic descriptors, k-mer similarity, rule-based extraction, heuristic scoring, and structured explanations.

Degraded operation is intentional. It ensures the platform can continue to produce reproducible candidate state while clearly marking where richer model support was unavailable.

What Is Public

The public documentation describes the model roles, control boundaries, and safety posture. The exact routing policy, prompt design, weighting strategy, and accumulated failure memory remain part of Protean’s proprietary discovery infrastructure.