Pipeline placement

This page explains where SotsAI should live in an LLM pipeline.

The mental model

Think of your pipeline as three distinct concerns:

Facts — what is true? (RAG, databases, APIs)
Behavior — how should this be communicated?
Language — how should this be phrased?

SotsAI owns #2.

Your LLM owns #3.

Recommended high-level pipeline

A typical production flow looks like this:

1. User submits a request
2. Orchestrator classifies intent (optional but recommended)
3. Orchestrator gathers context (facts, constraints, history)
4. Call SotsAI for behavioral reasoning (when a user psychometric profile is available)
5. LLM generates the final response using SotsAI output
6. Response is returned to the user

SotsAI should be called before final text generation.

Before or after RAG?

Call SotsAI after RAG

Recommended order:

1. RAG retrieves factual context
2. Orchestrator summarizes relevant facts
3. Call SotsAI with:
   - situation context
   - profiles
4. LLM generates the final response

Why this works:

SotsAI reasons on meaning, not raw documents
You avoid leaking sensitive documents
Behavioral guidance stays focused

Don’t call SotsAI before RAG

Avoid:

calling SotsAI before you have the relevant facts (when factual context matters)
passing raw documents or embeddings
asking SotsAI to interpret source material

SotsAI is not a knowledge engine.

Tool-calling vs direct calls

Tool-calling (most common)

In this setup:

the LLM decides when to call SotsAI within the set of tools exposed by the orchestrator
your backend executes the call
the LLM consumes the structured output

This works well when:

you already use tools
you want conditional activation
multiple tools coexist

Direct orchestration calls

In this setup:

your backend decides when to call SotsAI
the LLM never sees the decision logic
the LLM only sees the result

This works well when:

rules are explicit
behavioral reasoning is mandatory
you want full determinism

Multiple calls per interaction?

Usually no. One SotsAI call per interaction is enough.

Avoid:

calling SotsAI per message chunk
calling it inside streaming loops
chaining multiple behavioral calls

If you need multiple calls, it usually means:

profiles are missing
intent classification is unclear
orchestration logic is too implicit

Stateless by design

SotsAI is stateless.

This means:

no conversation memory
no session tracking
no implicit context

You must provide:

the situation context
relevant profiles
any constraints

This makes behavior:

predictable
auditable
cacheable

What NOT to put in the prompt

Do not embed behavioral logic like:

“adapt tone to personality”
“be careful with sensitive people”
“this person doesn’t like feedback”

That logic belongs in data, not prompts.

Your prompt should consume SotsAI output, not recreate it.

Minimal example (conceptual)

User input
↓
Intent classification
↓
Context assembly (facts + situation)
↓
SotsAI call (behavioral reasoning)
↓
LLM generation (language + tone)
↓
Final response