Pipeline placement
This page explains where SotsAI should live in an LLM pipeline.
The mental model
Section titled “The mental model”Think of your pipeline as three distinct concerns:
- Facts — what is true? (RAG, databases, APIs)
- Behavior — how should this be communicated?
- Language — how should this be phrased?
SotsAI owns #2.
Your LLM owns #3.
Recommended high-level pipeline
Section titled “Recommended high-level pipeline”A typical production flow looks like this:
1. User submits a request2. Orchestrator classifies intent (optional but recommended)3. Orchestrator gathers context (facts, constraints, history)4. Call SotsAI for behavioral reasoning (when a user psychometric profile is available)5. LLM generates the final response using SotsAI output6. Response is returned to the userSotsAI should be called before final text generation.
Before or after RAG?
Section titled “Before or after RAG?”Call SotsAI after RAG
Section titled “Call SotsAI after RAG”Recommended order:
1. RAG retrieves factual context2. Orchestrator summarizes relevant facts3. Call SotsAI with: - situation context - profiles4. LLM generates the final responseWhy this works:
- SotsAI reasons on meaning, not raw documents
- You avoid leaking sensitive documents
- Behavioral guidance stays focused
Don’t call SotsAI before RAG
Section titled “Don’t call SotsAI before RAG”Avoid:
- calling SotsAI before you have the relevant facts (when factual context matters)
- passing raw documents or embeddings
- asking SotsAI to interpret source material
SotsAI is not a knowledge engine.
Tool-calling vs direct calls
Section titled “Tool-calling vs direct calls”Tool-calling (most common)
Section titled “Tool-calling (most common)”In this setup:
- the LLM decides when to call SotsAI within the set of tools exposed by the orchestrator
- your backend executes the call
- the LLM consumes the structured output
This works well when:
- you already use tools
- you want conditional activation
- multiple tools coexist
Direct orchestration calls
Section titled “Direct orchestration calls”In this setup:
- your backend decides when to call SotsAI
- the LLM never sees the decision logic
- the LLM only sees the result
This works well when:
- rules are explicit
- behavioral reasoning is mandatory
- you want full determinism
Multiple calls per interaction?
Section titled “Multiple calls per interaction?”Usually no. One SotsAI call per interaction is enough.
Avoid:
- calling SotsAI per message chunk
- calling it inside streaming loops
- chaining multiple behavioral calls
If you need multiple calls, it usually means:
- profiles are missing
- intent classification is unclear
- orchestration logic is too implicit
Stateless by design
Section titled “Stateless by design”SotsAI is stateless.
This means:
- no conversation memory
- no session tracking
- no implicit context
You must provide:
- the situation context
- relevant profiles
- any constraints
This makes behavior:
- predictable
- auditable
- cacheable
What NOT to put in the prompt
Section titled “What NOT to put in the prompt”Do not embed behavioral logic like:
- “adapt tone to personality”
- “be careful with sensitive people”
- “this person doesn’t like feedback”
That logic belongs in data, not prompts.
Your prompt should consume SotsAI output, not recreate it.
Minimal example (conceptual)
Section titled “Minimal example (conceptual)”User input↓Intent classification↓Context assembly (facts + situation)↓SotsAI call (behavioral reasoning)↓LLM generation (language + tone)↓Final response