Auto-Analyst 2.0 — Key Take-aways
Auto-Analyst 2.0 — The AI data analytics system | by Arslan Shahid | FireBird Technologies | Medium
Area | Highlights |
---|---|
Project scope & license | Second major release of Auto-Analyst, now fully open-sourced under the MIT licence with a public demo at autoanalyst.ai. |
User interface | Built in Streamlit for rapid iteration. Users can chat with the system or address individual agents, upload a CSV (or use sample data), and see Python stdout plus Plotly charts, tables and text in-line. |
Agent architecture | Doer agents (currently four: data pre-processing, statistical-analytics, machine-learning, data-viz) each have a DSPy signature and always return code + commentary. Helper agents (e.g. planner for routing, code_fix for error repair) augment the doers rather than producing analytics themselves. |
Query-routing logic | 1) If a prompt explicitly names an agent (@data_viz_agent) it is routed directly. 2) Otherwise, the planner agent decomposes the request and delegates to the right doer agent(s). |
Backend design principles | • Clear separation of concerns (code generation vs. orchestration). • Use of retrievers for context (dataset & styling index) inside each doer agent. • Stream-captured execution so users see exactly what ran. |
Immediate roadmap | • Automated prompt optimisation with DSPy• Stronger code-fix pipeline using RAG over a library of common errors• Faster or alternative UI to reduce Streamlit latency• Add more specialised agents for new tasks. |
Open questions (long-term) | 1) What is the optimal agent granularity?2) Should industry-specific agents/systems be split or combined?3) What UX (chat vs. dashboards) best serves data-analytics use-cases? |
Bottom line: Auto-Analyst 2.0 packages a modular, agentic data-analytics workflow (built with DSPy) behind a low-code Streamlit UI, and invites community contribution to refine prompts, error-handling and domain-specific extensions.
Helper agents vs does agents
1. Two layers of agents inside Auto-Analyst 2.0
Layer | Purpose | Typical Output |
---|---|---|
Doer agents | Perform the analytics work (pre-processing, statistics, ML, visualisation) | Python code + human-readable commentary (e.g., a Plotly chart and a textual explanation) |
Helper agents | ==Support the doers so they can finish their job reliably== | Plans, repaired code, memory look-ups, etc. – never the final analytical result |
Helper agents are therefore “meta–agents”: they coordinate, heal or enrich the workflow but don’t analyse the data themselves.
2. Planner agent – the traffic controller
Why it exists
- Natural-language prompts are ambiguous. The system needs an LLM that can look at the user’s request and decide which analytic steps are required.
- Keeps doers single-purpose. Each doer stays small and focused (e.g., visualise or run regression) while the planner stitches them together.
How it works
Stage | What the planner does | Example |
---|---|---|
Parse | Reads the raw user query plus any chat context. | “Cluster customers by RFM score and draw a scatter plot.” |
Decompose | Breaks the job into ordered sub-tasks. | ① call sk_learn_agent for K-Means, ② call data_viz_agent for scatter |
Route | Emits a JSON/YAML “plan” that names the target doer(s) and passes the right sub-prompts. | yaml\n- agent: sk_learn_agent\n goal: cluster …\n- agent: data_viz_agent\n goal: scatter … |
Dispatch | Orchestrator executes the plan step-by-step; each doer receives only what it needs. | — |
If the user explicitly writes @data_viz_agent …, the router bypasses the planner and calls that agent directly.
3. Code-fix agent – the automatic repair shop
Trigger: Any Python exception raised while executing the doer’s code.
Inputs (DSPy signature)
faulty_code # entire cell that crashed
error # full traceback
previous_fixes # optional user or system hints
Outputs
fixed_code # only the patched section(s)
Behaviour
- Reads the stack-trace, locates the failure line(s).
- Generates a minimal patch – it must not rewrite the whole script, only the broken part.
- Returns the patch to the orchestrator, which re-runs the cell.
- If it still fails, the agent can loop or escalate to the user
4. End-to-end call sequence
sequenceDiagram participant U as User participant P as Planner participant D1 as Doer A (sk_learn_agent) participant D2 as Doer B (data_viz_agent) participant F as Code_fix Note over U: "Cluster customers and plot" U->>P: natural-language query P->>D1: sub-prompt "cluster customers" D1->>Orchestrator: code + commentary Note over Orchestrator,D1: run code Orchestrator->>F: traceback (if error) F->>Orchestrator: patched code Orchestrator->>D2: sub-prompt "plot clusters" D2->>Orchestrator: code + commentary Orchestrator->>U: Streamlit UI (chart, text)
5. Why this separation matters
Benefit | Explanation |
---|---|
Reliability | Code-fix lowers crash probability without complicating every doer’s prompt. |
Composability | New helpers (e.g., memory, style-enforcer, unit-tester) can be added without touching existing analytics logic. |
Prompt hygiene | Doers stay small – easier to train, benchmark and debug. |
Observability | Users see the plan, the code, and any fixes as discrete steps, which builds trust. |
6. Extending the helper layer
-
memory_agent – retrieves prior conversations or dataset glossaries for context.
-
doc_agent – turns code & commentary into docs automatically
-
eval_agent – runs unit tests or statistical diagnostics on doer output.
Because helpers are just DSPy signatures wired into the orchestrator, adding one is mostly prompt-engineering plus a short routing rule.
In short: helper agents are the “glue code” of Auto-Analyst. They empower but never replace the specialised doers, giving the system a modular, fault-tolerant architecture that is easy to extend and reason about.