Auto Analyst 2.0 - chat with chatgpt

Auto-Analyst 2.0 — Key Take-aways

Auto-Analyst 2.0 — The AI data analytics system | by Arslan Shahid | FireBird Technologies | Medium

Area	Highlights
Project scope & license	Second major release of Auto-Analyst, now fully open-sourced under the MIT licence with a public demo at autoanalyst.ai.
User interface	Built in Streamlit for rapid iteration. Users can chat with the system or address individual agents, upload a CSV (or use sample data), and see Python stdout plus Plotly charts, tables and text in-line.
Agent architecture	Doer agents (currently four: data pre-processing, statistical-analytics, machine-learning, data-viz) each have a DSPy signature and always return code + commentary. Helper agents (e.g. planner for routing, code_fix for error repair) augment the doers rather than producing analytics themselves.
Query-routing logic	1) If a prompt explicitly names an agent (@data_viz_agent) it is routed directly. 2) Otherwise, the planner agent decomposes the request and delegates to the right doer agent(s).
Backend design principles	• Clear separation of concerns (code generation vs. orchestration). • Use of retrievers for context (dataset & styling index) inside each doer agent. • Stream-captured execution so users see exactly what ran.
Immediate roadmap	• Automated prompt optimisation with DSPy• Stronger code-fix pipeline using RAG over a library of common errors• Faster or alternative UI to reduce Streamlit latency• Add more specialised agents for new tasks.
Open questions (long-term)	1) What is the optimal agent granularity?2) Should industry-specific agents/systems be split or combined?3) What UX (chat vs. dashboards) best serves data-analytics use-cases?

Bottom line: Auto-Analyst 2.0 packages a modular, agentic data-analytics workflow (built with DSPy) behind a low-code Streamlit UI, and invites community contribution to refine prompts, error-handling and domain-specific extensions.

Helper agents vs does agents

1. Two layers of agents inside Auto-Analyst 2.0

Layer	Purpose	Typical Output
Doer agents	Perform the analytics work (pre-processing, statistics, ML, visualisation)	Python code + human-readable commentary (e.g., a Plotly chart and a textual explanation)
Helper agents	==Support the doers so they can finish their job reliably==	Plans, repaired code, memory look-ups, etc. – never the final analytical result

Helper agents are therefore “meta–agents”: they coordinate, heal or enrich the workflow but don’t analyse the data themselves.

2. Planner agent – the traffic controller

Why it exists

Natural-language prompts are ambiguous. The system needs an LLM that can look at the user’s request and decide which analytic steps are required.
Keeps doers single-purpose. Each doer stays small and focused (e.g., visualise or run regression) while the planner stitches them together.

How it works

Stage	What the planner does	Example
Parse	Reads the raw user query plus any chat context.	“Cluster customers by RFM score and draw a scatter plot.”
Decompose	Breaks the job into ordered sub-tasks.	① call sk_learn_agent for K-Means, ② call data_viz_agent for scatter
Route	Emits a JSON/YAML “plan” that names the target doer(s) and passes the right sub-prompts.	yaml\n- agent: sk_learn_agent\n goal: cluster …\n- agent: data_viz_agent\n goal: scatter …
Dispatch	Orchestrator executes the plan step-by-step; each doer receives only what it needs.	—

If the user explicitly writes @data_viz_agent …, the router bypasses the planner and calls that agent directly.

3. Code-fix agent – the automatic repair shop

Trigger: Any Python exception raised while executing the doer’s code.

Inputs (DSPy signature)

faulty_code      # entire cell that crashed
error            # full traceback
previous_fixes   # optional user or system hints

Outputs

fixed_code       # only the patched section(s)

Behaviour

Reads the stack-trace, locates the failure line(s).
Generates a minimal patch – it must not rewrite the whole script, only the broken part.
Returns the patch to the orchestrator, which re-runs the cell.
If it still fails, the agent can loop or escalate to the user

4. End-to-end call sequence

sequenceDiagram
  participant U as User
  participant P as Planner
  participant D1 as Doer A (sk_learn_agent)
  participant D2 as Doer B (data_viz_agent)
  participant F as Code_fix
  Note over U: "Cluster customers and plot"
  U->>P: natural-language query
  P->>D1: sub-prompt "cluster customers"
  D1->>Orchestrator: code + commentary
  Note over Orchestrator,D1: run code
  Orchestrator->>F: traceback (if error)
  F->>Orchestrator: patched code
  Orchestrator->>D2: sub-prompt "plot clusters"
  D2->>Orchestrator: code + commentary
  Orchestrator->>U: Streamlit UI (chart, text)

5. Why this separation matters

Benefit	Explanation
Reliability	Code-fix lowers crash probability without complicating every doer’s prompt.
Composability	New helpers (e.g., memory, style-enforcer, unit-tester) can be added without touching existing analytics logic.
Prompt hygiene	Doers stay small – easier to train, benchmark and debug.
Observability	Users see the plan, the code, and any fixes as discrete steps, which builds trust.

6. Extending the helper layer

memory_agent – retrieves prior conversations or dataset glossaries for context.
doc_agent – turns code & commentary into docs automatically
eval_agent – runs unit tests or statistical diagnostics on doer output.

Because helpers are just DSPy signatures wired into the orchestrator, adding one is mostly prompt-engineering plus a short routing rule.

In short: helper agents are the “glue code” of Auto-Analyst. They empower but never replace the specialised doers, giving the system a modular, fault-tolerant architecture that is easy to extend and reason about.

Explorer