Skip to main content

Command Palette

Search for a command to run...

From Hallucinations to Execution: Building an Autonomous SQL Agent with Qwen 2.5

Updated
5 min read
From Hallucinations to Execution: Building an Autonomous SQL Agent with Qwen 2.5

Category: LLM Engineering / Agents / MLOps
Author: Manuela Schrittwieser – NeuralStack | MS


The Problem: When Chatbots Can't count

General-purpose Large Language Models (LLMs) are excellent conversationalists but often terrible database administrators. If you ask a standard model like GPT-4 or Llama 3 to "Count the active users," it might generate syntactically perfect SQL. However, without strict constraints, it frequently hallucinates schema, inventing columns user_status that don't exist, or provides a Markdown code block that requires manual copy-pasting to execute.

For my latest project, I wanted to move beyond simple "Text-to-SQL" generation. I wanted to build an Autonomous Agent: a system that doesn't just write code but executes it against a live database to return actual data.

In this article, I’ll walk through how I fine-tuned a lightweight Qwen 2.5 (1.5B) model using QLoRA, transitioned the workflow from experimental notebooks to a production-grade pipeline, and deployed the final agent on Hugging Face Spaces.

1. The Brain: Efficient Fine-Tuning with QLoRA

The core of the agent is the "brain"; the LLM responsible for translating natural language into SQL. I chose Qwen 2.5-1.5B-Instruct for its balance of performance and efficiency. At only 1.5 billion parameters, it is small enough to run on consumer hardware (even CPUs) while retaining strong reasoning capabilities.

To specialize the model, I utilized Quantized Low-Rank Adaptation (QLoRA). Instead of retraining the entire network, we freeze the base weights and train only a small set of adapters.

  • Dataset: b-mc2/sql-create-context. This was crucial because it pairs questions with the specific CREATE TABLE context. This forces the model to learn schema adherence rather than memorizing common column names.

  • Infrastructure: Training was performed on a single NVIDIA T4 GPU.

  • Optimization: 4-bit NormalFloat (NF4) quantization via bitsandbytes.

By the end of one epoch, the model shifted from being a "chatty" assistant to a concise SQL generator, achieving a Normalized Exact Match Accuracy of ~78%.

2. The Body: From Notebooks to Production Pipelines

A common pitfall in AI engineering is getting stuck in Jupyter Notebooks. To make this project production-ready, I refactored the codebase into a modular MLOps pipeline:

  • scripts/train.py: A CLI-configurable training script that handles data loading, tokenization, and W&B logging.

  • scripts/evaluate.py: An automated testing suite that normalizes SQL queries (ignoring whitespace/capitalization) to score model accuracy.

  • scripts/deploy.py: A CI/CD utility to automate the upload of adapters and merged models to the Hugging Face Hub.

This structure allows for reproducible runs where hyperparameters (batch size, learning rate) are modified via command-line arguments rather than editing code cells.

3. The Agent: Closing the Loop

The true value of this project lies in the Autonomous Agent. I implemented a Python class SQLAgent that follows a "Reason-Act-Observe" loop:

  1. Ingest: The agent receives a user prompt (e.g., "Who earns the most in Sales?").

  2. Reason: The fine-tuned Qwen model generates the SQL query based on the active schema.

  3. Act: The agent connects to a local SQLite database, creates a cursor, and executes the query.

  4. Observe: It retrieves the raw data tuples and presents them to the user.

This transforms the interaction from a passive code-generation task into a dynamic data retrieval tool.

4. Deployment & Merging

For the final deployment, I merged the LoRA adapters into the base model weights. This creates a standalone artifact (Qwen2.5-SQL-Assistant-Full) that can be loaded without specific PEFT dependencies, reducing inference latency.



Project Documentation

Autonomous SQL Agent

This section serves as the technical documentation for reproducing the SQL Assistant.

Architecture Overview

The repository is organized into distinct modules separating logic, data, and configuration:

├── agent/            # Core logic for the Autonomous Agent
├── scripts/          # MLOps pipeline (train, eval, deploy)
├── deployment/       # Gradio UI configuration for HF Spaces
└── data/             # Synthetic databases for local testing

1. Setup & Installation

Prerequisites: Python 3.10+, CUDA-enabled GPU (for training).

# Clone the repository
git clone https://github.com/MANU-de/Autonomous-SQL-Agent.git
cd Autonomous SQL Agent

# Install dependencies
pip install -r requirements.txt

To enable experiment tracking and model uploading, authenticate with your keys:

wandb login
huggingface-cli login

2. Running the Agent Locally

To interact with the agent using your command line, you first need to generate the dummy data and then launch the inference script.

# 1. Generate the SQLite database (dummy_database.db)
python scripts/setup_db.py

# 2. Launch the Agent
python agent/run_agent.py --adapter "manuelaschrittwieser/Qwen2.5-1.5B-SQL-Assistant-Prod"

Example Interaction:

User: Show me all employees in the Engineering department.
Agent (Thought): SELECT name FROM employees WHERE department = 'Engineering'
Agent (Result): [('Bob Jones',), ('Diana Prince',)]

3. Reproducing the Training

To fine-tune your own version of the model, utilize the train.py script. The configuration is handled via CLI arguments.

python scripts/train.py \
    --model_name "Qwen/Qwen2.5-1.5B-Instruct" \
    --output_dir "./outputs/v2" \
    --epochs 1 \
    --batch_size 4 \
    --lr 2e-4

4. Evaluation

We evaluate the model using Normalized Exact Match. This compares the generated SQL against the ground truth after removing formatting differences.

python scripts/evaluate.py --adapter_path "./outputs/v2"

5. Deployment (Web UI)

The web interface provided in the demo uses Gradio. You can run this interface locally before deploying to the cloud.

# Install lightweight inference dependencies
pip install -r deployment/requirements.txt

# Run the UI
python deployment/app.py

Access the UI at http://127.0.0.1:7860


Conclusion: The Future is Specialized and Autonomous

The era of relying solely on massive, trillion-parameter models for every possible task is coming to an end. This project demonstrates that a specialized 1.5B parameter model, when coupled with a robust agentic architecture, can rival generalist giants in specific domains like data retrieval at a fraction of the inference cost.

By shifting our focus from simple text generation to autonomous execution and from monolithic notebooks to modular engineering pipelines, we unlock the true potential of AI application development. The path forward isn't just about bigger models but about smarter, well-architected agents that can trustfully interact with our systems.

I invite you to clone the repository, explore the code, and start building your own specialized agents today.


More from this blog

N

NeuralStack | MS

30 posts

NeuralStack | MS is your go-to source for AI Full-Stack Development and Agentic AI Insights. I share Article, Trends and practical Tips to help Developers build smart, scalable AI Systems.