LaravelRAGChatbot

Membangun Chatbot AI di Laravel Menggunakan RAG, Ollama, dan Llama 3

Oleh Aditya Nursyahbani·11 Mei 2026·5 menit baca·95

Building the AI Chat Endpoint

Now we are ready to connect:

Retrieval system
Vector search
AI model
Prompt engineering
Chat API

This transforms Laravel into a real AI application.

Creating Chat Service

php
<?php

namespace App\Services;

use Illuminate\Support\Facades\Http;

class ChatService
{
    public function ask(string $question): string
    {
        $documents = app(RetrievalService::class)
            ->search($question);

        $context = collect($documents)
            ->pluck('content')
            ->join("\n\n");

        $prompt = "
        You are a helpful AI assistant.

        Use only the provided context.

        Context:
        {$context}

        Question:
        {$question}
        ";

        $response = Http::post(
            'http://localhost:11434/api/generate',
            [
                'model' => 'llama3',
                'prompt' => $prompt,
                'stream' => false,
            ]
        );

        return $response->json()['response'];
    }
}

Creating Chat API

Routes:

php
<?php

use App\Http\Controllers\ChatController;

Route::post('/chat', ChatController::class);

Controller:

php
<?php

namespace App\Http\Controllers;

use App\Services\ChatService;
use Illuminate\Http\Request;

class ChatController extends Controller
{
    public function __invoke(Request $request)
    {
        $answer = app(ChatService::class)
            ->ask($request->message);

        return response()->json([
            'answer' => $answer,
        ]);
    }
}

Testing the Chatbot

Request:

json
{
  "message": "How does Laravel queue work?"
}

Response:

json
{
  "answer": "Laravel queues allow background job processing..."
}

Streaming Responses

Streaming makes AI feel significantly faster.

Example:

text
ChatGPT-style typing effect

Ollama supports streaming.

php
<?php

Http::withOptions([
    'stream' => true,
]);

Queueing Embedding Jobs

Embedding generation can be expensive.

Use queues.

Create job:

bash
php artisan make:job GenerateEmbeddingJob

Example:

php
<?php

class GenerateEmbeddingJob implements ShouldQueue
{
    public function handle(): void
    {
        // Generate embeddings
    }
}

Recommended Production Architecture

Production-ready architecture:

text
Nginx
  ↓
Laravel API
  ↓
Redis Queue
  ↓
Embedding Workers
  ↓
PostgreSQL pgvector
  ↓
Ollama GPU Server

Recommended Open Source Models

Purpose	Model
Chat	Llama 3
Embeddings	nomic-embed-text
Fast chat	Phi-3
Coding	DeepSeek Coder
Multilingual	Qwen

Scaling Strategies

As your dataset grows:

Use chunking
Add caching
Use Redis
Use background workers
Optimize vector indexes
Separate AI servers
Add GPU acceleration

Common RAG Problems

Problem	Solution
Hallucinations	Better prompts
Slow search	Vector indexes
Poor retrieval	Better chunking
High latency	Caching
Expensive inference	Quantized models

Final Thoughts

Laravel is fully capable of powering modern AI applications.

With:

RAG
Vector search
Open source LLMs
Semantic retrieval
Local AI infrastructure

You can build:

AI assistants
Internal company chatbots
Knowledge bases
AI customer support
AI search engines
Document intelligence systems

without relying entirely on expensive external AI APIs.

This architecture also gives:

Better privacy
Lower costs
Full control
Offline capability
Custom domain knowledge

The future of Laravel applications is not only CRUD anymore.

It is AI-powered software.

Kembali ke Berita