
Building the AI Chat Endpoint
Now we are ready to connect:
- Retrieval system
- Vector search
- AI model
- Prompt engineering
- Chat API
This transforms Laravel into a real AI application.
Creating Chat Service
php<?php namespace App\Services; use Illuminate\Support\Facades\Http; class ChatService { public function ask(string $question): string { $documents = app(RetrievalService::class) ->search($question); $context = collect($documents) ->pluck('content') ->join("\n\n"); $prompt = " You are a helpful AI assistant. Use only the provided context. Context: {$context} Question: {$question} "; $response = Http::post( 'http://localhost:11434/api/generate', [ 'model' => 'llama3', 'prompt' => $prompt, 'stream' => false, ] ); return $response->json()['response']; } }
Creating Chat API
Routes:
php<?php use App\Http\Controllers\ChatController; Route::post('/chat', ChatController::class);
Controller:
php<?php namespace App\Http\Controllers; use App\Services\ChatService; use Illuminate\Http\Request; class ChatController extends Controller { public function __invoke(Request $request) { $answer = app(ChatService::class) ->ask($request->message); return response()->json([ 'answer' => $answer, ]); } }
Testing the Chatbot
Request:
json{ "message": "How does Laravel queue work?" }
Response:
json{ "answer": "Laravel queues allow background job processing..." }
Streaming Responses
Streaming makes AI feel significantly faster.
Example:
textChatGPT-style typing effect
Ollama supports streaming.
php<?php Http::withOptions([ 'stream' => true, ]);
Queueing Embedding Jobs
Embedding generation can be expensive.
Use queues.
Create job:
bashphp artisan make:job GenerateEmbeddingJob
Example:
php<?php class GenerateEmbeddingJob implements ShouldQueue { public function handle(): void { // Generate embeddings } }
Recommended Production Architecture
Production-ready architecture:
textNginx β Laravel API β Redis Queue β Embedding Workers β PostgreSQL pgvector β Ollama GPU Server
Recommended Open Source Models
| Purpose | Model |
|---|---|
| Chat | Llama 3 |
| Embeddings | nomic-embed-text |
| Fast chat | Phi-3 |
| Coding | DeepSeek Coder |
| Multilingual | Qwen |
Scaling Strategies
As your dataset grows:
- Use chunking
- Add caching
- Use Redis
- Use background workers
- Optimize vector indexes
- Separate AI servers
- Add GPU acceleration
Common RAG Problems
| Problem | Solution |
|---|---|
| Hallucinations | Better prompts |
| Slow search | Vector indexes |
| Poor retrieval | Better chunking |
| High latency | Caching |
| Expensive inference | Quantized models |
Final Thoughts
Laravel is fully capable of powering modern AI applications.
With:
- RAG
- Vector search
- Open source LLMs
- Semantic retrieval
- Local AI infrastructure
You can build:
- AI assistants
- Internal company chatbots
- Knowledge bases
- AI customer support
- AI search engines
- Document intelligence systems
without relying entirely on expensive external AI APIs.
This architecture also gives:
- Better privacy
- Lower costs
- Full control
- Offline capability
- Custom domain knowledge
The future of Laravel applications is not only CRUD anymore.
It is AI-powered software.