Laravel RAG System in 4 Steps!

Alfred Nutile
7 min readJun 24, 2024

--

Simple Laravel RAG System

This post will show how easy it is to get going with Laravel, Vectorized data and LLM chat. It can be the foundation to a RAG system. There are links to the code and more.

Retrieval augmented generation system — an architectural approach that can improve the efficacy of large language model (LLM) applications

The Repo is here https://github.com/alnutile/laravelrag/tree/main. The main branch will get you going on a fresh install of Laravel. If you copy the .env.example to .env you can get started just follow along.

Follow Along or Watch the Video COMING SOON!

Make sure to follow me on YouTube https://youtube.com/@alfrednutile?si=M6jhYvFWK1YI1hK9

Step 1 Setting up Vector

NOTE: Each pull do composer install. If that is not enough run composer dump

You can see the Branch here https://github.com/alnutile/laravelrag/tree/vector_setup

Once you have Laravel Setup using HERD, DBEngine, or the PostGres app. Then using TablePlus or Command Line or whatever make the database in this example “laravelrag”

Now we want to are going to install this library which will setup the Vector Extension for us.

composer require pgvector/pgvector
php artisan vendor:publish --tag="pgvector-migrations"
php artisan migrate

Step 2 Now for the Model

The Branch is https://github.com/alnutile/laravelrag/tree/model_setup

We will keep it simple here and have a model named Chunk

could not resist

This will be where we store “chunks” of a document. In our example we will chunk up a long Text document so we can keep it simple for now. But in the end all things become text! PDF, PPT, Docx etc.

You will see in the code it is not about pages as much as chunks that are x size with x overlap of content.

In this code we will default to 600 characters chunks with an bit of an overlap you can see the code here.

Step 3 Add the LLM Drivers

Repo Link is https://github.com/alnutile/laravelrag/tree/llm_driver

NOTE: Each pull do composer install. If that is not enough run composer dump

We bring in my LLMDriver folder which is not an official package (sorry just too lazy) and then some other libraries


composer require spatie/laravel-data laravel/pennant

I am going to use my LLM Driver for this and then plug in Ollama and later Claude.

But first get Ollama going on your machine read about it here.

We are going to pull llama3 and mxbai-embed-large (for embedding data)

Or just use your API creds for OpenAi it should make sense when you see the code and the config file “config/llmdriver.php”

Just set the Key value in your .env or checkout `config/llmdriver.php` for more options.

LLM_DRIVER=ollama

Now let’s open TinkerWell (I want to avoid coding a UI so we can focus on the concept more) https://tinkerwell.app/

Load up the Provider `bootstrap/providers.php`

<?php

return [
App\Providers\AppServiceProvider::class,
\App\Services\LlmServices\LlmServiceProvider::class,
];

Ok so we see it is working now lets chunk a document.

NOTE: This ideally all would run in Horizon or Queue jobs to deal with a ton of details like timeouts and more. We will see what happens if we just go at it this way for this demo.

Also keep an eye on the tests folders I have some “good” examples on how to test your LLM centric applications like `tests/Feature/ChunkTextTest.php`

Ok now we run the command to Embed the data

And now we have a ton of chunked data!

The columns are for the different size embeddings depending on the embed models you are using. I got some feedback here and went the route you see above.

Now lets chat with the data!

Step 4 Chatting with your Data

Ok we want the user to ask the LLM a question, but the LLM needs “context” and a Prompt that reduces drift, that is when an LLM makes up answers. I have seen it reduced to 100% in these systems.

First let’s vectorize the input so we can search for related data.

Since we embedded the data the question then gets embedded or vectorized to then use it to do the search.

Embed the Question

So we take the text question and pass it to the embed api (Ollama and OpenAi offer this)

Here is the code so you can see how simple it really is with HTTP.

Embed Data

You will see I use Laravel Data from Spatie so not matter the LLM service it is always the same type of date in and out!

Now we use the Distance query to do a few things lets break it down

Distance Query from the Outside

We take the results of the embedData gives us and pass it into the query using Vector to format it Pgvector\Laravel\Vector library:


use Pgvector\Laravel\Vector;
new Vector($value);

Then we use that in the distance query

use of vector

I used Cosine since I feel the results had been a bit better. Why? I did a bit of ChatGPT work to decide which one and why. Here are some results:

The order of effectiveness for similarity metrics can vary depending on the nature of the data and the specific use case. However, here’s a general guideline based on common scenarios:

1. **Cosine Similarity**: Cosine similarity is often considered one of the most effective metrics for measuring similarity between documents, especially when dealing with high-dimensional data like text documents. It’s robust to differences in document length and is effective at capturing semantic similarity.

2. **Inner Product**: Inner product similarity is another metric that can be effective, particularly for certain types of data. It measures the alignment between vectors, which can be useful in contexts where the direction of the vectors is important.

3. **L2 (Euclidean) Distance**: L2 distance is a straightforward metric that measures the straight-line distance between vectors. While it’s commonly used and easy to understand, it may not always be the most effective for capturing complex relationships between documents, especially in high-dimensional spaces.

In summary, the order of effectiveness is typically Cosine Similarity > Inner Product > L2 Distance. However, it’s important to consider the specific characteristics of your data and experiment with different metrics to determine which one works best for your particular application.

Ok back to the example. So now we have our question vectorized and we have search results. The code also takes a moment to knit back the chunks with siblings so instead of getting just the chunk we get the chunk before and after. https://github.com/alnutile/laravelrag/blob/chat_with_data/app/Services/LlmServices/DistanceQuery.php#L33

Now that we have the results we are going to build a prompt. This is tricky since it takes time to get it right so you might want to pull it into ChatGPT or Ollama and mess around a bit. The key here is setting the temperature to 0 to keep the system from drifting. That is not easy yet in Ollama https://github.com/ollama/ollama/issues/2505

Ok let’s break this down.

Item 1 the Prompt

Here we define a Role, Task and Format (JSON, Markdown, Table etc) checkout https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/ for some tips!

Item 2 the Input

Here we pass the original text question to help the LLM understand the users request

Garbage in Garbage Out

Item 3 the Context

Context, this is key. “Garbage in Garbage Out” is still the rule here as well. Put good data into your RAG system. In this example I imported some Laravel docs. But this is the data from the Distance Query!

Item 4 LLM

We are just doing a completion api here. This is not a “Chat” which is an array of questions and answers but that would work as well. This is just the Prompt we built passed in. I am using Claude driver to show how easily we can switch systems. Also I feel like Ollama, unless you set the temperature, is a bit trickier right now to keep on track. And Claude is FAST!

Item 5 The Answer

As seen above you can get a sense of the answer but just want to share that sometimes the LLM will point out (see below) that it does not have enough data to answer your question.

I removed a #4 from a previous edit but this is the first answer until I cleaned up some data

Here is another example I like to share with people

It will not hallucinate!

Just more “evidence” of what a good RAG system can do.

Wrapping it up

That is really how “easy” it is to get a RAG system going. LaraLlama.io has a ton more details that you can see but this is a very simple code base I share in this article.

The next post will be tools/functions, extending this code further. But there are so many ways to use this in applications, I list a bunch of use cases here https://docs.larallama.io/use-cases.html

The code is all here https://github.com/alnutile/laravelrag you can work through the branches with the last one being https://github.com/alnutile/laravelrag/tree/chat_with_data

Make sure to follow me on YouTube https://youtube.com/@alfrednutile?si=M6jhYvFWK1YI1hK9

And the list below or more ways to stay in touch!

📺 YouTube Channel — https://youtube.com/@alfrednutile?si=M6jhYvFWK1YI1hK9

📖 The Docs — https://docs.larallama.io/

🚀 The Site — https://www.larallama.io

🫶🏻 https://patreon.com/larallama

🧑🏻‍💻 The Code — https://github.com/LlmLaraHub/laralamma
n
📰 The NewsLetter — https://sundance-solutions.mailcoach.app/larallama-app

🖊️ Medium — https://medium.com/@alnutile

🤝🏻 LinkedIn — https://www.linkedin.com/in/alfrednutile/

📺 YouTube Playlist — https://www.youtube.com/watch?v=KM7AyRHx0jQ&list=PLL8JVuiFkO9I1pGpOfrl-A8-09xut-fDq

💬 Discussions — https://github.com/orgs/LlmLaraHub/discussions

--

--

Alfred Nutile
Alfred Nutile

Written by Alfred Nutile

DailyAi.Studio - Ai & Automations for any size company. Consulting and Training.

No responses yet