PromptHub Blog: RecPrompt: A Prompt Engineering Framework for LLM Recommendations (2024)

Hey guys, how's it going? Dan here, co-founder of PromptHub, here to talk today a little bit about recommendation systems and how you can use prompt engineering to build something really quickly. The basis for what we'll be looking at today is a framework called RecPrompt from a recent research paper. It is essentially a prompt engineering framework using multiple LLMs to create news recommendations, but the recommendations could be applied to any type of entity, whether that's restaurants, music, books, etc. Shout out to the research team that put this together; it’s an insightful piece of work, and we'll jump right in.

There are a few components to this prompt engineering framework: a prompt optimizer, a recommender, and a monitor. The optimizer optimizes the prompt using LLMs, or in this case, they also tried manually optimizing. The recommender is the component that actually makes the recommendations, and the monitor keeps track of all the different recommendations made and evaluates them against certain metrics to see how they perform quantitatively.

Here is the general flow; it might look a little complicated, but we can break it down pretty simply. We have the optimizer, the recommender, and the monitor. The recommender is the one that generates the news recommendations. That, along with an initial prompt template, goes into the prompt optimizer. You start with some initial prompt template—even on the first run, it's just something basic that says, "Based on the user's news history," and so on. That, plus some of the recommendations, goes into the prompt optimizer. The prompt optimizer has a system instruction that is static or frozen, and that, with the template plus the examples, are sent to the optimizer along with a meta prompt. This is the prompt that tells it, "Hey, based on the observations, enhance the prompt." All these different things get packaged up and sent to the prompt optimizer.

On the other side, the prompt optimizer outputs a prompt including some of these samples and any updates the LLM made. That becomes what gets sent to the monitor and the user, and everything is tracked through the monitor. The monitor measures and records the effectiveness of these newly generated recommendations across a couple of different sets of evaluation metrics, which are represented by the gray letters here. It's not super important right now; we'll go further into the evaluation set a little bit more. The refined prompt goes to the recommender, and everything is tracked through the monitor. So it's a little complicated, but also not too complicated.

Here’s a closer look at the system instruction message from the last graphic, put up a little closer here so you can get a better idea. It's pretty straightforward; nothing too crazy to write home about. Here's that initial prompt—the first initial prompt used that would be fed into the prompt optimizer. You see we're inputting a lot of variables here. They break it up using headings and markdown. They take input, there's history, and here's the candidate news (the potential news to recommend). There are slots for all these variables to get filled in. We turned this into a template that you can use directly in PromptHub, so you can grab it, and we'll link it below as well.

We looked at the optimizer before, and in that case, it was an LLM-based optimization. We’re basically grabbing all these things—the recommendations, the initial prompt template, the system instruction—and feeding it all into an LLM to get an enhanced prompt on the other side. The researchers also tested manually updating the prompts that were eventually used to make the recommendations. They created a situation where they could compare two prompts: the ones being manually created and the ones being optimized by the LLM. We've written a lot about using LLMs to optimize LLMs, and you can read about it all on our blog. Our general position is that we don't think just using an LLM to make your prompt better gets you the best results. We find that it's through iteration and some human intervention, plus using LLMs for some part of it, that you get the best results.

They ran a bunch of experiments testing these different types of methods. They got a new dataset from Microsoft and used GPT-3.5 and 4. They tested both news recommendation methods (which are just kind of classic recommendation methods) and these are random, most-pop, and topic-pop methods. Basically, random selects ones randomly, most-pop looks at the most popular based on the total number of aggregate views across the dataset, and topic-pop is more related to the specific user based on their browser history. These are not LLM-based methods but still recommendation methods. They also tested a bunch of deep neural methods. I won't go too deep into all of them; these are just well-known deep neural models for making recommendation systems.

So we have the simple recommendations, the deep neural ones, and the LLM-based ones. Within the LLM ones, we have the handcrafted prompt versus the LLM-optimized prompt, which is that bottom one. What are we looking at here? We can see topic-pop is the top performer in the first group of methods, which makes sense because topic-pop focuses on the user's browsing history more so than the global dataset. All the deep neural models outperform the top method here, and coming down to the LLM-based ones, there are interesting takeaways.

The initial prompt with GPT-4 outperforms most of the neural models. The initial prompt is just running that prompt we looked at earlier, not doing any further optimization or anything too crazy. We can see just using a really strong model like GPT-4 outperforms most of these methods but not all. There's a clear pattern that an LLM-generated prompt tends to perform better than a handcrafted prompt, which tends to perform better than just an initial static prompt. They’re close, though, and that's important to keep in mind. We'll go deeper into the percentage differences between these different LLM prompt methods. We see a clear, big distinction between GPT-4 and 3.5, which is expected. The only LLM-based recommendation method that beats all the neural models is the last row here, having the LLM generate the prompt using GPT-4.

Now, for my favorite part of this paper, digging into the different trade-offs between the different LLM processes. If we hyperfocus on just GPT-3.5 and 4 and pick one dataset, this pattern is pretty similar across all of them. If we look at the initial prompt, which is just a static prompt, nothing crazy going on, the handcrafted prompt is using the whole framework but having a human update the prompt based on performance, and then LLM-generated is that automated process we looked at earlier. We can see from the initial prompt to the handcrafted prompt there's a 1% gain here, a 3% gain there, not crazy. You could argue that the time it would take to implement the framework may not be worth it for that relatively small gain, but it depends on your use case. Then jumping from the initial prompt to automating the whole process using an LLM-generated prompt, you get a 6%, 4%, or 5% gain on average. Again, this could be significant or insignificant, and setting up the framework might be straightforward or more challenging, depending on your situation.

When reading these papers, consider your situation, how much engineering power you have, how easy it is to set up these things, and understand that the underlying base models are very strong. Having just a good initial prompt can get you far. Here's what that initial prompt looks like—pretty straightforward. You're pulling in some information and not doing anything too crazy. It all comes down to whether that percentage difference makes a big difference. If you have something at scale, it probably does; if this is just a proof of concept, the initial prompt might be good enough.

You can try this in PromptHub today. We'll have a link to it below as well. Happy prompting, and let me know what you think. If you implement it, feel free to drop us a message or comment below. See you!

PromptHub Blog: RecPrompt: A Prompt Engineering Framework for LLM Recommendations (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Margart Wisoky

Last Updated:

Views: 5621

Rating: 4.8 / 5 (78 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Margart Wisoky

Birthday: 1993-05-13

Address: 2113 Abernathy Knoll, New Tamerafurt, CT 66893-2169

Phone: +25815234346805

Job: Central Developer

Hobby: Machining, Pottery, Rafting, Cosplaying, Jogging, Taekwondo, Scouting

Introduction: My name is Margart Wisoky, I am a gorgeous, shiny, successful, beautiful, adventurous, excited, pleasant person who loves writing and wants to share my knowledge and understanding with you.