RAG Retrieval Relevance Score: How Should Manage That While Building My App?

RAG Retrieval Relevance Score: How Should Manage That While Building My App?

Disclosure: This post may contain affiliate links, meaning Chikara Houses get a commission if you decide to make a purchase through our links, at no cost to you. Please read our disclosure for more info. 
Content Disclaimer: The following content has been generated by a combination of human and an AI language model. Content is images, text and video (if any). Please note that while we strive to provide accurate and up-to-date information, the content generated by the AI may not always reflect the most current news, events, or developments.

RAG Retrieval Relevance Score: How Should Manage That While Building My App?

 

Let's start straight away by showing the issue here:

 

- We have asked to a rag agent llm to answer tot he question:

Will younger community assist elderly people in Japan?'

- The answer was:

"{ \"last_graph_message\": \"success_vector_retrieved_and_cached: [{'UUID': 'a80084e3-d959-4a21-a4f6-1941cb6be8a4', 'score': 0.599127516213688, 'content': 'This subreddit, r/Japan, is a community for discussing various aspects of Japan and providing information. It also serves as a guide to more specialized subreddits related to daily life, travel, or language acquisition. Users are encouraged to check the sidebar and general questions thread before posting. As with any country, Japan faces unique challenges and situations, and the subreddit aims to provide a platform for respectful and informative discussions.', 'row_data': { 'id': 'a80084e3-d959-4a21-a4f6-1941cb6be8a4', 'doc_name': 'https://www.reddit.com/r/japan/comments/1avi517/what_problems_do_you_foresee_japan_experiencing/', 'title': 'Japanese Culture & More: Your Guide to Japanese Subreddits and Our Privacy Policy', 'content': 'This subreddit, r/Japan, is a community for discussing various aspects of Japan and providing information. It also serves as a guide to more specialized subreddits related to daily life, travel, or language acquisition. Users are encouraged to check the sidebar and general questions thread before posting. As with any country, Japan faces unique challenges and situations, and the subreddit aims to provide a platform for respectful and informative discussions.' } }, { 'UUID': 'c18839ed-fcd4-4a3f-aaf0-003ecd20a424', 'score': 0.599127516213688, 'content': 'Climate change will impact Japan, but specific factors may cause unique challenges. While Japan, like many countries, faces an aging population and low birth rates, it must also address additional issues soon. These may include:\\\\n\\\\n- Increased frequency of typhoons and heavy rainfall\\\\n- Rising sea levels and coastal erosion\\\\n- More frequent and severe heatwaves\\\\n- Potential water shortages due to changing precipitation patterns\\\\n- Economic consequences from climate change, such as impacts on agriculture, fisheries, and tourism\\\\n\\\\nThese challenges require urgent attention and innovative solutions to ensure a sustainable future for Japan.', 'row_data': { 'id': 'c18839ed-fcd4-4a3f-aaf0-003ecd20a424', 'doc_name': 'https://www.reddit.com/r/japan/comments/1avi517/what_problems_do_you_foresee_japan_experiencing/', 'title': \\\"'Climate Change & Japan: Beyond Demographics - Preparing for Unique Challenges Ahead'\\\", 'content': 'Climate change will impact Japan, but specific factors may cause unique challenges. While Japan, like many countries, faces an aging population and low birth rates, it must also address additional issues soon. These may include:\\\\n\\\\n- Increased frequency of typhoons and heavy rainfall\\\\n- Rising sea levels and coastal erosion\\\\n- More frequent and severe heatwaves\\\\n- Potential water shortages due to changing precipitation patterns\\\\n- Economic consequences from climate change, such as impacts on agriculture, fisheries, and tourism\\\\n\\\\nThese challenges require urgent attention and innovative solutions to ensure a sustainable future for Japan.' } }"

 

Is there any relevance and Is there any answer to the question in this retrieved batch of data ? Or, was it a just mistake as the relevance score required initially was about 50% which can be prone to error as it is used a small 7b LLM (Mistral7B) model and we don't expect it to perform the best ?

It looks like the LLM's response is not directly answering the question about whether the younger community will assist elderly people in Japan. Instead, the retrieved data includes unrelated information, such as discussions from a Reddit thread on topics like climate change and the platform's cookie policy.

There is no clear relevance to the original question regarding intergenerational support in Japan. This might be due to the relevance score threshold being set at 50%, which can indeed result in the retrieval of loosely related or irrelevant data—especially with a smaller model like the 7B LLM. The small size of the model can also affect its ability to interpret context and retrieve precise information.

 

How to improve it?

To improve this focusing on the relevance score, what should be done:

  1. Increase the relevance score threshold to filter out unrelated results.

  2. Fine-tune the model's context parsing, if possible, by giving it more examples or using better prompt engineering.

  3. Use external knowledge bases or embeddings to supplement the smaller model and improve retrieval accuracy.

The retrieved documents seem to be noise rather than a meaningful answer.

What is good relevance score?

A good relevance score depends on the use case, but generally, here's a guideline for relevance score thresholds:

  1. 70-80% or higher: This range typically provides higher-quality and more accurate results. At this level, you can expect that most of the retrieved information will be directly relevant to the query.
  2. 60-70%: This can still be useful, but there may be some irrelevant or loosely related data included. It can work if the model has limited access to directly relevant information or if you're working with broad queries.
  3. 50-60%: This is generally too low for most tasks that require precise answers. You’re likely to retrieve results that are only marginally related or completely unrelated to the query, as in your case.

    Ideal Range

    For question answering or specific retrieval tasks (like the one you mentioned), a relevance score above 70% is generally considered more reliable. This helps ensure that the information retrieved is more focused and accurate. However, you may still want to adjust depending on how much data you have and the complexity of the task.

     

    What is @Creditizens saying about that:

    "I actually use 50% or even 40% relevance score when I start coding my RAG Agents until I have an MVP working fine. Then, I increase the percentage while adjusting the prompts and refine those in the flow of instructions. I believe that when using small LLMs like in this example... 70% is fine and maybe the maximum, I my experience around 64%. When using proprietary bigger models like Openai or Anthropic ones, 90% is a good number. It is like in statistics we want to be above 95% to validate the assumption but the question is `Are today's LLMs able to do that?`.. maybe in the future."

    Chikara Houses have some other articles about AI and IT in general, some are a collab with @Creditizens Youtube channel and have videos and example code snippets to get an idea. Continue Reading  

     

    Discover also about Chikara Houses:

    9 Rules Rooms:

    5 Needs Rooms:

     

    #RAGrelevancescore #AIdocumentretrieval #retrievalaugmentedgeneration #optimizerelevancescores

    RAG Retrieval Relevance Score: How Should Manage That While Building My App?

    Back to blog