How to import custom data/documents and ask questions based on those data ? #132

revskill10 · 2023-04-30T17:02:37Z

Sorry, i'm new in the ChatAI models.

My questions is, if i have a bunch of markdown documents, could i import those documents to ask questions based on this repository and chatbot ui ?

mudler · 2023-05-16T08:20:44Z

Hi @revskill10 👋

We have two full e2e example here now: https://github.com/go-skynet/LocalAI/tree/master/examples/langchain-chroma and https://github.com/go-skynet/LocalAI/tree/master/examples/query_data and a blog post over here: https://mudler.pm/posts/localai-question-answering/

Cheers!

alithechemist · 2023-07-24T11:29:27Z

Hi, I know this is closed but i think it's the right place to show what happens when i try the query-data example. Sorry in advance if i misunderstood something.

wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j
--2023-07-24 10:51:08--  https://huggingface.co/skeskinen/ggml/resolve/main/all-MiniLM-L6-v2/ggml-model-q4_0.bin
Resolving huggingface.co (huggingface.co)... 108.138.51.8
Connecting to huggingface.co (huggingface.co)|108.138.51.8|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/c7/0e/c70e3621b6763f59392113dce77884a660afc618241b9f2a9497a40c86b84511/a5a174d8772c8a569faf9f3136c441f2c3855b5bf35ed32274294219533feaad?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27ggml-model-q4_0.bin%3B+filename%3D%22ggml-model-q4_0.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1690455069&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY5MDQ1NTA2OX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9jNy8wZS9jNzBlMzYyMWI2NzYzZjU5MzkyMTEzZGNlNzc4ODRhNjYwYWZjNjE4MjQxYjlmMmE5NDk3YTQwYzg2Yjg0NTExL2E1YTE3NGQ4NzcyYzhhNTY5ZmFmOWYzMTM2YzQ0MWYyYzM4NTViNWJmMzVlZDMyMjc0Mjk0MjE5NTMzZmVhYWQ%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=W9me7Xle49TUswYboIoXK%7EHq7bzctGF7ZIHSLZnW07-ORddcjl0Zyjd7Nyjavs8Ev9Ws-faeIu%7EcG8t6weFYaLcEoawqrlMbV2o5kzjxxQCiCfYKHtx1LTnBAsGp4AiivgGu83f3yC%7EMgU7Rw6s%7EG4bW%7E3lA7cPnjQ3a2agZM3VzlxVtFb3QiPo3jwX%7EL%7E6QKA0NW7U83wcjRhkJlTB7lBTnJI0KTku6hTJlid6VSiR4plcij6audjKQxvd53Gh7SToWctV9TXMyjqKnnZ-YZsUrx1UWFYhllTFSMvHEWCNoSOPbB1GCDRJyPLRMR1vfoUkWyjN0c-Odg6ae5yqnhA__&Key-Pair-Id=KVTP0A1DKRTAX [following]
--2023-07-24 10:51:09--  https://cdn-lfs.huggingface.co/repos/c7/0e/c70e3621b6763f59392113dce77884a660afc618241b9f2a9497a40c86b84511/a5a174d8772c8a569faf9f3136c441f2c3855b5bf35ed32274294219533feaad?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27ggml-model-q4_0.bin%3B+filename%3D%22ggml-model-q4_0.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1690455069&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY5MDQ1NTA2OX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9jNy8wZS9jNzBlMzYyMWI2NzYzZjU5MzkyMTEzZGNlNzc4ODRhNjYwYWZjNjE4MjQxYjlmMmE5NDk3YTQwYzg2Yjg0NTExL2E1YTE3NGQ4NzcyYzhhNTY5ZmFmOWYzMTM2YzQ0MWYyYzM4NTViNWJmMzVlZDMyMjc0Mjk0MjE5NTMzZmVhYWQ%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=W9me7Xle49TUswYboIoXK%7EHq7bzctGF7ZIHSLZnW07-ORddcjl0Zyjd7Nyjavs8Ev9Ws-faeIu%7EcG8t6weFYaLcEoawqrlMbV2o5kzjxxQCiCfYKHtx1LTnBAsGp4AiivgGu83f3yC%7EMgU7Rw6s%7EG4bW%7E3lA7cPnjQ3a2agZM3VzlxVtFb3QiPo3jwX%7EL%7E6QKA0NW7U83wcjRhkJlTB7lBTnJI0KTku6hTJlid6VSiR4plcij6audjKQxvd53Gh7SToWctV9TXMyjqKnnZ-YZsUrx1UWFYhllTFSMvHEWCNoSOPbB1GCDRJyPLRMR1vfoUkWyjN0c-Odg6ae5yqnhA__&Key-Pair-Id=KVTP0A1DKRTAX
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 18.66.147.13
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|18.66.147.13|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14536015 (14M) [application/octet-stream]
Saving to: ‘models/bert’

models/bert                            100%[============================================================================>]  13.86M  10.1MB/s    in 1.4s    

2023-07-24 10:51:11 (10.1 MB/s) - ‘models/bert’ saved [14536015/14536015]

--2023-07-24 10:51:11--  https://gpt4all.io/models/ggml-gpt4all-j.bin
Resolving gpt4all.io (gpt4all.io)... 104.26.0.159
Connecting to gpt4all.io (gpt4all.io)|104.26.0.159|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3785248281 (3.5G)
Saving to: ‘models/ggml-gpt4all-j’

models/ggml-gpt4all-j                  100%[============================================================================>]   3.52G  10.2MB/s    in 5m 42s  

2023-07-24 10:56:54 (10.5 MB/s) - ‘models/ggml-gpt4all-j’ saved [3785248281/3785248281]

Run the container:

ERROR: Couldn't find env file: /home/g/LocalAI/examples/query_data/.env

So i fixed the env issue in docker-compose:

version: '3.6'

services:
  api:
    image: quay.io/go-skynet/local-ai:latest
    build:
      context: ../../
      dockerfile: Dockerfile
    ports:
      - 8080:8080
    env_file:
      - ../../.env
    volumes:
      - ../../models:/models:cached
    command: ["/usr/bin/local-ai"]

This way the container runs. But then when i load a document in the data directory and try to run store.py i encounter another problem

Traceback (most recent call last):
  File "/home/g/LocalAI/examples/query_data/store.py", line 6, in <module>
    from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, LLMPredictor, PromptHelper, ServiceContext
ModuleNotFoundError: No module named 'llama_index'

So i had to pip install:

pip3 install openllama
pip3 install llama-index

And try again:

export OPENAI_API_BASE=http://localhost:8080/v1
export OPENAI_API_KEY=sk-

python3 store.py
/home/g/.local/lib/python3.10/site-packages/langchain/llms/openai.py:172: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`
  warnings.warn(
/home/g/.local/lib/python3.10/site-packages/langchain/llms/openai.py:750: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`
  warnings.warn(
Traceback (most recent call last):
  File "/home/g/LocalAI/examples/query_data/store.py", line 20, in <module>
    prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
  File "/home/g/.local/lib/python3.10/site-packages/llama_index/indices/prompt_helper.py", line 65, in __init__
    raise ValueError("chunk_overlap_ratio must be a float between 0. and 1.")
ValueError: chunk_overlap_ratio must be a float between 0. and 1.

So i git pull, just in case I'm missing on recent changes, but the result did not change.

Any advise? the document is a single PDF file...

Tx!

quoing · 2023-09-02T15:49:39Z

ValueError: chunk_overlap_ratio must be a float between 0. and 1.

just edit the code, change chunk_overlap_ratio to something between 0 and 1.. I suppose this was changed from 0..100 to 0..1 recently..

then you probably have to "pip install sentence_transformers"

mudler closed this as completed May 16, 2023

quoing mentioned this issue Sep 3, 2023

[query_data example] max_chunk_overlap in PromptHelper must be in 0..1 range #1000

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to import custom data/documents and ask questions based on those data ? #132

How to import custom data/documents and ask questions based on those data ? #132

revskill10 commented Apr 30, 2023

mudler commented May 16, 2023 •

edited

Loading

alithechemist commented Jul 24, 2023

quoing commented Sep 2, 2023

How to import custom data/documents and ask questions based on those data ? #132

How to import custom data/documents and ask questions based on those data ? #132

Comments

revskill10 commented Apr 30, 2023

mudler commented May 16, 2023 • edited Loading

alithechemist commented Jul 24, 2023

quoing commented Sep 2, 2023

mudler commented May 16, 2023 •

edited

Loading