Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to import custom data/documents and ask questions based on those data ? #132

Closed
revskill10 opened this issue Apr 30, 2023 · 3 comments
Closed

Comments

@revskill10
Copy link

Sorry, i'm new in the ChatAI models.

My questions is, if i have a bunch of markdown documents, could i import those documents to ask questions based on this repository and chatbot ui ?

@mudler
Copy link
Owner

mudler commented May 16, 2023

@mudler mudler closed this as completed May 16, 2023
@alithechemist
Copy link

Hi, I know this is closed but i think it's the right place to show what happens when i try the query-data example. Sorry in advance if i misunderstood something.

wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j
--2023-07-24 10:51:08--  https://huggingface.co/skeskinen/ggml/resolve/main/all-MiniLM-L6-v2/ggml-model-q4_0.bin
Resolving huggingface.co (huggingface.co)... 108.138.51.8
Connecting to huggingface.co (huggingface.co)|108.138.51.8|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/c7/0e/c70e3621b6763f59392113dce77884a660afc618241b9f2a9497a40c86b84511/a5a174d8772c8a569faf9f3136c441f2c3855b5bf35ed32274294219533feaad?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27ggml-model-q4_0.bin%3B+filename%3D%22ggml-model-q4_0.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1690455069&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY5MDQ1NTA2OX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9jNy8wZS9jNzBlMzYyMWI2NzYzZjU5MzkyMTEzZGNlNzc4ODRhNjYwYWZjNjE4MjQxYjlmMmE5NDk3YTQwYzg2Yjg0NTExL2E1YTE3NGQ4NzcyYzhhNTY5ZmFmOWYzMTM2YzQ0MWYyYzM4NTViNWJmMzVlZDMyMjc0Mjk0MjE5NTMzZmVhYWQ%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=W9me7Xle49TUswYboIoXK%7EHq7bzctGF7ZIHSLZnW07-ORddcjl0Zyjd7Nyjavs8Ev9Ws-faeIu%7EcG8t6weFYaLcEoawqrlMbV2o5kzjxxQCiCfYKHtx1LTnBAsGp4AiivgGu83f3yC%7EMgU7Rw6s%7EG4bW%7E3lA7cPnjQ3a2agZM3VzlxVtFb3QiPo3jwX%7EL%7E6QKA0NW7U83wcjRhkJlTB7lBTnJI0KTku6hTJlid6VSiR4plcij6audjKQxvd53Gh7SToWctV9TXMyjqKnnZ-YZsUrx1UWFYhllTFSMvHEWCNoSOPbB1GCDRJyPLRMR1vfoUkWyjN0c-Odg6ae5yqnhA__&Key-Pair-Id=KVTP0A1DKRTAX [following]
--2023-07-24 10:51:09--  https://cdn-lfs.huggingface.co/repos/c7/0e/c70e3621b6763f59392113dce77884a660afc618241b9f2a9497a40c86b84511/a5a174d8772c8a569faf9f3136c441f2c3855b5bf35ed32274294219533feaad?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27ggml-model-q4_0.bin%3B+filename%3D%22ggml-model-q4_0.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1690455069&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY5MDQ1NTA2OX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9jNy8wZS9jNzBlMzYyMWI2NzYzZjU5MzkyMTEzZGNlNzc4ODRhNjYwYWZjNjE4MjQxYjlmMmE5NDk3YTQwYzg2Yjg0NTExL2E1YTE3NGQ4NzcyYzhhNTY5ZmFmOWYzMTM2YzQ0MWYyYzM4NTViNWJmMzVlZDMyMjc0Mjk0MjE5NTMzZmVhYWQ%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=W9me7Xle49TUswYboIoXK%7EHq7bzctGF7ZIHSLZnW07-ORddcjl0Zyjd7Nyjavs8Ev9Ws-faeIu%7EcG8t6weFYaLcEoawqrlMbV2o5kzjxxQCiCfYKHtx1LTnBAsGp4AiivgGu83f3yC%7EMgU7Rw6s%7EG4bW%7E3lA7cPnjQ3a2agZM3VzlxVtFb3QiPo3jwX%7EL%7E6QKA0NW7U83wcjRhkJlTB7lBTnJI0KTku6hTJlid6VSiR4plcij6audjKQxvd53Gh7SToWctV9TXMyjqKnnZ-YZsUrx1UWFYhllTFSMvHEWCNoSOPbB1GCDRJyPLRMR1vfoUkWyjN0c-Odg6ae5yqnhA__&Key-Pair-Id=KVTP0A1DKRTAX
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 18.66.147.13
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|18.66.147.13|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14536015 (14M) [application/octet-stream]
Saving to: ‘models/bert’

models/bert                            100%[============================================================================>]  13.86M  10.1MB/s    in 1.4s    

2023-07-24 10:51:11 (10.1 MB/s) - ‘models/bert’ saved [14536015/14536015]

--2023-07-24 10:51:11--  https://gpt4all.io/models/ggml-gpt4all-j.bin
Resolving gpt4all.io (gpt4all.io)... 104.26.0.159
Connecting to gpt4all.io (gpt4all.io)|104.26.0.159|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3785248281 (3.5G)
Saving to: ‘models/ggml-gpt4all-j’

models/ggml-gpt4all-j                  100%[============================================================================>]   3.52G  10.2MB/s    in 5m 42s  

2023-07-24 10:56:54 (10.5 MB/s) - ‘models/ggml-gpt4all-j’ saved [3785248281/3785248281]

Run the container:

ERROR: Couldn't find env file: /home/g/LocalAI/examples/query_data/.env

So i fixed the env issue in docker-compose:

version: '3.6'

services:
  api:
    image: quay.io/go-skynet/local-ai:latest
    build:
      context: ../../
      dockerfile: Dockerfile
    ports:
      - 8080:8080
    env_file:
      - ../../.env
    volumes:
      - ../../models:/models:cached
    command: ["/usr/bin/local-ai"]

This way the container runs. But then when i load a document in the data directory and try to run store.py i encounter another problem

Traceback (most recent call last):
  File "/home/g/LocalAI/examples/query_data/store.py", line 6, in <module>
    from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, LLMPredictor, PromptHelper, ServiceContext
ModuleNotFoundError: No module named 'llama_index'

So i had to pip install:

pip3 install openllama
pip3 install llama-index

And try again:

export OPENAI_API_BASE=http://localhost:8080/v1
export OPENAI_API_KEY=sk-

python3 store.py
/home/g/.local/lib/python3.10/site-packages/langchain/llms/openai.py:172: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`
  warnings.warn(
/home/g/.local/lib/python3.10/site-packages/langchain/llms/openai.py:750: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`
  warnings.warn(
Traceback (most recent call last):
  File "/home/g/LocalAI/examples/query_data/store.py", line 20, in <module>
    prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
  File "/home/g/.local/lib/python3.10/site-packages/llama_index/indices/prompt_helper.py", line 65, in __init__
    raise ValueError("chunk_overlap_ratio must be a float between 0. and 1.")
ValueError: chunk_overlap_ratio must be a float between 0. and 1.

So i git pull, just in case I'm missing on recent changes, but the result did not change.

Any advise? the document is a single PDF file...

Tx!

@quoing
Copy link
Contributor

quoing commented Sep 2, 2023

ValueError: chunk_overlap_ratio must be a float between 0. and 1.

just edit the code, change chunk_overlap_ratio to something between 0 and 1.. I suppose this was changed from 0..100 to 0..1 recently..

then you probably have to "pip install sentence_transformers"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants