Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONDecodeError: Expecting value: line 1 column 1 (char 0) while opening RequestQueue #968

Closed
sadaffatollahy opened this issue Feb 9, 2025 · 4 comments
Assignees
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@sadaffatollahy
Copy link

sadaffatollahy commented Feb 9, 2025

Issue description

Hi crawlee team. Thank you for the great work.

I encounter the following error while I try to run the crawler for the second time:

Traceback (most recent call last):
  File "/home/sadaf/store_crawler/stores_crawler/d/dookcollection.py", line 401, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/sadaf/store_crawler/stores_crawler/d/dookcollection.py", line 377, in main
    request_queue = await RequestQueue.open(name="dookcollection")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sadaf/store_crawler/store_crawler_venv/lib/python3.11/site-packages/crawlee/storages/_request_queue.py", line 165, in open
    return await open_storage(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/sadaf/store_crawler/store_crawler_venv/lib/python3.11/site-packages/crawlee/storages/_creation_management.py", line 170, in open_storage
    storage_info = await resource_collection_client.get_or_create(name=name)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sadaf/store_crawler/store_crawler_venv/lib/python3.11/site-packages/crawlee/storage_clients/_memory/_request_queue_collection_client.py", line 35, in get_or_create
    resource_client = await get_or_create_inner(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sadaf/store_crawler/store_crawler_venv/lib/python3.11/site-packages/crawlee/storage_clients/_memory/_creation_management.py", line 143, in get_or_create_inner
    found = find_or_create_client_by_id_or_name_inner(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sadaf/store_crawler/store_crawler_venv/lib/python3.11/site-packages/crawlee/storage_clients/_memory/_creation_management.py", line 102, in find_or_create_client_by_id_or_name_inner
    storage_path = _determine_storage_path(resource_client_class, memory_storage_client, id, name)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sadaf/store_crawler/store_crawler_venv/lib/python3.11/site-packages/crawlee/storage_clients/_memory/_creation_management.py", line 412, in _determine_storage_path
    metadata = json.load(metadata_file)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 293, in load
    return loads(fp.read(),
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I removed the related directory in the storage/request_queues and re-ran it but I still have the same problem.

I appreciate if you guys can help! Thanks!

Package version

crawlee==0.5.0

@sadaffatollahy sadaffatollahy added the bug Something isn't working. label Feb 9, 2025
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 9, 2025
@B4nan B4nan transferred this issue from apify/crawlee Feb 9, 2025
@janbuchar
Copy link
Collaborator

Hi @sadaffatollahy and thank you for your interest in Crawlee! Could you please provide a short script that reproduces the issue you encountered? It would help us greatly in diagnosing the problem.

@sadaffatollahy
Copy link
Author

sadaffatollahy commented Feb 11, 2025

async def main() -> None:
    # Open or create a named request queue
    request_queue = await RequestQueue.open(name="dookcollection")

    # Initialize the crawler with the named request queue
    crawler = BeautifulSoupCrawler(
        max_requests_per_crawl=100,
        request_handler=router,
        request_manager=request_queue,
    )

    # Start the crawler with the initial URL
    await crawler.run(
        ["https://dookcollection.ir/"],
    )

    # Export the entire dataset to a JSON file.
    await crawler.export_data_json(
        path="./storage/results_dookcollection.json",
        dataset_name="dookcollection",
        ensure_ascii=False,
    )

This is the main function for my crawler code. it has error when it opens RequestQueue

@janbuchar
Copy link
Collaborator

Can you also provide your router? When I ran the code you posted with a dummy request handler, it did not result in an error.

@janbuchar
Copy link
Collaborator

I'm closing this due to inactivity, feel free to let us know if you still need help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

2 participants