📚 This is an adapted version of Jina AI's Reader for local deployment using Docker. Convert any URL to an LLM-friendly input with a simple prefix http://127.0.0.1:3000/https://website-to-scrape.com/
Go to file
intergalacticalvariable a846c24307
Update README.md
2024-09-28 10:47:36 +02:00
.github/workflows wip 2024-04-10 19:32:07 +08:00
.vscode feat: add image captioning (#6) 2024-04-15 20:51:31 -07:00
backend works ;) 2024-09-28 02:31:42 +00:00
.gitignore strip more stuff 2024-08-14 13:47:25 +05:30
Dockerfile works ;) 2024-09-28 02:31:42 +00:00
LICENSE chore: rename url2text to reader 2024-04-13 11:42:15 -07:00
package.json update the start script 2024-08-26 17:36:54 +05:30
README.md Update README.md 2024-09-28 10:47:36 +02:00

📚 Reader: Local Deployment Edition

This is an adapted version of Jina AI's Reader for local deployment using Docker.

🎯 What it does

It converts any URL to an LLM-friendly input with http://127.0.0.1:3000/https://google.com. Get improved output for your agent and RAG systems at no cost. This tool helps you prepare web content for Large Language Models, making it easier to process and analyze online information.

🚀 Key Features

  • 🏠 Runs locally using Docker
  • 🔑 No API keys required - works out of the box!
  • 🖼️ Saves screenshots locally instead of uploading to Google Cloud Storage
  • 📥 Provides download URLs for saved screenshots
  • 🌐 Converts web content to LLM-friendly formats

⚠️ Limitations

  • 📄 Currently does not support parsing PDFs

🐳 Docker Deployment

Option 1: Using the pre-built image

  1. Pull the latest image:

    docker pull ghcr.io/intergalacticalvariable/reader:latest
    
  2. Run the container:

    Replace /path/to/local-storage with the directory where you want to store screenshots.

    docker run -p 3000:3000 -v /path/to/local-storage:/app/local-storage ghcr.io/intergalacticalvariable/reader:latest
    

Option 2: Building the image locally

  1. Clone the repository:

    git clone https://github.com/intergalacticalvariable/reader.git
    cd reader
    
  2. Build the Docker image:

    docker build -t reader .
    
  3. Run the container:

    docker run -p 3000:3000 -v /path/to/local-storage:/app/local-storage reader
    

🖥️ Usage

Once the Docker container is running, you can use curl to make requests. Here are examples for different response types:

  1. 📝 Markdown (bypasses readability processing):

    curl -H "X-Respond-With: markdown" http://127.0.0.1:3000/https://google.com
    
  2. 🌐 HTML (returns documentElement.outerHTML):

    curl -H "X-Respond-With: html" http://127.0.0.1:3000/https://google.com
    
  3. 📄 Text (returns document.body.innerText):

    curl -H "X-Respond-With: text" http://127.0.0.1:3000/https://google.com
    
  4. 📸 Screenshot (returns the URL of the webpage's screenshot):

    curl -H "X-Respond-With: screenshot" http://127.0.0.1:3000/https://google.com
    

🙏 Acknowledgements

This project is based on the excellent work done by the Jina AI team on their Reader project. We've adapted it for local deployment and made some modifications to suit our needs.

📜 License

This project is licensed under Apache-2.0 same as the original Jina AI Reader project.