reader/README.md

93 lines
2.8 KiB
Markdown
Raw Normal View History

2024-09-28 16:39:52 +08:00
# 📚 Reader: Local Deployment Edition
2024-08-15 19:00:32 +08:00
2024-09-28 16:39:52 +08:00
This is an adapted version of [Jina AI's Reader](https://github.com/jina-ai/reader) for local deployment using Docker.
2024-08-15 19:00:32 +08:00
2024-09-28 16:39:52 +08:00
## 🎯 What it does
2024-08-15 19:00:32 +08:00
2024-09-28 16:39:52 +08:00
It converts any URL to an LLM-friendly input with `http://127.0.0.1:3000/https://google.com`. Get improved output for your agent and RAG systems at no cost. This tool helps you prepare web content for Large Language Models, making it easier to process and analyze online information.
2024-08-15 19:00:32 +08:00
2024-09-28 16:39:52 +08:00
## 🚀 Key Features
2024-08-15 19:00:32 +08:00
2024-09-28 16:39:52 +08:00
- 🏠 Runs locally using Docker
- 🔑 No API keys required - works out of the box!
- 🖼️ Saves screenshots locally instead of uploading to Google Cloud Storage
- 📥 Provides download URLs for saved screenshots
- 🌐 Converts web content to LLM-friendly formats
2024-08-15 19:00:32 +08:00
2024-09-28 16:39:52 +08:00
## ⚠️ Limitations
2024-08-15 19:00:32 +08:00
2024-09-28 16:39:52 +08:00
- 📄 Currently does not support parsing PDFs
2024-08-15 19:00:32 +08:00
2024-09-28 16:39:52 +08:00
## 🐳 Docker Deployment
2024-09-28 16:15:56 +08:00
### Option 1: Using the pre-built image
2024-09-28 16:39:52 +08:00
1. Pull the latest image:
2024-09-28 16:15:56 +08:00
```bash
docker pull ghcr.io/intergalacticalvariable/reader:latest
```
2024-09-28 16:39:52 +08:00
2. Run the container:
2024-09-28 16:47:36 +08:00
Replace `/path/to/local-storage` with the directory where you want to store screenshots.
2024-09-28 16:15:56 +08:00
```bash
2024-09-28 16:39:52 +08:00
docker run -p 3000:3000 -v /path/to/local-storage:/app/local-storage ghcr.io/intergalacticalvariable/reader:latest
2024-09-28 16:15:56 +08:00
```
2024-09-28 16:39:52 +08:00
### Option 2: Building the image locally
2024-09-28 16:15:56 +08:00
1. Clone the repository:
```bash
git clone https://github.com/intergalacticalvariable/reader.git
cd reader
```
2. Build the Docker image:
```bash
docker build -t reader .
```
2024-09-28 16:39:52 +08:00
3. Run the container:
2024-09-28 16:15:56 +08:00
```bash
2024-09-28 16:39:52 +08:00
docker run -p 3000:3000 -v /path/to/local-storage:/app/local-storage reader
2024-09-28 16:15:56 +08:00
```
2024-09-28 16:39:52 +08:00
## 🖥️ Usage
2024-04-16 12:50:34 +08:00
2024-09-28 16:39:52 +08:00
Once the Docker container is running, you can use curl to make requests. Here are examples for different response types:
2024-04-16 12:50:34 +08:00
2024-09-28 16:39:52 +08:00
1. 📝 Markdown (bypasses readability processing):
```bash
curl -H "X-Respond-With: markdown" http://127.0.0.1:3000/https://google.com
```
2024-04-25 16:06:24 +08:00
2024-09-28 16:39:52 +08:00
2. 🌐 HTML (returns documentElement.outerHTML):
```bash
curl -H "X-Respond-With: html" http://127.0.0.1:3000/https://google.com
```
2024-04-24 23:28:55 +08:00
2024-09-28 16:39:52 +08:00
3. 📄 Text (returns document.body.innerText):
```bash
curl -H "X-Respond-With: text" http://127.0.0.1:3000/https://google.com
```
2024-05-15 18:54:47 +08:00
2024-09-28 16:39:52 +08:00
4. 📸 Screenshot (returns the URL of the webpage's screenshot):
```bash
curl -H "X-Respond-With: screenshot" http://127.0.0.1:3000/https://google.com
```
2024-07-24 15:59:13 +08:00
2024-09-28 16:39:52 +08:00
## 🙏 Acknowledgements
2024-07-24 15:59:13 +08:00
2024-09-28 16:49:56 +08:00
This project is based on the excellent work done by multiple contributors:
1. The original [Jina AI Reader project](https://github.com/jina-ai/reader), which provided the foundation for this tool.
2. [Harsh Gupta's adaptation](https://github.com/hargup/reader), which served as the immediate basis for this local deployment version.
2024-07-24 15:59:13 +08:00
2024-09-28 16:39:52 +08:00
## 📜 License
2024-04-14 03:33:51 +08:00
2024-09-28 16:45:57 +08:00
This project is licensed under Apache-2.0 same as the original Jina AI Reader project.