firecrawl/CONTRIBUTING.md

119 lines
3.7 KiB
Markdown
Raw Permalink Normal View History

2024-05-03 03:30:22 +08:00
# Contributors guide:
2024-05-03 03:30:22 +08:00
Welcome to [Firecrawl](https://firecrawl.dev) 🔥! Here are some instructions on how to get the project locally, so you can run it on your own (and contribute)
2024-04-22 02:19:40 +08:00
If you're contributing, note that the process is similar to other open source repos i.e. (fork firecrawl, make changes, run tests, PR). If you have any questions, and would like help gettin on board, reach out to hello@mendable.ai for more or submit an issue!
2024-04-22 02:45:15 +08:00
## Running the project locally
2024-04-22 02:19:40 +08:00
First, start by installing dependencies:
2024-05-03 03:30:22 +08:00
2024-04-22 02:19:40 +08:00
1. node.js [instructions](https://nodejs.org/en/learn/getting-started/how-to-install-nodejs)
2. pnpm [instructions](https://pnpm.io/installation)
2024-05-03 03:30:22 +08:00
3. redis [instructions](https://redis.io/docs/latest/operate/oss_and_stack/install/install-redis/)
2024-04-22 02:19:40 +08:00
2024-06-27 17:50:31 +08:00
Set environment variables in a .env in the /apps/api/ directory you can copy over the template in .env.example.
2024-04-22 02:19:40 +08:00
To start, we wont set up authentication, or any optional sub services (pdf parsing, JS blocking support, AI features )
2024-04-22 02:51:39 +08:00
.env:
2024-05-03 03:30:22 +08:00
2024-04-22 02:51:39 +08:00
```
2024-04-22 02:19:40 +08:00
# ===== Required ENVS ======
2024-05-03 03:30:22 +08:00
NUM_WORKERS_PER_QUEUE=8
2024-04-22 02:19:40 +08:00
PORT=3002
HOST=0.0.0.0
REDIS_URL=redis://localhost:6379
2024-07-23 07:17:53 +08:00
REDIS_RATE_LIMIT_URL=redis://localhost:6379
2024-04-22 02:19:40 +08:00
## To turn on DB authentication, you need to set up supabase.
USE_DB_AUTHENTICATION=false
# ===== Optional ENVS ======
# Supabase Setup (used to support DB authentication, advanced logging, etc.)
2024-05-03 03:30:22 +08:00
SUPABASE_ANON_TOKEN=
SUPABASE_URL=
2024-04-22 02:19:40 +08:00
SUPABASE_SERVICE_TOKEN=
# Other Optionals
TEST_API_KEY= # use if you've set up authentication and want to test with a real API key
SCRAPING_BEE_API_KEY= #Set if you'd like to use scraping Be to handle JS blocking
OPENAI_API_KEY= # add for LLM dependednt features (image alt generation, etc.)
BULL_AUTH_KEY= @
2024-04-22 02:19:40 +08:00
PLAYWRIGHT_MICROSERVICE_URL= # set if you'd like to run a playwright fallback
LLAMAPARSE_API_KEY= #Set if you have a llamaparse key you'd like to use to parse pdfs
2024-05-03 03:30:22 +08:00
SLACK_WEBHOOK_URL= # set if you'd like to send slack server health status messages
POSTHOG_API_KEY= # set if you'd like to send posthog events like job logs
POSTHOG_HOST= # set if you'd like to send posthog events like job logs
2024-04-22 02:19:40 +08:00
```
2024-04-22 02:45:15 +08:00
### Installing dependencies
First, install the dependencies using pnpm.
```bash
# cd apps/api # to make sure you're in the right folder
pnpm install # make sure you have pnpm version 9+!
2024-04-22 02:45:15 +08:00
```
### Running the project
You're going to need to open 3 terminals. Here is [a video guide accurate as of Oct 2024](https://youtu.be/LHqg5QNI4UY).
2024-04-22 02:19:40 +08:00
### Terminal 1 - setting up redis
Run the command anywhere within your project
2024-04-22 02:51:39 +08:00
```bash
redis-server
```
2024-04-22 02:19:40 +08:00
### Terminal 2 - setting up workers
Now, navigate to the apps/api/ directory and run:
2024-05-03 03:30:22 +08:00
2024-04-22 02:51:39 +08:00
```bash
pnpm run workers
2024-10-30 15:31:53 +08:00
# if you are going to use the [llm-extract feature](https://github.com/mendableai/firecrawl/pull/586/), you should also export OPENAI_API_KEY=sk-______
2024-04-22 02:51:39 +08:00
```
This will start the workers who are responsible for processing crawl jobs.
2024-04-22 02:19:40 +08:00
### Terminal 3 - setting up the main server
To do this, navigate to the apps/api/ directory and run if you dont have this already, install pnpm here: https://pnpm.io/installation
2024-04-22 02:51:39 +08:00
Next, run your server with:
2024-04-22 02:19:40 +08:00
2024-04-22 02:51:39 +08:00
```bash
pnpm run start
```
2024-04-22 02:19:40 +08:00
### Terminal 3 - sending our first request.
Alright: now lets send our first request.
```curl
curl -X GET http://localhost:3002/test
2024-05-03 03:30:22 +08:00
```
2024-04-22 02:19:40 +08:00
2024-05-03 03:30:22 +08:00
This should return the response Hello, world!
2024-04-22 02:19:40 +08:00
2024-05-03 03:30:22 +08:00
If youd like to test the crawl endpoint, you can run this
2024-04-22 02:19:40 +08:00
```curl
2024-09-19 03:18:44 +08:00
curl -X POST http://localhost:3002/v1/crawl \
2024-04-22 02:19:40 +08:00
-H 'Content-Type: application/json' \
-d '{
"url": "https://mendable.ai"
}'
2024-05-03 03:30:22 +08:00
```
2024-04-22 02:19:40 +08:00
## Tests:
2024-04-22 02:45:15 +08:00
The best way to do this is run the test with `npm run test:local-no-auth` if you'd like to run the tests without authentication.
2024-04-22 02:45:15 +08:00
If you'd like to run the tests with authentication, run `npm run test:prod`