firecrawl/examples/kubernetes/cluster-install
Gergő Móricz 8d467c8ca7
WebScraper refactor into scrapeURL (#714)
* feat: use strictNullChecking

* feat: switch logger to Winston

* feat(scrapeURL): first batch

* fix(scrapeURL): error swallow

* fix(scrapeURL): add timeout to EngineResultsTracker

* fix(scrapeURL): report unexpected error to sentry

* chore: remove unused modules

* feat(transfomers/coerce): warn when a format's response is missing

* feat(scrapeURL): feature flag priorities, engine quality sorting, PDF and DOCX support

* (add note)

* feat(scrapeURL): wip readme

* feat(scrapeURL): LLM extract

* feat(scrapeURL): better warnings

* fix(scrapeURL/engines/fire-engine;playwright): fix screenshot

* feat(scrapeURL): add forceEngine internal option

* feat(scrapeURL/engines): scrapingbee

* feat(scrapeURL/transformars): uploadScreenshot

* feat(scrapeURL): more intense tests

* bunch of stuff

* get rid of WebScraper (mostly)

* adapt batch scrape

* add staging deploy workflow

* fix yaml

* fix logger issues

* fix v1 test schema

* feat(scrapeURL/fire-engine/chrome-cdp): remove wait inserts on actions

* scrapeURL: v0 backwards compat

* logger fixes

* feat(scrapeurl): v0 returnOnlyUrls support

* fix(scrapeURL/v0): URL leniency

* fix(batch-scrape): ts non-nullable

* fix(scrapeURL/fire-engine/chromecdp): fix wait action

* fix(logger): remove error debug key

* feat(requests.http): use dotenv expression

* fix(scrapeURL/extractMetadata): extract custom metadata

* fix crawl option conversion

* feat(scrapeURL): Add retry logic to robustFetch

* fix(scrapeURL): crawl stuff

* fix(scrapeURL): LLM extract

* fix(scrapeURL/v0): search fix

* fix(tests/v0): grant larger response size to v0 crawl status

* feat(scrapeURL): basic fetch engine

* feat(scrapeURL): playwright engine

* feat(scrapeURL): add url-specific parameters

* Update readme and examples

* added e2e tests for most parameters. Still a few actions, location and iframes to be done.

* fixed type

* Nick:

* Update scrape.ts

* Update index.ts

* added actions and base64 check

* Nick: skipTls feature flag?

* 403

* todo

* todo

* fixes

* yeet headers from url specific params

* add warning when final engine has feature deficit

* expose engine results tracker for ScrapeEvents implementation

* ingest scrape events

* fixed some tests

* comment

* Update index.test.ts

* fixed rawHtml

* Update index.test.ts

* update comments

* move geolocation to global f-e option, fix removeBase64Images

* Nick:

* trim url-specific params

* Update index.ts

---------

Co-authored-by: Eric Ciarla <ericciarla@yahoo.com>
Co-authored-by: rafaelmmiller <8574157+rafaelmmiller@users.noreply.github.com>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2024-11-07 20:57:33 +01:00
..
api.yaml Update Kubernetes YAMLs and add worker service 2024-07-24 19:31:37 +02:00
configmap.yaml Update Kubernetes YAMLs and add worker service 2024-07-24 19:31:37 +02:00
playwright-service.yaml Update Kubernetes YAMLs and add worker service 2024-07-24 19:31:37 +02:00
README.md Update README.md 2024-09-09 10:55:31 +08:00
redis.yaml feat: Update redis deployment to run redis with password if REDIS_PASSWORD is configured 2024-09-07 16:00:32 +08:00
secret.yaml WebScraper refactor into scrapeURL (#714) 2024-11-07 20:57:33 +01:00
worker.yaml Remove liveness and readiness probes from worker.yaml 2024-07-24 19:38:54 +02:00

Install Firecrawl on a Kubernetes Cluster (Simple Version)

Before installing

  1. Set secret.yaml and configmap.yaml and do not check in secrets

    • Note: If REDIS_PASSWORD is configured in the secret, please modify the ConfigMap to reflect the following format for REDIS_URL and REDIS_RATE_LIMIT_URL:
      REDIS_URL: "redis://:password@host:port"
      REDIS_RATE_LIMIT_URL: "redis://:password@host:port"
      
      Replace password, host, and port with the appropriate values.
  2. Build Docker images, and host it in your Docker Registry (replace the target registry with your own)

    1. API (which is also used as a worker image)
      1. docker build --no-cache -t ghcr.io/winkk-dev/firecrawl:latest ../../../apps/api
        docker push ghcr.io/winkk-dev/firecrawl:latest
        
    2. Playwright
      1.    docker build --no-cache -t ghcr.io/winkk-dev/firecrawl-playwright:latest ../../../apps/playwright-service
           docker push ghcr.io/winkk-dev/firecrawl-playwright:latest
        
  3. Replace the image in worker.yaml, api.yaml and playwright-service.yaml

Install

kubectl apply -f configmap.yaml
kubectl apply -f secret.yaml
kubectl apply -f playwright-service.yaml
kubectl apply -f api.yaml
kubectl apply -f worker.yaml
kubectl apply -f redis.yaml

Port Forwarding for Testing

kubectl port-forward svc/api 3002:3002 -n dev

Delete Firecrawl

kubectl delete -f configmap.yaml
kubectl delete -f secret.yaml
kubectl delete -f playwright-service.yaml
kubectl delete -f api.yaml
kubectl delete -f worker.yaml
kubectl delete -f redis.yaml