Gergő Móricz
8d467c8ca7
WebScraper
refactor into scrapeURL
(#714 )
...
* feat: use strictNullChecking
* feat: switch logger to Winston
* feat(scrapeURL): first batch
* fix(scrapeURL): error swallow
* fix(scrapeURL): add timeout to EngineResultsTracker
* fix(scrapeURL): report unexpected error to sentry
* chore: remove unused modules
* feat(transfomers/coerce): warn when a format's response is missing
* feat(scrapeURL): feature flag priorities, engine quality sorting, PDF and DOCX support
* (add note)
* feat(scrapeURL): wip readme
* feat(scrapeURL): LLM extract
* feat(scrapeURL): better warnings
* fix(scrapeURL/engines/fire-engine;playwright): fix screenshot
* feat(scrapeURL): add forceEngine internal option
* feat(scrapeURL/engines): scrapingbee
* feat(scrapeURL/transformars): uploadScreenshot
* feat(scrapeURL): more intense tests
* bunch of stuff
* get rid of WebScraper (mostly)
* adapt batch scrape
* add staging deploy workflow
* fix yaml
* fix logger issues
* fix v1 test schema
* feat(scrapeURL/fire-engine/chrome-cdp): remove wait inserts on actions
* scrapeURL: v0 backwards compat
* logger fixes
* feat(scrapeurl): v0 returnOnlyUrls support
* fix(scrapeURL/v0): URL leniency
* fix(batch-scrape): ts non-nullable
* fix(scrapeURL/fire-engine/chromecdp): fix wait action
* fix(logger): remove error debug key
* feat(requests.http): use dotenv expression
* fix(scrapeURL/extractMetadata): extract custom metadata
* fix crawl option conversion
* feat(scrapeURL): Add retry logic to robustFetch
* fix(scrapeURL): crawl stuff
* fix(scrapeURL): LLM extract
* fix(scrapeURL/v0): search fix
* fix(tests/v0): grant larger response size to v0 crawl status
* feat(scrapeURL): basic fetch engine
* feat(scrapeURL): playwright engine
* feat(scrapeURL): add url-specific parameters
* Update readme and examples
* added e2e tests for most parameters. Still a few actions, location and iframes to be done.
* fixed type
* Nick:
* Update scrape.ts
* Update index.ts
* added actions and base64 check
* Nick: skipTls feature flag?
* 403
* todo
* todo
* fixes
* yeet headers from url specific params
* add warning when final engine has feature deficit
* expose engine results tracker for ScrapeEvents implementation
* ingest scrape events
* fixed some tests
* comment
* Update index.test.ts
* fixed rawHtml
* Update index.test.ts
* update comments
* move geolocation to global f-e option, fix removeBase64Images
* Nick:
* trim url-specific params
* Update index.ts
---------
Co-authored-by: Eric Ciarla <ericciarla@yahoo.com>
Co-authored-by: rafaelmmiller <8574157+rafaelmmiller@users.noreply.github.com>
Co-authored-by: Nicolas <nicolascamara29@gmail.com>
2024-11-07 20:57:33 +01:00
Nicolas
e1d8e1584e
Update SELF_HOST.md
2024-10-21 12:23:27 -03:00
Mayur Kawale
2b0c52ff67
Update SELF_HOST.md
2024-10-20 12:33:45 +05:30
y5n
7685853d8a
[Fix] fix SELF_HOST.md kubernetes cluster-install link
2024-09-07 13:50:47 +08:00
Tadashi Shigeoka
aa2cf686f4
[Docs] upgraded the path of the self-hosted documentation URL to /v1
.
2024-09-06 21:41:31 +09:00
rafaelsideguide
7a61325500
map + search + scrape markdown bug
2024-08-16 17:57:11 -03:00
Quan Ming
fe179d0cb1
Update redis troubleshooting in self host guide
2024-08-10 12:39:22 +08:00
rafaelsideguide
c7a38a4ae2
Update SELF_HOST.md
2024-07-30 18:07:36 -03:00
rafaelsideguide
2d1ab43c27
Update SELF_HOST.md
2024-07-30 15:59:42 -03:00
Nicolas
a7aaa7e57e
Update SELF_HOST.md
2024-07-04 17:49:09 -03:00
Jeff Pereira
8d09c5f9b5
(Docs) Self Host added new ts playwright service instructions
2024-07-03 12:00:44 -07:00
Lakr
3d1766ba7b
Fix Broken Link
2024-06-19 20:38:42 +08:00
Jakob Stadlhuber
078d4c8d41
Add Kubernetes configuration for Firecrawl deployment
...
Added new files for setting up Firecrawl on a Kubernetes Cluster. The files include Kubernetes manifests for deploying API, worker, playwright service, and Redis with associated ConfigMap and Secret associated resources. Also, updated the self-host documentation to include instructions for Kubernetes deployment.
2024-06-04 20:52:08 +02:00
Nicolas
fae8954eeb
Update SELF_HOST.md
2024-05-17 18:46:59 -07:00
Nicolas
eb36d4b3bd
Update SELF_HOST.md
2024-05-15 13:25:39 -07:00
rafaelsideguide
4737fe8711
Added missing instruction
2024-05-13 13:47:49 -03:00
rafaelsideguide
18480b2005
Removed .env.example, improved docs and docker compose envs
2024-05-10 11:38:17 -03:00
chand1012
b32057ec89
Update SELF_HOST.md
2024-05-05 12:03:42 -04:00
Nicolas
30a8482a68
Nick:
2024-04-21 11:41:34 -07:00
Nicolas
93627ae87c
Nick:
2024-04-16 12:06:46 -04:00
Nicolas
a6c2a87811
Initial commit
2024-04-15 17:01:47 -04:00