Commit Graph

1666 Commits

Author SHA1 Message Date
Nicolas
79e65f31ef Update v1.ts
Some checks failed
Deploy Images to GHCR / push-app-image (push) Has been cancelled
2024-10-17 17:57:44 -03:00
Gergő Móricz
03b37998fd feat: bulk scrape 2024-10-17 19:40:18 +02:00
Nicolas
081d7407b3
Merge pull request #788 from mendableai/nsc/log-extractpr-options
Some checks are pending
Deploy Images to GHCR / push-app-image (push) Waiting to run
Extractor options logging v1 fix
2024-10-16 23:51:22 -03:00
Nicolas
06b8d24a4c Update scrape.ts 2024-10-16 23:50:21 -03:00
Nicolas
a73b06589c
Merge pull request #785 from mendableai/nsc/support-for-all-metadata
Return all the website metadata
2024-10-16 23:37:26 -03:00
Nicolas
2ac50a16f5 Update metadata.ts 2024-10-16 23:37:07 -03:00
Nicolas
8974230db4 Nick: formatting + error handling 2024-10-16 23:35:03 -03:00
Nicolas
c0384ea381 Nick: added tests 2024-10-16 23:32:44 -03:00
Nicolas
417c7697c3 Update metadata.ts 2024-10-16 23:26:46 -03:00
Nicolas
ff906f7750 Update excludeTags.ts
Some checks are pending
Deploy Images to GHCR / push-app-image (push) Waiting to run
2024-10-16 13:40:34 -03:00
Nicolas
2c1a98f019 Update excludeTags.ts 2024-10-16 13:37:40 -03:00
Nicolas
cf8fe93281 Update credit_billing.ts
Some checks are pending
Deploy Images to GHCR / push-app-image (push) Waiting to run
2024-10-16 01:09:57 -03:00
Nicolas
e5a5ca2446 Update credit_billing.ts 2024-10-16 01:06:10 -03:00
Nicolas
027158fa44 Nick: 2024-10-15 21:47:27 -03:00
Nicolas
795e5a9228 Update metadata.ts 2024-10-15 21:36:13 -03:00
Nicolas
b4f6a0f919 Nick: geolocation 2024-10-15 21:12:33 -03:00
Nicolas
54a54b9f33 Nick: admin init 2024-10-15 17:28:28 -03:00
rafaelsideguide
4afcd16e02 performance improv for ws 2024-10-15 10:12:27 -03:00
rafaelsideguide
3afaab13d9 feat/improv-crawl-status-filters 2024-10-14 18:14:00 -03:00
rafaelsideguide
180801225b fix/check files on crawl 2024-10-14 15:44:45 -03:00
Nicolas
e40036caf7 Merge branch 'main' of https://github.com/mendableai/firecrawl
Some checks are pending
Deploy Images to GHCR / push-app-image (push) Waiting to run
2024-10-14 12:24:45 -03:00
Nicolas
c3a9630e33 Reapply "Merge pull request #773 from mendableai/nsc/retries-acuc-price-credits-fallback"
This reverts commit a6888ce17b.
2024-10-14 12:24:34 -03:00
rafaelsideguide
2bf7b433e2 fixed file blocking process 2024-10-14 12:18:26 -03:00
rafaelsideguide
a6888ce17b Revert "Merge pull request #773 from mendableai/nsc/retries-acuc-price-credits-fallback"
This reverts commit ba9ad1ef7f, reversing
changes made to 666082a7dd.
2024-10-14 10:32:09 -03:00
Nicolas
821c62c575 Update credit_billing.ts 2024-10-13 22:30:11 -03:00
Nicolas
78b6127d88 Nick: retries for acuc 2024-10-13 22:27:38 -03:00
Nicolas
666082a7dd Nick: bump python patch to 1.3.1 2024-10-13 14:03:19 -03:00
Nicolas
ec238a8349 Update firecrawl.py 2024-10-13 14:01:25 -03:00
Nicolas
03287821c2 Update index.ts
Some checks failed
Deploy Images to GHCR / push-app-image (push) Has been cancelled
2024-10-12 19:49:37 -03:00
Nicolas
d3856371c9 Update index.ts
Some checks are pending
Deploy Images to GHCR / push-app-image (push) Waiting to run
2024-10-12 19:36:49 -03:00
Nicolas
af06b42cb2 Update fireEngine.ts 2024-10-12 18:18:38 -03:00
Nicolas
35b15f1ee6 Update fireEngine.ts 2024-10-12 17:59:50 -03:00
Nicolas
961b1010cf Nick: rm the cache for map for 24hrs 2024-10-12 17:48:37 -03:00
Nicolas
5ab52854b9
Merge pull request #757 from busaud/patch-1
Some checks are pending
Deploy Images to GHCR / push-app-image (push) Waiting to run
Update README.md
2024-10-11 15:50:06 -03:00
Nicolas
5f16688bd4
Merge pull request #766 from mendableai/doc/issue-764
[Doc] Better explained how includePaths and excludePaths work
2024-10-11 15:48:19 -03:00
rafaelsideguide
2d3d7c827a fix/added unkwown status to job filter 2024-10-11 15:40:29 -03:00
Rafael Miller
ca51521625
Merge pull request #761 from mendableai/fix/filter-status-unknown-jobs
[BUG] filters failed and unknown jobs now
2024-10-11 15:36:16 -03:00
Nicolas
0bff5b1a24 Update auth.ts 2024-10-11 15:29:25 -03:00
Nicolas
257a951132 Update auth.ts 2024-10-11 14:21:04 -03:00
rafaelsideguide
e916ea7e1a updated openapi.json 2024-10-11 13:55:15 -03:00
rafaelsideguide
e57a8e9d45 better explain how includePaths and excludePaths work 2024-10-11 13:52:18 -03:00
rafaelsideguide
c1f98d0371 fixed developer.notion special case 2024-10-11 10:54:59 -03:00
rafaelsideguide
8cbd94ed2d fix/filters failed and unknown jobs now 2024-10-11 09:45:51 -03:00
Nicolas
bfed65d443 Update package.json 2024-10-10 17:46:49 -03:00
Nicolas
2cde877342 Nick: version bump 2024-10-10 17:44:27 -03:00
Nicolas
4960b2b0c2 Merge branch 'main' into feat-sdks/cancel-crawl 2024-10-10 17:43:20 -03:00
rafaelsideguide
2689ffa748 feat-sdk/cancel-crawl 2024-10-10 17:08:08 -03:00
rafaelsideguide
68a4c2e402 Fixed missing error handling in JS-SDK 2024-10-10 16:29:35 -03:00
rafaelsideguide
f113222829 fix: removing test teams concurrency limit
Some checks are pending
Deploy Images to GHCR / push-app-image (push) Waiting to run
2024-10-10 09:46:25 -03:00
busaud
0934dd88d3
Update README.md
I believe wait_until_done was removed as of v1?
2024-10-10 09:35:12 +03:00
Nicolas
d410804348
Merge pull request #755 from busaud/main
Some checks are pending
Deploy Images to GHCR / push-app-image (push) Waiting to run
bugfix: self-host crawling doesnt respect limit
2024-10-09 22:56:44 -03:00
Nicolas
abb5ec7439 Update playwright.ts 2024-10-09 22:55:01 -03:00
Nicolas
f6ec45f046
Merge pull request #747 from Harsh0707005/timeout-parameter-not-passed
Fixed Issue #734
2024-10-09 22:53:26 -03:00
Nicolas
222a34cae8
Update playwright.ts 2024-10-09 22:53:03 -03:00
busaud
c6ebbc6f6a bugfix: self-host crawling doesnt respect limit 2024-10-09 22:52:49 +00:00
Nicolas
52ec43aac3 Update index.ts 2024-10-09 19:42:25 -03:00
Nicolas
5ff6c64d77 Update index.ts 2024-10-09 19:30:14 -03:00
Gergő Móricz
17d0ed061e push 2024-10-09 23:13:26 +02:00
Gergő Móricz
b2ae1a52d5 fix(Dockerfile): remove chromium 2024-10-09 23:13:13 +02:00
Gergő Móricz
2d365ebc6d fix(redis): protected mode off 2024-10-09 22:09:08 +02:00
busaud
237442fabb Make sure the entrypoint script has the correct line endings 2024-10-09 20:58:37 +02:00
rafaelsideguide
ae464ada60 tests: teamIds
Some checks failed
Deploy Images to GHCR / push-app-image (push) Waiting to run
Fly Deploy / Pre-deploy checks (push) Has been cancelled
Fly Deploy / Deploy app (push) Has been cancelled
Fly Deploy Direct / Deploy app (push) Has been cancelled
2024-10-09 15:06:29 -03:00
Nicolas
1cd49a0a95 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-10-09 14:41:25 -03:00
Nicolas
064ce482c2 Update blocklist.ts 2024-10-09 14:41:23 -03:00
rafaelsideguide
4020a7d781 test: added test suite tokens
Some checks are pending
Deploy Images to GHCR / push-app-image (push) Waiting to run
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-10-08 15:11:08 -03:00
Gergő Móricz
075b63b57b
feat(redis): add memory calcualtion when not running on fly 2024-10-08 17:03:37 +02:00
Harsh Master
aa3d4b8d6c
Fixed Issue #734 2024-10-08 11:36:12 +05:30
Nicolas
5c0c952a27 Update website_params.ts
Some checks are pending
Deploy Images to GHCR / push-app-image (push) Waiting to run
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-10-07 14:51:05 -03:00
Nicolas
1f1afeaac4 Update system-monitor.ts 2024-10-04 15:15:04 -03:00
Nicolas
dba96998e3 Update fetch.ts 2024-10-03 18:56:51 -03:00
Nicolas
668ff3c71b Update fetch.ts 2024-10-03 18:55:39 -03:00
Nicolas
25dd16bf2a Nick: removed 401 2024-10-03 18:52:17 -03:00
Nicolas
93657f6a44 Update queue-worker.ts 2024-10-03 18:44:40 -03:00
Thomas Kosmas
28b64fc704 Change the gracefull shutdown signal 2024-10-04 00:40:09 +03:00
Nicolas
497ac3328b
Merge pull request #732 from mendableai/fix/url-validation-params
[BUG] Fixed URLs with params
2024-10-03 17:43:37 -03:00
rafaelsideguide
cfd776a5de fix: now urls with params are passing validation
example: https://www.granitecreek.com?asljhda=akjshd
2024-10-03 17:37:04 -03:00
Nicolas
99ca852e5d
Merge pull request #731 from mendableai/nsc/crawl-fixes
Fixes crawl failed and webhooks not working properly
2024-10-03 17:37:03 -03:00
Nicolas
85e9f7b9b9
Merge pull request #727 from mendableai/nsc/error-js-sdk-improv
Improves error handler in Node SDK to return the status code
2024-10-03 17:36:31 -03:00
Nicolas
4f7608821f Update package.json 2024-10-03 17:36:20 -03:00
Nicolas
f743f2b922 Update index.ts 2024-10-03 17:34:29 -03:00
Nicolas
c6a29efbed Update crawl-status.ts 2024-10-03 17:33:38 -03:00
Nicolas
ddd774ed68 Nick: 2024-10-03 17:20:57 -03:00
Nicolas
82551bb6bc Update index.test.ts 2024-10-03 17:13:30 -03:00
Nicolas
49bd95327e Update types.ts 2024-10-03 17:00:33 -03:00
Nicolas
1a1ac9fd60 Nick: 2024-10-03 16:37:58 -03:00
Nicolas
a150aa820c Nick: shouldnt fallback on a 400 + error code should be correct on page status code
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-10-03 15:21:42 -03:00
Nicolas
489a643391 Update index.ts 2024-10-02 20:25:52 -03:00
Gergő Móricz
26771e2e71 debug(zod): log unsupported protocol errors
Some checks failed
Fly Deploy / Pre-deploy checks (push) Has been cancelled
Fly Deploy / Deploy app (push) Has been cancelled
2024-10-01 22:13:28 +02:00
Nicolas
d1b838322d
Merge pull request #721 from mendableai/feat/concurrency-limit
Concurrency limits
2024-10-01 16:15:05 -03:00
Nicolas
ac5e1fc194 Update sitemap.ts 2024-10-01 16:14:43 -03:00
Nicolas
c6717fecaa Nick: got rid of job interval sleep and math.min 2024-10-01 16:11:12 -03:00
Nicolas
18f9cd09e1 Nick: fixed more stuff 2024-10-01 16:04:39 -03:00
Gergő Móricz
fe721fffbe fix(crawl-redis): normalize URL before locking 2024-10-01 20:59:50 +02:00
Nicolas
c0541cc990 Update queue-worker.ts 2024-10-01 15:38:24 -03:00
Nicolas
37299fc035 Update types.ts 2024-10-01 15:18:11 -03:00
Nicolas
8aa07afb6d Nick: fixes 2024-10-01 15:15:49 -03:00
Nicolas
92dbd33e57 Update queue-worker.ts 2024-10-01 14:53:26 -03:00
Nicolas
4d5477f357 Nick: resolved conflicts 2024-10-01 14:39:57 -03:00
Nicolas
96245e387d Update crawl.ts 2024-10-01 14:29:53 -03:00
Nicolas
258c67ce67 Revert "feat(queue-worker): always crawl links from content even if sitemapped"
This reverts commit 3c045c43a4.
2024-10-01 14:20:23 -03:00
Nicolas
445fc432e9 Reapply "fix(v1/crawl): always use sitemap"
This reverts commit 339b19ce9d.
2024-10-01 14:03:07 -03:00
Nicolas
339b19ce9d Revert "fix(v1/crawl): always use sitemap"
This reverts commit 5dc0fcf644.
2024-10-01 13:59:49 -03:00
Gergő Móricz
5dc0fcf644 fix(v1/crawl): always use sitemap 2024-10-01 18:49:44 +02:00
Gergő Móricz
3c045c43a4 feat(queue-worker): always crawl links from content even if sitemapped 2024-10-01 18:32:53 +02:00
Nicolas
1af26fe1b4 Nick: sitemap fix 2024-10-01 12:38:48 -03:00
Nicolas
ff4b7a835b
Merge pull request #685 from devflowinc/main
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
bugfix: using onlyIncludeTags and removeTags together
2024-09-30 17:18:30 -03:00
Nicolas
986262e1d4 Update search.ts 2024-09-30 15:23:43 -03:00
Gergő Móricz
0dd06d33ef fix(v0/search): pass job priority 2024-09-30 19:20:24 +02:00
Gergő Móricz
20ffdbd15c hotfix 2024-09-30 19:17:52 +02:00
Gergő Móricz
a8df85fd9b fix(acuc): remove sentry capture 2024-09-30 19:10:24 +02:00
Gergő Móricz
3621e191bd feat(concurrency-limit): set limit based on plan 2024-09-28 00:19:54 +02:00
Gergő Móricz
c6a83ab92c fix(api): entrypoint
Some checks failed
Fly Deploy / Pre-deploy checks (push) Has been cancelled
Fly Deploy / Deploy app (push) Has been cancelled
2024-09-27 22:16:27 +02:00
Gergő Móricz
e44bdf7a54 bad dockerfile 2024-09-27 21:07:11 +02:00
Gergő Móricz
f0a1a2e45b fix: increase ulimit -n in docker 2024-09-27 20:44:52 +02:00
Gergő Móricz
d5e2a80e4a fix(crawl-status): keep 10 megabyte pages if they're the only thing in the output
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-09-27 20:41:41 +02:00
Nicolas
975f0575b4 Nick: max retries with axios-retry 2024-09-27 12:58:57 -04:00
Nicolas
92961cf74f Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-09-27 12:23:45 -04:00
Nicolas
1fdff87b3e Update single_url.ts 2024-09-27 12:23:44 -04:00
Gergő Móricz
6283e8fc47 fix(logger): set default level to trace 2024-09-27 17:46:43 +02:00
Gergő Móricz
5e8ef4954e feat(auth): log cache key in acuc update error 2024-09-27 17:13:10 +02:00
Gergő Móricz
e98f858eb6 fix(api): playground scrape errors
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-09-26 22:28:14 +02:00
Nicolas
8d44cb33bb Nick: fixed error message 2024-09-26 22:15:15 +02:00
Gergő Móricz
2cb493321a fix(ACUC): do not refresh cache every set 2024-09-26 22:15:15 +02:00
Gergő Móricz
9bdd344b36 fix(redlock): use redlock.using for stability 2024-09-26 22:15:15 +02:00
Gergő Móricz
250c3bb5c6 fix(auth): move redlock settings 2024-09-26 22:15:15 +02:00
Gergő Móricz
81245e68fa fix(auth/redlock): retry cached ACUC lock for 20 seconds 2024-09-26 22:15:15 +02:00
Gergő Móricz
0f89f5e7cb fix(billTeam): cache update race condition 2024-09-26 22:15:15 +02:00
Gergő Móricz
d13a97f979 fix(credit_billing): allow spending of exact credits 2024-09-26 22:15:15 +02:00
Gergő Móricz
84bff8add8 fix(billTeam): update cached ACUC after billing 2024-09-26 22:15:15 +02:00
Gergő Móricz
f22ab5ffaf feat(db): implement bill_team RPC 2024-09-26 22:15:15 +02:00
Gergő Móricz
c1f68c3e0a fix(credit_billing): return chunk.remaining_credits 2024-09-26 22:15:15 +02:00
Gergő Móricz
2073063fb7 fix(db): fix caching and rpc error 2024-09-26 22:15:15 +02:00
Gergő Móricz
f8c70fe5dd feat(db): implement auth_credit_usage_chunk RPC 2024-09-26 22:15:15 +02:00
Gergő Móricz
29815e084b feat(v1/Document): add warning field 2024-09-26 21:19:05 +02:00
Gergő Móricz
095babe70b fix(queue-jobs): jobs with concurrency fails may vanish 2024-09-26 21:18:56 +02:00
Gergő Móricz
b696bfc854 fix(crawl-status): avoid race conditions where crawl may be deemed failed 2024-09-26 21:00:27 +02:00
Gergő Móricz
dec4171937 fix(queue-worker, queue-jobs): logic fixes 2024-09-26 20:39:19 +02:00
Gergő Móricz
d2881927c1 fix(queue-worker): remove concurrency entries when done in sentry-less branch 2024-09-26 20:29:17 +02:00
Gergő Móricz
53fce67ca1 feat(queue-worker): PoC of concurrency limits 2024-09-26 20:24:34 +02:00
Nicolas
30058b1da0 Nick: increased timeout for chrome-cdp due to smart wait 2024-09-26 20:24:34 +02:00
Nicolas
a9773a24a3 Nick: increased timeout for chrome-cdp due to smart wait
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Deploy app (push) Blocked by required conditions
2024-09-25 19:27:02 -04:00
Gergő Móricz
953d4fb197 fix(redlock): use redlock.using for stability 2024-09-25 22:47:42 +02:00
Gergő Móricz
eef116bef8 fix(auth): move redlock settings 2024-09-25 22:27:51 +02:00
Gergő Móricz
2c96d2eef6 fix(auth/redlock): retry cached ACUC lock for 20 seconds 2024-09-25 22:25:13 +02:00
Gergő Móricz
1cca9b8ae6 fix(billTeam): cache update race condition 2024-09-25 22:15:02 +02:00
Gergő Móricz
eb7317c08a fix(credit_billing): allow spending of exact credits 2024-09-25 21:44:05 +02:00
Gergő Móricz
e67cbc2ca1 fix(billTeam): update cached ACUC after billing 2024-09-25 21:37:01 +02:00
Gergő Móricz
5a8eb17a82 feat(db): implement bill_team RPC 2024-09-25 20:57:45 +02:00
Gergő Móricz
415fd9f333 fix(credit_billing): return chunk.remaining_credits 2024-09-25 20:37:35 +02:00
Gergő Móricz
417adf8e96 fix(db): fix caching and rpc error 2024-09-25 19:42:45 +02:00