Commit Graph

873 Commits

Author SHA1 Message Date
Gergo Moricz
8e0aa69603 fix(crawl-status): partial_data 2024-08-06 17:06:21 +02:00
Gergo Moricz
1ab119c874 fix(scrape): don't double-bill for scrape 2024-08-06 16:57:23 +02:00
Gergo Moricz
7c5cda7b45 fix(queue-worker): concurrency 2024-08-06 16:57:00 +02:00
Gergo Moricz
d7d63790e5 fix(crawl-status): isCancelled should be status failed 2024-08-06 16:35:55 +02:00
Gergo Moricz
03c84a9372 cleanup and fix cancelling 2024-08-06 16:26:46 +02:00
rafaelsideguide
4d24a99d50 fix params 2024-08-06 09:34:43 -03:00
Nicolas
e195ddbef4 Merge branch 'main' into nsc/hyper-v81 2024-08-05 20:47:39 -04:00
rafaelsideguide
3edc3a3d15 added fullpagescreenshot capabilities, wip on fire-engine side 2024-08-05 18:17:37 -03:00
rafaelsideguide
f32e8de156 fixes the empty excludes.filter undefined bug 2024-08-05 18:13:31 -03:00
tak-s
af9bc5c8bb Suppressed repetitive logs 2024-08-04 15:09:36 +09:00
Nicolas
1742e4ceae Nick: 2024-08-02 19:25:15 -04:00
Nicolas
39aecd974b Update redis-health.ts 2024-08-02 17:43:45 -04:00
Nicolas
b448e3c3ad Update website_params.ts 2024-08-02 14:26:35 -04:00
rafaelsideguide
4051630632 Update sitemap.ts 2024-08-02 11:32:48 -03:00
rafaelsideguide
8568b61015 bugfix for sitemaps 2024-08-02 11:03:01 -03:00
Nicolas
af68b7a785
Merge pull request #475 from mendableai/bugfix/issue-466
Some checks failed
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
Check Redis / clean-jobs (push) Has been cancelled
[Bug] pdfs and logging pdf events, also added trycatchs for docx
2024-08-01 22:05:26 -04:00
rafaelsideguide
f48ff36b32 added .inc files and forced lower case comparison 2024-07-31 09:28:43 -03:00
Nicolas
ad6f6eff4b Update fireEngine.ts 2024-07-30 19:15:54 -04:00
Nicolas
f9827b2151 Update credit_billing.ts 2024-07-30 19:13:17 -04:00
Nicolas
6d99dedd3c Nick: fixed tests 2024-07-30 19:11:01 -04:00
Nicolas
a28ecc1f61 Nick: caching 2024-07-30 18:59:35 -04:00
Nicolas
52198f2991 Nick: 2024-07-30 16:15:08 -04:00
Nicolas
f43d5e7895 Nick: scrape queue 2024-07-30 14:44:13 -04:00
Nicolas
7e002a8b06 Nick: bull mq 2024-07-30 13:27:23 -04:00
Nicolas
46bcbd931f Merge branch 'main' into feat/queue-scrapes 2024-07-30 12:44:07 -04:00
Nicolas
fd2452ec9c Update scrape.ts 2024-07-30 12:42:12 -04:00
rafaelsideguide
8f5174ffc7 Update auth.ts 2024-07-30 10:37:33 -03:00
rafaelsideguide
d25d7e7244 special case: developer.apple.com
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
2024-07-30 10:13:09 -03:00
Nicolas
c446942306 Nick: 2024-07-29 21:28:29 -04:00
Nicolas
5e8ffcf505 Update website_params.ts 2024-07-29 20:43:47 -04:00
Nicolas
7b813883ef Nick: first layer 2024-07-29 20:31:51 -04:00
Nicolas
e99c2568f4 Update auth.ts 2024-07-29 18:44:18 -04:00
Nicolas
968a2dc753 Nick: 2024-07-29 18:37:09 -04:00
Nicolas
04942bb9de Nick: 2024-07-29 18:31:43 -04:00
Nicolas
267d4681bf Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-07-29 17:21:15 -04:00
Nicolas
b4833c1694 Nick: increasing default timeout to 45s 2024-07-29 17:21:11 -04:00
Nicolas
7fa08100bf
Merge pull request #414 from NiuBlibing/support_model_name
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
support custom models
2024-07-29 13:21:29 -04:00
rafaelsideguide
49e3e64787 bugfix for pdfs and logging pdf events, also added trycatchs for docx 2024-07-29 14:13:46 -03:00
Nicolas
4c9d62f6d3 Nick: fixing sitemap fallback 2024-07-26 18:25:44 -04:00
Nicolas
091924a636 Nick: moving machines from mia to virginia 2024-07-26 17:37:46 -04:00
Nicolas
cb97871ff9 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-07-26 17:21:11 -04:00
Nicolas
ff4266f09e Update pdfProcessor.ts 2024-07-26 17:21:09 -04:00
Nicolas
0c2e3a72cc
Merge pull request #460 from mendableai/nsc/admin-router
Admin router + Improve redis notifications
2024-07-26 12:16:14 -04:00
rafaelsideguide
96cec2a673 fix checking scrape log success content length 2024-07-26 12:00:52 -03:00
Nicolas
542270f4c2
Merge pull request #461 from mendableai/nsc/small-handle-for-client-side-errors
Client side error handling
2024-07-25 20:53:10 -04:00
Nicolas
dc6f825270 Update email_notification.ts 2024-07-25 20:43:50 -04:00
Nicolas
f82ca3be17 Nick: 2024-07-25 19:53:29 -04:00
Nicolas
01fab6e036 Update single_url.ts 2024-07-25 17:51:41 -04:00
Nicolas
56042d090c Update single_url.ts 2024-07-25 17:48:44 -04:00
Nicolas
88f5efce8f Merge branch 'feat/scrape-monitoring' 2024-07-25 17:44:21 -04:00
Nicolas
3242872503 Update single_url.ts 2024-07-25 17:43:55 -04:00
Nicolas
ffd430f198
Merge pull request #457 from JakobStadlhuber/Readiness-Liveness-Probes
Readiness liveness probes
2024-07-25 17:20:31 -04:00
Nicolas
7129d7993e
Update v0.ts 2024-07-25 17:19:45 -04:00
Nicolas
10e80f00cf Merge branch 'main' into nsc/admin-router 2024-07-25 16:46:38 -04:00
Nicolas
e5b797549e Merge branch 'main' into feat/scrape-monitoring 2024-07-25 16:21:02 -04:00
Nicolas
50d2426fc4 Update scrape-events.ts 2024-07-25 16:20:29 -04:00
Nicolas
28a8a98491 Update admin.ts 2024-07-25 14:58:14 -04:00
Nicolas
2014d9dd2e Nick: admin router 2024-07-25 14:54:20 -04:00
rafaelsideguide
1f1c068eea changing from error to debug 2024-07-25 10:00:50 -03:00
rafaelsideguide
e720e1bacf Merge remote-tracking branch 'origin/main' into feat/logger 2024-07-25 09:49:27 -03:00
rafaelsideguide
309728a482 updated logs 2024-07-25 09:48:06 -03:00
Nicolas
2c1221750b
Merge pull request #449 from mendableai/bugfix/malformed-url-sitemap
Added regex for links in sitemap
2024-07-24 20:37:35 -04:00
Gergő Móricz
d1a3df6d08 fix: aaaaahhh 2024-07-25 00:50:03 +02:00
Gergő Móricz
6798695ee4 feat: move scraper to queue 2024-07-25 00:14:25 +02:00
Nicolas
92843a356d Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-07-24 18:13:36 -04:00
Nicolas
1e13ddbe8e Nick: changes to the ui component 2024-07-24 18:13:34 -04:00
Gergő Móricz
623b547292 fix(fly.toml): scale up memory limit 2024-07-24 23:39:00 +02:00
Nicolas
15890772be Scale bump 2024-07-24 16:56:19 -04:00
rafaelsideguide
cc98f83fda added failed and completed log events 2024-07-24 15:25:36 -03:00
Jakob Stadlhuber
be9e7f9edf Update Kubernetes configs for playwright-service, api, and worker
Added new ConfigMap for playwright-service and adjusted existing references.
Applied imagePullPolicy: Always to ensure all images are updated promptly.
Updated README to include --no-cache for Docker build instructions.
2024-07-24 18:54:16 +02:00
Gergo Moricz
60c74357df feat(ScrapeEvents): log queue events 2024-07-24 18:44:14 +02:00
rafaelsideguide
4eca6bd301 fix/check-for-auth-on-scrape-log 2024-07-24 12:54:14 -03:00
Nicolas
3a1b8a9797 Update website_params.ts 2024-07-24 11:04:47 -04:00
Nicolas
8b48ec8d30 Update website_params.ts 2024-07-24 11:02:20 -04:00
Gergo Moricz
4d35ad073c feat(monitoring/scrape): include url, worker, response_size 2024-07-24 16:43:39 +02:00
Gergo Moricz
64bcedeefc fix(monitoring): bad success check on scrape 2024-07-24 16:21:59 +02:00
Gergo Moricz
d57dbbd0c6 fix: add jobId for scrape 2024-07-24 15:18:12 +02:00
Gergo Moricz
71072fef3b fix(scrape-events): bad logic 2024-07-24 14:46:41 +02:00
Gergo Moricz
7cd9bf92e3 feat: scrape event logging to DB 2024-07-24 14:31:25 +02:00
Rafael Miller
5e728c1a4d
Update apps/api/src/scraper/WebScraper/crawler.ts
no need for regex

Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2024-07-24 08:33:00 -03:00
rafaelsideguide
6208ecdbc0 added logger 2024-07-23 17:30:46 -03:00
Nicolas
f0b07b509b Update index.ts 2024-07-23 15:15:56 -04:00
rafaelsideguide
a684bd3c5d added regex for links in sitemap 2024-07-23 09:07:23 -03:00
Nicolas
30e706b43f Update scrape.ts 2024-07-22 19:15:24 -04:00
Nicolas
8916fec66c Update index.ts 2024-07-22 19:14:53 -04:00
Nicolas
575ddc9e6e Update scrape.ts 2024-07-22 19:12:51 -04:00
Nicolas
e31a5007d5 Nick: speed improvements 2024-07-22 18:30:58 -04:00
Nicolas
b229fbebd8 Update scrape_log.ts 2024-07-19 12:53:26 -04:00
rafaelsideguide
5c02dbe20c fix(isFile): added .tiff extension 2024-07-18 17:07:21 -03:00
Gergo Moricz
f0e95ce399 fix(WebCrawler): filter out file URLs when taking URLs from sitemap 2024-07-18 21:49:37 +02:00
Gergo Moricz
95c6c63b85 fix(fly): raise heap limit to 4G per process 2024-07-18 20:56:54 +02:00
Nicolas
5f14f4f788 Update blocklist.ts 2024-07-18 14:20:19 -04:00
Nicolas
6161b83890 Update scrape_log.ts 2024-07-18 14:17:08 -04:00
Nicolas
2dd7398aad Update scrape_log.ts 2024-07-18 14:16:46 -04:00
Nicolas
f10f3f886b
Merge pull request #410 from mendableai/feat/fire-engine-chrome-cdp
Support chrome-cdp and restructure sitemap fire-engine support.
2024-07-18 13:52:08 -04:00
Nicolas
9a1a227797 Update crawl-cancel.ts 2024-07-18 13:49:51 -04:00
Nicolas
11768571ed Update crawl-cancel.ts 2024-07-18 13:43:03 -04:00
Nicolas
ce804d3c20 Update crawl-cancel.ts 2024-07-18 13:40:24 -04:00
Nicolas
d2de01d342 Nick: fixes 2024-07-18 13:19:44 -04:00
Gergo Moricz
0b8047c7a0 fix(WebScraper): infinite regex leading to fly.io instance hangs 2024-07-18 19:13:43 +02:00
Nicolas
f11137352c Merge branch 'main' into feat/fire-engine-chrome-cdp 2024-07-18 12:48:42 -04:00
Nicolas
01b5e8fc73
Merge pull request #429 from mendableai/mog/fix-job-stuck-2
Fix queue stuck bug via lock settings changes
2024-07-18 12:39:21 -04:00
Nicolas
b134ba92bc
Merge pull request #427 from mendableai/docs/update-docs
[Docs] Updating docs
2024-07-18 11:49:08 -04:00
rafaelsideguide
f13ef02a08 Update openapi.json 2024-07-18 10:34:03 -03:00
Nicolas
2fab2d8d29 Update scrape.ts 2024-07-17 20:44:34 -04:00
Nicolas
6609c1b6e5
Update .env.local 2024-07-17 16:22:27 -04:00
Nicolas
17a1f9b55f
Update .env.example 2024-07-17 16:22:04 -04:00
rafaelsideguide
eda616d728 Merge remote-tracking branch 'origin/main' into docs/update-docs 2024-07-17 16:44:51 -03:00
rafaelsideguide
2b4ce12097 Update openapi.json 2024-07-17 16:43:22 -03:00
Gergo Moricz
8160c311c0 fix queue stuck bug via lock setting changes 2024-07-17 21:31:25 +02:00
Caleb Peffer
8d5ebc9b9f
Merge pull request #423 from mendableai/cjp/linksOnPage
Caleb: Return a list of links on a page by default
2024-07-17 12:36:07 -06:00
Caleb Peffer
5b24d26c84 Caleb; fixed test 2024-07-17 11:33:12 -07:00
Caleb Peffer
c5d1e7260d Caleb: made changes per Rafaels requests 2024-07-17 11:29:05 -07:00
rafaelsideguide
205cd63c2f Update openapi.json 2024-07-17 15:07:06 -03:00
Rafael Miller
f020048a46
Merge pull request #420 from mendableai/bugfix/empty-tags
Small fix for empty pageOptions
2024-07-17 10:10:24 -03:00
Caleb Peffer
da3c6bca37 Caleb: added a simple test 2024-07-16 21:23:22 -07:00
Caleb Peffer
0b3c0ede49 Added tests per @nicks request 2024-07-16 21:15:59 -07:00
Caleb Peffer
98c788ca7a Caleb: added a test to ensure links on page exists and isn't zero on mendable 2024-07-16 21:13:52 -07:00
Nicolas
3c3412e893 Update rate-limiter.test.ts 2024-07-16 22:45:12 -04:00
Nicolas
ffc3b7c5fb Update index.ts 2024-07-16 22:42:40 -04:00
Nicolas
c9073a747c Nick: 2024-07-16 22:41:13 -04:00
Caleb Peffer
d39d3be649 Caleb: now extracting and returning a list of all links on the page for a customer 2024-07-16 18:38:03 -07:00
rafaelsideguide
dba1fb2dc8 Update removeUnwantedElements.ts 2024-07-16 18:22:56 -03:00
Nicolas
92202de12b Update rate-limiter.ts 2024-07-16 10:09:49 -04:00
Nicolas
4ef47f7765
Update models.ts 2024-07-15 22:52:17 -04:00
rentianyue-jk
1b7ae5457f support custom models 2024-07-16 10:22:54 +08:00
Thomas Kosmas
5c65ec58e5 Support chrome-cdp and restructure sitemap fire-engine support. 2024-07-15 18:40:43 +03:00
Nicolas
949791049f Nick: 2024-07-12 23:20:26 -04:00
Nicolas
d0c8d3ecde Merge branch 'main' into nsc/sitemap-fix-fire-engine 2024-07-12 22:15:06 -04:00
Nicolas
a3b1703b68 Update fireEngine.ts 2024-07-12 22:15:00 -04:00
Nicolas
09bc2c7a9c
Merge pull request #394 from mendableai/nsc/small-fe-print
Log Fire-engine page errors
2024-07-12 22:14:04 -04:00
Nicolas
e098e88ea7 Nick: 2024-07-12 22:02:08 -04:00
Nicolas
bfc7f5882e Update index.ts 2024-07-12 19:57:12 -04:00
Nicolas
436e8922a7 Nick: doing on the ci instead 2024-07-12 19:49:38 -04:00
Nicolas
fc3328f3d1 Update index.ts 2024-07-12 19:12:56 -04:00
Nicolas
fd18f2269b Nick: slack alerts 2024-07-12 19:07:59 -04:00
rafaelsideguide
f453bcf17c bugfix docker self hosting 2024-07-12 16:51:20 -03:00
Nicolas
0ddaac6ae0 Nick: fixed the other instances as well 2024-07-12 15:39:10 -04:00
Nicolas
5da03a8fbd Update fireEngine.ts 2024-07-12 14:59:49 -04:00
Kuniaki Shimizu
bd986a453c fix USE_DB_AUTHENTICATION checks 2024-07-13 03:50:46 +09:00
Nicolas
b5b75086c1 Update index.ts 2024-07-12 10:44:14 -04:00
Gergo Moricz
0d3e09e798 fix: try-catch job removal 2024-07-12 16:35:50 +02:00
Gergő Móricz
69d724714f
Merge branch 'main' into mog/job-stuck-fix 2024-07-12 16:33:34 +02:00
Nicolas
c3eecf7b9f Update index.ts 2024-07-12 10:22:06 -04:00
Gergo Moricz
10957b748b fix(bull): requeue jobs after restart 2024-07-12 13:55:53 +02:00
Nicolas
961b27811d
Merge pull request #386 from mendableai/feat/fire-engine-fallback-for-sitemap
[Feat] Added fire-engine fallback for getting sitemaps
2024-07-11 20:38:01 -04:00
Nicolas
84de63dbeb
Merge pull request #375 from StefanTerdell/self-host-qol
Self-hosting quality of life fixes
2024-07-11 20:37:39 -04:00
Nicolas
30c1118713
Merge pull request #326 from mendableai/feat/save-docs-on-supabase
[Feat] Added implementation for saving docs on supabase
2024-07-11 20:27:41 -04:00
Gergo Moricz
7e3a368684 fix: unpause globally 2024-07-12 00:05:35 +02:00
Gergo Moricz
ee1d41406e feat: unpause by http request 2024-07-11 23:56:36 +02:00
Gergo Moricz
f64a2d8668 fix: rename fly tomls to original 2024-07-11 23:21:02 +02:00
Gergo Moricz
bd84290b9e fix: reenable hyperdx 2024-07-11 23:20:51 +02:00
Gergo Moricz
09bca05b20 feat: fix iteration 3 (actually works) 2024-07-11 23:14:15 +02:00
Gergo Moricz
9cd7d79b64 feat: avoid double SIGINT crashing 2024-07-11 20:35:15 +02:00
Gergo Moricz
eaa8db4b19 fix(fly): raise kill timeout for graceful shutdown 2024-07-11 20:09:06 +02:00
Gergo Moricz
bffb9f8fd0 feat: stuck job restoration iteration 2 2024-07-11 20:08:21 +02:00
rafaelsideguide
86d0e88a91 removed hyperdx (they also have graceful shutdown) and tried to change the process for running on server. It didn't work. 2024-07-10 18:29:55 -03:00
rafaelsideguide
9ad06fdf56 added fire-engine fallback for getting sitemaps 2024-07-09 16:07:53 -03:00
Gergo Moricz
1a07e9d23b feat: pick up and commit interrupted jobs from/to DB 2024-07-09 15:57:38 +02:00
Gergo Moricz
77aa46588f feat: graceful exit handler 2024-07-09 14:29:32 +02:00
Stefan Terdell
188fe56203 Optional jobId webhook URL templating 2024-07-07 15:11:45 +02:00
Stefan Terdell
a2ae5f81d9 Only check Supabase if configured to 2024-07-07 15:06:31 +02:00
rafaelsideguide
c2bba54b4f Added veeva to special case params 2024-07-05 16:58:07 -03:00
rafaelsideguide
0ab6cef471 Merge remote-tracking branch 'origin/main' into dependabot/npm_and_yarn/apps/api/prod-deps-5b38a50718 2024-07-05 14:00:10 -03:00
Nicolas
914897c9d2 Merge branch 'main' into feat/save-docs-on-supabase 2024-07-05 12:27:22 -03:00
rafaelsideguide
538dc63035 Fixing rate-limiter-flexible package version
Redis version <3.0.2 throws TS bug:
https://github.com/animir/node-rate-limiter-flexible/issues/228
2024-07-05 12:12:00 -03:00
Nicolas
32849b017f Nick: 2024-07-03 20:18:11 -03:00
Nicolas
066d92f643 Update single_url.ts 2024-07-03 18:38:17 -03:00
Nicolas
f5b2fbd7e8 Nick: revision 2024-07-03 18:06:53 -03:00
Nicolas
2d30cc6117 Nick: comments 2024-07-03 18:01:54 -03:00
Nicolas
90c54c32fd Nick: refactor 2024-07-03 18:01:17 -03:00
Nicolas
90cf799a3c Update single_url.ts 2024-07-03 17:56:21 -03:00
Nicolas
b36406e465 Nick: log scrpaers 2024-07-03 17:28:53 -03:00
Eric Ciarla
2d0d5ac392 Update for llm-extraction-from-raw-html 2024-07-02 14:05:42 -04:00
rafaelsideguide
0175152577 Fixed PDF match custom scraping
Now it's working for both `https://getgc.ai/privacy` and `https://prairie.cards/products/wood-designs` usecases.
2024-07-02 11:25:17 -03:00
rafaelsideguide
96de948d6b Update index.test.ts 2024-07-02 11:04:09 -03:00
rafaelsideguide
7b7154ba1e bugfixed pageStatusCode 2024-07-02 10:51:35 -03:00
dependabot[bot]
c2e00d1998
apps/api(deps): bump the prod-deps group in /apps/api with 28 updates
Bumps the prod-deps group in /apps/api with 28 updates:

| Package | From | To |
| --- | --- | --- |
| [@anthropic-ai/sdk](https://github.com/anthropics/anthropic-sdk-typescript) | `0.20.9` | `0.24.3` |
| [@bull-board/api](https://github.com/felixmosh/bull-board/tree/HEAD/packages/api) | `5.19.2` | `5.20.5` |
| [@bull-board/express](https://github.com/felixmosh/bull-board/tree/HEAD/packages/express) | `5.19.2` | `5.20.5` |
| [@hyperdx/node-opentelemetry](https://github.com/hyperdxio/hyperdx-js) | `0.7.0` | `0.8.0` |
| [@nangohq/node](https://github.com/NangoHQ/nango/tree/HEAD/packages/node-client) | `0.36.101` | `0.40.8` |
| [@sentry/node](https://github.com/getsentry/sentry-javascript) | `7.116.0` | `8.13.0` |
| [@supabase/supabase-js](https://github.com/supabase/supabase-js) | `2.43.4` | `2.44.2` |
| [ajv](https://github.com/ajv-validator/ajv) | `8.15.0` | `8.16.0` |
| [async-mutex](https://github.com/DirtyHairy/async-mutex) | `0.4.1` | `0.5.0` |
| [bull](https://github.com/OptimalBits/bull) | `4.12.9` | `4.15.0` |
| [date-fns](https://github.com/date-fns/date-fns) | `2.30.0` | `3.6.0` |
| [express-rate-limit](https://github.com/express-rate-limit/express-rate-limit) | `6.11.2` | `7.3.1` |
| [glob](https://github.com/isaacs/node-glob) | `10.4.1` | `10.4.2` |
| [json-schema-to-zod](https://github.com/StefanTerdell/json-schema-to-zod) | `2.1.0` | `2.3.0` |
| [keyword-extractor](https://github.com/michaeldelorenzo/keyword-extractor) | `0.0.25` | `0.0.28` |
| [langchain](https://github.com/langchain-ai/langchainjs) | `0.1.37` | `0.2.8` |
| [logsnag](https://github.com/LogSnag/logsnag.js) | `0.1.8` | `1.0.0` |
| [mongoose](https://github.com/Automattic/mongoose) | `8.4.1` | `8.4.4` |
| [natural](https://github.com/NaturalNode/natural) | `6.12.0` | `7.0.7` |
| [openai](https://github.com/openai/openai-node) | `4.47.3` | `4.52.2` |
| [promptable](https://github.com/promptable/Promptable.js) | `0.0.9` | `0.0.10` |
| [puppeteer](https://github.com/puppeteer/puppeteer) | `22.10.0` | `22.12.1` |
| [rate-limiter-flexible](https://github.com/animir/node-rate-limiter-flexible) | `2.4.2` | `5.0.3` |
| [resend](https://github.com/resendlabs/resend-node) | `3.2.0` | `3.4.0` |
| [stripe](https://github.com/stripe/stripe-node) | `12.18.0` | `16.1.0` |
| [unstructured-client](https://github.com/Unstructured-IO/unstructured-js-client) | `0.9.4` | `0.11.3` |
| [uuid](https://github.com/uuidjs/uuid) | `9.0.1` | `10.0.0` |
| [zod-to-json-schema](https://github.com/StefanTerdell/zod-to-json-schema) | `3.23.0` | `3.23.1` |


Updates `@anthropic-ai/sdk` from 0.20.9 to 0.24.3
- [Release notes](https://github.com/anthropics/anthropic-sdk-typescript/releases)
- [Changelog](https://github.com/anthropics/anthropic-sdk-typescript/blob/main/CHANGELOG.md)
- [Commits](https://github.com/anthropics/anthropic-sdk-typescript/compare/sdk-v0.20.9...sdk-v0.24.3)

Updates `@bull-board/api` from 5.19.2 to 5.20.5
- [Release notes](https://github.com/felixmosh/bull-board/releases)
- [Changelog](https://github.com/felixmosh/bull-board/blob/master/CHANGELOG.md)
- [Commits](https://github.com/felixmosh/bull-board/commits/v5.20.5/packages/api)

Updates `@bull-board/express` from 5.19.2 to 5.20.5
- [Release notes](https://github.com/felixmosh/bull-board/releases)
- [Changelog](https://github.com/felixmosh/bull-board/blob/master/CHANGELOG.md)
- [Commits](https://github.com/felixmosh/bull-board/commits/v5.20.5/packages/express)

Updates `@hyperdx/node-opentelemetry` from 0.7.0 to 0.8.0
- [Release notes](https://github.com/hyperdxio/hyperdx-js/releases)
- [Commits](https://github.com/hyperdxio/hyperdx-js/compare/@hyperdx/node-opentelemetry@0.7.0...@hyperdx/node-opentelemetry@0.8.0)

Updates `@nangohq/node` from 0.36.101 to 0.40.8
- [Release notes](https://github.com/NangoHQ/nango/releases)
- [Changelog](https://github.com/NangoHQ/nango/blob/master/CHANGELOG.md)
- [Commits](https://github.com/NangoHQ/nango/commits/v0.40.8/packages/node-client)

Updates `@sentry/node` from 7.116.0 to 8.13.0
- [Release notes](https://github.com/getsentry/sentry-javascript/releases)
- [Changelog](https://github.com/getsentry/sentry-javascript/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/getsentry/sentry-javascript/compare/7.116.0...8.13.0)

Updates `@supabase/supabase-js` from 2.43.4 to 2.44.2
- [Release notes](https://github.com/supabase/supabase-js/releases)
- [Changelog](https://github.com/supabase/supabase-js/blob/master/RELEASE.md)
- [Commits](https://github.com/supabase/supabase-js/compare/v2.43.4...v2.44.2)

Updates `ajv` from 8.15.0 to 8.16.0
- [Release notes](https://github.com/ajv-validator/ajv/releases)
- [Commits](https://github.com/ajv-validator/ajv/compare/v8.15.0...v8.16.0)

Updates `async-mutex` from 0.4.1 to 0.5.0
- [Changelog](https://github.com/DirtyHairy/async-mutex/blob/master/CHANGELOG.md)
- [Commits](https://github.com/DirtyHairy/async-mutex/compare/v0.4.1...v0.5.0)

Updates `bull` from 4.12.9 to 4.15.0
- [Release notes](https://github.com/OptimalBits/bull/releases)
- [Changelog](https://github.com/OptimalBits/bull/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/OptimalBits/bull/compare/v4.12.9...v4.15.0)

Updates `date-fns` from 2.30.0 to 3.6.0
- [Release notes](https://github.com/date-fns/date-fns/releases)
- [Changelog](https://github.com/date-fns/date-fns/blob/main/CHANGELOG.md)
- [Commits](https://github.com/date-fns/date-fns/compare/v2.30.0...v3.6.0)

Updates `express-rate-limit` from 6.11.2 to 7.3.1
- [Release notes](https://github.com/express-rate-limit/express-rate-limit/releases)
- [Commits](https://github.com/express-rate-limit/express-rate-limit/compare/v6.11.2...v7.3.1)

Updates `glob` from 10.4.1 to 10.4.2
- [Changelog](https://github.com/isaacs/node-glob/blob/main/changelog.md)
- [Commits](https://github.com/isaacs/node-glob/compare/v10.4.1...v10.4.2)

Updates `json-schema-to-zod` from 2.1.0 to 2.3.0
- [Commits](https://github.com/StefanTerdell/json-schema-to-zod/commits)

Updates `keyword-extractor` from 0.0.25 to 0.0.28
- [Release notes](https://github.com/michaeldelorenzo/keyword-extractor/releases)
- [Commits](https://github.com/michaeldelorenzo/keyword-extractor/compare/0.0.25...0.0.28)

Updates `langchain` from 0.1.37 to 0.2.8
- [Release notes](https://github.com/langchain-ai/langchainjs/releases)
- [Changelog](https://github.com/langchain-ai/langchainjs/blob/main/release_workspace.js)
- [Commits](https://github.com/langchain-ai/langchainjs/compare/0.1.37...0.2.8)

Updates `logsnag` from 0.1.8 to 1.0.0
- [Commits](https://github.com/LogSnag/logsnag.js/compare/v0.1.8...v1.0.0)

Updates `mongoose` from 8.4.1 to 8.4.4
- [Release notes](https://github.com/Automattic/mongoose/releases)
- [Changelog](https://github.com/Automattic/mongoose/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Automattic/mongoose/compare/8.4.1...8.4.4)

Updates `natural` from 6.12.0 to 7.0.7
- [Release notes](https://github.com/NaturalNode/natural/releases)
- [Commits](https://github.com/NaturalNode/natural/compare/v6.12.0...v7.0.7)

Updates `openai` from 4.47.3 to 4.52.2
- [Release notes](https://github.com/openai/openai-node/releases)
- [Changelog](https://github.com/openai/openai-node/blob/master/CHANGELOG.md)
- [Commits](https://github.com/openai/openai-node/compare/v4.47.3...v4.52.2)

Updates `promptable` from 0.0.9 to 0.0.10
- [Commits](https://github.com/promptable/Promptable.js/commits)

Updates `puppeteer` from 22.10.0 to 22.12.1
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json)
- [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v22.10.0...puppeteer-v22.12.1)

Updates `rate-limiter-flexible` from 2.4.2 to 5.0.3
- [Release notes](https://github.com/animir/node-rate-limiter-flexible/releases)
- [Commits](https://github.com/animir/node-rate-limiter-flexible/commits/v5.0.3)

Updates `resend` from 3.2.0 to 3.4.0
- [Release notes](https://github.com/resendlabs/resend-node/releases)
- [Commits](https://github.com/resendlabs/resend-node/compare/v3.2.0...v3.4.0)

Updates `stripe` from 12.18.0 to 16.1.0
- [Release notes](https://github.com/stripe/stripe-node/releases)
- [Changelog](https://github.com/stripe/stripe-node/blob/master/CHANGELOG.md)
- [Commits](https://github.com/stripe/stripe-node/compare/v12.18.0...v16.1.0)

Updates `unstructured-client` from 0.9.4 to 0.11.3
- [Release notes](https://github.com/Unstructured-IO/unstructured-js-client/releases)
- [Changelog](https://github.com/Unstructured-IO/unstructured-js-client/blob/main/RELEASES.md)
- [Commits](https://github.com/Unstructured-IO/unstructured-js-client/compare/v0.9.4...v0.11.3)

Updates `uuid` from 9.0.1 to 10.0.0
- [Changelog](https://github.com/uuidjs/uuid/blob/main/CHANGELOG.md)
- [Commits](https://github.com/uuidjs/uuid/compare/v9.0.1...v10.0.0)

Updates `zod-to-json-schema` from 3.23.0 to 3.23.1
- [Release notes](https://github.com/StefanTerdell/zod-to-json-schema/releases)
- [Changelog](https://github.com/StefanTerdell/zod-to-json-schema/blob/master/changelog.md)
- [Commits](https://github.com/StefanTerdell/zod-to-json-schema/commits)

---
updated-dependencies:
- dependency-name: "@anthropic-ai/sdk"
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: "@bull-board/api"
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: "@bull-board/express"
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: "@hyperdx/node-opentelemetry"
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: "@nangohq/node"
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: "@sentry/node"
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: prod-deps
- dependency-name: "@supabase/supabase-js"
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: ajv
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: async-mutex
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: bull
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: date-fns
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: prod-deps
- dependency-name: express-rate-limit
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: prod-deps
- dependency-name: glob
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
- dependency-name: json-schema-to-zod
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: keyword-extractor
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
- dependency-name: langchain
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: logsnag
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: prod-deps
- dependency-name: mongoose
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
- dependency-name: natural
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: prod-deps
- dependency-name: openai
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: promptable
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
- dependency-name: puppeteer
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: rate-limiter-flexible
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: prod-deps
- dependency-name: resend
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: stripe
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: prod-deps
- dependency-name: unstructured-client
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod-deps
- dependency-name: uuid
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: prod-deps
- dependency-name: zod-to-json-schema
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-07-02 12:52:43 +00:00
Rafael Miller
f0f449fe51
Merge pull request #336 from snippet/allow-external-content-links
[Proposal] new feature allowExternalContentLinks
2024-07-02 09:45:21 -03:00
rafaelsideguide
db4a743365 Added e2e test 2024-07-02 09:44:08 -03:00
Nicolas
42cd58a679
Merge pull request #332 from mendableai/feat/rawHtmlExtraction
Adds pageOptions.includeRawHtml and new extraction mode "llm-extraction-from-raw-html"
2024-07-01 18:23:26 -03:00
Nicolas
c4f423981f Update pnpm-lock.yaml 2024-07-01 18:22:22 -03:00
rafaelsideguide
16aac7f8c5 Update single_url.ts 2024-07-01 18:21:15 -03:00
Nicolas
6d0c7a9ccd
Merge pull request #323 from mendableai/tests/crawl-limit-unit-tests
[Tests] Added crawl limit unit test
2024-07-01 17:56:04 -03:00
rafaelsideguide
4d6e25619b minor spacing and comment stuff 2024-07-01 16:05:34 -03:00
Eric Ciarla
e1af815f8c Update scrape.ts 2024-07-01 08:48:21 -04:00
Eric Ciarla
7ae195bacc Update index.test.ts 2024-06-29 10:13:12 -04:00
Eric Ciarla
837b446390 Update index.test.ts 2024-06-29 08:48:42 -04:00
Eric Ciarla
fe6e3aeadc Update index.test.ts 2024-06-29 08:44:21 -04:00
Eric Ciarla
6c9f0dfc91 Add tests 2024-06-29 08:32:20 -04:00
Jeff Pereira
a5fb45988c new feature allowExternalContentLinks 2024-06-28 17:23:40 -07:00
Eric Ciarla
87b54488d3 update to includeRawHtml 2024-06-28 17:07:47 -04:00
Eric Ciarla
70fcf2ce03 init 2024-06-28 16:39:09 -04:00
Nicolas
9bf74bc774 Update single_url.ts 2024-06-28 15:51:18 -03:00
Nicolas
7e17498bcf Update single_url.ts 2024-06-28 15:45:16 -03:00
rafaelsideguide
d66e1f7846 looking good 2024-06-27 16:00:45 -03:00
Nicolas
9e7298945c Update openapi.json 2024-06-26 21:25:38 -03:00
Nicolas
1ec0bf8adf Update openapi.json 2024-06-26 21:22:46 -03:00
Nicolas
042f81ddf2 Update removeUnwantedElements.test.ts 2024-06-26 21:20:11 -03:00
Nicolas
388ce3cbce Nick: small changes 2024-06-26 21:15:42 -03:00
Nicolas
1d4907acc9 Nick: 2024-06-26 21:02:58 -03:00
rafaelsideguide
c40da77be0 Added implementation for saving docs on supabase
- TODO: remove the comments on `log_job.ts` before deploying to prod
2024-06-26 18:23:28 -03:00
Nicolas
3b92fb8433
Merge pull request #322 from mendableai/tests/metadata
[Test] Added E2E tests for checking metadata values
2024-06-26 12:09:18 -03:00
rafaelsideguide
67d7650cf3 Added to e2e_noAuth 2024-06-26 12:07:55 -03:00
rafaelsideguide
009df6c930 Added crawl limit unit test
I think this test is over relying on mocks but I have no idea on how to fix this without changing the code arch structure
2024-06-26 09:54:25 -03:00
rafaelsideguide
05eaa3c68d Update index.test.ts 2024-06-26 09:32:02 -03:00
rafaelsideguide
4381109dd8 added default values and fixed pdf bug 2024-06-26 09:00:54 -03:00
Nicolas
45f2765601
Merge pull request #316 from snippet/types-webscraper
add some types
2024-06-25 22:03:21 -03:00
Nicolas
768a131b5c
Merge pull request #318 from mendableai/bug/fix-custom-scrape-pdf-google-drive
[Bug] Fixed the regex test for google drive pdf files
2024-06-25 18:27:11 -03:00
rafaelsideguide
5f69fc7677 Fixed the regex test 2024-06-25 18:24:01 -03:00
rafaelsideguide
d02829d335 fixed clean jobs 2024-06-25 17:49:29 -03:00
Jeff Pereira
199cbe8bcb add some types 2024-06-25 12:20:25 -07:00
Nicolas
749b0c05dc Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-06-25 15:21:15 -03:00
Nicolas
e7be17db92 Nick: metadata fixes and lock duration for bull decreased to 2 hrs 2024-06-25 15:21:14 -03:00
Nicolas
f84fb4b331
Merge pull request #313 from snippet/google-search-term-fix
fix multi-word search term issue: /search (w/o Serp)
2024-06-24 19:24:58 -03:00
Jeff Pereira
6ddf3a58a1 fix multi-word search term issue: /search (w/o Serp) 2024-06-24 14:21:52 -07:00
Nicolas
90b7fff366
Update crawler.ts 2024-06-24 16:52:01 -03:00
Nicolas
08c1fa799b
Update queue-worker.ts 2024-06-24 16:51:32 -03:00
rafaelsideguide
3ebdf93342 removed console.logs 2024-06-24 16:43:12 -03:00
Nicolas
56d42d9c9b Nick: 2024-06-24 16:33:07 -03:00
rafaelsideguide
21d29de819 testing crawl with new.abb.com case
many unnecessary console.logs for tracing the code execution
2024-06-24 16:25:07 -03:00
Nicolas
3c7b7e7242 NIck: fixes fallback 2024-06-23 18:59:08 -03:00
Caleb Peffer
e59ba758f5 Caleb: changed posthog logging so that It associates jobs with a group. No 2024-06-18 17:42:21 -07:00
Caleb Peffer
5a91d8425f Caleb: solve for typechecking on idempotencyKey on my machine 2024-06-18 17:07:38 -07:00
rafaelsideguide
9c539e9113 Fixed includeHTML to use cleanedHtml as response 2024-06-18 16:26:54 -03:00
Rafael Miller
f5a9acc4c6
Merge branch 'main' into feat/removeTags-regex 2024-06-18 14:39:59 -03:00
rafaelsideguide
9f7afd1e88 fix for some complex cases 2024-06-18 14:36:51 -03:00
Nicolas
d0c05accf6 Nick: 2024-06-18 13:21:50 -04:00
Nicolas
818751a256
Merge pull request #294 from mendableai/tests/e2e-to-unit
[Test] Transcribed from e2e to unit tests for many cases
2024-06-18 13:09:22 -04:00
rafaelsideguide
727e5de8c5 Update index.test.ts 2024-06-18 11:54:10 -03:00
rafaelsideguide
c54e797eb1 (╯°□°)╯︵ ┻━┻ 2024-06-18 11:51:28 -03:00
rafaelsideguide
20f14bcf7f Added some types 2024-06-18 10:55:07 -03:00
rafaelsideguide
c2fc69af1c removed some e2e tests that are making the ci get stuck 2024-06-18 09:57:05 -03:00
rafaelsideguide
6c726a02eb Moved to utils/removeUnwantedElements, added unit tests 2024-06-18 09:46:42 -03:00
AndyMik90
8b3c3aae91 Added support for RegEx in removeTags 2024-06-18 07:31:46 +02:00
rafaelsideguide
b2bd562bb2 transcribed from e2e to unit tests for many cases 2024-06-17 17:09:44 -03:00
Nicolas
ab038051e9 Merge branch 'main' into nsc/rate-limiter-tests 2024-06-17 15:06:12 -04:00
Eric Ciarla
519ab1aecb Update unit tests 2024-06-15 17:14:09 -04:00
Eric Ciarla
f0d4146b42 Merge branch 'feat/maxDepthRelative' of https://github.com/mendableai/firecrawl into feat/maxDepthRelative 2024-06-15 16:52:00 -04:00
Eric Ciarla
ff7b52cab1 Delete one more e2e test 2024-06-15 16:51:50 -04:00
Eric Ciarla
b1eb608295
Merge branch 'main' into feat/maxDepthRelative 2024-06-15 16:50:27 -04:00
Eric Ciarla
34e37c5671 Add unit tests to replace e2e 2024-06-15 16:43:37 -04:00
Eric Ciarla
2b40729cc2 Update index.test.ts 2024-06-15 08:56:32 -04:00
Eric Ciarla
f22759b2e7 Update index.test.ts 2024-06-14 19:42:11 -04:00
Eric Ciarla
a6b7197737 Fix for maxDepth 2024-06-14 19:40:37 -04:00
Nicolas
4ec863718b
Merge pull request #283 from mendableai/nsc/crawler-fixes
Fixes crawler getting confused with base paths that contain www.
2024-06-14 13:50:32 -07:00
Nicolas
43767360d8 Merge branch 'main' into nsc/rate-limiter-tests 2024-06-14 13:50:21 -07:00
Nicolas
e88cb314c8 Update crawler.ts 2024-06-14 13:44:54 -07:00
Rafael Miller
361cba4119
Merge pull request #175 from mendableai/test/load-testing
Test/load testing
2024-06-14 17:39:01 -03:00
Nicolas
7b11ace87d Create rate-limiter.test.ts 2024-06-14 12:31:42 -07:00