rafaelsideguide
49e3e64787
bugfix for pdfs and logging pdf events, also added trycatchs for docx
2024-07-29 14:13:46 -03:00
Nicolas
4c9d62f6d3
Nick: fixing sitemap fallback
2024-07-26 18:25:44 -04:00
Nicolas
091924a636
Nick: moving machines from mia to virginia
2024-07-26 17:37:46 -04:00
Nicolas
cb97871ff9
Merge branch 'main' of https://github.com/mendableai/firecrawl
2024-07-26 17:21:11 -04:00
Nicolas
ff4266f09e
Update pdfProcessor.ts
2024-07-26 17:21:09 -04:00
Nicolas
0c2e3a72cc
Merge pull request #460 from mendableai/nsc/admin-router
...
Admin router + Improve redis notifications
2024-07-26 12:16:14 -04:00
rafaelsideguide
96cec2a673
fix checking scrape log success content length
2024-07-26 12:00:52 -03:00
Nicolas
542270f4c2
Merge pull request #461 from mendableai/nsc/small-handle-for-client-side-errors
...
Client side error handling
2024-07-25 20:53:10 -04:00
Nicolas
dc6f825270
Update email_notification.ts
2024-07-25 20:43:50 -04:00
Nicolas
f82ca3be17
Nick:
2024-07-25 19:53:29 -04:00
Nicolas
01fab6e036
Update single_url.ts
2024-07-25 17:51:41 -04:00
Nicolas
56042d090c
Update single_url.ts
2024-07-25 17:48:44 -04:00
Nicolas
88f5efce8f
Merge branch 'feat/scrape-monitoring'
2024-07-25 17:44:21 -04:00
Nicolas
3242872503
Update single_url.ts
2024-07-25 17:43:55 -04:00
Nicolas
ffd430f198
Merge pull request #457 from JakobStadlhuber/Readiness-Liveness-Probes
...
Readiness liveness probes
2024-07-25 17:20:31 -04:00
Nicolas
7129d7993e
Update v0.ts
2024-07-25 17:19:45 -04:00
Nicolas
10e80f00cf
Merge branch 'main' into nsc/admin-router
2024-07-25 16:46:38 -04:00
Nicolas
e5b797549e
Merge branch 'main' into feat/scrape-monitoring
2024-07-25 16:21:02 -04:00
Nicolas
50d2426fc4
Update scrape-events.ts
2024-07-25 16:20:29 -04:00
Nicolas
28a8a98491
Update admin.ts
2024-07-25 14:58:14 -04:00
Nicolas
2014d9dd2e
Nick: admin router
2024-07-25 14:54:20 -04:00
rafaelsideguide
1f1c068eea
changing from error to debug
2024-07-25 10:00:50 -03:00
rafaelsideguide
e720e1bacf
Merge remote-tracking branch 'origin/main' into feat/logger
2024-07-25 09:49:27 -03:00
rafaelsideguide
309728a482
updated logs
2024-07-25 09:48:06 -03:00
Nicolas
2c1221750b
Merge pull request #449 from mendableai/bugfix/malformed-url-sitemap
...
Added regex for links in sitemap
2024-07-24 20:37:35 -04:00
Gergő Móricz
d1a3df6d08
fix: aaaaahhh
2024-07-25 00:50:03 +02:00
Gergő Móricz
6798695ee4
feat: move scraper to queue
2024-07-25 00:14:25 +02:00
Nicolas
92843a356d
Merge branch 'main' of https://github.com/mendableai/firecrawl
2024-07-24 18:13:36 -04:00
Nicolas
1e13ddbe8e
Nick: changes to the ui component
2024-07-24 18:13:34 -04:00
Gergő Móricz
623b547292
fix(fly.toml): scale up memory limit
2024-07-24 23:39:00 +02:00
Nicolas
15890772be
Scale bump
2024-07-24 16:56:19 -04:00
rafaelsideguide
cc98f83fda
added failed and completed log events
2024-07-24 15:25:36 -03:00
Jakob Stadlhuber
be9e7f9edf
Update Kubernetes configs for playwright-service, api, and worker
...
Added new ConfigMap for playwright-service and adjusted existing references.
Applied imagePullPolicy: Always to ensure all images are updated promptly.
Updated README to include --no-cache for Docker build instructions.
2024-07-24 18:54:16 +02:00
Gergo Moricz
60c74357df
feat(ScrapeEvents): log queue events
2024-07-24 18:44:14 +02:00
rafaelsideguide
4eca6bd301
fix/check-for-auth-on-scrape-log
2024-07-24 12:54:14 -03:00
Nicolas
3a1b8a9797
Update website_params.ts
2024-07-24 11:04:47 -04:00
Nicolas
8b48ec8d30
Update website_params.ts
2024-07-24 11:02:20 -04:00
Gergo Moricz
4d35ad073c
feat(monitoring/scrape): include url, worker, response_size
2024-07-24 16:43:39 +02:00
Gergo Moricz
64bcedeefc
fix(monitoring): bad success check on scrape
2024-07-24 16:21:59 +02:00
Gergo Moricz
d57dbbd0c6
fix: add jobId for scrape
2024-07-24 15:18:12 +02:00
Gergo Moricz
71072fef3b
fix(scrape-events): bad logic
2024-07-24 14:46:41 +02:00
Gergo Moricz
7cd9bf92e3
feat: scrape event logging to DB
2024-07-24 14:31:25 +02:00
Rafael Miller
5e728c1a4d
Update apps/api/src/scraper/WebScraper/crawler.ts
...
no need for regex
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2024-07-24 08:33:00 -03:00
rafaelsideguide
6208ecdbc0
added logger
2024-07-23 17:30:46 -03:00
Nicolas
f0b07b509b
Update index.ts
2024-07-23 15:15:56 -04:00
rafaelsideguide
a684bd3c5d
added regex for links in sitemap
2024-07-23 09:07:23 -03:00
Nicolas
30e706b43f
Update scrape.ts
2024-07-22 19:15:24 -04:00
Nicolas
8916fec66c
Update index.ts
2024-07-22 19:14:53 -04:00
Nicolas
575ddc9e6e
Update scrape.ts
2024-07-22 19:12:51 -04:00
Nicolas
e31a5007d5
Nick: speed improvements
2024-07-22 18:30:58 -04:00
Nicolas
b229fbebd8
Update scrape_log.ts
2024-07-19 12:53:26 -04:00
rafaelsideguide
5c02dbe20c
fix(isFile): added .tiff extension
2024-07-18 17:07:21 -03:00
Gergo Moricz
f0e95ce399
fix(WebCrawler): filter out file URLs when taking URLs from sitemap
2024-07-18 21:49:37 +02:00
Gergo Moricz
95c6c63b85
fix(fly): raise heap limit to 4G per process
2024-07-18 20:56:54 +02:00
Nicolas
5f14f4f788
Update blocklist.ts
2024-07-18 14:20:19 -04:00
Nicolas
6161b83890
Update scrape_log.ts
2024-07-18 14:17:08 -04:00
Nicolas
2dd7398aad
Update scrape_log.ts
2024-07-18 14:16:46 -04:00
Nicolas
f10f3f886b
Merge pull request #410 from mendableai/feat/fire-engine-chrome-cdp
...
Support chrome-cdp and restructure sitemap fire-engine support.
2024-07-18 13:52:08 -04:00
Nicolas
9a1a227797
Update crawl-cancel.ts
2024-07-18 13:49:51 -04:00
Nicolas
11768571ed
Update crawl-cancel.ts
2024-07-18 13:43:03 -04:00
Nicolas
ce804d3c20
Update crawl-cancel.ts
2024-07-18 13:40:24 -04:00
Nicolas
d2de01d342
Nick: fixes
2024-07-18 13:19:44 -04:00
Gergo Moricz
0b8047c7a0
fix(WebScraper): infinite regex leading to fly.io instance hangs
2024-07-18 19:13:43 +02:00
Nicolas
f11137352c
Merge branch 'main' into feat/fire-engine-chrome-cdp
2024-07-18 12:48:42 -04:00
Nicolas
01b5e8fc73
Merge pull request #429 from mendableai/mog/fix-job-stuck-2
...
Fix queue stuck bug via lock settings changes
2024-07-18 12:39:21 -04:00
Nicolas
b134ba92bc
Merge pull request #427 from mendableai/docs/update-docs
...
[Docs] Updating docs
2024-07-18 11:49:08 -04:00
rafaelsideguide
f13ef02a08
Update openapi.json
2024-07-18 10:34:03 -03:00
Nicolas
2fab2d8d29
Update scrape.ts
2024-07-17 20:44:34 -04:00
Nicolas
6609c1b6e5
Update .env.local
2024-07-17 16:22:27 -04:00
Nicolas
17a1f9b55f
Update .env.example
2024-07-17 16:22:04 -04:00
rafaelsideguide
eda616d728
Merge remote-tracking branch 'origin/main' into docs/update-docs
2024-07-17 16:44:51 -03:00
rafaelsideguide
2b4ce12097
Update openapi.json
2024-07-17 16:43:22 -03:00
Gergo Moricz
8160c311c0
fix queue stuck bug via lock setting changes
2024-07-17 21:31:25 +02:00
Caleb Peffer
8d5ebc9b9f
Merge pull request #423 from mendableai/cjp/linksOnPage
...
Caleb: Return a list of links on a page by default
2024-07-17 12:36:07 -06:00
Caleb Peffer
5b24d26c84
Caleb; fixed test
2024-07-17 11:33:12 -07:00
Caleb Peffer
c5d1e7260d
Caleb: made changes per Rafaels requests
2024-07-17 11:29:05 -07:00
rafaelsideguide
205cd63c2f
Update openapi.json
2024-07-17 15:07:06 -03:00
Rafael Miller
f020048a46
Merge pull request #420 from mendableai/bugfix/empty-tags
...
Small fix for empty pageOptions
2024-07-17 10:10:24 -03:00
Caleb Peffer
da3c6bca37
Caleb: added a simple test
2024-07-16 21:23:22 -07:00
Caleb Peffer
0b3c0ede49
Added tests per @nicks request
2024-07-16 21:15:59 -07:00
Caleb Peffer
98c788ca7a
Caleb: added a test to ensure links on page exists and isn't zero on mendable
2024-07-16 21:13:52 -07:00
Nicolas
3c3412e893
Update rate-limiter.test.ts
2024-07-16 22:45:12 -04:00
Nicolas
ffc3b7c5fb
Update index.ts
2024-07-16 22:42:40 -04:00
Nicolas
c9073a747c
Nick:
2024-07-16 22:41:13 -04:00
Caleb Peffer
d39d3be649
Caleb: now extracting and returning a list of all links on the page for a customer
2024-07-16 18:38:03 -07:00
rafaelsideguide
dba1fb2dc8
Update removeUnwantedElements.ts
2024-07-16 18:22:56 -03:00
Nicolas
92202de12b
Update rate-limiter.ts
2024-07-16 10:09:49 -04:00
Nicolas
4ef47f7765
Update models.ts
2024-07-15 22:52:17 -04:00
rentianyue-jk
1b7ae5457f
support custom models
2024-07-16 10:22:54 +08:00
Thomas Kosmas
5c65ec58e5
Support chrome-cdp and restructure sitemap fire-engine support.
2024-07-15 18:40:43 +03:00
Nicolas
949791049f
Nick:
2024-07-12 23:20:26 -04:00
Nicolas
d0c8d3ecde
Merge branch 'main' into nsc/sitemap-fix-fire-engine
2024-07-12 22:15:06 -04:00
Nicolas
a3b1703b68
Update fireEngine.ts
2024-07-12 22:15:00 -04:00
Nicolas
09bc2c7a9c
Merge pull request #394 from mendableai/nsc/small-fe-print
...
Log Fire-engine page errors
2024-07-12 22:14:04 -04:00
Nicolas
e098e88ea7
Nick:
2024-07-12 22:02:08 -04:00
Nicolas
bfc7f5882e
Update index.ts
2024-07-12 19:57:12 -04:00
Nicolas
436e8922a7
Nick: doing on the ci instead
2024-07-12 19:49:38 -04:00
Nicolas
fc3328f3d1
Update index.ts
2024-07-12 19:12:56 -04:00
Nicolas
fd18f2269b
Nick: slack alerts
2024-07-12 19:07:59 -04:00
rafaelsideguide
f453bcf17c
bugfix docker self hosting
2024-07-12 16:51:20 -03:00
Nicolas
0ddaac6ae0
Nick: fixed the other instances as well
2024-07-12 15:39:10 -04:00
Nicolas
5da03a8fbd
Update fireEngine.ts
2024-07-12 14:59:49 -04:00
Kuniaki Shimizu
bd986a453c
fix USE_DB_AUTHENTICATION checks
2024-07-13 03:50:46 +09:00
Nicolas
b5b75086c1
Update index.ts
2024-07-12 10:44:14 -04:00
Gergo Moricz
0d3e09e798
fix: try-catch job removal
2024-07-12 16:35:50 +02:00
Gergő Móricz
69d724714f
Merge branch 'main' into mog/job-stuck-fix
2024-07-12 16:33:34 +02:00
Nicolas
c3eecf7b9f
Update index.ts
2024-07-12 10:22:06 -04:00
Gergo Moricz
10957b748b
fix(bull): requeue jobs after restart
2024-07-12 13:55:53 +02:00
Nicolas
961b27811d
Merge pull request #386 from mendableai/feat/fire-engine-fallback-for-sitemap
...
[Feat] Added fire-engine fallback for getting sitemaps
2024-07-11 20:38:01 -04:00
Nicolas
84de63dbeb
Merge pull request #375 from StefanTerdell/self-host-qol
...
Self-hosting quality of life fixes
2024-07-11 20:37:39 -04:00
Nicolas
30c1118713
Merge pull request #326 from mendableai/feat/save-docs-on-supabase
...
[Feat] Added implementation for saving docs on supabase
2024-07-11 20:27:41 -04:00
Gergo Moricz
7e3a368684
fix: unpause globally
2024-07-12 00:05:35 +02:00
Gergo Moricz
ee1d41406e
feat: unpause by http request
2024-07-11 23:56:36 +02:00
Gergo Moricz
f64a2d8668
fix: rename fly tomls to original
2024-07-11 23:21:02 +02:00
Gergo Moricz
bd84290b9e
fix: reenable hyperdx
2024-07-11 23:20:51 +02:00
Gergo Moricz
09bca05b20
feat: fix iteration 3 (actually works)
2024-07-11 23:14:15 +02:00
Gergo Moricz
9cd7d79b64
feat: avoid double SIGINT crashing
2024-07-11 20:35:15 +02:00
Gergo Moricz
eaa8db4b19
fix(fly): raise kill timeout for graceful shutdown
2024-07-11 20:09:06 +02:00
Gergo Moricz
bffb9f8fd0
feat: stuck job restoration iteration 2
2024-07-11 20:08:21 +02:00
rafaelsideguide
86d0e88a91
removed hyperdx (they also have graceful shutdown) and tried to change the process for running on server. It didn't work.
2024-07-10 18:29:55 -03:00
rafaelsideguide
9ad06fdf56
added fire-engine fallback for getting sitemaps
2024-07-09 16:07:53 -03:00
Gergo Moricz
1a07e9d23b
feat: pick up and commit interrupted jobs from/to DB
2024-07-09 15:57:38 +02:00
Gergo Moricz
77aa46588f
feat: graceful exit handler
2024-07-09 14:29:32 +02:00
Stefan Terdell
188fe56203
Optional jobId webhook URL templating
2024-07-07 15:11:45 +02:00
Stefan Terdell
a2ae5f81d9
Only check Supabase if configured to
2024-07-07 15:06:31 +02:00
rafaelsideguide
c2bba54b4f
Added veeva to special case params
2024-07-05 16:58:07 -03:00
rafaelsideguide
0ab6cef471
Merge remote-tracking branch 'origin/main' into dependabot/npm_and_yarn/apps/api/prod-deps-5b38a50718
2024-07-05 14:00:10 -03:00
Nicolas
914897c9d2
Merge branch 'main' into feat/save-docs-on-supabase
2024-07-05 12:27:22 -03:00
rafaelsideguide
538dc63035
Fixing rate-limiter-flexible package version
...
Redis version <3.0.2 throws TS bug:
https://github.com/animir/node-rate-limiter-flexible/issues/228
2024-07-05 12:12:00 -03:00
Nicolas
32849b017f
Nick:
2024-07-03 20:18:11 -03:00
Nicolas
066d92f643
Update single_url.ts
2024-07-03 18:38:17 -03:00
Nicolas
f5b2fbd7e8
Nick: revision
2024-07-03 18:06:53 -03:00
Nicolas
2d30cc6117
Nick: comments
2024-07-03 18:01:54 -03:00
Nicolas
90c54c32fd
Nick: refactor
2024-07-03 18:01:17 -03:00
Nicolas
90cf799a3c
Update single_url.ts
2024-07-03 17:56:21 -03:00
Nicolas
b36406e465
Nick: log scrpaers
2024-07-03 17:28:53 -03:00
Eric Ciarla
2d0d5ac392
Update for llm-extraction-from-raw-html
2024-07-02 14:05:42 -04:00
rafaelsideguide
0175152577
Fixed PDF match custom scraping
...
Now it's working for both `https://getgc.ai/privacy ` and `https://prairie.cards/products/wood-designs ` usecases.
2024-07-02 11:25:17 -03:00
rafaelsideguide
96de948d6b
Update index.test.ts
2024-07-02 11:04:09 -03:00
rafaelsideguide
7b7154ba1e
bugfixed pageStatusCode
2024-07-02 10:51:35 -03:00
dependabot[bot]
c2e00d1998
apps/api(deps): bump the prod-deps group in /apps/api with 28 updates
...
Bumps the prod-deps group in /apps/api with 28 updates:
| Package | From | To |
| --- | --- | --- |
| [@anthropic-ai/sdk](https://github.com/anthropics/anthropic-sdk-typescript ) | `0.20.9` | `0.24.3` |
| [@bull-board/api](https://github.com/felixmosh/bull-board/tree/HEAD/packages/api ) | `5.19.2` | `5.20.5` |
| [@bull-board/express](https://github.com/felixmosh/bull-board/tree/HEAD/packages/express ) | `5.19.2` | `5.20.5` |
| [@hyperdx/node-opentelemetry](https://github.com/hyperdxio/hyperdx-js ) | `0.7.0` | `0.8.0` |
| [@nangohq/node](https://github.com/NangoHQ/nango/tree/HEAD/packages/node-client ) | `0.36.101` | `0.40.8` |
| [@sentry/node](https://github.com/getsentry/sentry-javascript ) | `7.116.0` | `8.13.0` |
| [@supabase/supabase-js](https://github.com/supabase/supabase-js ) | `2.43.4` | `2.44.2` |
| [ajv](https://github.com/ajv-validator/ajv ) | `8.15.0` | `8.16.0` |
| [async-mutex](https://github.com/DirtyHairy/async-mutex ) | `0.4.1` | `0.5.0` |
| [bull](https://github.com/OptimalBits/bull ) | `4.12.9` | `4.15.0` |
| [date-fns](https://github.com/date-fns/date-fns ) | `2.30.0` | `3.6.0` |
| [express-rate-limit](https://github.com/express-rate-limit/express-rate-limit ) | `6.11.2` | `7.3.1` |
| [glob](https://github.com/isaacs/node-glob ) | `10.4.1` | `10.4.2` |
| [json-schema-to-zod](https://github.com/StefanTerdell/json-schema-to-zod ) | `2.1.0` | `2.3.0` |
| [keyword-extractor](https://github.com/michaeldelorenzo/keyword-extractor ) | `0.0.25` | `0.0.28` |
| [langchain](https://github.com/langchain-ai/langchainjs ) | `0.1.37` | `0.2.8` |
| [logsnag](https://github.com/LogSnag/logsnag.js ) | `0.1.8` | `1.0.0` |
| [mongoose](https://github.com/Automattic/mongoose ) | `8.4.1` | `8.4.4` |
| [natural](https://github.com/NaturalNode/natural ) | `6.12.0` | `7.0.7` |
| [openai](https://github.com/openai/openai-node ) | `4.47.3` | `4.52.2` |
| [promptable](https://github.com/promptable/Promptable.js ) | `0.0.9` | `0.0.10` |
| [puppeteer](https://github.com/puppeteer/puppeteer ) | `22.10.0` | `22.12.1` |
| [rate-limiter-flexible](https://github.com/animir/node-rate-limiter-flexible ) | `2.4.2` | `5.0.3` |
| [resend](https://github.com/resendlabs/resend-node ) | `3.2.0` | `3.4.0` |
| [stripe](https://github.com/stripe/stripe-node ) | `12.18.0` | `16.1.0` |
| [unstructured-client](https://github.com/Unstructured-IO/unstructured-js-client ) | `0.9.4` | `0.11.3` |
| [uuid](https://github.com/uuidjs/uuid ) | `9.0.1` | `10.0.0` |
| [zod-to-json-schema](https://github.com/StefanTerdell/zod-to-json-schema ) | `3.23.0` | `3.23.1` |
Updates `@anthropic-ai/sdk` from 0.20.9 to 0.24.3
- [Release notes](https://github.com/anthropics/anthropic-sdk-typescript/releases )
- [Changelog](https://github.com/anthropics/anthropic-sdk-typescript/blob/main/CHANGELOG.md )
- [Commits](https://github.com/anthropics/anthropic-sdk-typescript/compare/sdk-v0.20.9...sdk-v0.24.3 )
Updates `@bull-board/api` from 5.19.2 to 5.20.5
- [Release notes](https://github.com/felixmosh/bull-board/releases )
- [Changelog](https://github.com/felixmosh/bull-board/blob/master/CHANGELOG.md )
- [Commits](https://github.com/felixmosh/bull-board/commits/v5.20.5/packages/api )
Updates `@bull-board/express` from 5.19.2 to 5.20.5
- [Release notes](https://github.com/felixmosh/bull-board/releases )
- [Changelog](https://github.com/felixmosh/bull-board/blob/master/CHANGELOG.md )
- [Commits](https://github.com/felixmosh/bull-board/commits/v5.20.5/packages/express )
Updates `@hyperdx/node-opentelemetry` from 0.7.0 to 0.8.0
- [Release notes](https://github.com/hyperdxio/hyperdx-js/releases )
- [Commits](https://github.com/hyperdxio/hyperdx-js/compare/@hyperdx/node-opentelemetry@0.7.0...@hyperdx/node-opentelemetry@0.8.0 )
Updates `@nangohq/node` from 0.36.101 to 0.40.8
- [Release notes](https://github.com/NangoHQ/nango/releases )
- [Changelog](https://github.com/NangoHQ/nango/blob/master/CHANGELOG.md )
- [Commits](https://github.com/NangoHQ/nango/commits/v0.40.8/packages/node-client )
Updates `@sentry/node` from 7.116.0 to 8.13.0
- [Release notes](https://github.com/getsentry/sentry-javascript/releases )
- [Changelog](https://github.com/getsentry/sentry-javascript/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/getsentry/sentry-javascript/compare/7.116.0...8.13.0 )
Updates `@supabase/supabase-js` from 2.43.4 to 2.44.2
- [Release notes](https://github.com/supabase/supabase-js/releases )
- [Changelog](https://github.com/supabase/supabase-js/blob/master/RELEASE.md )
- [Commits](https://github.com/supabase/supabase-js/compare/v2.43.4...v2.44.2 )
Updates `ajv` from 8.15.0 to 8.16.0
- [Release notes](https://github.com/ajv-validator/ajv/releases )
- [Commits](https://github.com/ajv-validator/ajv/compare/v8.15.0...v8.16.0 )
Updates `async-mutex` from 0.4.1 to 0.5.0
- [Changelog](https://github.com/DirtyHairy/async-mutex/blob/master/CHANGELOG.md )
- [Commits](https://github.com/DirtyHairy/async-mutex/compare/v0.4.1...v0.5.0 )
Updates `bull` from 4.12.9 to 4.15.0
- [Release notes](https://github.com/OptimalBits/bull/releases )
- [Changelog](https://github.com/OptimalBits/bull/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/OptimalBits/bull/compare/v4.12.9...v4.15.0 )
Updates `date-fns` from 2.30.0 to 3.6.0
- [Release notes](https://github.com/date-fns/date-fns/releases )
- [Changelog](https://github.com/date-fns/date-fns/blob/main/CHANGELOG.md )
- [Commits](https://github.com/date-fns/date-fns/compare/v2.30.0...v3.6.0 )
Updates `express-rate-limit` from 6.11.2 to 7.3.1
- [Release notes](https://github.com/express-rate-limit/express-rate-limit/releases )
- [Commits](https://github.com/express-rate-limit/express-rate-limit/compare/v6.11.2...v7.3.1 )
Updates `glob` from 10.4.1 to 10.4.2
- [Changelog](https://github.com/isaacs/node-glob/blob/main/changelog.md )
- [Commits](https://github.com/isaacs/node-glob/compare/v10.4.1...v10.4.2 )
Updates `json-schema-to-zod` from 2.1.0 to 2.3.0
- [Commits](https://github.com/StefanTerdell/json-schema-to-zod/commits )
Updates `keyword-extractor` from 0.0.25 to 0.0.28
- [Release notes](https://github.com/michaeldelorenzo/keyword-extractor/releases )
- [Commits](https://github.com/michaeldelorenzo/keyword-extractor/compare/0.0.25...0.0.28 )
Updates `langchain` from 0.1.37 to 0.2.8
- [Release notes](https://github.com/langchain-ai/langchainjs/releases )
- [Changelog](https://github.com/langchain-ai/langchainjs/blob/main/release_workspace.js )
- [Commits](https://github.com/langchain-ai/langchainjs/compare/0.1.37...0.2.8 )
Updates `logsnag` from 0.1.8 to 1.0.0
- [Commits](https://github.com/LogSnag/logsnag.js/compare/v0.1.8...v1.0.0 )
Updates `mongoose` from 8.4.1 to 8.4.4
- [Release notes](https://github.com/Automattic/mongoose/releases )
- [Changelog](https://github.com/Automattic/mongoose/blob/master/CHANGELOG.md )
- [Commits](https://github.com/Automattic/mongoose/compare/8.4.1...8.4.4 )
Updates `natural` from 6.12.0 to 7.0.7
- [Release notes](https://github.com/NaturalNode/natural/releases )
- [Commits](https://github.com/NaturalNode/natural/compare/v6.12.0...v7.0.7 )
Updates `openai` from 4.47.3 to 4.52.2
- [Release notes](https://github.com/openai/openai-node/releases )
- [Changelog](https://github.com/openai/openai-node/blob/master/CHANGELOG.md )
- [Commits](https://github.com/openai/openai-node/compare/v4.47.3...v4.52.2 )
Updates `promptable` from 0.0.9 to 0.0.10
- [Commits](https://github.com/promptable/Promptable.js/commits )
Updates `puppeteer` from 22.10.0 to 22.12.1
- [Release notes](https://github.com/puppeteer/puppeteer/releases )
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json )
- [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v22.10.0...puppeteer-v22.12.1 )
Updates `rate-limiter-flexible` from 2.4.2 to 5.0.3
- [Release notes](https://github.com/animir/node-rate-limiter-flexible/releases )
- [Commits](https://github.com/animir/node-rate-limiter-flexible/commits/v5.0.3 )
Updates `resend` from 3.2.0 to 3.4.0
- [Release notes](https://github.com/resendlabs/resend-node/releases )
- [Commits](https://github.com/resendlabs/resend-node/compare/v3.2.0...v3.4.0 )
Updates `stripe` from 12.18.0 to 16.1.0
- [Release notes](https://github.com/stripe/stripe-node/releases )
- [Changelog](https://github.com/stripe/stripe-node/blob/master/CHANGELOG.md )
- [Commits](https://github.com/stripe/stripe-node/compare/v12.18.0...v16.1.0 )
Updates `unstructured-client` from 0.9.4 to 0.11.3
- [Release notes](https://github.com/Unstructured-IO/unstructured-js-client/releases )
- [Changelog](https://github.com/Unstructured-IO/unstructured-js-client/blob/main/RELEASES.md )
- [Commits](https://github.com/Unstructured-IO/unstructured-js-client/compare/v0.9.4...v0.11.3 )
Updates `uuid` from 9.0.1 to 10.0.0
- [Changelog](https://github.com/uuidjs/uuid/blob/main/CHANGELOG.md )
- [Commits](https://github.com/uuidjs/uuid/compare/v9.0.1...v10.0.0 )
Updates `zod-to-json-schema` from 3.23.0 to 3.23.1
- [Release notes](https://github.com/StefanTerdell/zod-to-json-schema/releases )
- [Changelog](https://github.com/StefanTerdell/zod-to-json-schema/blob/master/changelog.md )
- [Commits](https://github.com/StefanTerdell/zod-to-json-schema/commits )
---
updated-dependencies:
- dependency-name: "@anthropic-ai/sdk"
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: "@bull-board/api"
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: "@bull-board/express"
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: "@hyperdx/node-opentelemetry"
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: "@nangohq/node"
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: "@sentry/node"
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: "@supabase/supabase-js"
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: ajv
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: async-mutex
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: bull
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: date-fns
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: express-rate-limit
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: glob
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: prod-deps
- dependency-name: json-schema-to-zod
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: keyword-extractor
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: prod-deps
- dependency-name: langchain
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: logsnag
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: mongoose
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: prod-deps
- dependency-name: natural
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: openai
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: promptable
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: prod-deps
- dependency-name: puppeteer
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: rate-limiter-flexible
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: resend
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: stripe
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: unstructured-client
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: uuid
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: zod-to-json-schema
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: prod-deps
...
Signed-off-by: dependabot[bot] <support@github.com>
2024-07-02 12:52:43 +00:00
Rafael Miller
f0f449fe51
Merge pull request #336 from snippet/allow-external-content-links
...
[Proposal] new feature allowExternalContentLinks
2024-07-02 09:45:21 -03:00
rafaelsideguide
db4a743365
Added e2e test
2024-07-02 09:44:08 -03:00
Nicolas
42cd58a679
Merge pull request #332 from mendableai/feat/rawHtmlExtraction
...
Adds pageOptions.includeRawHtml and new extraction mode "llm-extraction-from-raw-html"
2024-07-01 18:23:26 -03:00
Nicolas
c4f423981f
Update pnpm-lock.yaml
2024-07-01 18:22:22 -03:00
rafaelsideguide
16aac7f8c5
Update single_url.ts
2024-07-01 18:21:15 -03:00
Nicolas
6d0c7a9ccd
Merge pull request #323 from mendableai/tests/crawl-limit-unit-tests
...
[Tests] Added crawl limit unit test
2024-07-01 17:56:04 -03:00
rafaelsideguide
4d6e25619b
minor spacing and comment stuff
2024-07-01 16:05:34 -03:00
Eric Ciarla
e1af815f8c
Update scrape.ts
2024-07-01 08:48:21 -04:00
Eric Ciarla
7ae195bacc
Update index.test.ts
2024-06-29 10:13:12 -04:00
Eric Ciarla
837b446390
Update index.test.ts
2024-06-29 08:48:42 -04:00
Eric Ciarla
fe6e3aeadc
Update index.test.ts
2024-06-29 08:44:21 -04:00
Eric Ciarla
6c9f0dfc91
Add tests
2024-06-29 08:32:20 -04:00
Jeff Pereira
a5fb45988c
new feature allowExternalContentLinks
2024-06-28 17:23:40 -07:00
Eric Ciarla
87b54488d3
update to includeRawHtml
2024-06-28 17:07:47 -04:00
Eric Ciarla
70fcf2ce03
init
2024-06-28 16:39:09 -04:00
Nicolas
9bf74bc774
Update single_url.ts
2024-06-28 15:51:18 -03:00
Nicolas
7e17498bcf
Update single_url.ts
2024-06-28 15:45:16 -03:00
rafaelsideguide
d66e1f7846
looking good
2024-06-27 16:00:45 -03:00
Nicolas
9e7298945c
Update openapi.json
2024-06-26 21:25:38 -03:00
Nicolas
1ec0bf8adf
Update openapi.json
2024-06-26 21:22:46 -03:00
Nicolas
042f81ddf2
Update removeUnwantedElements.test.ts
2024-06-26 21:20:11 -03:00
Nicolas
388ce3cbce
Nick: small changes
2024-06-26 21:15:42 -03:00
Nicolas
1d4907acc9
Nick:
2024-06-26 21:02:58 -03:00
rafaelsideguide
c40da77be0
Added implementation for saving docs on supabase
...
- TODO: remove the comments on `log_job.ts` before deploying to prod
2024-06-26 18:23:28 -03:00
Nicolas
3b92fb8433
Merge pull request #322 from mendableai/tests/metadata
...
[Test] Added E2E tests for checking metadata values
2024-06-26 12:09:18 -03:00
rafaelsideguide
67d7650cf3
Added to e2e_noAuth
2024-06-26 12:07:55 -03:00
rafaelsideguide
009df6c930
Added crawl limit unit test
...
I think this test is over relying on mocks but I have no idea on how to fix this without changing the code arch structure
2024-06-26 09:54:25 -03:00
rafaelsideguide
05eaa3c68d
Update index.test.ts
2024-06-26 09:32:02 -03:00
rafaelsideguide
4381109dd8
added default values and fixed pdf bug
2024-06-26 09:00:54 -03:00
Nicolas
45f2765601
Merge pull request #316 from snippet/types-webscraper
...
add some types
2024-06-25 22:03:21 -03:00
Nicolas
768a131b5c
Merge pull request #318 from mendableai/bug/fix-custom-scrape-pdf-google-drive
...
[Bug] Fixed the regex test for google drive pdf files
2024-06-25 18:27:11 -03:00
rafaelsideguide
5f69fc7677
Fixed the regex test
2024-06-25 18:24:01 -03:00
rafaelsideguide
d02829d335
fixed clean jobs
2024-06-25 17:49:29 -03:00
Jeff Pereira
199cbe8bcb
add some types
2024-06-25 12:20:25 -07:00
Nicolas
749b0c05dc
Merge branch 'main' of https://github.com/mendableai/firecrawl
2024-06-25 15:21:15 -03:00
Nicolas
e7be17db92
Nick: metadata fixes and lock duration for bull decreased to 2 hrs
2024-06-25 15:21:14 -03:00
Nicolas
f84fb4b331
Merge pull request #313 from snippet/google-search-term-fix
...
fix multi-word search term issue: /search (w/o Serp)
2024-06-24 19:24:58 -03:00
Jeff Pereira
6ddf3a58a1
fix multi-word search term issue: /search (w/o Serp)
2024-06-24 14:21:52 -07:00
Nicolas
90b7fff366
Update crawler.ts
2024-06-24 16:52:01 -03:00
Nicolas
08c1fa799b
Update queue-worker.ts
2024-06-24 16:51:32 -03:00
rafaelsideguide
3ebdf93342
removed console.logs
2024-06-24 16:43:12 -03:00
Nicolas
56d42d9c9b
Nick:
2024-06-24 16:33:07 -03:00
rafaelsideguide
21d29de819
testing crawl with new.abb.com case
...
many unnecessary console.logs for tracing the code execution
2024-06-24 16:25:07 -03:00
Nicolas
3c7b7e7242
NIck: fixes fallback
2024-06-23 18:59:08 -03:00
Caleb Peffer
e59ba758f5
Caleb: changed posthog logging so that It associates jobs with a group. No
2024-06-18 17:42:21 -07:00
Caleb Peffer
5a91d8425f
Caleb: solve for typechecking on idempotencyKey on my machine
2024-06-18 17:07:38 -07:00
rafaelsideguide
9c539e9113
Fixed includeHTML to use cleanedHtml as response
2024-06-18 16:26:54 -03:00
Rafael Miller
f5a9acc4c6
Merge branch 'main' into feat/removeTags-regex
2024-06-18 14:39:59 -03:00
rafaelsideguide
9f7afd1e88
fix for some complex cases
2024-06-18 14:36:51 -03:00
Nicolas
d0c05accf6
Nick:
2024-06-18 13:21:50 -04:00
Nicolas
818751a256
Merge pull request #294 from mendableai/tests/e2e-to-unit
...
[Test] Transcribed from e2e to unit tests for many cases
2024-06-18 13:09:22 -04:00
rafaelsideguide
727e5de8c5
Update index.test.ts
2024-06-18 11:54:10 -03:00
rafaelsideguide
c54e797eb1
(╯°□°)╯︵ ┻━┻
2024-06-18 11:51:28 -03:00
rafaelsideguide
20f14bcf7f
Added some types
2024-06-18 10:55:07 -03:00
rafaelsideguide
c2fc69af1c
removed some e2e tests that are making the ci get stuck
2024-06-18 09:57:05 -03:00
rafaelsideguide
6c726a02eb
Moved to utils/removeUnwantedElements, added unit tests
2024-06-18 09:46:42 -03:00
AndyMik90
8b3c3aae91
Added support for RegEx in removeTags
2024-06-18 07:31:46 +02:00
rafaelsideguide
b2bd562bb2
transcribed from e2e to unit tests for many cases
2024-06-17 17:09:44 -03:00
Nicolas
ab038051e9
Merge branch 'main' into nsc/rate-limiter-tests
2024-06-17 15:06:12 -04:00