Nicolas
|
8f46b8218a
|
Merge pull request #361 from snippet/ts-playwright-service-docker
setting up docker to ts playwright service
|
2024-07-04 17:47:41 -03:00 |
|
Nicolas
|
32849b017f
|
Nick:
|
2024-07-03 20:18:11 -03:00 |
|
Nicolas
|
5ecd9cb6f5
|
Merge pull request #363 from mendableai/nsc/logging-scrapers
Logging for all scraper methods
|
2024-07-03 18:47:22 -03:00 |
|
Nicolas
|
066d92f643
|
Update single_url.ts
|
2024-07-03 18:38:17 -03:00 |
|
Nicolas
|
f5b2fbd7e8
|
Nick: revision
|
2024-07-03 18:06:53 -03:00 |
|
Nicolas
|
2d30cc6117
|
Nick: comments
|
2024-07-03 18:01:54 -03:00 |
|
Nicolas
|
90c54c32fd
|
Nick: refactor
|
2024-07-03 18:01:17 -03:00 |
|
Nicolas
|
90cf799a3c
|
Update single_url.ts
|
2024-07-03 17:56:21 -03:00 |
|
Nicolas
|
b36406e465
|
Nick: log scrpaers
|
2024-07-03 17:28:53 -03:00 |
|
Jeff Pereira
|
b4292c1ea3
|
setting up docker to ts playwright service
|
2024-07-03 11:55:39 -07:00 |
|
Nicolas
|
abb44bb112
|
Merge pull request #346 from mendableai/dependabot/pip/apps/playwright-service/prod-deps-8f04296377
apps/playwright-service(deps): bump the prod-deps group in /apps/playwright-service with 3 updates
|
2024-07-03 01:07:09 -03:00 |
|
Nicolas
|
f967daddcb
|
Merge pull request #325 from snippet/playwright-scraper-api
new playwright service
|
2024-07-03 01:04:52 -03:00 |
|
Eric Ciarla
|
2d0d5ac392
|
Update for llm-extraction-from-raw-html
|
2024-07-02 14:05:42 -04:00 |
|
rafaelsideguide
|
0175152577
|
Fixed PDF match custom scraping
Now it's working for both `https://getgc.ai/privacy` and `https://prairie.cards/products/wood-designs` usecases.
|
2024-07-02 11:25:17 -03:00 |
|
rafaelsideguide
|
96de948d6b
|
Update index.test.ts
|
2024-07-02 11:04:09 -03:00 |
|
rafaelsideguide
|
7b7154ba1e
|
bugfixed pageStatusCode
|
2024-07-02 10:51:35 -03:00 |
|
Rafael Miller
|
50eecf04a9
|
Update licence pyproject.toml
Closes #345
|
2024-07-02 10:01:49 -03:00 |
|
dependabot[bot]
|
c2e00d1998
|
apps/api(deps): bump the prod-deps group in /apps/api with 28 updates
Bumps the prod-deps group in /apps/api with 28 updates:
| Package | From | To |
| --- | --- | --- |
| [@anthropic-ai/sdk](https://github.com/anthropics/anthropic-sdk-typescript) | `0.20.9` | `0.24.3` |
| [@bull-board/api](https://github.com/felixmosh/bull-board/tree/HEAD/packages/api) | `5.19.2` | `5.20.5` |
| [@bull-board/express](https://github.com/felixmosh/bull-board/tree/HEAD/packages/express) | `5.19.2` | `5.20.5` |
| [@hyperdx/node-opentelemetry](https://github.com/hyperdxio/hyperdx-js) | `0.7.0` | `0.8.0` |
| [@nangohq/node](https://github.com/NangoHQ/nango/tree/HEAD/packages/node-client) | `0.36.101` | `0.40.8` |
| [@sentry/node](https://github.com/getsentry/sentry-javascript) | `7.116.0` | `8.13.0` |
| [@supabase/supabase-js](https://github.com/supabase/supabase-js) | `2.43.4` | `2.44.2` |
| [ajv](https://github.com/ajv-validator/ajv) | `8.15.0` | `8.16.0` |
| [async-mutex](https://github.com/DirtyHairy/async-mutex) | `0.4.1` | `0.5.0` |
| [bull](https://github.com/OptimalBits/bull) | `4.12.9` | `4.15.0` |
| [date-fns](https://github.com/date-fns/date-fns) | `2.30.0` | `3.6.0` |
| [express-rate-limit](https://github.com/express-rate-limit/express-rate-limit) | `6.11.2` | `7.3.1` |
| [glob](https://github.com/isaacs/node-glob) | `10.4.1` | `10.4.2` |
| [json-schema-to-zod](https://github.com/StefanTerdell/json-schema-to-zod) | `2.1.0` | `2.3.0` |
| [keyword-extractor](https://github.com/michaeldelorenzo/keyword-extractor) | `0.0.25` | `0.0.28` |
| [langchain](https://github.com/langchain-ai/langchainjs) | `0.1.37` | `0.2.8` |
| [logsnag](https://github.com/LogSnag/logsnag.js) | `0.1.8` | `1.0.0` |
| [mongoose](https://github.com/Automattic/mongoose) | `8.4.1` | `8.4.4` |
| [natural](https://github.com/NaturalNode/natural) | `6.12.0` | `7.0.7` |
| [openai](https://github.com/openai/openai-node) | `4.47.3` | `4.52.2` |
| [promptable](https://github.com/promptable/Promptable.js) | `0.0.9` | `0.0.10` |
| [puppeteer](https://github.com/puppeteer/puppeteer) | `22.10.0` | `22.12.1` |
| [rate-limiter-flexible](https://github.com/animir/node-rate-limiter-flexible) | `2.4.2` | `5.0.3` |
| [resend](https://github.com/resendlabs/resend-node) | `3.2.0` | `3.4.0` |
| [stripe](https://github.com/stripe/stripe-node) | `12.18.0` | `16.1.0` |
| [unstructured-client](https://github.com/Unstructured-IO/unstructured-js-client) | `0.9.4` | `0.11.3` |
| [uuid](https://github.com/uuidjs/uuid) | `9.0.1` | `10.0.0` |
| [zod-to-json-schema](https://github.com/StefanTerdell/zod-to-json-schema) | `3.23.0` | `3.23.1` |
Updates `@anthropic-ai/sdk` from 0.20.9 to 0.24.3
- [Release notes](https://github.com/anthropics/anthropic-sdk-typescript/releases)
- [Changelog](https://github.com/anthropics/anthropic-sdk-typescript/blob/main/CHANGELOG.md)
- [Commits](https://github.com/anthropics/anthropic-sdk-typescript/compare/sdk-v0.20.9...sdk-v0.24.3)
Updates `@bull-board/api` from 5.19.2 to 5.20.5
- [Release notes](https://github.com/felixmosh/bull-board/releases)
- [Changelog](https://github.com/felixmosh/bull-board/blob/master/CHANGELOG.md)
- [Commits](https://github.com/felixmosh/bull-board/commits/v5.20.5/packages/api)
Updates `@bull-board/express` from 5.19.2 to 5.20.5
- [Release notes](https://github.com/felixmosh/bull-board/releases)
- [Changelog](https://github.com/felixmosh/bull-board/blob/master/CHANGELOG.md)
- [Commits](https://github.com/felixmosh/bull-board/commits/v5.20.5/packages/express)
Updates `@hyperdx/node-opentelemetry` from 0.7.0 to 0.8.0
- [Release notes](https://github.com/hyperdxio/hyperdx-js/releases)
- [Commits](https://github.com/hyperdxio/hyperdx-js/compare/@hyperdx/node-opentelemetry@0.7.0...@hyperdx/node-opentelemetry@0.8.0)
Updates `@nangohq/node` from 0.36.101 to 0.40.8
- [Release notes](https://github.com/NangoHQ/nango/releases)
- [Changelog](https://github.com/NangoHQ/nango/blob/master/CHANGELOG.md)
- [Commits](https://github.com/NangoHQ/nango/commits/v0.40.8/packages/node-client)
Updates `@sentry/node` from 7.116.0 to 8.13.0
- [Release notes](https://github.com/getsentry/sentry-javascript/releases)
- [Changelog](https://github.com/getsentry/sentry-javascript/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/getsentry/sentry-javascript/compare/7.116.0...8.13.0)
Updates `@supabase/supabase-js` from 2.43.4 to 2.44.2
- [Release notes](https://github.com/supabase/supabase-js/releases)
- [Changelog](https://github.com/supabase/supabase-js/blob/master/RELEASE.md)
- [Commits](https://github.com/supabase/supabase-js/compare/v2.43.4...v2.44.2)
Updates `ajv` from 8.15.0 to 8.16.0
- [Release notes](https://github.com/ajv-validator/ajv/releases)
- [Commits](https://github.com/ajv-validator/ajv/compare/v8.15.0...v8.16.0)
Updates `async-mutex` from 0.4.1 to 0.5.0
- [Changelog](https://github.com/DirtyHairy/async-mutex/blob/master/CHANGELOG.md)
- [Commits](https://github.com/DirtyHairy/async-mutex/compare/v0.4.1...v0.5.0)
Updates `bull` from 4.12.9 to 4.15.0
- [Release notes](https://github.com/OptimalBits/bull/releases)
- [Changelog](https://github.com/OptimalBits/bull/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/OptimalBits/bull/compare/v4.12.9...v4.15.0)
Updates `date-fns` from 2.30.0 to 3.6.0
- [Release notes](https://github.com/date-fns/date-fns/releases)
- [Changelog](https://github.com/date-fns/date-fns/blob/main/CHANGELOG.md)
- [Commits](https://github.com/date-fns/date-fns/compare/v2.30.0...v3.6.0)
Updates `express-rate-limit` from 6.11.2 to 7.3.1
- [Release notes](https://github.com/express-rate-limit/express-rate-limit/releases)
- [Commits](https://github.com/express-rate-limit/express-rate-limit/compare/v6.11.2...v7.3.1)
Updates `glob` from 10.4.1 to 10.4.2
- [Changelog](https://github.com/isaacs/node-glob/blob/main/changelog.md)
- [Commits](https://github.com/isaacs/node-glob/compare/v10.4.1...v10.4.2)
Updates `json-schema-to-zod` from 2.1.0 to 2.3.0
- [Commits](https://github.com/StefanTerdell/json-schema-to-zod/commits)
Updates `keyword-extractor` from 0.0.25 to 0.0.28
- [Release notes](https://github.com/michaeldelorenzo/keyword-extractor/releases)
- [Commits](https://github.com/michaeldelorenzo/keyword-extractor/compare/0.0.25...0.0.28)
Updates `langchain` from 0.1.37 to 0.2.8
- [Release notes](https://github.com/langchain-ai/langchainjs/releases)
- [Changelog](https://github.com/langchain-ai/langchainjs/blob/main/release_workspace.js)
- [Commits](https://github.com/langchain-ai/langchainjs/compare/0.1.37...0.2.8)
Updates `logsnag` from 0.1.8 to 1.0.0
- [Commits](https://github.com/LogSnag/logsnag.js/compare/v0.1.8...v1.0.0)
Updates `mongoose` from 8.4.1 to 8.4.4
- [Release notes](https://github.com/Automattic/mongoose/releases)
- [Changelog](https://github.com/Automattic/mongoose/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Automattic/mongoose/compare/8.4.1...8.4.4)
Updates `natural` from 6.12.0 to 7.0.7
- [Release notes](https://github.com/NaturalNode/natural/releases)
- [Commits](https://github.com/NaturalNode/natural/compare/v6.12.0...v7.0.7)
Updates `openai` from 4.47.3 to 4.52.2
- [Release notes](https://github.com/openai/openai-node/releases)
- [Changelog](https://github.com/openai/openai-node/blob/master/CHANGELOG.md)
- [Commits](https://github.com/openai/openai-node/compare/v4.47.3...v4.52.2)
Updates `promptable` from 0.0.9 to 0.0.10
- [Commits](https://github.com/promptable/Promptable.js/commits)
Updates `puppeteer` from 22.10.0 to 22.12.1
- [Release notes](https://github.com/puppeteer/puppeteer/releases)
- [Changelog](https://github.com/puppeteer/puppeteer/blob/main/release-please-config.json)
- [Commits](https://github.com/puppeteer/puppeteer/compare/puppeteer-v22.10.0...puppeteer-v22.12.1)
Updates `rate-limiter-flexible` from 2.4.2 to 5.0.3
- [Release notes](https://github.com/animir/node-rate-limiter-flexible/releases)
- [Commits](https://github.com/animir/node-rate-limiter-flexible/commits/v5.0.3)
Updates `resend` from 3.2.0 to 3.4.0
- [Release notes](https://github.com/resendlabs/resend-node/releases)
- [Commits](https://github.com/resendlabs/resend-node/compare/v3.2.0...v3.4.0)
Updates `stripe` from 12.18.0 to 16.1.0
- [Release notes](https://github.com/stripe/stripe-node/releases)
- [Changelog](https://github.com/stripe/stripe-node/blob/master/CHANGELOG.md)
- [Commits](https://github.com/stripe/stripe-node/compare/v12.18.0...v16.1.0)
Updates `unstructured-client` from 0.9.4 to 0.11.3
- [Release notes](https://github.com/Unstructured-IO/unstructured-js-client/releases)
- [Changelog](https://github.com/Unstructured-IO/unstructured-js-client/blob/main/RELEASES.md)
- [Commits](https://github.com/Unstructured-IO/unstructured-js-client/compare/v0.9.4...v0.11.3)
Updates `uuid` from 9.0.1 to 10.0.0
- [Changelog](https://github.com/uuidjs/uuid/blob/main/CHANGELOG.md)
- [Commits](https://github.com/uuidjs/uuid/compare/v9.0.1...v10.0.0)
Updates `zod-to-json-schema` from 3.23.0 to 3.23.1
- [Release notes](https://github.com/StefanTerdell/zod-to-json-schema/releases)
- [Changelog](https://github.com/StefanTerdell/zod-to-json-schema/blob/master/changelog.md)
- [Commits](https://github.com/StefanTerdell/zod-to-json-schema/commits)
---
updated-dependencies:
- dependency-name: "@anthropic-ai/sdk"
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: "@bull-board/api"
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: "@bull-board/express"
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: "@hyperdx/node-opentelemetry"
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: "@nangohq/node"
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: "@sentry/node"
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: "@supabase/supabase-js"
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: ajv
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: async-mutex
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: bull
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: date-fns
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: express-rate-limit
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: glob
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: prod-deps
- dependency-name: json-schema-to-zod
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: keyword-extractor
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: prod-deps
- dependency-name: langchain
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: logsnag
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: mongoose
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: prod-deps
- dependency-name: natural
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: openai
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: promptable
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: prod-deps
- dependency-name: puppeteer
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: rate-limiter-flexible
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: resend
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: stripe
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: unstructured-client
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: uuid
dependency-type: direct:production
update-type: version-update:semver-major
dependency-group: prod-deps
- dependency-name: zod-to-json-schema
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: prod-deps
...
Signed-off-by: dependabot[bot] <support@github.com>
|
2024-07-02 12:52:43 +00:00 |
|
dependabot[bot]
|
ad3e73b445
|
apps/test-suite(deps): bump the prod-deps group
Bumps the prod-deps group in /apps/test-suite with 6 updates:
| Package | From | To |
| --- | --- | --- |
| [@anthropic-ai/sdk](https://github.com/anthropics/anthropic-sdk-typescript) | `0.20.8` | `0.24.3` |
| [@dqbd/tiktoken](https://github.com/dqbd/tiktoken) | `1.0.14` | `1.0.15` |
| [@supabase/supabase-js](https://github.com/supabase/supabase-js) | `2.43.1` | `2.44.2` |
| [openai](https://github.com/openai/openai-node) | `4.40.2` | `4.52.2` |
| [playwright](https://github.com/microsoft/playwright) | `1.43.1` | `1.45.0` |
| [ts-jest](https://github.com/kulshekhar/ts-jest) | `29.1.2` | `29.1.5` |
Updates `@anthropic-ai/sdk` from 0.20.8 to 0.24.3
- [Release notes](https://github.com/anthropics/anthropic-sdk-typescript/releases)
- [Changelog](https://github.com/anthropics/anthropic-sdk-typescript/blob/main/CHANGELOG.md)
- [Commits](https://github.com/anthropics/anthropic-sdk-typescript/compare/sdk-v0.20.8...sdk-v0.24.3)
Updates `@dqbd/tiktoken` from 1.0.14 to 1.0.15
- [Release notes](https://github.com/dqbd/tiktoken/releases)
- [Changelog](https://github.com/dqbd/tiktoken/blob/main/CHANGELOG.md)
- [Commits](https://github.com/dqbd/tiktoken/compare/@dqbd/tiktoken@1.0.14...@dqbd/tiktoken@1.0.15)
Updates `@supabase/supabase-js` from 2.43.1 to 2.44.2
- [Release notes](https://github.com/supabase/supabase-js/releases)
- [Changelog](https://github.com/supabase/supabase-js/blob/master/RELEASE.md)
- [Commits](https://github.com/supabase/supabase-js/compare/v2.43.1...v2.44.2)
Updates `openai` from 4.40.2 to 4.52.2
- [Release notes](https://github.com/openai/openai-node/releases)
- [Changelog](https://github.com/openai/openai-node/blob/master/CHANGELOG.md)
- [Commits](https://github.com/openai/openai-node/compare/v4.40.2...v4.52.2)
Updates `playwright` from 1.43.1 to 1.45.0
- [Release notes](https://github.com/microsoft/playwright/releases)
- [Commits](https://github.com/microsoft/playwright/compare/v1.43.1...v1.45.0)
Updates `ts-jest` from 29.1.2 to 29.1.5
- [Release notes](https://github.com/kulshekhar/ts-jest/releases)
- [Changelog](https://github.com/kulshekhar/ts-jest/blob/main/CHANGELOG.md)
- [Commits](https://github.com/kulshekhar/ts-jest/compare/v29.1.2...v29.1.5)
---
updated-dependencies:
- dependency-name: "@anthropic-ai/sdk"
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: "@dqbd/tiktoken"
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: prod-deps
- dependency-name: "@supabase/supabase-js"
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: openai
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: playwright
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: ts-jest
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: prod-deps
...
Signed-off-by: dependabot[bot] <support@github.com>
|
2024-07-02 12:47:58 +00:00 |
|
dependabot[bot]
|
60de6bb6e3
|
apps/playwright-service(deps): bump the prod-deps group
Bumps the prod-deps group in /apps/playwright-service with 3 updates: [hypercorn](https://github.com/pgjones/hypercorn), [fastapi](https://github.com/tiangolo/fastapi) and [playwright](https://github.com/Microsoft/playwright-python).
Updates `hypercorn` from 0.16.0 to 0.17.3
- [Changelog](https://github.com/pgjones/hypercorn/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pgjones/hypercorn/compare/0.16.0...0.17.3)
Updates `fastapi` from 0.110.0 to 0.111.0
- [Release notes](https://github.com/tiangolo/fastapi/releases)
- [Commits](https://github.com/tiangolo/fastapi/compare/0.110.0...0.111.0)
Updates `playwright` from 1.42.0 to 1.44.0
- [Release notes](https://github.com/Microsoft/playwright-python/releases)
- [Commits](https://github.com/Microsoft/playwright-python/compare/v1.42.0...v1.44.0)
---
updated-dependencies:
- dependency-name: hypercorn
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: fastapi
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
- dependency-name: playwright
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: prod-deps
...
Signed-off-by: dependabot[bot] <support@github.com>
|
2024-07-02 12:47:09 +00:00 |
|
Rafael Miller
|
f0f449fe51
|
Merge pull request #336 from snippet/allow-external-content-links
[Proposal] new feature allowExternalContentLinks
|
2024-07-02 09:45:21 -03:00 |
|
rafaelsideguide
|
db4a743365
|
Added e2e test
|
2024-07-02 09:44:08 -03:00 |
|
Eric Ciarla
|
0821017f5b
|
Update README.md
|
2024-07-02 07:08:46 -04:00 |
|
Nicolas
|
42cd58a679
|
Merge pull request #332 from mendableai/feat/rawHtmlExtraction
Adds pageOptions.includeRawHtml and new extraction mode "llm-extraction-from-raw-html"
|
2024-07-01 18:23:26 -03:00 |
|
Nicolas
|
c4f423981f
|
Update pnpm-lock.yaml
|
2024-07-01 18:22:22 -03:00 |
|
rafaelsideguide
|
16aac7f8c5
|
Update single_url.ts
|
2024-07-01 18:21:15 -03:00 |
|
Nicolas
|
6d0c7a9ccd
|
Merge pull request #323 from mendableai/tests/crawl-limit-unit-tests
[Tests] Added crawl limit unit test
|
2024-07-01 17:56:04 -03:00 |
|
rafaelsideguide
|
4d6e25619b
|
minor spacing and comment stuff
|
2024-07-01 16:05:34 -03:00 |
|
Eric Ciarla
|
e1af815f8c
|
Update scrape.ts
|
2024-07-01 08:48:21 -04:00 |
|
Eric Ciarla
|
7ae195bacc
|
Update index.test.ts
|
2024-06-29 10:13:12 -04:00 |
|
Eric Ciarla
|
837b446390
|
Update index.test.ts
|
2024-06-29 08:48:42 -04:00 |
|
Eric Ciarla
|
fe6e3aeadc
|
Update index.test.ts
|
2024-06-29 08:44:21 -04:00 |
|
Eric Ciarla
|
6c9f0dfc91
|
Add tests
|
2024-06-29 08:32:20 -04:00 |
|
Jeff Pereira
|
a5fb45988c
|
new feature allowExternalContentLinks
|
2024-06-28 17:23:40 -07:00 |
|
Eric Ciarla
|
87b54488d3
|
update to includeRawHtml
|
2024-06-28 17:07:47 -04:00 |
|
Eric Ciarla
|
70fcf2ce03
|
init
|
2024-06-28 16:39:09 -04:00 |
|
Nicolas
|
9bf74bc774
|
Update single_url.ts
|
2024-06-28 15:51:18 -03:00 |
|
Nicolas
|
7e17498bcf
|
Update single_url.ts
|
2024-06-28 15:45:16 -03:00 |
|
rafaelsideguide
|
7dffaaa3e2
|
Changed port and added "using with firecrawl" section on readme
|
2024-06-28 11:51:24 -03:00 |
|
rafaelsideguide
|
d66e1f7846
|
looking good
|
2024-06-27 16:00:45 -03:00 |
|
Nicolas
|
9e7298945c
|
Update openapi.json
|
2024-06-26 21:25:38 -03:00 |
|
Nicolas
|
1ec0bf8adf
|
Update openapi.json
|
2024-06-26 21:22:46 -03:00 |
|
Nicolas
|
042f81ddf2
|
Update removeUnwantedElements.test.ts
|
2024-06-26 21:20:11 -03:00 |
|
Nicolas
|
388ce3cbce
|
Nick: small changes
|
2024-06-26 21:15:42 -03:00 |
|
Nicolas
|
1d4907acc9
|
Nick:
|
2024-06-26 21:02:58 -03:00 |
|
rafaelsideguide
|
c40da77be0
|
Added implementation for saving docs on supabase
- TODO: remove the comments on `log_job.ts` before deploying to prod
|
2024-06-26 18:23:28 -03:00 |
|
Jeff Pereira
|
d833a132a5
|
new playwright service
|
2024-06-26 12:32:30 -07:00 |
|
Nicolas
|
3b92fb8433
|
Merge pull request #322 from mendableai/tests/metadata
[Test] Added E2E tests for checking metadata values
|
2024-06-26 12:09:18 -03:00 |
|
rafaelsideguide
|
67d7650cf3
|
Added to e2e_noAuth
|
2024-06-26 12:07:55 -03:00 |
|
rafaelsideguide
|
009df6c930
|
Added crawl limit unit test
I think this test is over relying on mocks but I have no idea on how to fix this without changing the code arch structure
|
2024-06-26 09:54:25 -03:00 |
|
rafaelsideguide
|
05eaa3c68d
|
Update index.test.ts
|
2024-06-26 09:32:02 -03:00 |
|
rafaelsideguide
|
4381109dd8
|
added default values and fixed pdf bug
|
2024-06-26 09:00:54 -03:00 |
|
Nicolas
|
45f2765601
|
Merge pull request #316 from snippet/types-webscraper
add some types
|
2024-06-25 22:03:21 -03:00 |
|
Nicolas
|
768a131b5c
|
Merge pull request #318 from mendableai/bug/fix-custom-scrape-pdf-google-drive
[Bug] Fixed the regex test for google drive pdf files
|
2024-06-25 18:27:11 -03:00 |
|
rafaelsideguide
|
5f69fc7677
|
Fixed the regex test
|
2024-06-25 18:24:01 -03:00 |
|
rafaelsideguide
|
d02829d335
|
fixed clean jobs
|
2024-06-25 17:49:29 -03:00 |
|
Jeff Pereira
|
199cbe8bcb
|
add some types
|
2024-06-25 12:20:25 -07:00 |
|
Nicolas
|
749b0c05dc
|
Merge branch 'main' of https://github.com/mendableai/firecrawl
|
2024-06-25 15:21:15 -03:00 |
|
Nicolas
|
e7be17db92
|
Nick: metadata fixes and lock duration for bull decreased to 2 hrs
|
2024-06-25 15:21:14 -03:00 |
|
Nicolas
|
f84fb4b331
|
Merge pull request #313 from snippet/google-search-term-fix
fix multi-word search term issue: /search (w/o Serp)
|
2024-06-24 19:24:58 -03:00 |
|
Jeff Pereira
|
6ddf3a58a1
|
fix multi-word search term issue: /search (w/o Serp)
|
2024-06-24 14:21:52 -07:00 |
|
Nicolas
|
90b7fff366
|
Update crawler.ts
|
2024-06-24 16:52:01 -03:00 |
|
Nicolas
|
08c1fa799b
|
Update queue-worker.ts
|
2024-06-24 16:51:32 -03:00 |
|
rafaelsideguide
|
3ebdf93342
|
removed console.logs
|
2024-06-24 16:43:12 -03:00 |
|
Nicolas
|
56d42d9c9b
|
Nick:
|
2024-06-24 16:33:07 -03:00 |
|
rafaelsideguide
|
21d29de819
|
testing crawl with new.abb.com case
many unnecessary console.logs for tracing the code execution
|
2024-06-24 16:25:07 -03:00 |
|
Nicolas
|
3c7b7e7242
|
NIck: fixes fallback
|
2024-06-23 18:59:08 -03:00 |
|
Caleb Peffer
|
e59ba758f5
|
Caleb: changed posthog logging so that It associates jobs with a group. No
|
2024-06-18 17:42:21 -07:00 |
|
Caleb Peffer
|
5a91d8425f
|
Caleb: solve for typechecking on idempotencyKey on my machine
|
2024-06-18 17:07:38 -07:00 |
|
rafaelsideguide
|
9c539e9113
|
Fixed includeHTML to use cleanedHtml as response
|
2024-06-18 16:26:54 -03:00 |
|
Rafael Miller
|
f5a9acc4c6
|
Merge branch 'main' into feat/removeTags-regex
|
2024-06-18 14:39:59 -03:00 |
|
rafaelsideguide
|
9f7afd1e88
|
fix for some complex cases
|
2024-06-18 14:36:51 -03:00 |
|
Nicolas
|
d0c05accf6
|
Nick:
|
2024-06-18 13:21:50 -04:00 |
|
Nicolas
|
818751a256
|
Merge pull request #294 from mendableai/tests/e2e-to-unit
[Test] Transcribed from e2e to unit tests for many cases
|
2024-06-18 13:09:22 -04:00 |
|
Nicolas
|
754c9fa08d
|
Update package.json
|
2024-06-18 12:58:57 -04:00 |
|
Nicolas
|
90a807c547
|
Update index.ts
|
2024-06-18 12:56:13 -04:00 |
|
Nicolas
|
26e8bfc23a
|
Merge branch 'main' into pr/296
|
2024-06-18 12:55:45 -04:00 |
|
Nicolas
|
b53ba58bc0
|
Merge pull request #282 from mendableai/nsc/rate-limiter-tests
test: Rate Limit Unit Tests
|
2024-06-18 11:01:28 -04:00 |
|
rafaelsideguide
|
727e5de8c5
|
Update index.test.ts
|
2024-06-18 11:54:10 -03:00 |
|
rafaelsideguide
|
c54e797eb1
|
(╯°□°)╯︵ ┻━┻
|
2024-06-18 11:51:28 -03:00 |
|
rafaelsideguide
|
6e32522fa2
|
Improvements on response document types
|
2024-06-18 11:43:06 -03:00 |
|
rafaelsideguide
|
20f14bcf7f
|
Added some types
|
2024-06-18 10:55:07 -03:00 |
|
rafaelsideguide
|
c2fc69af1c
|
removed some e2e tests that are making the ci get stuck
|
2024-06-18 09:57:05 -03:00 |
|
rafaelsideguide
|
6c726a02eb
|
Moved to utils/removeUnwantedElements, added unit tests
|
2024-06-18 09:46:42 -03:00 |
|
AndyMik90
|
8b3c3aae91
|
Added support for RegEx in removeTags
|
2024-06-18 07:31:46 +02:00 |
|
neev jewalkar
|
e5ffda1eec
|
Added local host support for the javascript SDK
|
2024-06-18 05:42:25 +05:30 |
|
rafaelsideguide
|
b2bd562bb2
|
transcribed from e2e to unit tests for many cases
|
2024-06-17 17:09:44 -03:00 |
|
Nicolas
|
ab038051e9
|
Merge branch 'main' into nsc/rate-limiter-tests
|
2024-06-17 15:06:12 -04:00 |
|
rafaelsideguide
|
a20d002a6b
|
Delete test-run-report.json
|
2024-06-17 09:25:29 -03:00 |
|
Eric Ciarla
|
519ab1aecb
|
Update unit tests
|
2024-06-15 17:14:09 -04:00 |
|
Eric Ciarla
|
f0d4146b42
|
Merge branch 'feat/maxDepthRelative' of https://github.com/mendableai/firecrawl into feat/maxDepthRelative
|
2024-06-15 16:52:00 -04:00 |
|
Eric Ciarla
|
ff7b52cab1
|
Delete one more e2e test
|
2024-06-15 16:51:50 -04:00 |
|
Eric Ciarla
|
b1eb608295
|
Merge branch 'main' into feat/maxDepthRelative
|
2024-06-15 16:50:27 -04:00 |
|
Eric Ciarla
|
34e37c5671
|
Add unit tests to replace e2e
|
2024-06-15 16:43:37 -04:00 |
|
Eric Ciarla
|
2b40729cc2
|
Update index.test.ts
|
2024-06-15 08:56:32 -04:00 |
|
Eric Ciarla
|
f22759b2e7
|
Update index.test.ts
|
2024-06-14 19:42:11 -04:00 |
|
Eric Ciarla
|
a6b7197737
|
Fix for maxDepth
|
2024-06-14 19:40:37 -04:00 |
|
Nicolas
|
4ec863718b
|
Merge pull request #283 from mendableai/nsc/crawler-fixes
Fixes crawler getting confused with base paths that contain www.
|
2024-06-14 13:50:32 -07:00 |
|
Nicolas
|
43767360d8
|
Merge branch 'main' into nsc/rate-limiter-tests
|
2024-06-14 13:50:21 -07:00 |
|
Nicolas
|
e88cb314c8
|
Update crawler.ts
|
2024-06-14 13:44:54 -07:00 |
|
Rafael Miller
|
361cba4119
|
Merge pull request #175 from mendableai/test/load-testing
Test/load testing
|
2024-06-14 17:39:01 -03:00 |
|
Nicolas
|
7b11ace87d
|
Create rate-limiter.test.ts
|
2024-06-14 12:31:42 -07:00 |
|
rafaelsideguide
|
e369d1dd0e
|
Update index.test.ts
|
2024-06-14 16:17:54 -03:00 |
|
Nicolas
|
e37aa3db57
|
Nick: fixed rate limit on status
|
2024-06-14 12:13:02 -07:00 |
|
rafaelsideguide
|
a6ed2e693f
|
Update index.test.ts
|
2024-06-14 15:22:52 -03:00 |
|
rafaelsideguide
|
ad7795f973
|
Merge remote-tracking branch 'origin/main' into test/load-testing
|
2024-06-14 15:14:01 -03:00 |
|
rafaelsideguide
|
354712a8a3
|
just changed the name for the test?
|
2024-06-14 13:02:04 -03:00 |
|
Eric Ciarla
|
2c5f5c0ea2
|
Merge branch 'main' into feat/maxDepthRelative
|
2024-06-14 11:49:12 -04:00 |
|
Eric Ciarla
|
80c10393b4
|
Update index.test.ts
|
2024-06-14 11:32:30 -04:00 |
|
Eric Ciarla
|
42ed1f4479
|
Update index.test.ts
|
2024-06-14 11:20:24 -04:00 |
|
Eric Ciarla
|
8830acce07
|
Update index.test.ts
|
2024-06-14 11:11:58 -04:00 |
|
Eric Ciarla
|
278bb311cb
|
Update index.test.ts
|
2024-06-14 11:02:39 -04:00 |
|
Eric Ciarla
|
36a62727b8
|
Update index.test.ts
|
2024-06-14 10:52:43 -04:00 |
|
Rafael Miller
|
f9c7ca9388
|
Merge branch 'main' into feat/issue-266
|
2024-06-14 11:47:58 -03:00 |
|
Rafael Miller
|
3e2e76311c
|
Merge branch 'main' into feat/issue-205
|
2024-06-14 11:25:20 -03:00 |
|
Eric Ciarla
|
59451754f5
|
Add tests
|
2024-06-14 10:14:07 -04:00 |
|
rafaelsideguide
|
afee5684a3
|
Fixed tests' message and updated version
|
2024-06-14 11:05:19 -03:00 |
|
Eric Ciarla
|
9b254c1cd0
|
Update index.test.ts
|
2024-06-14 09:48:14 -04:00 |
|
Rafael Miller
|
5a5c532bea
|
Merge branch 'main' into py-sdk-improve-response-handling
|
2024-06-14 10:42:51 -03:00 |
|
Eric Ciarla
|
9aba451b18
|
Update index.test.ts
|
2024-06-14 09:33:43 -04:00 |
|
Rafael Miller
|
cc2e3f05b0
|
Merge pull request #256 from mattjoyce/feat-254-sdk-py-logging
Added logging to python sdk FIRECRAWL_LOGGING_LEVEL
|
2024-06-14 10:22:40 -03:00 |
|
rafaelsideguide
|
6963a490f1
|
Updated version
|
2024-06-14 10:21:44 -03:00 |
|
rafaelsideguide
|
5dd18ca79b
|
fixed edge cases
|
2024-06-14 09:46:55 -03:00 |
|
Eric Ciarla
|
ab9de0f5ab
|
Update maxDepth tests
|
2024-06-13 18:46:30 -04:00 |
|
Eric Ciarla
|
393bd45237
|
Update index.test.ts
|
2024-06-13 18:13:15 -04:00 |
|
Eric Ciarla
|
71c98d8b80
|
Update logic
|
2024-06-13 18:00:52 -04:00 |
|
Eric Ciarla
|
095951aa4d
|
Update test
|
2024-06-13 17:40:00 -04:00 |
|
Eric Ciarla
|
5e8aa92788
|
Update index.ts
|
2024-06-13 17:33:13 -04:00 |
|
Eric Ciarla
|
bf10e9d392
|
Update index.test.ts
|
2024-06-13 17:28:59 -04:00 |
|
Eric Ciarla
|
65d63bae45
|
Update index.ts
|
2024-06-13 17:17:44 -04:00 |
|
Eric Ciarla
|
32e814bedc
|
Update index.ts
|
2024-06-13 17:02:30 -04:00 |
|
Nicolas
|
6fc1ee32fd
|
Merge pull request #275 from mendableai/feat/issue-273
Added pageOptions.removeTags
|
2024-06-13 13:27:01 -07:00 |
|
rafaelsideguide
|
bb859ae9a7
|
Added metadata.pageStatusCode and metadata.pageError properties to the responses
|
2024-06-13 17:08:40 -03:00 |
|
rafaelsideguide
|
676d6e8ab5
|
Added pageOptions.removeTags
|
2024-06-13 10:51:05 -03:00 |
|
Nicolas
|
182f8d4d6c
|
Update index.ts
|
2024-06-12 18:07:05 -07:00 |
|
Nicolas
|
11b6d5afa5
|
Update fly.toml
|
2024-06-12 18:00:22 -07:00 |
|
Nicolas
|
67dc46b454
|
Nick: clusters
|
2024-06-12 17:53:04 -07:00 |
|
rafaelsideguide
|
d20af257ba
|
Added jobId to webhook data
|
2024-06-12 15:38:41 -03:00 |
|
rafaelsideguide
|
e37d151404
|
added parsePDF option to pageOptions
user can decide if they are going to let us take care of the parse or they are going to parse the pdf by themselves
|
2024-06-12 15:06:47 -03:00 |
|
rafaelsideguide
|
01c9f071fa
|
fixed
|
2024-06-12 11:27:06 -03:00 |
|
rafaelsideguide
|
dc6acbf1f0
|
Merge remote-tracking branch 'origin/main' into feat/allowbackwardcrawling-option
|
2024-06-12 11:01:05 -03:00 |
|
Nicolas
|
f93231499f
|
Merge pull request #265 from mendableai/feat/issue-264
[Feat] Added route to clean completed jobs and a github action cron that triggers every 24h
|
2024-06-11 21:33:52 -07:00 |
|
Nicolas
|
45dee63943
|
Merge pull request #262 from mendableai/nsc/webhook-self-host-fix
Only fetch webhook from db if self host webhook not set and using db auth
|
2024-06-11 15:46:57 -07:00 |
|
rafaelsideguide
|
157fbe4a1e
|
added bull auth key
|
2024-06-11 17:52:01 -03:00 |
|
rafaelsideguide
|
df3a678cf4
|
getting back the cancel test, this should work
|
2024-06-11 17:46:56 -03:00 |
|
rafaelsideguide
|
def2ba9987
|
added tests
|
2024-06-11 17:46:25 -03:00 |
|
Nicolas
|
1e3e06a1d5
|
Update replacePaths.test.ts
|
2024-06-11 13:02:39 -07:00 |
|
Nicolas
|
2239e03269
|
Update replacePaths.test.ts
|
2024-06-11 12:54:02 -07:00 |
|
Nicolas
|
520739c9f4
|
Nick: fixed bugs associated with absolute path replacements
|
2024-06-11 12:43:16 -07:00 |
|
Nicolas
|
b87725c683
|
Update openapi.json
|
2024-06-11 12:08:49 -07:00 |
|
rafaelsideguide
|
ee282c3d55
|
Added allowBackwardCrawling option
|
2024-06-11 15:24:39 -03:00 |
|
rafaelsideguide
|
a9f93c2f1e
|
Added route to clean completed jobs and a github action cron that triggers every 24h
|
2024-06-11 14:18:05 -03:00 |
|
Nicolas
|
da38dad9a7
|
Merge branch 'main' of https://github.com/mendableai/firecrawl
|
2024-06-10 18:26:31 -07:00 |
|
Nicolas
|
9390816c1b
|
Update openapi.json
|
2024-06-10 18:26:25 -07:00 |
|
Nicolas
|
f6b06ac27a
|
Nick: ignoreSitemap, better crawling algo
|
2024-06-10 18:12:41 -07:00 |
|
Nicolas
|
1bd0327e1a
|
Merge branch 'main' into nsc/pageoptions-crawler
|
2024-06-10 17:15:10 -07:00 |
|
Nicolas
|
99f2ffd6d5
|
Update webhook.ts
|
2024-06-10 17:03:10 -07:00 |
|
Nicolas
|
7ae9778642
|
Update single_url.ts
|
2024-06-10 16:57:31 -07:00 |
|
Nicolas
|
913c1dd568
|
Nick: fetch -> axios and fix timeouts
|
2024-06-10 16:49:03 -07:00 |
|
Nicolas
|
3091f0134c
|
Nick:
|
2024-06-10 16:27:10 -07:00 |
|
Matt Joyce
|
827354a116
|
Added logging to python sdk FIRECRAWL_LOGGING_LEVEL
Instantiates the logger early and depends on env to set.
|
2024-06-10 21:21:23 +10:00 |
|
Nicolas
|
aafd23fa8a
|
Merge pull request #252 from mattjoyce/fix-208-py-sdk-interval-poll-name
Fix 208 py sdk interval poll name
|
2024-06-08 21:33:17 -07:00 |
|
Matt Joyce
|
6fd9ce1c89
|
type hints and linting
|
2024-06-08 11:46:52 +10:00 |
|
Matt Joyce
|
7477c5e5bd
|
Use error handler consistently
|
2024-06-08 11:28:51 +10:00 |
|
Matt Joyce
|
9f306736af
|
More detailed error handling
|
2024-06-08 11:18:30 +10:00 |
|
Matt Joyce
|
c71ea7a795
|
Prepare headers consistently
|
2024-06-08 11:08:26 +10:00 |
|
Matt Joyce
|
8f9a165c2f
|
Lint - whitespace
|
2024-06-08 08:03:02 +10:00 |
|
Matt Joyce
|
5f0df596ec
|
Align param name with JS SDK
timeout becomes poll_interval
|
2024-06-08 07:37:08 +10:00 |
|
Nicolas
|
f24ca76618
|
Nick: removing rate limit emails for now
|
2024-06-07 10:39:11 -07:00 |
|
Nicolas
|
98d82c4cec
|
Update search.ts
|
2024-06-06 20:02:21 -07:00 |
|
Nicolas
|
5e80f8af87
|
Nick: llm extract 50
|
2024-06-06 18:35:44 -07:00 |
|
rafaelsideguide
|
7b7a6f8a39
|
Merge branch 'main' of https://github.com/mendableai/firecrawl
|
2024-06-06 17:51:28 -03:00 |
|
rafaelsideguide
|
f2695df215
|
updated sdk versions
|
2024-06-06 17:51:12 -03:00 |
|
rafaelsideguide
|
560f256a35
|
fixing minor problems on workflow
|
2024-06-06 17:36:48 -03:00 |
|
rafaelsideguide
|
f5318ea7d7
|
Update index.test.ts
|
2024-06-06 16:50:20 -03:00 |
|
rafaelsideguide
|
cd7f9abcec
|
Update index.test.ts
|
2024-06-06 16:44:46 -03:00 |
|
rafaelsideguide
|
7b9b668b95
|
Update index.test.ts
|
2024-06-06 16:36:51 -03:00 |
|
rafaelsideguide
|
82e0ed4cd3
|
Update index.test.ts
|
2024-06-06 16:33:27 -03:00 |
|
rafaelsideguide
|
dac7612be2
|
Merge branch 'main' of https://github.com/mendableai/firecrawl into 194-sdk-ci-pipeline-for-publishing-pythonnode-sdk
|
2024-06-06 16:07:25 -03:00 |
|
Nicolas
|
c2ad358390
|
Nick:
|
2024-06-06 12:05:20 -07:00 |
|
rafaelsideguide
|
79ec9f04dc
|
Merge branch 'main' of https://github.com/mendableai/firecrawl into 194-sdk-ci-pipeline-for-publishing-pythonnode-sdk
|
2024-06-06 15:58:14 -03:00 |
|
Nicolas
|
de06b13deb
|
Update rate-limiter.ts
|
2024-06-06 11:56:22 -07:00 |
|
Nicolas
|
27a8fd0c3c
|
Update rate-limiter.ts
|
2024-06-06 11:56:00 -07:00 |
|
Nicolas
|
1129d33321
|
Update rate-limiter.ts
|
2024-06-06 11:53:12 -07:00 |
|
rafaelsideguide
|
b234b4be5a
|
Merge branch 'main' into 194-sdk-ci-pipeline-for-publishing-pythonnode-sdk
|
2024-06-06 15:44:29 -03:00 |
|
rafaelsideguide
|
af0bfca847
|
Merge branch 'main' into 194-sdk-ci-pipeline-for-publishing-pythonnode-sdk
|
2024-06-06 15:36:28 -03:00 |
|
rafaelsideguide
|
8132f22c73
|
nice
|
2024-06-06 15:36:20 -03:00 |
|
Nicolas
|
f1b5ec8517
|
Nick: fixes
|
2024-06-06 11:23:10 -07:00 |
|
Nicolas
|
deae7dcd61
|
Update email_notification.ts
|
2024-06-06 10:41:54 -07:00 |
|
Nicolas
|
f725fa5a97
|
Update email_notification.ts
|
2024-06-06 10:41:23 -07:00 |
|
rafaelsideguide
|
fb758fa05e
|
go
|
2024-06-06 14:01:16 -03:00 |
|
Nicolas
|
0310da6729
|
Update rate-limiter.ts
|
2024-06-06 09:31:44 -07:00 |
|
Nicolas
|
01503c1fbf
|
Nick:
|
2024-06-06 09:29:25 -07:00 |
|
rafaelsideguide
|
b3cae4c858
|
adding js and testing twine
|
2024-06-06 13:27:31 -03:00 |
|
rafaelsideguide
|
bc1c1e5053
|
updating version to check if it runs
|
2024-06-06 11:41:01 -03:00 |
|
Rafael Miller
|
7686ad5702
|
Merge pull request #196 from mattjoyce/main
Python-SDK transitional build setup for pyproject.toml
|
2024-06-06 10:26:16 -03:00 |
|
Nicolas
|
525b4f2a83
|
Update rate-limiter.ts
|
2024-06-05 14:38:10 -07:00 |
|
Nicolas
|
d7f8208cdb
|
Update email_notification.ts
|
2024-06-05 13:53:31 -07:00 |
|
Nicolas
|
ec10eb09f3
|
Update credit_billing.ts
|
2024-06-05 13:22:03 -07:00 |
|
Nicolas
|
5991000d2b
|
Update credit_billing.ts
|
2024-06-05 13:21:15 -07:00 |
|