Nicolas
5205c5f005
Update map.ts
2024-08-16 19:37:00 -04:00
Nicolas
0c05d096a9
Merge branch 'v1-webscraper' of https://github.com/mendableai/firecrawl into v1-webscraper
2024-08-16 19:33:58 -04:00
Nicolas
ab48353226
Nick: /map almost good
2024-08-16 19:33:57 -04:00
Gergő Móricz
eb84673b06
feat: crawl status websocket WIP
2024-08-17 01:04:14 +02:00
Gergő Móricz
e2a6ef26d3
mount v1Router under v1 path
2024-08-16 23:48:50 +02:00
Gergő Móricz
4c1b74dab3
fix(map): remove robots.txt
2024-08-16 23:46:10 +02:00
Gergő Móricz
c281fe62c0
fix(crawl): propagate db fix to preview endpoint
2024-08-16 23:43:54 +02:00
Gergő Móricz
803577eeba
feat(crawl): webhook
2024-08-16 23:42:48 +02:00
Gergő Móricz
e6738abf96
fix(crawl-status): retrieve from DB in bulk
2024-08-16 23:39:39 +02:00
rafaelsideguide
086ba6280b
fixed markdown format
2024-08-16 18:39:13 -03:00
Gergő Móricz
aabfaf0ac5
clean up crawl-status, fix db ddos
2024-08-16 23:29:39 +02:00
rafaelsideguide
e5b807ccc4
Merge branch 'v1-webscraper' of https://github.com/mendableai/firecrawl into v1-webscraper
2024-08-16 17:57:31 -03:00
rafaelsideguide
7a61325500
map + search + scrape markdown bug
2024-08-16 17:57:11 -03:00
Gergő Móricz
5896153d19
fix: crawl status and redis fixes
2024-08-16 22:52:48 +02:00
Gergő Móricz
3fcb21930e
remove log
2024-08-16 22:48:23 +02:00
Gergő Móricz
f20328bdbb
crawl status and document stuff
2024-08-16 22:48:05 +02:00
Nicolas
0c057bb649
Update index.test.ts
2024-08-16 16:45:10 -04:00
Nicolas
b32464558a
Update index.test.ts
2024-08-16 16:41:09 -04:00
Nicolas
5bac7988a6
Update index.test.ts
2024-08-16 16:08:38 -04:00
Nicolas
290c7ee936
Update index.test.ts
2024-08-16 16:06:46 -04:00
Nicolas
23a033fe61
Nick: fixes and more e2e tests
2024-08-16 16:03:35 -04:00
Nicolas
78ca94251c
Merge pull request #480 from mendableai/nsc/hyper-v81
...
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Go SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
Reduce metrics ingestion w/ HyperDX v0.8.1
2024-08-16 14:34:14 -04:00
Nicolas
37ae9a9043
Update index.test.ts
2024-08-16 14:17:43 -04:00
Nicolas
200ce8e2ce
Merge branch 'v1-webscraper' of https://github.com/mendableai/firecrawl into v1-webscraper
2024-08-16 14:16:35 -04:00
Nicolas
21d3798e49
Nick: initial e2e v1 tests for /scrape
2024-08-16 14:16:30 -04:00
rafaelsideguide
3f998b688d
scrape ready
2024-08-16 15:14:37 -03:00
Nicolas
b0d211ecc1
Merge branch 'main' into v1-webscraper
2024-08-16 13:43:28 -04:00
Gergő Móricz
fd6432e7fd
fix(queue-worker): correct job success
2024-08-16 19:16:08 +02:00
Gergő Móricz
6e54942265
fix(queue-worker): add cancelled to crawl log
2024-08-16 19:11:53 +02:00
rafaelsideguide
9b1cb266a0
added origin to request types
2024-08-16 13:49:50 -03:00
Gergő Móricz
d0a8382a5b
fix(queue-worker): crawl finishing race condition
2024-08-16 18:48:52 +02:00
Gergő Móricz
6bd52e63bf
fix(queue-worker): fix linksOnPage undefined error
2024-08-16 18:42:24 +02:00
Gergő Móricz
5a6570cba2
fix(webhooks): call back with parent crawl ID
2024-08-16 17:42:42 +02:00
rafaelsideguide
7d324bd2c8
Create checkCredits.ts
2024-08-16 11:21:52 -03:00
Nicolas
ec361609d2
Nick: added growth-2x plan
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Go SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
2024-08-15 18:37:19 -04:00
Gergő Móricz
8b7569f8f3
add zod, create middleware, update openapi declaration, add crawl logic
2024-08-15 23:30:33 +02:00
Gergő Móricz
4165de1773
v1 restructure
2024-08-15 21:51:59 +02:00
Gergő Móricz
af08ab0b1a
fix bad module resolution
2024-08-15 21:51:52 +02:00
Nicolas
c917c8fbcd
Merge branch 'main' into v1-webscraper
2024-08-15 15:14:29 -04:00
Nicolas
32c6b1f136
Nick: remove active job alerts
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Go SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
2024-08-15 14:50:30 -04:00
Gergő Móricz
0c14366720
fix: add checkandupdateurl to crawlPreview
2024-08-15 20:30:25 +02:00
Nicolas
81b2479db3
Merge pull request #459 from mendableai/feat/queue-scrapes
...
feat: Move scraper to queue
2024-08-15 14:19:55 -04:00
Gergő Móricz
fc08ff450d
search port
2024-08-15 20:10:59 +02:00
Nicolas
86326f34e9
Update single_url.test.ts
2024-08-15 13:48:42 -04:00
Gergő Móricz
129a882bcc
fix(scrape): give scrapes their real job id
2024-08-15 19:29:47 +02:00
Gergő Móricz
965a5817d1
fix(queue-worker): log jobs correctly
2024-08-15 19:27:15 +02:00
Gergő Móricz
dad9d353d9
use thomas's url validation
2024-08-15 19:19:02 +02:00
Gergő Móricz
e3279274f1
fix: make playground crawl work
2024-08-15 19:14:32 +02:00
Gergő Móricz
c5597bc722
fix: robots.txt laoding
2024-08-15 19:11:07 +02:00
Gergő Móricz
29f0d9ec94
propagate priority to fire-engine
2024-08-15 19:04:46 +02:00
Gergő Móricz
b79d3d1754
fix
2024-08-15 19:02:05 +02:00
Gergő Móricz
57730f6a35
priority changes
2024-08-15 18:58:07 +02:00
Gergő Móricz
846610681b
fix: fix posthog, add dummy crawl DB items
2024-08-15 18:55:18 +02:00
rafaelsideguide
81066cf90a
updating cargo pckg name n version
2024-08-15 10:11:27 -03:00
Nicolas
6e1074cdd1
Update website_params.ts
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Go SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
2024-08-14 17:39:54 -04:00
Thomas Kosmas
6410e1a81d
Update params
2024-08-15 00:10:14 +03:00
rafaelsideguide
697501cc8a
Merge remote-tracking branch 'origin/main' into f/rust-sdk
2024-08-14 17:30:43 -03:00
Gergő Móricz
8a5cad72f6
fix(queue-worker): variable name collision
2024-08-14 22:02:05 +02:00
Gergő Móricz
b8ec40dd72
fix(crawl): submit sitemapped jobs in bulk
2024-08-14 20:34:19 +02:00
Gergő Móricz
2ca1017fc3
fix(crawl): make request 0 of crawl jobs higher priority
2024-08-14 19:34:18 +02:00
Gergő Móricz
f4466f6bb0
fix(test-suite): add artillery
2024-08-14 19:33:09 +02:00
Gergő Móricz
cfad067a63
fix(fly): change proxy limits
2024-08-14 18:52:40 +02:00
Gergő Móricz
a6c81f9d62
fix: return all data when calling webhook
2024-08-14 17:53:47 +02:00
rafaelsideguide
f86d2bb291
added go-sdk as submodule
2024-08-13 18:17:35 -03:00
Nicolas
e2472b9b0d
Merge remote-tracking branch 'origin/v1/mockup-controllers' into v1-webscraper
2024-08-13 16:34:57 -04:00
Gergo Moricz
2e5e480cc2
fix(crawl): call webhooks
2024-08-13 22:10:17 +02:00
Gergo Moricz
a33596de3c
fix(log_job): add crawl_id
2024-08-13 22:03:46 +02:00
Gergo Moricz
9252940b52
fix(crawl-status): sort data
2024-08-13 21:55:13 +02:00
Gergo Moricz
8dbac0268c
feat: offload crawl results to the DB
2024-08-13 21:40:59 +02:00
Gergo Moricz
4bbc9db1df
fix: prioritize scrape jobs over crawl jobs
2024-08-13 21:31:34 +02:00
Gergo Moricz
5f2af37880
fix(scrape): remove scrape job from queue after the job is done
2024-08-13 21:26:41 +02:00
Gergo Moricz
2413e33359
fix(queue-worker): remove console.log
2024-08-13 21:07:36 +02:00
Gergo Moricz
d7549d4dc5
feat: remove webScraperQueue
2024-08-13 21:03:24 +02:00
Gergő Móricz
4a2c37dcf5
Merge branch 'main' into feat/queue-scrapes
2024-08-13 20:53:49 +02:00
Gergo Moricz
86e136beca
feat: crawl to scrape conversion
2024-08-13 20:51:43 +02:00
rafaelsideguide
a4be95ac27
fixed tests
2024-08-13 13:42:26 -03:00
KentHsu
fd060c7ef1
fix: go-sdk module name
2024-08-13 10:10:45 +08:00
Nicolas
09ca165d2e
Merge pull request #531 from kevinswiber/fix/respect-docker-env-file-comments
...
Self-host fix: Moving comments of .env.example values from end-of-line to above-line.
2024-08-12 16:54:56 -04:00
Nicolas
d06f40810c
Merge pull request #515 from wahpiangle/main
...
Update redis urls in example .env
2024-08-12 16:51:59 -04:00
Nicolas
6810338271
Update search.ts
2024-08-12 16:51:43 -04:00
Thomas Kosmas
98be29c963
Update parameters for platform.openai.com
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Go SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
2024-08-12 22:49:28 +03:00
Kevin Swiber
33aa5cf0de
Moving comments of .env.example values from end-of-line to above-line. Self-host docs suggest using .env.example as a base. However, Docker doesn't respect end-of-line comments. It sets the comment as the actual value of the variable. This fix prevents that.
2024-08-12 12:24:46 -07:00
Nicolas
74a5125185
Nick: removed redlock
2024-08-12 15:07:30 -04:00
Nicolas
0bd1a820ee
Update auth.ts
2024-08-12 13:42:09 -04:00
Nicolas
25a899eae3
Nick: redlock cache in auth
2024-08-12 13:37:47 -04:00
Rafael Miller
36e4b2cf49
Update .env.example
2024-08-12 10:37:00 -03:00
Quan Ming
a96ad4b0e2
Update redis url to use comment
2024-08-10 12:33:26 +08:00
Nicolas
e28c415cf4
Nick:
2024-08-09 14:07:46 -04:00
Gergo Moricz
5a778f2c22
fix(js-sdk): add type metadata to exports
Fly Deploy / Pre-deploy checks (push) Has been cancelled
Fly Deploy / Test Suite (push) Has been cancelled
Fly Deploy / Python SDK Tests (push) Has been cancelled
Fly Deploy / JavaScript SDK Tests (push) Has been cancelled
Fly Deploy / Go SDK Tests (push) Has been cancelled
Fly Deploy / Deploy app (push) Has been cancelled
Fly Deploy / Build and publish Python SDK (push) Has been cancelled
Fly Deploy / Build and publish JavaScript SDK (push) Has been cancelled
2024-08-09 20:05:36 +02:00
Rafael Miller
6a78f6fe78
Merge pull request #497 from KentHsu/feat/add-go-sdk
...
[Feat] Add Go SDK implementation
2024-08-09 14:58:20 -03:00
rafaelsideguide
0591000b64
bugfix includes excludes
2024-08-09 14:30:41 -03:00
Kent (Chia-Hao), Hsu
1fda882983
Merge branch 'mendableai:main' into feat/add-go-sdk
2024-08-10 00:46:15 +08:00
Quan Ming
0221872a70
Update redis urls in example .env
2024-08-10 00:40:11 +08:00
rafaelsideguide
b802ea02a1
small improvements
...
- wait for getting results on crawl: sometimes crawl takes some a second to save the data on the db and this causes response.data to be empty
- added timeout value to test script
- increased http client timeout (llm extract was failing on e2e tests)
- fixed env path on test script
2024-08-09 11:13:14 -03:00
rafaelsideguide
0b8df5e264
python sdk and tests
2024-08-08 14:25:09 -03:00
Nicolas
f1f5605010
Update website_params.ts
2024-08-08 12:31:58 -04:00
rafaelsideguide
cf9d77d889
typescript fixes
2024-08-08 11:41:13 -03:00
Nicolas
b0abad07da
Merge pull request #496 from tak-s/improve-logging-level
...
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
Improve logs
2024-08-07 22:01:12 -04:00
rafaelsideguide
c16437e933
fixed bunch of types
2024-08-07 17:05:18 -03:00
Gergo Moricz
920b7f2f44
fix(runWebScraper): don't filter empty docs
2024-08-07 21:00:22 +02:00
Gergo Moricz
55ec96c23f
fix(queue-worker): bad job lock extension time
2024-08-07 20:24:16 +02:00
Gergo Moricz
ab7a35c581
fix(queue-worker): log lock extensions
2024-08-07 19:49:48 +02:00
Gergo Moricz
a1c2ee5aa9
fix: always complete job, no try
2024-08-07 19:39:09 +02:00
Gergo Moricz
191dfbd9ca
fix: move to completed in one place
2024-08-07 18:49:58 +02:00
Nicolas
457c082ba1
Nick: fixed tests
2024-08-07 11:08:53 -04:00
Nicolas
8a992b1596
Merge branch 'main' of https://github.com/mendableai/firecrawl
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
2024-08-07 10:40:06 -04:00
Nicolas
b12e1157cc
Nick: v35 bump
2024-08-07 10:40:00 -04:00
Gergő Móricz
5fc7fcb77c
Merge branch 'main' into feat/queue-scrapes
2024-08-07 16:35:44 +02:00
Gergo Moricz
fe9fdb578b
revert bad hotfixes
2024-08-07 16:34:25 +02:00
Gergo Moricz
b7c01dcb9b
fix(webScraperQueue): reduce retries to 2
2024-08-07 16:31:50 +02:00
Gergo Moricz
cdf7bad5b4
fix(runWebScraper): don't move to completed
2024-08-07 15:20:56 +02:00
Gergo Moricz
9df8719efa
fix(queue-worker): raise queue log level to info
2024-08-07 14:56:04 +02:00
Gergo Moricz
7bb922071c
fix(queue-worker): manually renew lock (testing)
2024-08-07 14:35:20 +02:00
Gergo Moricz
8216266d16
fix(scrape_log): display error properly
2024-08-07 14:19:20 +02:00
Gergo Moricz
2e2e80d679
fix(scrape-events): updateScrapeResult fix
2024-08-07 14:17:50 +02:00
Gergo Moricz
b5ec47fd96
fix(runWebScraper): don't fetch next job
2024-08-07 13:53:04 +02:00
Gergo Moricz
020a5efdb7
Revert "Revert "Merge pull request #432 from mendableai/mog/js-sdk-cjs""
...
This reverts commit 5da4472842
.
2024-08-07 01:27:26 +02:00
Gergő Móricz
7380d7799f
Merge branch 'main' into mog/js-sdk-cjs
2024-08-07 01:12:36 +02:00
Gergo Moricz
5f7724205f
fix(js-sdk): re-add types
2024-08-07 01:06:21 +02:00
Nicolas
f294d3922c
Nick: revert
2024-08-06 18:44:45 -04:00
Nicolas
5da4472842
Revert "Merge pull request #432 from mendableai/mog/js-sdk-cjs"
...
This reverts commit bb90e03dea
, reversing
changes made to 3321ca9398
.
2024-08-06 18:41:06 -04:00
Nicolas
a67a5c04c9
Revert "Merge pull request #432 from mendableai/mog/js-sdk-cjs"
...
This reverts commit bb90e03dea
, reversing
changes made to 3321ca9398
.
2024-08-06 18:02:56 -04:00
Nicolas
bb90e03dea
Merge pull request #432 from mendableai/mog/js-sdk-cjs
...
fix(js-sdk): build both CommonJS and ESM versions
2024-08-06 17:38:57 -04:00
rafaelsideguide
3fb2307010
Update index.ts
2024-08-06 17:34:13 -03:00
rafaelsideguide
d599d31e63
wip
2024-08-06 17:33:39 -03:00
rafaelsideguide
6cdf4c68ec
wip: map, crawl, scrape mockups
2024-08-06 15:24:45 -03:00
Nicolas
3321ca9398
Merge pull request #504 from mendableai/feat/fullpage-screenshot
...
[Feat] Added fullpagescreenshot capabilities
2024-08-06 13:52:29 -04:00
Gergo Moricz
b60ee30dba
fix(single_url): accept 500
2024-08-06 18:00:56 +02:00
Gergo Moricz
06751a8e21
fix(crawl-status): missing partial data after cancel
2024-08-06 17:31:20 +02:00
Gergo Moricz
810b98ec38
fix(scrape): fix timeout error code
2024-08-06 17:30:01 +02:00
Gergo Moricz
3ae95a2740
fix(scrape): consider timeout property
2024-08-06 17:25:58 +02:00
Gergo Moricz
8566ece700
fix(scrape): pass extractorOptions
2024-08-06 17:15:19 +02:00
Gergo Moricz
8e0aa69603
fix(crawl-status): partial_data
2024-08-06 17:06:21 +02:00
Gergo Moricz
1ab119c874
fix(scrape): don't double-bill for scrape
2024-08-06 16:57:23 +02:00
Gergo Moricz
7c5cda7b45
fix(queue-worker): concurrency
2024-08-06 16:57:00 +02:00
Gergo Moricz
d7d63790e5
fix(crawl-status): isCancelled should be status failed
2024-08-06 16:35:55 +02:00
Gergo Moricz
03c84a9372
cleanup and fix cancelling
2024-08-06 16:26:46 +02:00
rafaelsideguide
4d24a99d50
fix params
2024-08-06 09:34:43 -03:00
Nicolas
e195ddbef4
Merge branch 'main' into nsc/hyper-v81
2024-08-05 20:47:39 -04:00
rafaelsideguide
3edc3a3d15
added fullpagescreenshot capabilities, wip on fire-engine side
2024-08-05 18:17:37 -03:00
rafaelsideguide
f32e8de156
fixes the empty excludes.filter undefined bug
2024-08-05 18:13:31 -03:00
KentHsu
1378ffc138
feat: add go-sdk
2024-08-04 17:33:33 +08:00
tak-s
af9bc5c8bb
Suppressed repetitive logs
2024-08-04 15:09:36 +09:00
Nicolas
1742e4ceae
Nick:
2024-08-02 19:25:15 -04:00
Nicolas
39aecd974b
Update redis-health.ts
2024-08-02 17:43:45 -04:00
Nicolas
b448e3c3ad
Update website_params.ts
2024-08-02 14:26:35 -04:00
rafaelsideguide
4051630632
Update sitemap.ts
2024-08-02 11:32:48 -03:00
rafaelsideguide
8568b61015
bugfix for sitemaps
2024-08-02 11:03:01 -03:00
Nicolas
af68b7a785
Merge pull request #475 from mendableai/bugfix/issue-466
...
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
Check Redis / clean-jobs (push) Has been cancelled
[Bug] pdfs and logging pdf events, also added trycatchs for docx
2024-08-01 22:05:26 -04:00
rafaelsideguide
f48ff36b32
added .inc files and forced lower case comparison
2024-07-31 09:28:43 -03:00
Nicolas
ad6f6eff4b
Update fireEngine.ts
2024-07-30 19:15:54 -04:00
Nicolas
f9827b2151
Update credit_billing.ts
2024-07-30 19:13:17 -04:00
Nicolas
6d99dedd3c
Nick: fixed tests
2024-07-30 19:11:01 -04:00
Nicolas
a28ecc1f61
Nick: caching
2024-07-30 18:59:35 -04:00
Nicolas
52198f2991
Nick:
2024-07-30 16:15:08 -04:00
Nicolas
f43d5e7895
Nick: scrape queue
2024-07-30 14:44:13 -04:00
Nicolas
7e002a8b06
Nick: bull mq
2024-07-30 13:27:23 -04:00
Nicolas
46bcbd931f
Merge branch 'main' into feat/queue-scrapes
2024-07-30 12:44:07 -04:00
Nicolas
fd2452ec9c
Update scrape.ts
2024-07-30 12:42:12 -04:00
rafaelsideguide
8f5174ffc7
Update auth.ts
2024-07-30 10:37:33 -03:00
rafaelsideguide
d25d7e7244
special case: developer.apple.com
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
2024-07-30 10:13:09 -03:00
Nicolas
c446942306
Nick:
2024-07-29 21:28:29 -04:00
Nicolas
5e8ffcf505
Update website_params.ts
2024-07-29 20:43:47 -04:00
Nicolas
7b813883ef
Nick: first layer
2024-07-29 20:31:51 -04:00
Nicolas
e99c2568f4
Update auth.ts
2024-07-29 18:44:18 -04:00
Nicolas
968a2dc753
Nick:
2024-07-29 18:37:09 -04:00
Nicolas
04942bb9de
Nick:
2024-07-29 18:31:43 -04:00
Nicolas
267d4681bf
Merge branch 'main' of https://github.com/mendableai/firecrawl
2024-07-29 17:21:15 -04:00
Nicolas
b4833c1694
Nick: increasing default timeout to 45s
2024-07-29 17:21:11 -04:00
Nicolas
7fa08100bf
Merge pull request #414 from NiuBlibing/support_model_name
...
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
support custom models
2024-07-29 13:21:29 -04:00
rafaelsideguide
49e3e64787
bugfix for pdfs and logging pdf events, also added trycatchs for docx
2024-07-29 14:13:46 -03:00
Nicolas
4c9d62f6d3
Nick: fixing sitemap fallback
2024-07-26 18:25:44 -04:00
Nicolas
091924a636
Nick: moving machines from mia to virginia
2024-07-26 17:37:46 -04:00
Nicolas
cb97871ff9
Merge branch 'main' of https://github.com/mendableai/firecrawl
2024-07-26 17:21:11 -04:00
Nicolas
ff4266f09e
Update pdfProcessor.ts
2024-07-26 17:21:09 -04:00
Nicolas
0c2e3a72cc
Merge pull request #460 from mendableai/nsc/admin-router
...
Admin router + Improve redis notifications
2024-07-26 12:16:14 -04:00
rafaelsideguide
96cec2a673
fix checking scrape log success content length
2024-07-26 12:00:52 -03:00
Nicolas
542270f4c2
Merge pull request #461 from mendableai/nsc/small-handle-for-client-side-errors
...
Client side error handling
2024-07-25 20:53:10 -04:00
Nicolas
dc6f825270
Update email_notification.ts
2024-07-25 20:43:50 -04:00
Nicolas
f82ca3be17
Nick:
2024-07-25 19:53:29 -04:00
Nicolas
01fab6e036
Update single_url.ts
2024-07-25 17:51:41 -04:00
Nicolas
56042d090c
Update single_url.ts
2024-07-25 17:48:44 -04:00
Nicolas
88f5efce8f
Merge branch 'feat/scrape-monitoring'
2024-07-25 17:44:21 -04:00
Nicolas
3242872503
Update single_url.ts
2024-07-25 17:43:55 -04:00
Nicolas
ffd430f198
Merge pull request #457 from JakobStadlhuber/Readiness-Liveness-Probes
...
Readiness liveness probes
2024-07-25 17:20:31 -04:00
Nicolas
7129d7993e
Update v0.ts
2024-07-25 17:19:45 -04:00
rafaelsideguide
e0954d7f59
Merge branch 'main' of https://github.com/mendableai/firecrawl
2024-07-25 17:48:43 -03:00
rafaelsideguide
81aa919262
fix
2024-07-25 17:47:43 -03:00
Nicolas
10e80f00cf
Merge branch 'main' into nsc/admin-router
2024-07-25 16:46:38 -04:00
Nicolas
e5b797549e
Merge branch 'main' into feat/scrape-monitoring
2024-07-25 16:21:02 -04:00
Nicolas
50d2426fc4
Update scrape-events.ts
2024-07-25 16:20:29 -04:00
Nicolas
28a8a98491
Update admin.ts
2024-07-25 14:58:14 -04:00
Nicolas
2014d9dd2e
Nick: admin router
2024-07-25 14:54:20 -04:00
rafaelsideguide
1f1c068eea
changing from error to debug
2024-07-25 10:00:50 -03:00
rafaelsideguide
e720e1bacf
Merge remote-tracking branch 'origin/main' into feat/logger
2024-07-25 09:49:27 -03:00
rafaelsideguide
309728a482
updated logs
2024-07-25 09:48:06 -03:00
Nicolas
2c1221750b
Merge pull request #449 from mendableai/bugfix/malformed-url-sitemap
...
Added regex for links in sitemap
2024-07-24 20:37:35 -04:00
Gergő Móricz
d1a3df6d08
fix: aaaaahhh
2024-07-25 00:50:03 +02:00
Nicolas
6ad7e24403
Update ingestion.tsx
2024-07-24 18:15:51 -04:00
Gergő Móricz
6798695ee4
feat: move scraper to queue
2024-07-25 00:14:25 +02:00
Nicolas
92843a356d
Merge branch 'main' of https://github.com/mendableai/firecrawl
2024-07-24 18:13:36 -04:00
Nicolas
1e13ddbe8e
Nick: changes to the ui component
2024-07-24 18:13:34 -04:00
Gergő Móricz
623b547292
fix(fly.toml): scale up memory limit
2024-07-24 23:39:00 +02:00
Nicolas
15890772be
Scale bump
2024-07-24 16:56:19 -04:00
Eric Ciarla
a4bccbe3bb
Firecrawl UI Template
...
Firecrawl UI template
2024-07-24 15:05:55 -04:00
Eric Ciarla
4596d0b2e6
Add ReadMe and LICENSE
2024-07-24 14:56:53 -04:00
Eric Ciarla
9654721bf2
Vite commit
2024-07-24 14:27:50 -04:00
rafaelsideguide
cc98f83fda
added failed and completed log events
2024-07-24 15:25:36 -03:00
Jakob Stadlhuber
be9e7f9edf
Update Kubernetes configs for playwright-service, api, and worker
...
Added new ConfigMap for playwright-service and adjusted existing references.
Applied imagePullPolicy: Always to ensure all images are updated promptly.
Updated README to include --no-cache for Docker build instructions.
2024-07-24 18:54:16 +02:00
Gergo Moricz
60c74357df
feat(ScrapeEvents): log queue events
2024-07-24 18:44:14 +02:00
rafaelsideguide
4eca6bd301
fix/check-for-auth-on-scrape-log
2024-07-24 12:54:14 -03:00
Nicolas
3a1b8a9797
Update website_params.ts
2024-07-24 11:04:47 -04:00
Nicolas
8b48ec8d30
Update website_params.ts
2024-07-24 11:02:20 -04:00
Gergo Moricz
4d35ad073c
feat(monitoring/scrape): include url, worker, response_size
2024-07-24 16:43:39 +02:00
Gergo Moricz
64bcedeefc
fix(monitoring): bad success check on scrape
2024-07-24 16:21:59 +02:00
Gergo Moricz
d57dbbd0c6
fix: add jobId for scrape
2024-07-24 15:18:12 +02:00
Gergo Moricz
71072fef3b
fix(scrape-events): bad logic
2024-07-24 14:46:41 +02:00
Gergo Moricz
7cd9bf92e3
feat: scrape event logging to DB
2024-07-24 14:31:25 +02:00
Rafael Miller
5e728c1a4d
Update apps/api/src/scraper/WebScraper/crawler.ts
...
no need for regex
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2024-07-24 08:33:00 -03:00
Eric Ciarla
1b7a00624d
Delete old comp
2024-07-23 21:51:08 -04:00
Eric Ciarla
565bc09439
Basic react app
2024-07-23 21:48:11 -04:00
rafaelsideguide
6208ecdbc0
added logger
2024-07-23 17:30:46 -03:00
Eric Ciarla
a0d89169ed
init
2024-07-23 15:48:12 -04:00
Nicolas
f0b07b509b
Update index.ts
2024-07-23 15:15:56 -04:00
rafaelsideguide
a684bd3c5d
added regex for links in sitemap
2024-07-23 09:07:23 -03:00
Nicolas
30e706b43f
Update scrape.ts
2024-07-22 19:15:24 -04:00
Nicolas
8916fec66c
Update index.ts
2024-07-22 19:14:53 -04:00
Nicolas
575ddc9e6e
Update scrape.ts
2024-07-22 19:12:51 -04:00
Nicolas
e31a5007d5
Nick: speed improvements
2024-07-22 18:30:58 -04:00
Nicolas
b229fbebd8
Update scrape_log.ts
2024-07-19 12:53:26 -04:00
rafaelsideguide
5c02dbe20c
fix(isFile): added .tiff extension
2024-07-18 17:07:21 -03:00
Gergo Moricz
f0e95ce399
fix(WebCrawler): filter out file URLs when taking URLs from sitemap
2024-07-18 21:49:37 +02:00
Gergo Moricz
95c6c63b85
fix(fly): raise heap limit to 4G per process
2024-07-18 20:56:54 +02:00
Nicolas
5f14f4f788
Update blocklist.ts
2024-07-18 14:20:19 -04:00
Nicolas
6161b83890
Update scrape_log.ts
2024-07-18 14:17:08 -04:00
Nicolas
2dd7398aad
Update scrape_log.ts
2024-07-18 14:16:46 -04:00
Nicolas
f10f3f886b
Merge pull request #410 from mendableai/feat/fire-engine-chrome-cdp
...
Support chrome-cdp and restructure sitemap fire-engine support.
2024-07-18 13:52:08 -04:00
Nicolas
9a1a227797
Update crawl-cancel.ts
2024-07-18 13:49:51 -04:00
Nicolas
11768571ed
Update crawl-cancel.ts
2024-07-18 13:43:03 -04:00
Nicolas
ce804d3c20
Update crawl-cancel.ts
2024-07-18 13:40:24 -04:00
Nicolas
d2de01d342
Nick: fixes
2024-07-18 13:19:44 -04:00
Gergo Moricz
0b8047c7a0
fix(WebScraper): infinite regex leading to fly.io instance hangs
2024-07-18 19:13:43 +02:00
Nicolas
f11137352c
Merge branch 'main' into feat/fire-engine-chrome-cdp
2024-07-18 12:48:42 -04:00
Nicolas
6d1d46a987
Merge pull request #433 from mendableai/mog/js-sdk-tests-fix
...
fix(js-sdk): transform tests with ts-jest and configure node
2024-07-18 12:40:59 -04:00
Nicolas
01b5e8fc73
Merge pull request #429 from mendableai/mog/fix-job-stuck-2
...
Fix queue stuck bug via lock settings changes
2024-07-18 12:39:21 -04:00
Nicolas
b134ba92bc
Merge pull request #427 from mendableai/docs/update-docs
...
[Docs] Updating docs
2024-07-18 11:49:08 -04:00
rafaelsideguide
f13ef02a08
Update openapi.json
2024-07-18 10:34:03 -03:00
Gergo Moricz
a23b125471
fix(js-sdk): transform tests with ts-jest and configure node
2024-07-18 14:20:51 +02:00
Gergo Moricz
361269974e
fix(js-sdk): remove autogenerated index.d.ts from git and add to gitignore
2024-07-18 13:48:39 +02:00
Gergo Moricz
2e62de4f8b
fix(js-sdk): remove built files from repo and add to gitignore
2024-07-18 13:45:51 +02:00