Commit Graph

788 Commits

Author SHA1 Message Date
Gergő Móricz
fc08ff450d search port 2024-08-15 20:10:59 +02:00
Nicolas
86326f34e9 Update single_url.test.ts 2024-08-15 13:48:42 -04:00
Gergő Móricz
129a882bcc fix(scrape): give scrapes their real job id 2024-08-15 19:29:47 +02:00
Gergő Móricz
965a5817d1 fix(queue-worker): log jobs correctly 2024-08-15 19:27:15 +02:00
Gergő Móricz
dad9d353d9 use thomas's url validation 2024-08-15 19:19:02 +02:00
Gergő Móricz
e3279274f1 fix: make playground crawl work 2024-08-15 19:14:32 +02:00
Gergő Móricz
c5597bc722 fix: robots.txt laoding 2024-08-15 19:11:07 +02:00
Gergő Móricz
29f0d9ec94 propagate priority to fire-engine 2024-08-15 19:04:46 +02:00
Gergő Móricz
b79d3d1754 fix 2024-08-15 19:02:05 +02:00
Gergő Móricz
57730f6a35 priority changes 2024-08-15 18:58:07 +02:00
Gergő Móricz
846610681b fix: fix posthog, add dummy crawl DB items 2024-08-15 18:55:18 +02:00
Nicolas
6e1074cdd1 Update website_params.ts
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Go SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
2024-08-14 17:39:54 -04:00
Thomas Kosmas
6410e1a81d Update params 2024-08-15 00:10:14 +03:00
Gergő Móricz
8a5cad72f6 fix(queue-worker): variable name collision 2024-08-14 22:02:05 +02:00
Gergő Móricz
b8ec40dd72 fix(crawl): submit sitemapped jobs in bulk 2024-08-14 20:34:19 +02:00
Gergő Móricz
2ca1017fc3 fix(crawl): make request 0 of crawl jobs higher priority 2024-08-14 19:34:18 +02:00
Gergő Móricz
cfad067a63 fix(fly): change proxy limits 2024-08-14 18:52:40 +02:00
Gergő Móricz
a6c81f9d62 fix: return all data when calling webhook 2024-08-14 17:53:47 +02:00
Nicolas
e2472b9b0d Merge remote-tracking branch 'origin/v1/mockup-controllers' into v1-webscraper 2024-08-13 16:34:57 -04:00
Gergo Moricz
2e5e480cc2 fix(crawl): call webhooks 2024-08-13 22:10:17 +02:00
Gergo Moricz
a33596de3c fix(log_job): add crawl_id 2024-08-13 22:03:46 +02:00
Gergo Moricz
9252940b52 fix(crawl-status): sort data 2024-08-13 21:55:13 +02:00
Gergo Moricz
8dbac0268c feat: offload crawl results to the DB 2024-08-13 21:40:59 +02:00
Gergo Moricz
4bbc9db1df fix: prioritize scrape jobs over crawl jobs 2024-08-13 21:31:34 +02:00
Gergo Moricz
5f2af37880 fix(scrape): remove scrape job from queue after the job is done 2024-08-13 21:26:41 +02:00
Gergo Moricz
2413e33359 fix(queue-worker): remove console.log 2024-08-13 21:07:36 +02:00
Gergo Moricz
d7549d4dc5 feat: remove webScraperQueue 2024-08-13 21:03:24 +02:00
Gergő Móricz
4a2c37dcf5
Merge branch 'main' into feat/queue-scrapes 2024-08-13 20:53:49 +02:00
Gergo Moricz
86e136beca feat: crawl to scrape conversion 2024-08-13 20:51:43 +02:00
rafaelsideguide
a4be95ac27 fixed tests 2024-08-13 13:42:26 -03:00
Nicolas
09ca165d2e
Merge pull request #531 from kevinswiber/fix/respect-docker-env-file-comments
Self-host fix: Moving comments of .env.example values from end-of-line to above-line.
2024-08-12 16:54:56 -04:00
Nicolas
d06f40810c
Merge pull request #515 from wahpiangle/main
Update redis urls in example .env
2024-08-12 16:51:59 -04:00
Nicolas
6810338271 Update search.ts 2024-08-12 16:51:43 -04:00
Thomas Kosmas
98be29c963 Update parameters for platform.openai.com
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Go SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
2024-08-12 22:49:28 +03:00
Kevin Swiber
33aa5cf0de
Moving comments of .env.example values from end-of-line to above-line. Self-host docs suggest using .env.example as a base. However, Docker doesn't respect end-of-line comments. It sets the comment as the actual value of the variable. This fix prevents that. 2024-08-12 12:24:46 -07:00
Nicolas
74a5125185 Nick: removed redlock 2024-08-12 15:07:30 -04:00
Nicolas
0bd1a820ee Update auth.ts 2024-08-12 13:42:09 -04:00
Nicolas
25a899eae3 Nick: redlock cache in auth 2024-08-12 13:37:47 -04:00
Rafael Miller
36e4b2cf49
Update .env.example 2024-08-12 10:37:00 -03:00
Quan Ming
a96ad4b0e2 Update redis url to use comment 2024-08-10 12:33:26 +08:00
Nicolas
e28c415cf4 Nick: 2024-08-09 14:07:46 -04:00
rafaelsideguide
0591000b64 bugfix includes excludes 2024-08-09 14:30:41 -03:00
Quan Ming
0221872a70 Update redis urls in example .env 2024-08-10 00:40:11 +08:00
Nicolas
f1f5605010 Update website_params.ts 2024-08-08 12:31:58 -04:00
Nicolas
b0abad07da
Merge pull request #496 from tak-s/improve-logging-level
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
Improve logs
2024-08-07 22:01:12 -04:00
Gergo Moricz
920b7f2f44 fix(runWebScraper): don't filter empty docs 2024-08-07 21:00:22 +02:00
Gergo Moricz
55ec96c23f fix(queue-worker): bad job lock extension time 2024-08-07 20:24:16 +02:00
Gergo Moricz
ab7a35c581 fix(queue-worker): log lock extensions 2024-08-07 19:49:48 +02:00
Gergo Moricz
a1c2ee5aa9 fix: always complete job, no try 2024-08-07 19:39:09 +02:00
Gergo Moricz
191dfbd9ca fix: move to completed in one place 2024-08-07 18:49:58 +02:00
Nicolas
457c082ba1 Nick: fixed tests 2024-08-07 11:08:53 -04:00
Gergő Móricz
5fc7fcb77c
Merge branch 'main' into feat/queue-scrapes 2024-08-07 16:35:44 +02:00
Gergo Moricz
fe9fdb578b revert bad hotfixes 2024-08-07 16:34:25 +02:00
Gergo Moricz
b7c01dcb9b fix(webScraperQueue): reduce retries to 2 2024-08-07 16:31:50 +02:00
Gergo Moricz
cdf7bad5b4 fix(runWebScraper): don't move to completed 2024-08-07 15:20:56 +02:00
Gergo Moricz
9df8719efa fix(queue-worker): raise queue log level to info 2024-08-07 14:56:04 +02:00
Gergo Moricz
7bb922071c fix(queue-worker): manually renew lock (testing) 2024-08-07 14:35:20 +02:00
Gergo Moricz
8216266d16 fix(scrape_log): display error properly 2024-08-07 14:19:20 +02:00
Gergo Moricz
2e2e80d679 fix(scrape-events): updateScrapeResult fix 2024-08-07 14:17:50 +02:00
Gergo Moricz
b5ec47fd96 fix(runWebScraper): don't fetch next job 2024-08-07 13:53:04 +02:00
rafaelsideguide
6cdf4c68ec wip: map, crawl, scrape mockups 2024-08-06 15:24:45 -03:00
Nicolas
3321ca9398
Merge pull request #504 from mendableai/feat/fullpage-screenshot
[Feat] Added fullpagescreenshot capabilities
2024-08-06 13:52:29 -04:00
Gergo Moricz
b60ee30dba fix(single_url): accept 500 2024-08-06 18:00:56 +02:00
Gergo Moricz
06751a8e21 fix(crawl-status): missing partial data after cancel 2024-08-06 17:31:20 +02:00
Gergo Moricz
810b98ec38 fix(scrape): fix timeout error code 2024-08-06 17:30:01 +02:00
Gergo Moricz
3ae95a2740 fix(scrape): consider timeout property 2024-08-06 17:25:58 +02:00
Gergo Moricz
8566ece700 fix(scrape): pass extractorOptions 2024-08-06 17:15:19 +02:00
Gergo Moricz
8e0aa69603 fix(crawl-status): partial_data 2024-08-06 17:06:21 +02:00
Gergo Moricz
1ab119c874 fix(scrape): don't double-bill for scrape 2024-08-06 16:57:23 +02:00
Gergo Moricz
7c5cda7b45 fix(queue-worker): concurrency 2024-08-06 16:57:00 +02:00
Gergo Moricz
d7d63790e5 fix(crawl-status): isCancelled should be status failed 2024-08-06 16:35:55 +02:00
Gergo Moricz
03c84a9372 cleanup and fix cancelling 2024-08-06 16:26:46 +02:00
rafaelsideguide
4d24a99d50 fix params 2024-08-06 09:34:43 -03:00
rafaelsideguide
3edc3a3d15 added fullpagescreenshot capabilities, wip on fire-engine side 2024-08-05 18:17:37 -03:00
rafaelsideguide
f32e8de156 fixes the empty excludes.filter undefined bug 2024-08-05 18:13:31 -03:00
tak-s
af9bc5c8bb Suppressed repetitive logs 2024-08-04 15:09:36 +09:00
Nicolas
1742e4ceae Nick: 2024-08-02 19:25:15 -04:00
Nicolas
39aecd974b Update redis-health.ts 2024-08-02 17:43:45 -04:00
Nicolas
b448e3c3ad Update website_params.ts 2024-08-02 14:26:35 -04:00
rafaelsideguide
4051630632 Update sitemap.ts 2024-08-02 11:32:48 -03:00
rafaelsideguide
8568b61015 bugfix for sitemaps 2024-08-02 11:03:01 -03:00
Nicolas
af68b7a785
Merge pull request #475 from mendableai/bugfix/issue-466
Some checks failed
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
Check Redis / clean-jobs (push) Has been cancelled
[Bug] pdfs and logging pdf events, also added trycatchs for docx
2024-08-01 22:05:26 -04:00
rafaelsideguide
f48ff36b32 added .inc files and forced lower case comparison 2024-07-31 09:28:43 -03:00
Nicolas
ad6f6eff4b Update fireEngine.ts 2024-07-30 19:15:54 -04:00
Nicolas
f9827b2151 Update credit_billing.ts 2024-07-30 19:13:17 -04:00
Nicolas
6d99dedd3c Nick: fixed tests 2024-07-30 19:11:01 -04:00
Nicolas
a28ecc1f61 Nick: caching 2024-07-30 18:59:35 -04:00
Nicolas
52198f2991 Nick: 2024-07-30 16:15:08 -04:00
Nicolas
f43d5e7895 Nick: scrape queue 2024-07-30 14:44:13 -04:00
Nicolas
7e002a8b06 Nick: bull mq 2024-07-30 13:27:23 -04:00
Nicolas
46bcbd931f Merge branch 'main' into feat/queue-scrapes 2024-07-30 12:44:07 -04:00
Nicolas
fd2452ec9c Update scrape.ts 2024-07-30 12:42:12 -04:00
rafaelsideguide
8f5174ffc7 Update auth.ts 2024-07-30 10:37:33 -03:00
rafaelsideguide
d25d7e7244 special case: developer.apple.com
Some checks are pending
Fly Deploy / Pre-deploy checks (push) Waiting to run
Fly Deploy / Test Suite (push) Blocked by required conditions
Fly Deploy / Python SDK Tests (push) Blocked by required conditions
Fly Deploy / JavaScript SDK Tests (push) Blocked by required conditions
Fly Deploy / Deploy app (push) Blocked by required conditions
Fly Deploy / Build and publish Python SDK (push) Blocked by required conditions
Fly Deploy / Build and publish JavaScript SDK (push) Blocked by required conditions
2024-07-30 10:13:09 -03:00
Nicolas
5e8ffcf505 Update website_params.ts 2024-07-29 20:43:47 -04:00
Nicolas
7b813883ef Nick: first layer 2024-07-29 20:31:51 -04:00
Nicolas
e99c2568f4 Update auth.ts 2024-07-29 18:44:18 -04:00
Nicolas
968a2dc753 Nick: 2024-07-29 18:37:09 -04:00
Nicolas
04942bb9de Nick: 2024-07-29 18:31:43 -04:00
Nicolas
267d4681bf Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-07-29 17:21:15 -04:00