Gergő Móricz
623b547292
fix(fly.toml): scale up memory limit
2024-07-24 23:39:00 +02:00
Nicolas
15890772be
Scale bump
2024-07-24 16:56:19 -04:00
Eric Ciarla
a4bccbe3bb
Firecrawl UI Template
...
Firecrawl UI template
2024-07-24 15:05:55 -04:00
Eric Ciarla
4596d0b2e6
Add ReadMe and LICENSE
2024-07-24 14:56:53 -04:00
Eric Ciarla
9654721bf2
Vite commit
2024-07-24 14:27:50 -04:00
rafaelsideguide
cc98f83fda
added failed and completed log events
2024-07-24 15:25:36 -03:00
Jakob Stadlhuber
be9e7f9edf
Update Kubernetes configs for playwright-service, api, and worker
...
Added new ConfigMap for playwright-service and adjusted existing references.
Applied imagePullPolicy: Always to ensure all images are updated promptly.
Updated README to include --no-cache for Docker build instructions.
2024-07-24 18:54:16 +02:00
Gergo Moricz
60c74357df
feat(ScrapeEvents): log queue events
2024-07-24 18:44:14 +02:00
rafaelsideguide
4eca6bd301
fix/check-for-auth-on-scrape-log
2024-07-24 12:54:14 -03:00
Nicolas
3a1b8a9797
Update website_params.ts
2024-07-24 11:04:47 -04:00
Nicolas
8b48ec8d30
Update website_params.ts
2024-07-24 11:02:20 -04:00
Gergo Moricz
4d35ad073c
feat(monitoring/scrape): include url, worker, response_size
2024-07-24 16:43:39 +02:00
Gergo Moricz
64bcedeefc
fix(monitoring): bad success check on scrape
2024-07-24 16:21:59 +02:00
Gergo Moricz
d57dbbd0c6
fix: add jobId for scrape
2024-07-24 15:18:12 +02:00
Gergo Moricz
71072fef3b
fix(scrape-events): bad logic
2024-07-24 14:46:41 +02:00
Gergo Moricz
7cd9bf92e3
feat: scrape event logging to DB
2024-07-24 14:31:25 +02:00
Rafael Miller
5e728c1a4d
Update apps/api/src/scraper/WebScraper/crawler.ts
...
no need for regex
Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2024-07-24 08:33:00 -03:00
Eric Ciarla
1b7a00624d
Delete old comp
2024-07-23 21:51:08 -04:00
Eric Ciarla
565bc09439
Basic react app
2024-07-23 21:48:11 -04:00
rafaelsideguide
6208ecdbc0
added logger
2024-07-23 17:30:46 -03:00
Eric Ciarla
a0d89169ed
init
2024-07-23 15:48:12 -04:00
Nicolas
f0b07b509b
Update index.ts
2024-07-23 15:15:56 -04:00
rafaelsideguide
a684bd3c5d
added regex for links in sitemap
2024-07-23 09:07:23 -03:00
Nicolas
30e706b43f
Update scrape.ts
2024-07-22 19:15:24 -04:00
Nicolas
8916fec66c
Update index.ts
2024-07-22 19:14:53 -04:00
Nicolas
575ddc9e6e
Update scrape.ts
2024-07-22 19:12:51 -04:00
Nicolas
e31a5007d5
Nick: speed improvements
2024-07-22 18:30:58 -04:00
Nicolas
b229fbebd8
Update scrape_log.ts
2024-07-19 12:53:26 -04:00
rafaelsideguide
5c02dbe20c
fix(isFile): added .tiff extension
2024-07-18 17:07:21 -03:00
Gergo Moricz
f0e95ce399
fix(WebCrawler): filter out file URLs when taking URLs from sitemap
2024-07-18 21:49:37 +02:00
Gergo Moricz
95c6c63b85
fix(fly): raise heap limit to 4G per process
2024-07-18 20:56:54 +02:00
Nicolas
5f14f4f788
Update blocklist.ts
2024-07-18 14:20:19 -04:00
Nicolas
6161b83890
Update scrape_log.ts
2024-07-18 14:17:08 -04:00
Nicolas
2dd7398aad
Update scrape_log.ts
2024-07-18 14:16:46 -04:00
Nicolas
f10f3f886b
Merge pull request #410 from mendableai/feat/fire-engine-chrome-cdp
...
Support chrome-cdp and restructure sitemap fire-engine support.
2024-07-18 13:52:08 -04:00
Nicolas
9a1a227797
Update crawl-cancel.ts
2024-07-18 13:49:51 -04:00
Nicolas
11768571ed
Update crawl-cancel.ts
2024-07-18 13:43:03 -04:00
Nicolas
ce804d3c20
Update crawl-cancel.ts
2024-07-18 13:40:24 -04:00
Nicolas
d2de01d342
Nick: fixes
2024-07-18 13:19:44 -04:00
Gergo Moricz
0b8047c7a0
fix(WebScraper): infinite regex leading to fly.io instance hangs
2024-07-18 19:13:43 +02:00
Nicolas
f11137352c
Merge branch 'main' into feat/fire-engine-chrome-cdp
2024-07-18 12:48:42 -04:00
Nicolas
6d1d46a987
Merge pull request #433 from mendableai/mog/js-sdk-tests-fix
...
fix(js-sdk): transform tests with ts-jest and configure node
2024-07-18 12:40:59 -04:00
Nicolas
01b5e8fc73
Merge pull request #429 from mendableai/mog/fix-job-stuck-2
...
Fix queue stuck bug via lock settings changes
2024-07-18 12:39:21 -04:00
Nicolas
b134ba92bc
Merge pull request #427 from mendableai/docs/update-docs
...
[Docs] Updating docs
2024-07-18 11:49:08 -04:00
rafaelsideguide
f13ef02a08
Update openapi.json
2024-07-18 10:34:03 -03:00
Gergo Moricz
a23b125471
fix(js-sdk): transform tests with ts-jest and configure node
2024-07-18 14:20:51 +02:00
Nicolas
2fab2d8d29
Update scrape.ts
2024-07-17 20:44:34 -04:00
Nicolas
6609c1b6e5
Update .env.local
2024-07-17 16:22:27 -04:00
Nicolas
17a1f9b55f
Update .env.example
2024-07-17 16:22:04 -04:00
rafaelsideguide
eda616d728
Merge remote-tracking branch 'origin/main' into docs/update-docs
2024-07-17 16:44:51 -03:00
rafaelsideguide
2b4ce12097
Update openapi.json
2024-07-17 16:43:22 -03:00
Gergo Moricz
8160c311c0
fix queue stuck bug via lock setting changes
2024-07-17 21:31:25 +02:00
Caleb Peffer
8d5ebc9b9f
Merge pull request #423 from mendableai/cjp/linksOnPage
...
Caleb: Return a list of links on a page by default
2024-07-17 12:36:07 -06:00
Caleb Peffer
5b24d26c84
Caleb; fixed test
2024-07-17 11:33:12 -07:00
Caleb Peffer
c5d1e7260d
Caleb: made changes per Rafaels requests
2024-07-17 11:29:05 -07:00
rafaelsideguide
205cd63c2f
Update openapi.json
2024-07-17 15:07:06 -03:00
Rafael Miller
f020048a46
Merge pull request #420 from mendableai/bugfix/empty-tags
...
Small fix for empty pageOptions
2024-07-17 10:10:24 -03:00
Caleb Peffer
da3c6bca37
Caleb: added a simple test
2024-07-16 21:23:22 -07:00
Caleb Peffer
0b3c0ede49
Added tests per @nicks request
2024-07-16 21:15:59 -07:00
Caleb Peffer
98c788ca7a
Caleb: added a test to ensure links on page exists and isn't zero on mendable
2024-07-16 21:13:52 -07:00
Nicolas
d7f185428f
Merge pull request #424 from mendableai/nsc/seperate-rate-limit
...
Redis Health Checks
2024-07-16 22:53:28 -04:00
Nicolas
3c3412e893
Update rate-limiter.test.ts
2024-07-16 22:45:12 -04:00
Nicolas
ffc3b7c5fb
Update index.ts
2024-07-16 22:42:40 -04:00
Nicolas
c9073a747c
Nick:
2024-07-16 22:41:13 -04:00
Caleb Peffer
d39d3be649
Caleb: now extracting and returning a list of all links on the page for a customer
2024-07-16 18:38:03 -07:00
rafaelsideguide
dba1fb2dc8
Update removeUnwantedElements.ts
2024-07-16 18:22:56 -03:00
Rafael Miller
db0545014f
Merge pull request #391 from jhoseph88/feat/issue-387
...
[Feat] Pass along current, total, current_step, and current_url in js sdk
2024-07-16 15:56:42 -03:00
Nicolas
92202de12b
Update rate-limiter.ts
2024-07-16 10:09:49 -04:00
Nicolas
4ef47f7765
Update models.ts
2024-07-15 22:52:17 -04:00
rentianyue-jk
1b7ae5457f
support custom models
2024-07-16 10:22:54 +08:00
Thomas Kosmas
5c65ec58e5
Support chrome-cdp and restructure sitemap fire-engine support.
2024-07-15 18:40:43 +03:00
Nicolas
949791049f
Nick:
2024-07-12 23:20:26 -04:00
Nicolas
d0c8d3ecde
Merge branch 'main' into nsc/sitemap-fix-fire-engine
2024-07-12 22:15:06 -04:00
Nicolas
a3b1703b68
Update fireEngine.ts
2024-07-12 22:15:00 -04:00
Nicolas
09bc2c7a9c
Merge pull request #394 from mendableai/nsc/small-fe-print
...
Log Fire-engine page errors
2024-07-12 22:14:04 -04:00
Nicolas
e098e88ea7
Nick:
2024-07-12 22:02:08 -04:00
Nicolas
bfc7f5882e
Update index.ts
2024-07-12 19:57:12 -04:00
Nicolas
436e8922a7
Nick: doing on the ci instead
2024-07-12 19:49:38 -04:00
Nicolas
fc3328f3d1
Update index.ts
2024-07-12 19:12:56 -04:00
Nicolas
fd18f2269b
Nick: slack alerts
2024-07-12 19:07:59 -04:00
rafaelsideguide
f453bcf17c
bugfix docker self hosting
2024-07-12 16:51:20 -03:00
Nicolas
0ddaac6ae0
Nick: fixed the other instances as well
2024-07-12 15:39:10 -04:00
Nicolas
5da03a8fbd
Update fireEngine.ts
2024-07-12 14:59:49 -04:00
Kuniaki Shimizu
bd986a453c
fix USE_DB_AUTHENTICATION checks
2024-07-13 03:50:46 +09:00
Nicolas
b5b75086c1
Update index.ts
2024-07-12 10:44:14 -04:00
Gergo Moricz
0d3e09e798
fix: try-catch job removal
2024-07-12 16:35:50 +02:00
Gergő Móricz
69d724714f
Merge branch 'main' into mog/job-stuck-fix
2024-07-12 16:33:34 +02:00
Nicolas
c3eecf7b9f
Update index.ts
2024-07-12 10:22:06 -04:00
Gergo Moricz
10957b748b
fix(bull): requeue jobs after restart
2024-07-12 13:55:53 +02:00
Nicolas
961b27811d
Merge pull request #386 from mendableai/feat/fire-engine-fallback-for-sitemap
...
[Feat] Added fire-engine fallback for getting sitemaps
2024-07-11 20:38:01 -04:00
Nicolas
84de63dbeb
Merge pull request #375 from StefanTerdell/self-host-qol
...
Self-hosting quality of life fixes
2024-07-11 20:37:39 -04:00
Nicolas
30c1118713
Merge pull request #326 from mendableai/feat/save-docs-on-supabase
...
[Feat] Added implementation for saving docs on supabase
2024-07-11 20:27:41 -04:00
jhoseph88
68828a5b5c
Pass along current, total, current_step, and current_url in js sdk
2024-07-11 19:37:09 -04:00
Gergo Moricz
7e3a368684
fix: unpause globally
2024-07-12 00:05:35 +02:00
Gergo Moricz
ee1d41406e
feat: unpause by http request
2024-07-11 23:56:36 +02:00
Gergo Moricz
f64a2d8668
fix: rename fly tomls to original
2024-07-11 23:21:02 +02:00
Gergo Moricz
bd84290b9e
fix: reenable hyperdx
2024-07-11 23:20:51 +02:00
Gergo Moricz
09bca05b20
feat: fix iteration 3 (actually works)
2024-07-11 23:14:15 +02:00
Gergo Moricz
9cd7d79b64
feat: avoid double SIGINT crashing
2024-07-11 20:35:15 +02:00
Gergo Moricz
eaa8db4b19
fix(fly): raise kill timeout for graceful shutdown
2024-07-11 20:09:06 +02:00