Commit Graph

1320 Commits

Author SHA1 Message Date
Eric Ciarla
a62c0730c1
Delete package-lock.json 2024-07-24 15:00:19 -04:00
Eric Ciarla
4cb091ad05
Update .gitignore 2024-07-24 14:59:34 -04:00
Eric Ciarla
4596d0b2e6 Add ReadMe and LICENSE 2024-07-24 14:56:53 -04:00
Eric Ciarla
9654721bf2 Vite commit 2024-07-24 14:27:50 -04:00
rafaelsideguide
cc98f83fda added failed and completed log events 2024-07-24 15:25:36 -03:00
Jakob Stadlhuber
2dc7be3869 Remove liveness and readiness probes from worker.yaml
This commit removes the liveness and readiness probes configuration from the Kubernetes worker manifest. Additionally, a Service definition for the worker application has been removed. These changes might be necessary to update the deployment strategy or simplify the configuration.
2024-07-24 19:38:54 +02:00
Jakob Stadlhuber
d68f349109 Update Kubernetes YAMLs and add worker service
Refactored container configurations in worker, api, and playwright-service YAMLs to streamline syntax and add missing fields. Added a service definition for the worker component and included a new environment variable in the configmap for rate-limiting. These changes enhance configuration clarity and ensure proper resource definitions.
2024-07-24 19:31:37 +02:00
Jakob Stadlhuber
f26bda2477 Update Docker build paths in Kubernetes setup README
Corrected relative paths for Docker build commands to ensure the appropriate directories are targeted. This fix is crucial for successful image builds and deployment consistency in the Kubernetes cluster setup.
2024-07-24 19:06:19 +02:00
Jakob Stadlhuber
895e80caa4 Add liveness and readiness probes to Kubernetes configs
Introduced liveness and readiness probes for the Playwright service, API, and worker components. This ensures that Kubernetes can better manage the health and availability of these services by periodically checking their endpoints. This enhancement will improve the robustness and reliability of the deployed applications.
2024-07-24 19:00:23 +02:00
Jakob Stadlhuber
be9e7f9edf Update Kubernetes configs for playwright-service, api, and worker
Added new ConfigMap for playwright-service and adjusted existing references.
Applied imagePullPolicy: Always to ensure all images are updated promptly.
Updated README to include --no-cache for Docker build instructions.
2024-07-24 18:54:16 +02:00
Gergo Moricz
60c74357df feat(ScrapeEvents): log queue events 2024-07-24 18:44:14 +02:00
Jakob Stadlhuber
497aa5d25e Update Kubernetes configs for playwright-service, api, and worker
Added new ConfigMap for playwright-service and adjusted existing references.
Applied imagePullPolicy: Always to ensure all images are updated promptly.
Updated README to include --no-cache for Docker build instructions.
2024-07-24 17:55:45 +02:00
rafaelsideguide
4eca6bd301 fix/check-for-auth-on-scrape-log 2024-07-24 12:54:14 -03:00
Nicolas
4ead89f983
Merge pull request #453 from mendableai/nsc/notion-fix
Notion Website Fixes
2024-07-24 11:40:19 -04:00
Nicolas
3a1b8a9797 Update website_params.ts 2024-07-24 11:04:47 -04:00
Nicolas
8b48ec8d30 Update website_params.ts 2024-07-24 11:02:20 -04:00
Gergo Moricz
4d35ad073c feat(monitoring/scrape): include url, worker, response_size 2024-07-24 16:43:39 +02:00
Gergo Moricz
64bcedeefc fix(monitoring): bad success check on scrape 2024-07-24 16:21:59 +02:00
Gergo Moricz
d57dbbd0c6 fix: add jobId for scrape 2024-07-24 15:18:12 +02:00
Gergo Moricz
71072fef3b fix(scrape-events): bad logic 2024-07-24 14:46:41 +02:00
Gergo Moricz
7cd9bf92e3 feat: scrape event logging to DB 2024-07-24 14:31:25 +02:00
Rafael Miller
5e728c1a4d
Update apps/api/src/scraper/WebScraper/crawler.ts
no need for regex

Co-authored-by: Gergő Móricz <mo.geryy@gmail.com>
2024-07-24 08:33:00 -03:00
Eric Ciarla
1b7a00624d Delete old comp 2024-07-23 21:51:08 -04:00
Eric Ciarla
565bc09439 Basic react app 2024-07-23 21:48:11 -04:00
rafaelsideguide
6208ecdbc0 added logger 2024-07-23 17:30:46 -03:00
Eric Ciarla
a0d89169ed init 2024-07-23 15:48:12 -04:00
Nicolas
f0b07b509b Update index.ts 2024-07-23 15:15:56 -04:00
rafaelsideguide
a684bd3c5d added regex for links in sitemap 2024-07-23 09:07:23 -03:00
Nicolas
252bc09ee2
Merge pull request #447 from mendableai/nsc/speed-improvements
/scrape should now be 600ms-900ms faster
2024-07-22 19:18:24 -04:00
Nicolas
ac692ef09c
Update CONTRIBUTING.md 2024-07-22 19:17:53 -04:00
Nicolas
30e706b43f Update scrape.ts 2024-07-22 19:15:24 -04:00
Nicolas
8916fec66c Update index.ts 2024-07-22 19:14:53 -04:00
Nicolas
575ddc9e6e Update scrape.ts 2024-07-22 19:12:51 -04:00
Nicolas
e31a5007d5 Nick: speed improvements 2024-07-22 18:30:58 -04:00
Nicolas
1bc36e1a56
Update fly-direct.yml 2024-07-22 14:12:55 -04:00
Nicolas
b229fbebd8 Update scrape_log.ts 2024-07-19 12:53:26 -04:00
rafaelsideguide
5c02dbe20c fix(isFile): added .tiff extension 2024-07-18 17:07:21 -03:00
Gergo Moricz
f0e95ce399 fix(WebCrawler): filter out file URLs when taking URLs from sitemap 2024-07-18 21:49:37 +02:00
Gergo Moricz
95c6c63b85 fix(fly): raise heap limit to 4G per process 2024-07-18 20:56:54 +02:00
Nicolas
5f14f4f788 Update blocklist.ts 2024-07-18 14:20:19 -04:00
Nicolas
6161b83890 Update scrape_log.ts 2024-07-18 14:17:08 -04:00
Nicolas
c402c85346 Merge branch 'main' of https://github.com/mendableai/firecrawl 2024-07-18 14:16:51 -04:00
Nicolas
2dd7398aad Update scrape_log.ts 2024-07-18 14:16:46 -04:00
Gergo Moricz
791e6b2047 fix action 2024-07-18 19:59:33 +02:00
Nicolas
f10f3f886b
Merge pull request #410 from mendableai/feat/fire-engine-chrome-cdp
Support chrome-cdp and restructure sitemap fire-engine support.
2024-07-18 13:52:08 -04:00
Nicolas
9a1a227797 Update crawl-cancel.ts 2024-07-18 13:49:51 -04:00
Nicolas
11768571ed Update crawl-cancel.ts 2024-07-18 13:43:03 -04:00
Nicolas
ce804d3c20 Update crawl-cancel.ts 2024-07-18 13:40:24 -04:00
Nicolas
d338b05446
Merge pull request #436 from mendableai/mog/fix-infinite-regex
fix(WebScraper): infinite regex leading to fly.io instance hangs
2024-07-18 13:32:44 -04:00
Nicolas
d2de01d342 Nick: fixes 2024-07-18 13:19:44 -04:00