Commit Graph

518 Commits

Author SHA1 Message Date
Gergo Moricz
7e3a368684 fix: unpause globally 2024-07-12 00:05:35 +02:00
Gergo Moricz
ee1d41406e feat: unpause by http request 2024-07-11 23:56:36 +02:00
Gergo Moricz
f64a2d8668 fix: rename fly tomls to original 2024-07-11 23:21:02 +02:00
Gergo Moricz
bd84290b9e fix: reenable hyperdx 2024-07-11 23:20:51 +02:00
Gergo Moricz
09bca05b20 feat: fix iteration 3 (actually works) 2024-07-11 23:14:15 +02:00
Gergo Moricz
9cd7d79b64 feat: avoid double SIGINT crashing 2024-07-11 20:35:15 +02:00
Gergo Moricz
eaa8db4b19 fix(fly): raise kill timeout for graceful shutdown 2024-07-11 20:09:06 +02:00
Gergo Moricz
bffb9f8fd0 feat: stuck job restoration iteration 2 2024-07-11 20:08:21 +02:00
rafaelsideguide
86d0e88a91 removed hyperdx (they also have graceful shutdown) and tried to change the process for running on server. It didn't work. 2024-07-10 18:29:55 -03:00
Gergo Moricz
1a07e9d23b feat: pick up and commit interrupted jobs from/to DB 2024-07-09 15:57:38 +02:00
Gergo Moricz
77aa46588f feat: graceful exit handler 2024-07-09 14:29:32 +02:00
Nicolas
914897c9d2 Merge branch 'main' into feat/save-docs-on-supabase 2024-07-05 12:27:22 -03:00
Nicolas
32849b017f Nick: 2024-07-03 20:18:11 -03:00
Nicolas
066d92f643 Update single_url.ts 2024-07-03 18:38:17 -03:00
Nicolas
f5b2fbd7e8 Nick: revision 2024-07-03 18:06:53 -03:00
Nicolas
2d30cc6117 Nick: comments 2024-07-03 18:01:54 -03:00
Nicolas
90c54c32fd Nick: refactor 2024-07-03 18:01:17 -03:00
Nicolas
90cf799a3c Update single_url.ts 2024-07-03 17:56:21 -03:00
Nicolas
b36406e465 Nick: log scrpaers 2024-07-03 17:28:53 -03:00
Eric Ciarla
2d0d5ac392 Update for llm-extraction-from-raw-html 2024-07-02 14:05:42 -04:00
rafaelsideguide
0175152577 Fixed PDF match custom scraping
Now it's working for both `https://getgc.ai/privacy` and `https://prairie.cards/products/wood-designs` usecases.
2024-07-02 11:25:17 -03:00
rafaelsideguide
96de948d6b Update index.test.ts 2024-07-02 11:04:09 -03:00
rafaelsideguide
7b7154ba1e bugfixed pageStatusCode 2024-07-02 10:51:35 -03:00
Rafael Miller
f0f449fe51
Merge pull request #336 from snippet/allow-external-content-links
[Proposal] new feature allowExternalContentLinks
2024-07-02 09:45:21 -03:00
rafaelsideguide
db4a743365 Added e2e test 2024-07-02 09:44:08 -03:00
Nicolas
42cd58a679
Merge pull request #332 from mendableai/feat/rawHtmlExtraction
Adds pageOptions.includeRawHtml and new extraction mode "llm-extraction-from-raw-html"
2024-07-01 18:23:26 -03:00
Nicolas
c4f423981f Update pnpm-lock.yaml 2024-07-01 18:22:22 -03:00
rafaelsideguide
16aac7f8c5 Update single_url.ts 2024-07-01 18:21:15 -03:00
Nicolas
6d0c7a9ccd
Merge pull request #323 from mendableai/tests/crawl-limit-unit-tests
[Tests] Added crawl limit unit test
2024-07-01 17:56:04 -03:00
rafaelsideguide
4d6e25619b minor spacing and comment stuff 2024-07-01 16:05:34 -03:00
Eric Ciarla
e1af815f8c Update scrape.ts 2024-07-01 08:48:21 -04:00
Eric Ciarla
7ae195bacc Update index.test.ts 2024-06-29 10:13:12 -04:00
Eric Ciarla
837b446390 Update index.test.ts 2024-06-29 08:48:42 -04:00
Eric Ciarla
fe6e3aeadc Update index.test.ts 2024-06-29 08:44:21 -04:00
Eric Ciarla
6c9f0dfc91 Add tests 2024-06-29 08:32:20 -04:00
Jeff Pereira
a5fb45988c new feature allowExternalContentLinks 2024-06-28 17:23:40 -07:00
Eric Ciarla
87b54488d3 update to includeRawHtml 2024-06-28 17:07:47 -04:00
Eric Ciarla
70fcf2ce03 init 2024-06-28 16:39:09 -04:00
Nicolas
9bf74bc774 Update single_url.ts 2024-06-28 15:51:18 -03:00
Nicolas
7e17498bcf Update single_url.ts 2024-06-28 15:45:16 -03:00
rafaelsideguide
d66e1f7846 looking good 2024-06-27 16:00:45 -03:00
Nicolas
9e7298945c Update openapi.json 2024-06-26 21:25:38 -03:00
Nicolas
1ec0bf8adf Update openapi.json 2024-06-26 21:22:46 -03:00
Nicolas
042f81ddf2 Update removeUnwantedElements.test.ts 2024-06-26 21:20:11 -03:00
Nicolas
388ce3cbce Nick: small changes 2024-06-26 21:15:42 -03:00
Nicolas
1d4907acc9 Nick: 2024-06-26 21:02:58 -03:00
rafaelsideguide
c40da77be0 Added implementation for saving docs on supabase
- TODO: remove the comments on `log_job.ts` before deploying to prod
2024-06-26 18:23:28 -03:00
Nicolas
3b92fb8433
Merge pull request #322 from mendableai/tests/metadata
[Test] Added E2E tests for checking metadata values
2024-06-26 12:09:18 -03:00
rafaelsideguide
67d7650cf3 Added to e2e_noAuth 2024-06-26 12:07:55 -03:00
rafaelsideguide
009df6c930 Added crawl limit unit test
I think this test is over relying on mocks but I have no idea on how to fix this without changing the code arch structure
2024-06-26 09:54:25 -03:00