Gergo Moricz
|
bd84290b9e
|
fix: reenable hyperdx
|
2024-07-11 23:20:51 +02:00 |
|
Gergo Moricz
|
09bca05b20
|
feat: fix iteration 3 (actually works)
|
2024-07-11 23:14:15 +02:00 |
|
Gergo Moricz
|
9cd7d79b64
|
feat: avoid double SIGINT crashing
|
2024-07-11 20:35:15 +02:00 |
|
Gergo Moricz
|
eaa8db4b19
|
fix(fly): raise kill timeout for graceful shutdown
|
2024-07-11 20:09:06 +02:00 |
|
Gergo Moricz
|
bffb9f8fd0
|
feat: stuck job restoration iteration 2
|
2024-07-11 20:08:21 +02:00 |
|
rafaelsideguide
|
86d0e88a91
|
removed hyperdx (they also have graceful shutdown) and tried to change the process for running on server. It didn't work.
|
2024-07-10 18:29:55 -03:00 |
|
Gergo Moricz
|
1a07e9d23b
|
feat: pick up and commit interrupted jobs from/to DB
|
2024-07-09 15:57:38 +02:00 |
|
Gergo Moricz
|
77aa46588f
|
feat: graceful exit handler
|
2024-07-09 14:29:32 +02:00 |
|
Nicolas
|
914897c9d2
|
Merge branch 'main' into feat/save-docs-on-supabase
|
2024-07-05 12:27:22 -03:00 |
|
Nicolas
|
32849b017f
|
Nick:
|
2024-07-03 20:18:11 -03:00 |
|
Nicolas
|
066d92f643
|
Update single_url.ts
|
2024-07-03 18:38:17 -03:00 |
|
Nicolas
|
f5b2fbd7e8
|
Nick: revision
|
2024-07-03 18:06:53 -03:00 |
|
Nicolas
|
2d30cc6117
|
Nick: comments
|
2024-07-03 18:01:54 -03:00 |
|
Nicolas
|
90c54c32fd
|
Nick: refactor
|
2024-07-03 18:01:17 -03:00 |
|
Nicolas
|
90cf799a3c
|
Update single_url.ts
|
2024-07-03 17:56:21 -03:00 |
|
Nicolas
|
b36406e465
|
Nick: log scrpaers
|
2024-07-03 17:28:53 -03:00 |
|
Eric Ciarla
|
2d0d5ac392
|
Update for llm-extraction-from-raw-html
|
2024-07-02 14:05:42 -04:00 |
|
rafaelsideguide
|
0175152577
|
Fixed PDF match custom scraping
Now it's working for both `https://getgc.ai/privacy` and `https://prairie.cards/products/wood-designs` usecases.
|
2024-07-02 11:25:17 -03:00 |
|
rafaelsideguide
|
96de948d6b
|
Update index.test.ts
|
2024-07-02 11:04:09 -03:00 |
|
rafaelsideguide
|
7b7154ba1e
|
bugfixed pageStatusCode
|
2024-07-02 10:51:35 -03:00 |
|
Rafael Miller
|
f0f449fe51
|
Merge pull request #336 from snippet/allow-external-content-links
[Proposal] new feature allowExternalContentLinks
|
2024-07-02 09:45:21 -03:00 |
|
rafaelsideguide
|
db4a743365
|
Added e2e test
|
2024-07-02 09:44:08 -03:00 |
|
Nicolas
|
42cd58a679
|
Merge pull request #332 from mendableai/feat/rawHtmlExtraction
Adds pageOptions.includeRawHtml and new extraction mode "llm-extraction-from-raw-html"
|
2024-07-01 18:23:26 -03:00 |
|
Nicolas
|
c4f423981f
|
Update pnpm-lock.yaml
|
2024-07-01 18:22:22 -03:00 |
|
rafaelsideguide
|
16aac7f8c5
|
Update single_url.ts
|
2024-07-01 18:21:15 -03:00 |
|
Nicolas
|
6d0c7a9ccd
|
Merge pull request #323 from mendableai/tests/crawl-limit-unit-tests
[Tests] Added crawl limit unit test
|
2024-07-01 17:56:04 -03:00 |
|
rafaelsideguide
|
4d6e25619b
|
minor spacing and comment stuff
|
2024-07-01 16:05:34 -03:00 |
|
Eric Ciarla
|
e1af815f8c
|
Update scrape.ts
|
2024-07-01 08:48:21 -04:00 |
|
Eric Ciarla
|
7ae195bacc
|
Update index.test.ts
|
2024-06-29 10:13:12 -04:00 |
|
Eric Ciarla
|
837b446390
|
Update index.test.ts
|
2024-06-29 08:48:42 -04:00 |
|
Eric Ciarla
|
fe6e3aeadc
|
Update index.test.ts
|
2024-06-29 08:44:21 -04:00 |
|
Eric Ciarla
|
6c9f0dfc91
|
Add tests
|
2024-06-29 08:32:20 -04:00 |
|
Jeff Pereira
|
a5fb45988c
|
new feature allowExternalContentLinks
|
2024-06-28 17:23:40 -07:00 |
|
Eric Ciarla
|
87b54488d3
|
update to includeRawHtml
|
2024-06-28 17:07:47 -04:00 |
|
Eric Ciarla
|
70fcf2ce03
|
init
|
2024-06-28 16:39:09 -04:00 |
|
Nicolas
|
9bf74bc774
|
Update single_url.ts
|
2024-06-28 15:51:18 -03:00 |
|
Nicolas
|
7e17498bcf
|
Update single_url.ts
|
2024-06-28 15:45:16 -03:00 |
|
rafaelsideguide
|
d66e1f7846
|
looking good
|
2024-06-27 16:00:45 -03:00 |
|
Nicolas
|
9e7298945c
|
Update openapi.json
|
2024-06-26 21:25:38 -03:00 |
|
Nicolas
|
1ec0bf8adf
|
Update openapi.json
|
2024-06-26 21:22:46 -03:00 |
|
Nicolas
|
042f81ddf2
|
Update removeUnwantedElements.test.ts
|
2024-06-26 21:20:11 -03:00 |
|
Nicolas
|
388ce3cbce
|
Nick: small changes
|
2024-06-26 21:15:42 -03:00 |
|
Nicolas
|
1d4907acc9
|
Nick:
|
2024-06-26 21:02:58 -03:00 |
|
rafaelsideguide
|
c40da77be0
|
Added implementation for saving docs on supabase
- TODO: remove the comments on `log_job.ts` before deploying to prod
|
2024-06-26 18:23:28 -03:00 |
|
Nicolas
|
3b92fb8433
|
Merge pull request #322 from mendableai/tests/metadata
[Test] Added E2E tests for checking metadata values
|
2024-06-26 12:09:18 -03:00 |
|
rafaelsideguide
|
67d7650cf3
|
Added to e2e_noAuth
|
2024-06-26 12:07:55 -03:00 |
|
rafaelsideguide
|
009df6c930
|
Added crawl limit unit test
I think this test is over relying on mocks but I have no idea on how to fix this without changing the code arch structure
|
2024-06-26 09:54:25 -03:00 |
|
rafaelsideguide
|
05eaa3c68d
|
Update index.test.ts
|
2024-06-26 09:32:02 -03:00 |
|
rafaelsideguide
|
4381109dd8
|
added default values and fixed pdf bug
|
2024-06-26 09:00:54 -03:00 |
|
Nicolas
|
45f2765601
|
Merge pull request #316 from snippet/types-webscraper
add some types
|
2024-06-25 22:03:21 -03:00 |
|