Commit Graph

604 Commits

Author SHA1 Message Date
Nicolas
96257b7b17 Update handleCustomScraping.ts 2024-06-04 12:22:46 -07:00
Nicolas
674500affa Nick: 2024-06-04 12:15:39 -07:00
Nicolas
fc04d5b033
Merge pull request #235 from mendableai/feat/gdrive-pdfs
[Feat] Added custom scraping for google-drive pdf usecase
2024-06-04 11:31:53 -07:00
rafaelsideguide
5ae4d1caf5 Update single_url.ts 2024-06-04 15:28:09 -03:00
rafaelsideguide
93f3098672 build files 2024-06-04 14:54:54 -03:00
rafaelsideguide
64a4338ff0 Update single_url.ts 2024-06-04 14:40:05 -03:00
Rafael Miller
02fe470e20
Merge pull request #148 from mendableai/nsc/improvemnts-fixes-misc
Better fallbacks for initial crawl start
2024-06-04 14:31:10 -03:00
Rafael Miller
665a40d9f4
Merge pull request #212 from mendableai/bugfix/partial-data-js-sdk
[Bug] Improved js response and test for getting partial_data
2024-06-04 14:05:23 -03:00
rafaelsideguide
1f4c6b7a87 Update package.json 2024-06-04 13:59:48 -03:00
Rafael Miller
19c67916d4
Merge pull request #211 from mendableai/fix/rename-variables
[Fix] Changed timeout parameter name on js sdk
2024-06-04 13:57:58 -03:00
Rafael Miller
f4f87b5374
Merge branch 'main' into bugfix/partial-data-js-sdk 2024-06-04 13:40:42 -03:00
Rafael Miller
f17cb1a0d4
Merge pull request #224 from mattjoyce/playwright-service-bug-222
Playwright service bugs #222  #179  #197
2024-06-04 12:05:56 -03:00
rafaelsideguide
4e3a0495d7 updated version 0.0.12 -> 0.0.13
- [ ] publish
2024-06-04 12:03:55 -03:00
Rafael Miller
b80fb374e5
Merge branch 'main' into playwright-service-bug-222 2024-06-04 11:57:17 -03:00
rafaelsideguide
6920ec8a61 bugfixing. already on main 2024-06-04 11:05:50 -03:00
Nicolas
d6762386f8
Update fly-direct.yml 2024-06-04 01:09:04 -07:00
Nicolas
d10c0839b0
Update fly-direct.yml 2024-06-04 01:04:06 -07:00
Nicolas
5d50b259b7 Create fly-direct.yml 2024-06-04 00:42:07 -07:00
Nicolas
d91b725c6f Update fly.toml 2024-06-04 00:41:15 -07:00
Nicolas
cbf8d79cce Update pdfProcessor.ts 2024-06-04 00:13:37 -07:00
Nicolas
3fc9004ba8 Update fly.toml 2024-06-03 23:49:46 -07:00
Nicolas
0cc7031acb Update fly.yml 2024-06-03 23:47:10 -07:00
Nicolas
2ea01f1456 Update single_url.ts 2024-06-03 23:42:39 -07:00
Nicolas
3563e3ae45 Update fly.yml 2024-06-03 23:34:52 -07:00
Nicolas
854d5b3cb3 Update single_url.ts 2024-06-03 23:32:55 -07:00
Nicolas
99059814a8 Nick: 2024-06-03 21:32:48 -07:00
Nicolas
918059ee9e Merge branch 'main' into nsc/improvemnts-fixes-misc 2024-06-03 16:46:02 -07:00
Nicolas
93bb53271e Merge branch 'nsc/improved-blocklist' 2024-06-03 16:44:33 -07:00
Nicolas
38e583f66c Update socialBlockList.test.ts 2024-06-03 16:44:23 -07:00
Nicolas
b26c5f1588
Merge pull request #185 from mendableai/nsc/improved-blocklist
Improvements to the blocklist regex
2024-06-03 16:43:34 -07:00
Nicolas
c69c89f838 Nick: 2024-06-03 16:42:42 -07:00
Nicolas
48d1ec05b2 Merge branch 'main' into nsc/improved-blocklist 2024-06-03 16:38:03 -07:00
Nicolas
d30ced4394
Merge pull request #221 from mendableai/nsc/fwd-header-auth
feat: Ability to forward headers to reliable providers for auth etc...
2024-06-03 16:33:40 -07:00
Nicolas
d865b0c5c8
Merge pull request #229 from rombru/main
Use @ instead of # for default BULL_AUTH_KEY. Hash mark is reserved for URI fragments.
2024-06-03 12:38:34 -07:00
Romain Bruyère
4987f901d1
Merge branch 'mendableai:main' into main 2024-06-03 21:29:33 +02:00
rafaelsideguide
4100cc9223 Update index.test.ts 2024-06-03 16:29:16 -03:00
rombru
3ff91ddd1f fix: use @ instead of # for default BULL_AUTH_KEY. hash mark is reserved for URI fragments. 2024-06-03 21:28:25 +02:00
rafaelsideguide
c1aed1360e Update index.test.ts 2024-06-03 15:51:07 -03:00
Nicolas
30a0c5de1a
Merge pull request #228 from mendableai/bugfix/fire-engine-content
Fixed fire-engine content bug
2024-06-03 11:42:03 -07:00
rafaelsideguide
1fc3a15149 Update single_url.ts 2024-06-03 15:24:40 -03:00
Eric Ciarla
3ea801d9dd Commit Roast My Website 2024-06-02 20:40:19 -07:00
Eric Ciarla
ea04fe2e3f Add Roast My Website Example 2024-06-02 20:38:05 -07:00
Nicolas
fde522c3e1 Update single_url.ts 2024-06-02 20:23:45 -07:00
Matt Joyce
deefe65cbe Change the way the playwright response is parsed
Was failing with a Type Error, but actually looked ok.
This fixes the type error, and stop scraper fallback.
2024-06-01 19:16:56 +10:00
Matt Joyce
14896a9fdd Fix PLAYWRIGHT_MICROSERVICE_URL
It needs to end in html, otherwise scrape will 404
2024-06-01 19:03:16 +10:00
Matt Joyce
1eacad4ef3 Clarifying wait type and name 2024-06-01 18:53:03 +10:00
Matt Joyce
c516140bfb Various Linting
Pylint
C0114: Missing module docstring
C0115: Missing class docstring
C0116: Missing function or method docstring
C0303: Trailing whitespace
Import ordering
2024-06-01 18:53:03 +10:00
Matt Joyce
2a39b5382b Add timeout to class and provide default. 2024-06-01 18:52:42 +10:00
Nicolas
c7d5a9ad48 Merge branch 'main' into nsc/fwd-header-auth 2024-05-31 18:19:20 -07:00
Nicolas
8cb62dde92 Update website_params.ts 2024-05-31 16:09:39 -07:00