firecrawl/apps/playwright-service-ts/README.md

# Playwright Scrape API

This is a simple web scraping service built with Express and Playwright.

## Features

- Scrapes HTML content from specified URLs.
- Blocks requests to known ad-serving domains.
- Blocks media files to reduce bandwidth usage.
- Uses random user-agent strings to avoid detection.
- Strategy to ensure the page is fully rendered.

## Install
```bash
npm install
npx playwright install
```

## RUN
```bash
npm run build
npm start
```
OR
```bash
npm run dev
```

## USE

```bash
curl -X POST http://localhost:3000/scrape \
-H "Content-Type: application/json" \
-d '{
  "url": "https://example.com",
  "wait_after_load": 1000,
  "timeout": 15000,
  "headers": {
    "Custom-Header": "value"
  },
  "check_selector": "#content"
}'
```

## USING WITH FIRECRAWL

Add `PLAYWRIGHT_MICROSERVICE_URL=http://localhost:3003/scrape` to `/apps/api/.env` to configure the API to use this Playwright microservice for scraping operations.
new playwright service 2024-06-27 03:32:30 +08:00			`# Playwright Scrape API`

			`This is a simple web scraping service built with Express and Playwright.`

			`## Features`

			`- Scrapes HTML content from specified URLs.`
			`- Blocks requests to known ad-serving domains.`
			`- Blocks media files to reduce bandwidth usage.`
			`- Uses random user-agent strings to avoid detection.`
			`- Strategy to ensure the page is fully rendered.`

			`## Install`
			```bash
			`npm install`
Changed port and added "using with firecrawl" section on readme 2024-06-28 22:51:24 +08:00			`npx playwright install`
new playwright service 2024-06-27 03:32:30 +08:00			```

			`## RUN`
			```bash
			`npm run build`
			`npm start`
			```
			`OR`
			```bash
			`npm run dev`
			```

			`## USE`

			```bash
			`curl -X POST http://localhost:3000/scrape \`
			`-H "Content-Type: application/json" \`
			`-d '{`
			`"url": "https://example.com",`
			`"wait_after_load": 1000,`
			`"timeout": 15000,`
			`"headers": {`
			`"Custom-Header": "value"`
			`},`
			`"check_selector": "#content"`
			`}'`
Changed port and added "using with firecrawl" section on readme 2024-06-28 22:51:24 +08:00			```

			`## USING WITH FIRECRAWL`

			Add `PLAYWRIGHT_MICROSERVICE_URL=http://localhost:3003/scrape` to `/apps/api/.env` to configure the API to use this Playwright microservice for scraping operations.