# π₯ Firecrawl
Empower your AI apps with clean data from any website. Featuring advanced scraping, crawling, and data extraction capabilities.
_This repository is in development, and weβre still integrating custom modules into the mono repo. It's not fully ready for self-hosted deployment yet, but you can run it locally._
## What is Firecrawl?
[Firecrawl](https://firecrawl.dev?ref=github) is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each. No sitemap required. Check out our [documentation](https://docs.firecrawl.dev).
_Pst. hey, you, join our stargazers :)_
## How to use it?
We provide an easy to use API with our hosted version. You can find the playground and documentation [here](https://firecrawl.dev/playground). You can also self host the backend if you'd like.
Check out the following resources to get started:
- [x] [API](https://docs.firecrawl.dev/api-reference/introduction)
- [x] [Python SDK](https://docs.firecrawl.dev/sdks/python)
- [x] [Node SDK](https://docs.firecrawl.dev/sdks/node)
- [x] [Go SDK](https://docs.firecrawl.dev/sdks/go)
- [x] [Rust SDK](https://docs.firecrawl.dev/sdks/rust)
- [x] [Langchain Integration π¦π](https://python.langchain.com/docs/integrations/document_loaders/firecrawl/)
- [x] [Langchain JS Integration π¦π](https://js.langchain.com/docs/integrations/document_loaders/web_loaders/firecrawl)
- [x] [Llama Index Integration π¦](https://docs.llamaindex.ai/en/latest/examples/data_connectors/WebPageDemo/#using-firecrawl-reader)
- [x] [Dify Integration](https://dify.ai/blog/dify-ai-blog-integrated-with-firecrawl)
- [x] [Langflow Integration](https://docs.langflow.org/)
- [x] [Crew.ai Integration](https://docs.crewai.com/)
- [x] [Flowise AI Integration](https://docs.flowiseai.com/integrations/langchain/document-loaders/firecrawl)
- [x] [Composio Integration](https://composio.dev/tools/firecrawl/all)
- [x] [PraisonAI Integration](https://docs.praison.ai/firecrawl/)
- [x] [Zapier Integration](https://zapier.com/apps/firecrawl/integrations)
- [x] [Cargo Integration](https://docs.getcargo.io/integration/firecrawl)
- [x] [Pipedream Integration](https://pipedream.com/apps/firecrawl/)
- [x] [Pabbly Connect Integration](https://www.pabbly.com/connect/integrations/firecrawl/)
- [ ] Want an SDK or Integration? Let us know by opening an issue.
To run locally, refer to guide [here](https://github.com/mendableai/firecrawl/blob/main/CONTRIBUTING.md).
### API Key
To use the API, you need to sign up on [Firecrawl](https://firecrawl.dev) and get an API key.
### Crawling
Used to crawl a URL and all accessible subpages. This submits a crawl job and returns a job ID to check the status of the crawl.
```bash
curl -X POST https://api.firecrawl.dev/v1/crawl \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-d '{
"url": "https://docs.firecrawl.dev",
"limit": 100,
"scrapeOptions": {
"formats": ["markdown", "html"]
}
}'
```
Returns a crawl job id and the url to check the status of the crawl.
```json
{
"success": true,
"id": "123-456-789",
"url": "https://api.firecrawl.dev/v1/crawl/123-456-789"
}
```
### Check Crawl Job
Used to check the status of a crawl job and get its result.
```bash
curl -X GET https://api.firecrawl.dev/v1/crawl/123-456-789 \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY'
```
```json
{
"status": "completed",
"total": 36,
"creditsUsed": 36,
"expiresAt": "2024-00-00T00:00:00.000Z",
"data": [
{
"markdown": "[Firecrawl Docs home page![light logo](https://mintlify.s3-us-west-1.amazonaws.com/firecrawl/logo/light.svg)!...",
"html": "...",
"metadata": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"language": "en",
"sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.",
"ogLocaleAlternate": [],
"statusCode": 200
}
}
]
}
```
### Scraping
Used to scrape a URL and get its content in the specified formats.
```bash
curl -X POST https://api.firecrawl.dev/v1/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://docs.firecrawl.dev",
"formats" : ["markdown", "html"]
}'
```
Response:
```json
{
"success": true,
"data": {
"markdown": "Launch Week I is here! [See our Day 2 Release π](https://www.firecrawl.dev/blog/launch-week-i-day-2-doubled-rate-limits)[π₯ Get 2 months free...",
"html": "