***

title: Contents API Overview
subtitle: Extract clean HTML or Markdown content from any webpage on demand.
og:title: You.com Contents API | Clean HTML & Markdown from Any URL
og:description: Extract clean HTML or Markdown content from any webpage on demand. Perfect for competitive intelligence, knowledge base ingestion, and web scraping without the mess.
---------------------

For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://you.com/docs/contents/llms.txt. For full documentation content, see https://you.com/docs/contents/llms-full.txt.

## What is the Contents API?

The Contents API extracts clean HTML or Markdown content from a given URL. Pass it a list of URLs and get back the full page content for each, ready for LLM consumption—no parsing, no HTML noise, no browser automation required.

***

## How it's different from livecrawl

The Contents API and the `livecrawl` parameter in the Search API both extract full page content, but they serve different workflows:

|                    | Contents API                | Search API + livecrawl                  |
| ------------------ | --------------------------- | --------------------------------------- |
| **Starting point** | You already know the URLs   | You have a query, not URLs              |
| **Use case**       | Fetch known pages on demand | Enrich search results with full content |
| **URL source**     | You provide them            | You.com search discovers them           |
| **Batch size**     | 10 URLs per request         | Up to 100 results per search            |

Use the Contents API when you have a list of specific URLs you want to read. Use `livecrawl` when you want full content returned alongside search results.

***

## What you get

Each URL in your request returns a structured object:

```json
[
  {
    "url": "https://competitor.com/pricing",
    "title": "Pricing — Competitor Inc.",
    "markdown": "# Pricing\n\n## Starter Plan\n$49/month...",
    "html": "<html>...</html>",
    "metadata": {
      "site_name": "Competitor Inc.",
      "favicon_url": "https://ydc-index.io/favicon?domain=competitor.com&size=128"
    }
  }
]
```

You control which formats are returned via the `formats` parameter—request `markdown`, `html`, and/or `metadata` in any combination.

***

## Key features

### Any URL, on demand

Pass up to 10 URLs in a single request. The API crawls them all in parallel and returns the content. No need to manage a headless browser or deal with raw HTML yourself.

### LLM-ready Markdown

The `markdown` format strips navigation menus, ads, footers, and other boilerplate. You get actual content of the page—ready to drop into a prompt.

### Configurable timeout

Use `crawl_timeout` (1–60 seconds) to balance speed vs. completeness. For fast pages: 5–10 seconds. For heavy JavaScript-rendered pages: 20–30 seconds.

### Metadata extraction

Request `metadata` alongside content to get the page's site name and favicon URL—useful for building UIs that display source attribution.

***

## Quickstart

<CodeBlock>
  ```python
  import os
  from youdotcom import You
  from youdotcom.models import ContentsFormats

  with You(api_key_auth="api_key") as you:
      pages = you.contents.generate(
          urls=["https://you.com/about"],
          formats=[ContentsFormats.MARKDOWN],
      )

      for page in pages:
          print(f"Title: {page.title}")
          print(f"Content preview: {page.markdown[:300]}\n")
  ```

  ```typescript
  import { You } from "@youdotcom-oss/sdk";
  import { ContentsFormats } from "@youdotcom-oss/sdk/models";

  const you = new You({ apiKeyAuth: "api_key" });

  const pages = await you.contents({
    urls: ["https://you.com/about"],
    formats: [ContentsFormats.Markdown],
  });

  for (const page of pages) {
    console.log(`Title: ${page.title}`);
    console.log(`Preview: ${page.markdown?.slice(0, 300)}\n`);
  }
  ```

  ```curl
  curl -X POST https://ydc-index.io/v1/contents \
    -H "X-API-Key: api_key" \
    -H "Content-Type: application/json" \
    -d '{
      "urls": ["https://you.com/about"],
      "formats": ["markdown"]
    }'
  ```
</CodeBlock>

<Card title="Try in Postman" icon="fa-regular fa-rocket" href="https://www.postman.com/youdotcom/you-com-api-workspace/collection/46015159-037d5564-c4b5-4e11-9cea-1e41a5eba4aa">
  Fork the Contents API collection, add your API key to the `production` environment, and hit Send.
</Card>

***

## Parameters

| Parameter       | Type             | Required | Description                                                                     |
| --------------- | ---------------- | -------- | ------------------------------------------------------------------------------- |
| `urls`          | array of strings | Yes      | The URLs to fetch content from                                                  |
| `formats`       | array of strings | No       | Content formats to return: `markdown`, `html`, `metadata` (default: `markdown`) |
| `crawl_timeout` | number           | No       | Per-URL timeout in seconds, between 1 and 60 (default: 10)                      |

[View full API reference](/docs/api-reference/contents)

***

## Common use cases

### Competitive intelligence

Monitor competitor pricing, feature, or blog pages. Fetch the content on a schedule, feed it to an LLM, and surface meaningful changes—without manual checking.

```python
from youdotcom import You
from youdotcom.models import ContentsFormats

competitor_pages = [
    "https://competitor-a.com/pricing",
    "https://competitor-b.com/pricing",
    "https://competitor-c.com/features",
]

with You(api_key_auth="api_key") as you:
    pages = you.contents.generate(
        urls=competitor_pages,
        formats=[ContentsFormats.MARKDOWN],
        crawl_timeout=15,
    )

    for page in pages:
        print(f"\n--- {page.title} ---")
        # Feed page.markdown into your LLM for summarization or diff
        print(page.markdown[:500])
```

### Knowledge base ingestion

You have a list of authoritative sources—documentation pages, whitepapers, internal wikis. Fetch them all, convert to clean Markdown, and index into your vector store.

```python
from youdotcom import You
from youdotcom.models import ContentsFormats

# Known authoritative sources to index
source_urls = [
    "https://docs.example.com/api-reference",
    "https://docs.example.com/authentication",
    "https://docs.example.com/rate-limits",
]

with You(api_key_auth="api_key") as you:
    pages = you.contents.generate(
        urls=source_urls,
        formats=[ContentsFormats.MARKDOWN, ContentsFormats.METADATA],
    )

    documents = []
    for page in pages:
        documents.append({
            "source": page.url,
            "title": page.title,
            "content": page.markdown,
        })
        # Index document into your vector store here
```

### Research assistant

Give users the ability to ask questions about specific URLs. Fetch the page content on the fly and feed it as context into your LLM—turning any URL into a searchable document.

```python
from youdotcom import You
from youdotcom.models import ContentsFormats

def fetch_url_context(url: str) -> str:
    with You(api_key_auth="api_key") as you:
        pages = you.contents.generate(urls=[url], formats=[ContentsFormats.MARKDOWN])
        return pages[0].markdown if pages else ""

# User asks: "Summarize this page for me"
url = "https://example.com/long-report"
context = fetch_url_context(url)

prompt = f"Summarize the following page content:\n\n{context}"
# Pass prompt to your LLM
```

***

## Best practices

### Request only the formats you need

Each format adds processing time. If you only need Markdown for LLM consumption, don't request `html`. If you don't need site metadata for your UI, skip `metadata`.

### Batch your URLs

A single request with 10 URLs is faster than 10 separate requests. The API processes them in parallel.

### Set `crawl_timeout` based on the target site

For simple static pages, 5–10 seconds is usually enough. For JavaScript-heavy pages (SPAs, dashboards), increase to 20–30 seconds to give the renderer time to complete.

### Handle partial failures gracefully

If one URL in a batch fails to crawl (e.g., it's behind a login wall or returns a 404), the API returns `null` for its `markdown` and `html` fields. Always check before processing:

```python
for page in pages:
    if page.markdown:
        # Process content
        pass
    else:
        print(f"Failed to fetch: {page.url}")
```

***

## Pricing

**\$1.00 per 1,000 pages**

All new accounts receive \$100 in free credits to get started. Pricing is simple and based on the number of pages you fetch.

**What's included:**

* Batch multiple URLs per request
* Clean Markdown or raw HTML output
* Python SDK, MCP Server, and REST API access
* No browser automation or HTML parsing required

For volume discounts, annual pricing, or enterprise features, visit [you.com/pricing](https://you.com/pricing) or contact [api@you.com](mailto:api@you.com).

***

## Next steps

<CardGroup cols={2}>
  <Card title="API Reference" icon="fa-regular fa-code" href="/docs/api-reference/contents">
    Full parameter reference, request/response schemas, and error codes
  </Card>

  <Card title="Search API Overview" icon="fa-regular fa-search" href="/docs/search/overview">
    Pair search results with livecrawl to get full content alongside real-time web data
  </Card>

  <Card title="Quickstart" icon="fa-regular fa-play" href="/docs/quickstart">
    Get your API key and make your first call in under five minutes
  </Card>

  <Card title="Python SDK" icon="fa-brands fa-python" href="/docs/sdks/python-sdk">
    Use the official SDK for cleaner integration
  </Card>
</CardGroup>