> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://you.com/docs/llms.txt.
> For full documentation content, see https://you.com/docs/llms-full.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://you.com/docs/_mcp/server.

# Retrieve page content

## Overview

By default, search results include snippets — 100–200 words of extracted text per result. Enable **live crawling** to get the full page content: typically 2,000–10,000 words of clean HTML or Markdown per result.

This is what enables:

* Deep RAG with full document context
* Knowledge base construction from live web data
* Comprehensive content synthesis across sources
* Full article bodies for news results

## How it works

Add `livecrawl` to any search request. The API fetches each matching result's page in real time and attaches a `contents` object to it. You choose which result types to crawl and what format to return.

| Parameter           | Type    | Options                 | Description                                                                                                                         |
| ------------------- | ------- | ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| `livecrawl`         | string  | `web`, `news`, `all`    | Which result types to crawl                                                                                                         |
| `livecrawl_formats` | string  | `html`, `markdown`      | Format for returned content. Repeat to get both: `?livecrawl_formats=html&livecrawl_formats=markdown` (GET) or pass an array (POST) |
| `crawl_timeout`     | integer | `1`–`60` (default `10`) | Max seconds to wait per page                                                                                                        |

`markdown` is recommended for LLM use cases — it strips navigation, ads, and boilerplate HTML, leaving only the core content.

**Livecrawl is billed separately from the base Search API rate.** Each page crawled costs \$1.00 per 1,000 pages — the same rate as the [Contents API](/docs/contents/overview). A single call with `count=10` and `livecrawl=all` crawls up to 20 pages (10 web + 10 news), adding \$0.02 to the \$0.005 base call cost.

## Crawl web results

Set `livecrawl=web` to attach full page content to web results. The `contents.markdown` (or `contents.html`) field is added to each result that was successfully crawled.

```python
from youdotcom import You
from youdotcom.models import LiveCrawl, LiveCrawlFormats

with You(api_key_auth="api_key") as you:
  res = you.search.unified(
    query="transformer architecture explained",
    count=5,
    livecrawl=LiveCrawl.WEB,
    livecrawl_formats=LiveCrawlFormats.MARKDOWN,
  )

  if res.results and res.results.web:
    for result in res.results.web:
      print(f"{result.title}")
      print(f"  URL: {result.url}")
      if result.contents:
        print(f"  Content ({len(result.contents.markdown)} chars)")
        print(f"  Preview: {result.contents.markdown[:200]}...\n")
      else:
        print("  (No content retrieved)\n")
```

```typescript
import { You } from "@youdotcom-oss/sdk";
import { LiveCrawl, LiveCrawlFormats } from "@youdotcom-oss/sdk/models";

const you = new You({
  apiKeyAuth: "api_key",
});

async function run() {
  const result = await you.search({
    query: "transformer architecture explained",
    count: 5,
    livecrawl: LiveCrawl.Web,
    livecrawlFormats: LiveCrawlFormats.Markdown,
  });

  result.results?.web?.forEach((r) => {
    console.log(r.title);
    console.log(`  URL: ${r.url}`);
    if (r.contents?.markdown) {
      console.log(`  Content (${r.contents.markdown.length} chars)`);
      console.log(`  Preview: ${r.contents.markdown.slice(0, 200)}...\n`);
    } else {
      console.log("  (No content retrieved)\n");
    }
  });
}

run();
```

```curl
curl -G https://ydc-index.io/v1/search \
  -H "X-API-Key: api_key" \
  --data-urlencode "query=transformer architecture explained" \
  -d count=5 \
  -d livecrawl=web \
  -d livecrawl_formats=markdown
```

## Crawl news results

Set `livecrawl=news` to get full article bodies for news results. Combine with `freshness` for breaking news pipelines.

```python
from youdotcom import You
from youdotcom.models import Freshness, LiveCrawl, LiveCrawlFormats

with You(api_key_auth="api_key") as you:
  res = you.search.unified(
    query="semiconductor supply chain",
    freshness=Freshness.WEEK,
    count=5,
    livecrawl=LiveCrawl.NEWS,
    livecrawl_formats=LiveCrawlFormats.MARKDOWN,
  )

  if res.results and res.results.news:
    for article in res.results.news:
      print(f"{article.title}")
      if article.contents:
        print(article.contents.markdown[:400])
      print()
```

```typescript
import { You } from "@youdotcom-oss/sdk";
import { Freshness, LiveCrawl, LiveCrawlFormats } from "@youdotcom-oss/sdk/models";

const you = new You({
  apiKeyAuth: "api_key",
});

async function run() {
  const result = await you.search({
    query: "semiconductor supply chain",
    freshness: Freshness.Week,
    count: 5,
    livecrawl: LiveCrawl.News,
    livecrawlFormats: LiveCrawlFormats.Markdown,
  });

  result.results?.news?.forEach((article) => {
    console.log(article.title);
    if (article.contents?.markdown) {
      console.log(article.contents.markdown.slice(0, 400));
    }
    console.log();
  });
}

run();
```

```curl
curl -G https://ydc-index.io/v1/search \
  -H "X-API-Key: api_key" \
  --data-urlencode "query=semiconductor supply chain" \
  -d freshness=week \
  -d count=5 \
  -d livecrawl=news \
  -d livecrawl_formats=markdown
```

## Crawl both web and news

Use `livecrawl=all` to crawl every result type in one request.

```python
from youdotcom import You
from youdotcom.models import LiveCrawl, LiveCrawlFormats

with You(api_key_auth="api_key") as you:
  res = you.search.unified(
    query="quantum computing breakthroughs",
    count=5,
    livecrawl=LiveCrawl.ALL,
    livecrawl_formats=LiveCrawlFormats.MARKDOWN,
  )

  if res.results:
    for result in (res.results.web or []):
      if result.contents:
        print(f"[WEB] {result.title}: {len(result.contents.markdown)} chars")

    for result in (res.results.news or []):
      if result.contents:
        print(f"[NEWS] {result.title}: {len(result.contents.markdown)} chars")
```

```typescript
import { You } from "@youdotcom-oss/sdk";
import { LiveCrawl, LiveCrawlFormats } from "@youdotcom-oss/sdk/models";

const you = new You({
  apiKeyAuth: "api_key",
});

async function run() {
  const result = await you.search({
    query: "quantum computing breakthroughs",
    count: 5,
    livecrawl: LiveCrawl.All,
    livecrawlFormats: LiveCrawlFormats.Markdown,
  });

  result.results?.web?.forEach((r) => {
    if (r.contents?.markdown) {
      console.log(`[WEB] ${r.title}: ${r.contents.markdown.length} chars`);
    }
  });

  result.results?.news?.forEach((r) => {
    if (r.contents?.markdown) {
      console.log(`[NEWS] ${r.title}: ${r.contents.markdown.length} chars`);
    }
  });
}

run();
```

```curl
curl -G https://ydc-index.io/v1/search \
  -H "X-API-Key: api_key" \
  --data-urlencode "query=quantum computing breakthroughs" \
  -d count=5 \
  -d livecrawl=all \
  -d livecrawl_formats=markdown
```

## Control crawl timeout

By default the crawler waits up to 10 seconds per page. For latency-sensitive applications, reduce `crawl_timeout`. For complex or slow-loading pages, increase it (up to 60 seconds).

```python
from youdotcom import You
from youdotcom.models import LiveCrawl, LiveCrawlFormats

with You(api_key_auth="api_key") as you:
  # Low-latency pipeline: only wait 3 seconds per page
  res = you.search.unified(
    query="latest Python releases",
    count=5,
    livecrawl=LiveCrawl.WEB,
    livecrawl_formats=LiveCrawlFormats.MARKDOWN,
    crawl_timeout=3,
  )

  if res.results and res.results.web:
    for result in res.results.web:
      status = "crawled" if result.contents else "skipped (timeout)"
      print(f"{result.title} — {status}")
```

```typescript
import { You } from "@youdotcom-oss/sdk";
import { LiveCrawl, LiveCrawlFormats } from "@youdotcom-oss/sdk/models";

const you = new You({
  apiKeyAuth: "api_key",
});

async function run() {
  // Low-latency pipeline: only wait 3 seconds per page
  const result = await you.search({
    query: "latest Python releases",
    count: 5,
    livecrawl: LiveCrawl.Web,
    livecrawlFormats: LiveCrawlFormats.Markdown,
    crawlTimeout: 3,
  });

  result.results?.web?.forEach((r) => {
    const status = r.contents ? "crawled" : "skipped (timeout)";
    console.log(`${r.title} — ${status}`);
  });
}

run();
```

```curl
curl -G https://ydc-index.io/v1/search \
  -H "X-API-Key: api_key" \
  --data-urlencode "query=latest Python releases" \
  -d count=5 \
  -d livecrawl=web \
  -d livecrawl_formats=markdown \
  -d crawl_timeout=3
```

## HTML vs Markdown

| Format     | Best for                                                    |
| ---------- | ----------------------------------------------------------- |
| `markdown` | LLM prompts, RAG, text analysis — clean, no boilerplate     |
| `html`     | Rendering, scraping structured data, preserving page layout |

## Already have URLs?

If you have a list of URLs and don't need to search first, use the [Contents API](/docs/contents/overview) directly. It accepts URLs without a query and returns the same `markdown` or `html` content.

***

## Next steps

Fetch page content directly from URLs, no search query needed

Retrieve and filter real-time news results

View all parameters and response schemas

Back to Search API overview