Source policy

Tell Scout which domains and dates to keep, and which to drop.
View as Markdown

advanced_settings.source_policy lets you constrain a search or a research task to the sources you trust. It applies to /v1/search, /v1/task (via the underlying searches it runs), and any /v1/monitors that runs depth: deep underneath.

Fields

FieldTypeDefaultWhat it does
include_domainsstring[]nullKeep only results whose host is on this list. Subdomains match — "openai.com" keeps platform.openai.com.
exclude_domainsstring[]nullDrop results whose host is on this list. Applied after include_domains.
after_dateYYYY-MM-DDnullKeep only results published on or after this date. Filtered against the page’s detected publish date — best-effort, since not every site exposes one.
before_dateYYYY-MM-DDnullKeep only results published on or before this date.

Filtering happens after the SERP fetch and before any LLM-based re-ranking or content extraction. Results dropped here do not consume SKU credits past the initial search.

Examples

Trust a short list of sources

1{
2 "queries": ["climate policy"],
3 "advanced_settings": {
4 "source_policy": {
5 "include_domains": ["nature.com", "science.org", "nytimes.com"]
6 }
7 }
8}

Drop common low-signal hosts

1{
2 "queries": ["best vector database 2026"],
3 "objective": "comparison of features and pricing",
4 "advanced_settings": {
5 "source_policy": {
6 "exclude_domains": ["reddit.com", "quora.com", "medium.com"]
7 }
8 }
9}

Window by recency

1{
2 "queries": ["LLM safety incidents"],
3 "category": "news",
4 "advanced_settings": {
5 "source_policy": {
6 "after_date": "2026-01-01"
7 }
8 }
9}

You can combine all four fields. Order does not matter — filtering is deterministic.

How it interacts with other settings

  • freshness (hour, day, week, month, year) is a Google query-time filter that runs before the SERP returns. Use it for rough recency. Use after_date / before_date for precise windows.
  • category (news, research_paper, personal_site, …) adds Google’s tbm filter or refines the query string. It is unaffected by source_policy.
  • objective drives re-ranking. Re-ranking sees only the rows that survived source_policy, so the policy is a strict pre-filter.
  • limit is applied last. If source_policy drops every result, you get an empty array, not a 503.

When publish dates are missing

after_date and before_date look at the publish_date field on each result. Scout extracts that from the page’s meta tags and structured data. Some sites do not publish a usable date; those results are kept by default. If you want to be strict, drop the host from include_domains or post-filter on your side.

Edge cases

  • An empty include_domains: [] is treated the same as omitting the field — no filter is applied. Pass at least one host to enable the allowlist.
  • Hosts are matched case-insensitive and ignore leading www..
  • A leading dot (.openai.com) is accepted and means the same thing as openai.com.
  • Conflicting policy (include_domains: ["openai.com"] and exclude_domains: ["openai.com"]) results in zero rows — the exclude wins.