Source policy
advanced_settings.source_policy lets you constrain a search or a
research task to the sources you trust. It applies to /v1/search,
/v1/task (via the underlying searches it runs), and any
/v1/monitors that runs depth: deep underneath.
Fields
Filtering happens after the SERP fetch and before any LLM-based re-ranking or content extraction. Results dropped here do not consume SKU credits past the initial search.
Examples
Trust a short list of sources
Drop common low-signal hosts
Window by recency
You can combine all four fields. Order does not matter — filtering is deterministic.
How it interacts with other settings
freshness(hour,day,week,month,year) is a Google query-time filter that runs before the SERP returns. Use it for rough recency. Useafter_date/before_datefor precise windows.category(news,research_paper,personal_site, …) adds Google’stbmfilter or refines the query string. It is unaffected bysource_policy.objectivedrives re-ranking. Re-ranking sees only the rows that survivedsource_policy, so the policy is a strict pre-filter.limitis applied last. Ifsource_policydrops every result, you get an empty array, not a503.
When publish dates are missing
after_date and before_date look at the publish_date field on each
result. Scout extracts that from the page’s meta tags and structured
data. Some sites do not publish a usable date; those results are kept
by default. If you want to be strict, drop the host from
include_domains or post-filter on your side.
Edge cases
- An empty
include_domains: []is treated the same as omitting the field — no filter is applied. Pass at least one host to enable the allowlist. - Hosts are matched case-insensitive and ignore leading
www.. - A leading dot (
.openai.com) is accepted and means the same thing asopenai.com. - Conflicting policy (
include_domains: ["openai.com"]andexclude_domains: ["openai.com"]) results in zero rows — the exclude wins.