Base filters
These are the field/value pairs you put at the very front of a
search, before the first |. They run as the implied search command
and decide which data Splunk pulls off disk. Getting them specific is
the highest-leverage thing you can do for performance.
| Field | What it selects |
|---|---|
index | Which index to read from. The biggest performance lever. |
sourcetype | The format/type of the data (e.g. access_combined). |
source | The file, directory, or input the event came from. |
host | The device the event originated on. |
index
Data lives in indexes. By default everything goes to main, but
well-run deployments partition data — web logs in one index, firewall
logs in another. Naming the index means Splunk only opens those buckets:
index=web
index=security
If you don't specify an index, Splunk searches the default set, which
is almost always more data than you need. Make index= the first thing
you type.
sourcetype
The source type classifies the data format. Events from different
sources often share a source type — source=/var/log/messages and a
syslog input on source=UDP:514 can both be
sourcetype=linux_syslog.
index=web sourcetype=access_combined
source and host
source is the specific input path; host is the originating device.
Use them to drill into a single file or machine:
index=os host=web-prod-03 source=/var/log/secure
Combining them
Base filters are AND-ed together implicitly. Stack them to pin down
exactly the data you want:
index=web sourcetype=access_combined host=web-prod-*
Why this matters
From Splunk's optimization guidance: partition data into separate
indexes if you'll rarely search across them, and search as
specifically as you can. The base filter stage is where both of those
pay off — narrow index + sourcetype means later commands work on a
fraction of the data.
Next: time modifiers — often an even bigger lever than the index.