Cypher Language Reference

Graph schema, Cypher syntax, functions, procedures, threat intelligence, and best practices for querying WhisperGraph.

Updated April 2026

Cypher Language Reference Documentation

Schema guide

Node labels

LabelDescriptionExample values
HOSTNAMEFully-qualified domain names, subdomains, mail server nameswww.google.com, ns1.cloudflare.com
IPV4IPv4 addresses1.1.1.1, 142.250.64.100
IPV6IPv6 addresses2606:4700::6810:84e5
PREFIXIP CIDR blocks142.250.64.0/24
REGISTERED_PREFIXRIR-allocated IP blocks (virtual, resolved at query time)1.1.1.0/24
ANNOUNCED_PREFIXBGP-announced prefixes (virtual, resolved at query time)104.16.128.0/20
ASNAutonomous system numbersAS13335, AS15169
ASN_NAMEHuman-readable AS organization namesCLOUDFLARENET - Cloudflare, Inc.
TLDTop-level domainscom, net, org, io
TLD_OPERATORTLD registry operatorsVeriSign, Inc.
REGISTRARDomain registrars (IANA ID format)iana:292 (MarkMonitor)
EMAILWHOIS contact email addressesdomains@cloudflare.com
PHONEWHOIS contact phone numbers (E.164)+14158675825
ORGANIZATIONOrganizations from WHOIS recordscloudflare hostmaster
CITYGeoIP city with country codeMountain View, US
COUNTRYISO 3166-1 alpha-2 country codesUS, DE, AU
RIRRegional Internet RegistriesARIN, RIPENCC, APNIC, LACNIC, AFRINIC
DNSSEC_ALGORITHMDNSSEC signing algorithmsECDSAP256SHA256, RSASHA256
FEED_SOURCEThreat intelligence feed sources (virtual)Spamhaus DROP, Feodo Tracker
CATEGORYThreat feed categories (virtual)C2 Servers, Phishing

Edge types

DNS resolution

Edge typeFromToDescription
RESOLVES_TOHOSTNAMEIPV4/IPV6DNS A/AAAA records
CHILD_OFHOSTNAMEHOSTNAME/TLDDomain hierarchy (sub.example.com -> example.com -> com)
ALIAS_OFHOSTNAMEHOSTNAMECNAME records
NAMESERVER_FORHOSTNAMEHOSTNAMENS delegation (nameserver serves the target domain)
MAIL_FORHOSTNAMEHOSTNAMEMX records (mail server handles mail for the target domain)
SIGNED_WITHHOSTNAMEDNSSEC_ALGORITHMDNSSEC signing algorithm

BGP and routing

Edge typeFromToDescription
ANNOUNCED_BYIPV4/PREFIXANNOUNCED_PREFIXBGP announcement (virtual, resolved at query time)
ROUTESASNANNOUNCED_PREFIXASN routes this prefix (virtual)
BELONGS_TOIPV4PREFIX/REGISTERED_PREFIX/ANNOUNCED_PREFIXIP membership in a prefix block
PEERS_WITHASNASNBGP peering session (virtual)
HAS_NAMEASNASN_NAMENetwork operator name (virtual)
HAS_COUNTRYASN/PREFIXCOUNTRYCountry assignment

WHOIS and registration

Edge typeFromToDescription
HAS_REGISTRARHOSTNAMEREGISTRARCurrent domain registrar
PREV_REGISTRARHOSTNAMEREGISTRARPrevious domain registrar
REGISTERED_BYHOSTNAME/REGISTERED_PREFIXORGANIZATIONWHOIS registrant organization
HAS_EMAILHOSTNAMEEMAILWHOIS contact email
HAS_PHONEHOSTNAMEPHONEWHOIS contact phone

GeoIP

Edge typeFromToDescription
LOCATED_INIPV4CITYGeoIP city location
LOCATED_INCITYCOUNTRYCity to country mapping

Threat intelligence

Edge typeFromToDescription
LISTED_INIPV4/HOSTNAMEFEED_SOURCEIP or hostname appears in this threat feed (virtual)
BELONGS_TOFEED_SOURCECATEGORYFeed classified under this category

Web

Edge typeFromToDescription
LINKS_TOHOSTNAMEHOSTNAMEHyperlink between hostnames (from web crawl data)

SPF

Edge typeFromToDescription
SPF_INCLUDEHOSTNAMEHOSTNAMESPF include: mechanism
SPF_IPHOSTNAMEPREFIXSPF ip4: / ip6: mechanism
SPF_AHOSTNAMEHOSTNAMESPF a: mechanism
SPF_MXHOSTNAMEHOSTNAMESPF mx: mechanism
SPF_REDIRECTHOSTNAMEHOSTNAMESPF redirect= modifier
SPF_EXISTSHOSTNAMEHOSTNAMESPF exists: mechanism

Other

Edge typeFromToDescription
OPERATESTLD_OPERATORTLDRegistry operator manages this TLD (virtual)

Entity relationship diagram

Entity Relationship DiagramEntity Relationship Diagram

Solid lines are physical edges stored on disk. Dashed lines are virtual edges computed at query time from live infrastructure and threat intelligence data.

Solid lines are physical edges stored on disk. Dashed lines are virtual edges computed at query time from live infrastructure and threat intelligence data.

Multi-hop path patterns

These are the most common traversal chains through the graph.

Domain to network owner:

HOSTNAME -[:RESOLVES_TO]-> IPV4 -[:ANNOUNCED_BY]-> ANNOUNCED_PREFIX -[:ROUTES]-> ASN -[:HAS_NAME]-> ASN_NAME

Domain to nameservers:

HOSTNAME(ns) -[:NAMESERVER_FOR]-> HOSTNAME(domain)

Domain to mail servers:

HOSTNAME(mx) -[:MAIL_FOR]-> HOSTNAME(domain)

IP to GeoIP location:

IPV4 -[:LOCATED_IN]-> CITY -[:LOCATED_IN]-> COUNTRY

IP to threat feeds:

IPV4 -[:LISTED_IN]-> FEED_SOURCE -[:BELONGS_TO]-> CATEGORY

Domain WHOIS chain:

HOSTNAME -[:HAS_REGISTRAR]-> REGISTRAR
HOSTNAME -[:HAS_EMAIL]-> EMAIL
HOSTNAME -[:REGISTERED_BY]-> ORGANIZATION

DNS hierarchy:

HOSTNAME -[:CHILD_OF]-> HOSTNAME(parent) -[:CHILD_OF]-> TLD

RIR allocation chain:

IPV4 -[:BELONGS_TO]-> REGISTERED_PREFIX -[:REGISTERED_BY]-> ORGANIZATION

Language reference

MATCH

Basic pattern matching. Always anchor your starting node with {name: "value"} on large labels like HOSTNAME and IPV4.

MATCH (n:HOSTNAME {name: "www.google.com"}) RETURN n.name
MATCH (h:HOSTNAME {name: "www.google.com"})-[:RESOLVES_TO]->(ip:IPV4)
RETURN h.name, ip.name LIMIT 10

OPTIONAL MATCH

Returns null for unmatched patterns instead of dropping the row. Use this for WHOIS fields and other sparse data where not every node has every relationship.

MATCH (h:HOSTNAME {name: "www.google.com"})
OPTIONAL MATCH (h)-[:HAS_CERTIFICATE]->(c)
RETURN h.name, c.name

This returns one row with c.name as null, rather than zero rows.

WHERE clause

Comparison operators:

=, <>, <, >, <=, >=

MATCH (n:TLD) WHERE n.name <> "com" RETURN n.name LIMIT 3

String predicates:

-- Prefix search (fast, uses index)
MATCH (n:HOSTNAME) WHERE n.name STARTS WITH "www.googl" RETURN n.name LIMIT 5

-- Substring search (fast)
MATCH (n:HOSTNAME) WHERE n.name CONTAINS "cloudflare" RETURN n.name LIMIT 5

-- Suffix search (fast on HOSTNAME and TLD, slow on other labels)
MATCH (n:HOSTNAME) WHERE n.name ENDS WITH ".google.com" RETURN n.name LIMIT 5

Regular expressions:

MATCH (n:HOSTNAME) WHERE n.name =~ "www\\.google\\.com" RETURN n.name LIMIT 1

Regex uses full-match semantics (not substring). Nested quantifiers are rejected for safety. Maximum pattern length is 1000 characters. Prefer STARTS WITH, ENDS WITH, or CONTAINS instead of regex when possible.

Logical operators: AND, OR, NOT, XOR

WITH 5 AS x WHERE x > 3 AND x < 10 RETURN x

NULL checks: IS NULL, IS NOT NULL

WITH null AS x RETURN x IS NULL AS result

List membership: IN

MATCH (n:ASN) WHERE n.name IN ["AS13335", "AS15169"] RETURN n.name

RETURN

Projects columns from matched patterns. Supports aliases with AS and DISTINCT.

MATCH (h:HOSTNAME {name: "www.google.com"})-[:RESOLVES_TO]->(ip)
RETURN DISTINCT ip.name AS address
MATCH (h:HOSTNAME {name: "www.google.com"})-[:RESOLVES_TO]->(ip)
WITH h.name AS host, collect(ip.name) AS ips
RETURN host, ips

WITH

Pipes results between query stages. You can aggregate, filter, and reshape data mid-query.

MATCH (h:HOSTNAME {name: "www.google.com"})-[:RESOLVES_TO]->(ip)
WITH h.name AS host, count(ip) AS ipCount
RETURN host, ipCount
MATCH (sub:HOSTNAME)-[:CHILD_OF]->(h:HOSTNAME {name: "github.com"})
WITH sub LIMIT 5000
MATCH (sub)-[:RESOLVES_TO]->(ip:IPV4)
RETURN DISTINCT ip.name LIMIT 30

ORDER BY, LIMIT, SKIP

Sort results and paginate. Always use LIMIT on queries against large labels.

MATCH (n:TLD) RETURN n.name ORDER BY n.name ASC LIMIT 5
MATCH (n:COUNTRY) RETURN n.name ORDER BY n.name SKIP 2 LIMIT 3

Both SKIP N LIMIT M and LIMIT M SKIP N orderings are supported.

UNWIND

Turns a list into individual rows. This is the batch lookup pattern.

UNWIND ["google.com", "cloudflare.com", "microsoft.com"] AS domain
MATCH (h:HOSTNAME {name: domain})
OPTIONAL MATCH (h)-[:HAS_REGISTRAR]->(r:REGISTRAR)
RETURN domain, collect(DISTINCT r.name) AS registrars

You can pass dozens or hundreds of indicators in a single UNWIND list.

UNION

Combines results from multiple MATCH clauses. UNION deduplicates; UNION ALL keeps duplicates.

MATCH (n:TLD {name: "com"}) RETURN n.name
UNION
MATCH (n:TLD {name: "net"}) RETURN n.name

CALL subqueries

Correlated subqueries import variables with WITH.

MATCH (h:HOSTNAME {name: "www.google.com"})
CALL { WITH h MATCH (h)-[:RESOLVES_TO]->(ip) RETURN count(ip) AS ipCount }
RETURN h.name, ipCount

CALL is also used to invoke procedures:

CALL explain("185.220.101.1")
CALL whisper.history("cloudflare.com")
CALL whisper.quota()
CALL db.labels()
CALL db.relationshipTypes()

EXPLAIN

Shows the query plan without executing the query.

EXPLAIN MATCH (h:HOSTNAME {name: "www.google.com"})-[:RESOLVES_TO]->(ip) RETURN ip.name
ProduceResult([ip.name])
  Project([ip.name])
    Expand(h-[RESOLVES_TO]->ip)
      NodeLookup(h={name:'www.google.com'}:HOSTNAME)

Use EXPLAIN to verify the engine is using an indexed lookup rather than a label scan. PROFILE returns the same plan plus row counts per operator.

CASE / WHEN

Conditional expressions.

RETURN CASE WHEN 1 > 0 THEN "positive" ELSE "negative" END AS result
MATCH (h:HOSTNAME {name: "cloudflare.com"})-[:RESOLVES_TO]->(ip:IPV4)
OPTIONAL MATCH (ip)-[:LISTED_IN]->(f:FEED_SOURCE)
RETURN ip.name,
       CASE WHEN f IS NOT NULL THEN "listed" ELSE "clean" END AS status
LIMIT 5

Nested CASE expressions are supported.

shortestPath

Finds the minimum-hop path between two nodes.

MATCH (a:HOSTNAME {name: "cloudflare.com"}), (b:HOSTNAME {name: "google.com"})
MATCH p = shortestPath((a)-[*1..6]-(b))
RETURN length(p) AS hops, [n IN nodes(p) | n.name] AS path

Always specify a bounded path length like [*1..6]. Unbounded paths may be slow or time out. allShortestPaths() is also supported and returns all paths of the shortest length.

EXISTS subqueries

MATCH (h:HOSTNAME {name: "www.google.com"})
WHERE EXISTS { (h)-[:RESOLVES_TO]->() }
RETURN h.name

The function-style EXISTS(expr) predicate is also supported. It returns true if the expression is non-null:

MATCH (h:HOSTNAME {name: "google.com"}) RETURN EXISTS(h.name) AS has_name

COUNT subqueries

MATCH (h:HOSTNAME {name: "www.google.com"})
RETURN h.name, COUNT { (h)-[:RESOLVES_TO]->() } AS ipCount

Note: COUNT{}, COLLECT{}, and EXISTS{} subqueries cannot be used directly in ORDER BY. Pre-compute in a WITH clause, then sort by the alias.

COLLECT subqueries

MATCH (h:HOSTNAME {name: "www.google.com"})
RETURN h.name, COLLECT { MATCH (h)-[:RESOLVES_TO]->(ip) RETURN ip.name } AS ips

List comprehensions

RETURN [x IN range(1, 10) WHERE x % 2 = 0] AS evens
WITH [1, 2, 3, 4, 5] AS nums
RETURN [x IN nums WHERE x > 2 | x * 10] AS filtered

Pattern comprehensions

MATCH (h:HOSTNAME {name: "www.google.com"})
RETURN h.name, [(h)-[:RESOLVES_TO]->(ip) | ip.name] AS ips

Function reference

Aggregation

FunctionDescriptionExample
count(*)Count rowsMATCH (n:TLD) RETURN count(*)
count(DISTINCT x)Count unique valuesMATCH (n:COUNTRY) RETURN count(DISTINCT n.name)
collect(x)Collect values into a listMATCH (n:RIR) RETURN collect(n.name)
sum(x)Sum numeric valuesUNWIND [1,2,3] AS x RETURN sum(x)
avg(x)AverageUNWIND [1,2,3] AS x RETURN avg(x)
min(x)MinimumUNWIND [3,1,4] AS x RETURN min(x)
max(x)MaximumUNWIND [3,1,4] AS x RETURN max(x)
stdev(x)Sample standard deviationUNWIND [1,2,3,4,5] AS x RETURN stdev(x)
stdevp(x)Population standard deviationUNWIND [1,2,3,4,5] AS x RETURN stdevp(x)
percentileDisc(x, p)Discrete percentileUNWIND [1,2,3,4,5] AS x RETURN percentileDisc(x, 0.5)
percentileCont(x, p)Continuous percentileUNWIND [1,2,3,4,5] AS x RETURN percentileCont(x, 0.9)

Note: min() and max() work on both aggregated rows and list literals.

String

FunctionExampleResult
toUpper("hello")"HELLO"
toLower("HELLO")"hello"
trim(" hello ")"hello"
lTrim(" hello")"hello"
rTrim("hello ")"hello"
replace("hello world", "world", "cypher")"hello cypher"
substring("hello world", 6)"world"
substring("hello world", 0, 5)"hello"
split("a.b.c", ".")["a", "b", "c"]
left("hello", 3)"hel"
right("hello", 3)"llo"
reverse("hello")"olleh"
size("hello")5

Numeric

FunctionExampleResult
abs(-42)42
ceil(3.14)4.0
floor(3.99)3.0
round(3.5)4
sign(-42)-1
rand()Random float 0..1
log(e())1.0
log10(100)2.0
exp(1)2.718...
sqrt(144)12.0
e()2.718...
pi()3.14159...

Arithmetic operators: +, -, *, /, %, ^ (exponent), + (string concatenation).

Trigonometric

sin(), cos(), tan(), asin(), acos(), atan(), atan2(), degrees(), radians()

All accept and return radians, except degrees() which converts radians to degrees, and radians() which converts degrees to radians.

Geospatial

FunctionDescription
point({latitude: 37.77, longitude: -122.42})Create a geographic point
point.distance(p1, p2)Haversine distance in meters between two points

Collection

FunctionExampleResult
head([1,2,3])1
last([1,2,3])3
tail([1,2,3])[2,3]
size([1,2,3])3
reverse([1,2,3])[3,2,1]
range(1, 5)[1,2,3,4,5]
range(0, 10, 2)[0,2,4,6,8,10]
keys({a: 1, b: 2})["a", "b"]

List predicates

FunctionExampleResult
all(x IN list WHERE pred)all(x IN [2,4,6] WHERE x % 2 = 0)true
any(x IN list WHERE pred)any(x IN [1,2,3] WHERE x > 2)true
none(x IN list WHERE pred)none(x IN [1,2,3] WHERE x > 5)true
single(x IN list WHERE pred)single(x IN [1,2,3] WHERE x = 2)true
reduce(acc=init, x IN list | expr)reduce(t=0, x IN [1,2,3,4,5] | t+x)15

Node and relationship

FunctionDescription
labels(n)Returns the list of labels on a node, e.g. ["HOSTNAME"]
type(r)Returns edge type name, e.g. "RESOLVES_TO"
id(n)Returns node ID
properties(n)Returns all properties as a map
startNode(r)Source node of a relationship
endNode(r)Target node of a relationship
nodes(p)All nodes in a path
relationships(p)All relationships in a path
length(p)Number of edges in a path

Type conversion

FunctionExampleResult
toString(42)"42"
toInteger("42")42
toFloat("3.14")3.14
toBoolean("true")true
toIntegerList(["1","2"])[1, 2]
toFloatList(["1.1","2.2"])[1.1, 2.2]
toStringList([1,2])["1", "2"]
toBooleanList(["true","false"])[true, false]
isEmpty("")true
coalesce(null, null, 42)42
randomUUID()UUID string

Date and time

FunctionDescription
datetime()Current UTC datetime
date()Current date
timestamp()Current epoch milliseconds
datetime("2024-01-15T10:30:00Z")Parse an ISO 8601 datetime string
duration("P1Y2M3D")Create a duration from ISO 8601
duration.between(dt1, dt2)Duration between two datetimes

Property access on datetime values: .year, .month, .day, .hour, .minute, .second, .epochMillis, .epochSeconds.


Features

Threat intelligence

Whisper Graph indexes 40+ threat intelligence feeds across 18 categories. This data comes from whisper-feeds, a companion service that aggregates live threat feed data from public and commercial sources, with hourly incremental and daily full refresh cycles.

The threat intelligence layer works like this:

  • IPs and hostnames that appear in threat feeds are connected to FEED_SOURCE nodes via LISTED_IN virtual edges.
  • Each FEED_SOURCE belongs to one or more CATEGORYs (e.g., "C2 Servers", "Phishing", "General Blacklists").
  • ASN nodes may carry threat properties like maxThreatScore and overallThreatLevel.
  • The explain() procedure computes a composite score that factors in feed count, feed weights, recency, and network density.

Querying threat feeds directly:

MATCH (ip:IPV4 {name: "185.220.101.1"})-[:LISTED_IN]->(f:FEED_SOURCE)
RETURN ip.name, f.name
[
  {"ip.name": "185.220.101.1", "f.name": "Dan Tor Exit"},
  {"ip.name": "185.220.101.1", "f.name": "FireHOL Level 2"},
  {"ip.name": "185.220.101.1", "f.name": "Spamhaus DROP"}
]

Scored threat assessment with explain():

explain() accepts IPs, domains, ASNs, and CIDR ranges.

CALL explain("185.220.101.1")
[{
  "indicator": "185.220.101.1",
  "type": "ip",
  "found": true,
  "score": 5.28,
  "level": "INFO",
  "explanation": "185.220.101.1 is listed in 3 threat feed(s). Score 5.3 (Informational - minimal risk).",
  "factors": [
    "Listed in 3 source(s) with combined weight 2.20",
    "Base score: 2.20 x log2(3 + 1) = 4.40",
    "Recency boost: x1.2 (last seen just now)"
  ]
}]
CALL explain("cloudflare.com")
[{
  "indicator": "cloudflare.com",
  "type": "domain",
  "found": true,
  "score": 0.0,
  "level": "NONE",
  "explanation": "Not listed in any threat intelligence feed"
}]
CALL explain("AS60729")
[{
  "indicator": "AS60729",
  "type": "asn",
  "found": true,
  "score": 0.0,
  "level": "NONE",
  "explanation": "AS60729 (TORSERVERS-NET, DE) has a reputation score of 48.5 (Suspicious)."
}]
CALL explain("185.220.101.0/24")
[{
  "indicator": "185.220.101.0/24",
  "type": "network",
  "found": true,
  "score": 0.0,
  "level": "MEDIUM",
  "explanation": "Network 185.220.101.0/24 contains 167 listed IPs across multiple threat feeds."
}]

The level field returns one of: NONE, INFO, LOW, MEDIUM, HIGH, CRITICAL.

Feed sources (all 40):

FeedCategory
AlienVault ReputationReputation
Binary Defense BanlistGeneral Blacklists
Blocklist.de AllGeneral Blacklists
Blocklist.de MailSpam
Blocklist.de SSHBrute Force
Botvrij DomainsMalicious Domains
Botvrij Dst IPsC2 Servers
Brute Force BlockerBrute Force
C2 Intel 30dC2 Servers
C2 TrackerC2 Servers
CERT.pl DomainsMalicious Domains
CINS ScoreGeneral Blacklists
Cloudflare Radar Top 1MPopularity/Trust
DNS RD AbuseGeneral Blacklists
Dan Tor ExitTOR Network
ET Compromised IPsGeneral Blacklists
Feodo TrackerC2 Servers
FireHOL Abusers 1dGeneral Blacklists
FireHOL AnonymousProxies
FireHOL Level 1General Blacklists
FireHOL Level 2General Blacklists
FireHOL Level 3General Blacklists
FireHOL WebClientGeneral Blacklists
GreenSnow BlacklistGeneral Blacklists
Hagezi LightAd/Tracking Blocklists
Hagezi ProAd/Tracking Blocklists
IPsumGeneral Blacklists
InterServer RBLGeneral Blacklists
MalwareBazaar RecentMalware Distribution
OpenPhish FeedPhishing
SSH Client AttacksBrute Force
SSH Password AuthBrute Force
SSL IP BlacklistGeneral Blacklists
Spamhaus DROPGeneral Blacklists
Spamhaus EDROPGeneral Blacklists
StevenBlack HostsAd/Tracking Blocklists
ThreatFox IOCsC2 Servers
Tor Exit NodesTOR Network
Tranco Top 1MPopularity/Trust
URLhaus RecentMalware Distribution

Categories (all 18):

Ad/Tracking Blocklists, Anonymization Infrastructure, Attack Sources, Brute Force, C2 Servers, General Blacklists, Malicious Domains, Malicious Infrastructure, Malware Distribution, Phishing, Popularity/Trust, Proxies, Reference Data, Reputation, Spam, TOR Network, Threat Intelligence, VPNs.

Domain history

The whisper.history() procedure returns timestamped WHOIS and BGP snapshots for an indicator.

CALL whisper.history("cloudflare.com")
[
  {
    "indicator": "cloudflare.com",
    "type": "domain",
    "queryTime": "2024-06-13 18:31:16",
    "createDate": "2009-02-17",
    "updateDate": "2024-01-09",
    "expiryDate": "2033-02-17",
    "registrar": "CloudFlare, Inc.",
    "nameServers": "..."
  },
  {
    "indicator": "cloudflare.com",
    "type": "domain",
    "queryTime": "2020-05-01 07:03:56",
    "createDate": "2009-02-17",
    "registrar": "Cloudflare, Inc."
  }
]

Supported indicator types: domain, IP, ASN, CIDR, hash.

History works on registrable domains (e.g., google.com), not subdomains (www.google.com). Subdomains return empty results.

Quota and plan info

CALL whisper.quota()

Returns your current plan tier, hourly query limit, usage count, maximum traversal depth, and concurrent query count.

Schema introspection

Discover what is in the graph without knowing the schema up front.

-- All node labels with counts
CALL db.labels()

Returns every label in the graph with its count, including virtual labels (REGISTERED_PREFIX, ANNOUNCED_PREFIX, FEED_SOURCE, CATEGORY).

-- All edge types with counts
CALL db.relationshipTypes()

Returns every edge type with its count, including virtual edge types (LISTED_IN, ANNOUNCED_BY, OPERATES).

-- All property keys
CALL db.propertyKeys()

Returns all known property keys in the graph.

-- Full schema summary
CALL db.schema()

Returns labels, relationship types, and property keys in a single response.

These are all instant lookups, regardless of graph size.


Best practices

General rules

  • Anchor your starting node. MATCH (n:HOSTNAME {name: "example.com"}) does an indexed lookup. MATCH (n:HOSTNAME) scans billions of nodes.
  • Always use LIMIT. Especially on traversals that could fan out (LINKS_TO, RESOLVES_TO on CDN IPs). Start small and increase if needed.
  • Use OPTIONAL MATCH for WHOIS fields. Not every domain has a registrar, email, or phone. A required MATCH on a missing field returns zero rows and hides other results.
  • Use count() before pulling large result sets. Check cardinality first to avoid unexpectedly large responses.
  • Use UNWIND for batch lookups. Pass lists of indicators in a single request rather than making one request per indicator.
  • Specify edge types explicitly. [:RESOLVES_TO] is faster than [r] because the engine does not need to check all edge types.
  • Use ANNOUNCED_BY for current BGP routing. Use BELONGS_TO for the registered RIR allocation. They return different prefix types and may give different results.
  • Anchor LINKS_TO queries. The web link graph is one of the largest datasets. Queries without an anchored starting node will time out.
  • Avoid CONTAINS with special characters. Characters like & cause slow full-text scans. Use STARTS WITH or exact match when possible.

Do this / not that

Do thisNot thatWhy
MATCH (h:HOSTNAME {name: "example.com"})MATCH (h:HOSTNAME) WHERE h.name = "example.com"Inline property gets an indexed lookup
Always add LIMITOpen-ended traversalsPrevents timeout on billion-scale labels
OPTIONAL MATCH for WHOIS fieldsMATCH for sparse relationshipsAvoids losing rows when fields are missing
MATCH (sub)-[:CHILD_OF]->(h {name: "x.com"})WHERE h.name ENDS WITH ".x.com"CHILD_OF uses an indexed edge; ENDS WITH scans
STARTS WITH "www.goo"=~ "www\\.goo.*"STARTS WITH uses the FST index; regex does not
Pre-compute in WITH, then ORDER BY aliasCOUNT{} or COLLECT{} in ORDER BYSubquery expressions in ORDER BY are not supported