Search & Indexing - Domain Specification

First Introduced: V1.2 Status: Specification Complete Last Updated: 2025-10-25

Overview

Search & Indexing provides the query and ranking infrastructure that enables users, partners, and internal tools to find relevant listings efficiently and accurately. This domain bridges operational data (Spaces, Attributes, Availability, Pricing, Tags) into materialized indexes designed for fast, filterable retrieval — whether searching for "4-bedroom oceanfront villas in Barbados" or syncing results for a partner channel feed.

The domain separates the concerns of search data preparation (indexing) from search execution (querying), enabling independent scaling of write-heavy reindexing operations and read-heavy query workloads.

Responsibilities

This domain IS responsible for:

Maintaining denormalized search indexes from authoritative domain data
Processing search queries with filters, facets, and sorting
Managing search relevance and ranking algorithms
Tracking search query logs for analytics and optimization
Supporting geospatial and full-text search capabilities
Ensuring search index freshness through event-driven updates
Providing faceted filtering and aggregation support

This domain is NOT responsible for:

Authoritative storage of Space, Pricing, or Availability data (→ respective domains)
User preference tracking or personalization (→ Analytics domain)
Content generation or description text (→ Content domain)
Access control enforcement (→ Authorization domain)
Search UI rendering (→ Frontend applications)

Relationships

Depends On:

Supply - Space and Unit entities to index
Content & Metadata - Descriptions, amenities, attributes, tags
Pricing - Price ranges for filtering
Availability - Calendar data for availability filters
Identity & Tenancy - Org/Account scoping

Depended On By:

Channels - Search results syndicated to partner feeds
Analytics - Query logs feed behavioral analytics
Public search APIs and internal admin tools

Related Domains:

System Architecture - Index storage technology choices (PostgreSQL FTS vs. Elasticsearch)

Core Concepts

Entity: SearchIndex

Purpose: Flattened, query-optimized representation of a Space or Unit designed for fast keyword and filter-based retrieval.

Key Attributes:

id (UUID, primary key)
space_id (UUID, foreign key → spaces.id, unique)
org_id (UUID, foreign key → organizations.id)
account_id (UUID, foreign key → accounts.id)
name (TEXT) - Space name for text search
headline (TEXT) - Short description
description_text (TEXT) - Full searchable description content
location_lat, location_lng (DECIMAL) - Geospatial coordinates
country (VARCHAR) - ISO country code
region (VARCHAR) - State/province/region
city (VARCHAR) - City name
tags (TEXT[]) - Array of tag keys
amenities (TEXT[]) - Array of amenity keys
price_min (INTEGER) - Minimum nightly rate in minor units (cents)
price_max (INTEGER) - Maximum nightly rate in minor units (cents)
capacity (INTEGER) - Maximum guest capacity
bedrooms (INTEGER) - Number of bedrooms
bathrooms (DECIMAL) - Number of bathrooms
property_type (VARCHAR) - villa | apartment | house | etc.
status (ENUM) - active | draft | inactive | deleted
featured (BOOLEAN) - Featured listing flag
search_vector (TSVECTOR) - PostgreSQL full-text search vector
last_indexed_at (TIMESTAMP) - When this record was last updated
source_version (INTEGER) - Version of source data indexed
created_at, updated_at (timestamps)

Relationships:

SearchIndex → Space (1:1, one index record per space)
SearchIndex → Org/Account (many-to-one for tenancy)

Lifecycle:

Created: When Space becomes active or searchable
Updated: On Space, Content, Pricing, or Availability changes (event-driven)
Deleted: When Space deleted or status changed to non-searchable

Business Rules:

Only Spaces with status='active' appear in public search results
Index must include org_id and account_id for authorization filtering
Full reindex required when schema changes
Stale indexes (>24h old) flagged for refresh

Entity: DiscoveryIndex

Purpose: Aggregated, query-ready snapshot combining availability, pricing calendars, and dynamic metadata for advanced filtering.

Key Attributes:

id (UUID, primary key)
space_id (UUID, foreign key → spaces.id, unique)
org_id (UUID, foreign key → organizations.id)
account_id (UUID, foreign key → accounts.id)
available_dates (DATE[]) - Array of available dates (next 180 days)
price_calendar (JSONB) - Date → price mapping: {"2025-11-01": 25000, "2025-11-02": 28000}
availability_score (DECIMAL) - Percentage available (0.0-1.0)
instant_bookable (BOOLEAN) - No owner approval required
min_stay_nights (INTEGER) - Minimum length of stay
max_stay_nights (INTEGER) - Maximum length of stay
check_in_days (INTEGER[]) - Day of week codes (0=Sunday, 6=Saturday)
attributes (JSONB) - Structured property attributes from Content domain
facets (JSONB) - Pre-computed facet values for filtering
rank_score (DECIMAL) - Computed relevance/quality score
last_availability_sync (TIMESTAMP) - When availability was last computed
last_pricing_sync (TIMESTAMP) - When pricing was last computed
created_at, updated_at (timestamps)

Relationships:

DiscoveryIndex → Space (1:1)
DiscoveryIndex → AvailabilityCalendar (derived, many-to-one)
DiscoveryIndex → RatePlan (derived, many-to-one)

Lifecycle:

Created: When Space activated for discovery
Updated: Nightly batch for availability; event-driven for pricing changes
Deleted: When Space removed from search

Business Rules:

available_dates refreshed daily for next 180-day window
price_calendar includes only base nightly rates (excludes fees)
availability_score computed as: (available days / 180)
Stale availability data (>48h) triggers warning flag

Entity: SearchQuery

Purpose: Log of executed search queries for analytics, optimization, and debugging.

Key Attributes:

id (UUID, primary key)
org_id (UUID, foreign key → organizations.id, nullable for public searches)
user_id (UUID, foreign key → users.id, nullable for anonymous)
session_id (VARCHAR) - Session identifier for tracking
query_text (TEXT) - Freeform search text (if any)
filters (JSONB) - Applied filters: {"country": "MX", "bedrooms": {">=": 3}}
sort_by (VARCHAR) - price | capacity | relevance | featured
sort_order (ENUM) - asc | desc
page (INTEGER) - Pagination page number
page_size (INTEGER) - Results per page
result_count (INTEGER) - Total matching results
result_ids (UUID[]) - Space IDs returned (first page)
latency_ms (INTEGER) - Query execution time
clicked_result_id (UUID, nullable) - Which result user clicked
click_position (INTEGER, nullable) - Position in results (1-indexed)
converted_to_booking (BOOLEAN) - Whether search led to booking
search_source (VARCHAR) - web | mobile | api | partner
executed_at (TIMESTAMP)
created_at (timestamp)

Relationships:

SearchQuery → User (many-to-one, optional)
SearchQuery → Org (many-to-one, optional for scoped searches)
SearchQuery → Space (many-to-many via result_ids)

Lifecycle:

Created: On every search query execution
Updated: When user clicks result or converts to booking
Retained: 90 days for analytics; aggregated to metrics afterward

Business Rules:

Log all queries regardless of result count (including zero results)
Store click and conversion events for relevance tuning
Anonymize personal data after retention period
Index performance data used for query optimization

Entity: SearchSynonym

Purpose: Manages query expansion rules to improve search recall by mapping alternative terms.

Key Attributes:

id (UUID, primary key)
org_id (UUID, foreign key → organizations.id, nullable for global)
term (VARCHAR, required) - Source search term
synonyms (TEXT[]) - Equivalent terms: ["pool", "swimming pool", "private pool"]
type (ENUM) - synonym | expansion | spelling_correction
language_code (VARCHAR, default 'en') - Language scope
is_bidirectional (BOOLEAN) - Apply synonym in both directions
is_active (BOOLEAN) - Enable/disable rule
created_by (UUID, foreign key → users.id)
created_at, updated_at (timestamps)

Relationships:

SearchSynonym → Org (many-to-one, optional for global rules)
SearchSynonym → User (many-to-one for audit)

Lifecycle:

Created: By admin or data team based on query analysis
Updated: When refining synonym sets or enabling/disabling
Deleted: Soft delete via is_active=false

Business Rules:

Global synonyms apply to all searches; org-specific override globals
Bidirectional synonyms create symmetric expansion
Maximum 20 synonyms per term to prevent explosion
Applied during query parsing before index lookup

Purpose: Defines filterable dimensions and their available values for faceted search interfaces.

Key Attributes:

id (UUID, primary key)
facet_key (VARCHAR, unique) - System identifier: country | bedrooms | amenities
display_name (VARCHAR) - Human-readable label
facet_type (ENUM) - categorical | numeric_range | boolean | date_range
source_field (VARCHAR) - SearchIndex field name
value_source (ENUM) - static | dynamic | computed
allowed_values (JSONB) - Predefined values for categorical facets
sort_order (INTEGER) - Display order in UI
is_filterable (BOOLEAN) - Appears in filter UI
is_sortable (BOOLEAN) - Can be used for sorting
aggregation_type (VARCHAR) - count | sum | avg | min | max
created_at, updated_at (timestamps)

Relationships:

Facet → SearchIndex (defines which fields are faceted)
Facet values computed dynamically from SearchIndex

Lifecycle:

Created: By system configuration or admin
Updated: When adding new facet dimensions
Static: Core facets (country, bedrooms) seeded at deployment

Business Rules:

Categorical facets show available value counts
Numeric range facets show min/max bounds
Facet values filtered by current query scope
Disabled facets hidden from UI but preserve data

Workflows

Workflow: Reindex Space

Trigger: Space, Content, Pricing, or Availability update event

Receive event from domain event bus
Fetch source data:
- Space entity (name, location, status, capacity)
- Description text from Content domain
- Amenities and Attributes from Content domain
- Price min/max from active RatePlan
- Tags from Space-Tag relationships
Transform to search document:
- Flatten nested structures
- Generate full-text search vector
- Compute derived fields (price range, availability score)
Upsert to SearchIndex:
- Update if exists, insert if new
- Set last_indexed_at timestamp
- Increment source_version
Update DiscoveryIndex (if applicable):
- Refresh availability flags if calendar changed
- Update price calendar if rates changed
Emit reindex.completed event
Log to ReindexTask table with status and duration

Postconditions:

SearchIndex record reflects current source data state
Index marked with timestamp for freshness monitoring
Changes immediately queryable (eventual consistency: <5 seconds)

Workflow: Execute Search Query

Trigger: API request with search parameters

Parse and validate request:
- Extract query text, filters, sort, pagination
- Validate filter syntax and values
- Apply authorization scope (org_id, account_id)
Apply query expansion:
- Look up synonyms for query terms
- Expand to include equivalent terms
Build search query:
- Full-text match on search_vector (if query text provided)
- Apply filters: country, region, price range, bedrooms, amenities, tags
- Apply availability filter (if date range specified)
- Apply sort order
Execute against SearchIndex + DiscoveryIndex:
- PostgreSQL: Use FTS + GIN indexes + JSONB operators
- Elasticsearch: Use query DSL with filters and aggregations
Compute facets:
- Aggregate available values for each facet dimension
- Count results per facet value
Apply pagination:
- Offset and limit results
Return results:
- Space IDs, names, hero images, summary data
- Facet counts for UI
- Total result count
Log query to SearchQuery table:
- Capture filters, result count, latency
- Associate with user/session if available

Postconditions:

Results returned to caller within SLA (<200ms P95)
Query logged for analytics
Facets reflect current data state

Workflow: Nightly Availability Sync

Trigger: Scheduled job (runs daily at 2:00 AM UTC)

Select all active Spaces with status='active'
For each Space:
- Query AvailabilityCalendar for next 180 days
- Identify available dates (no Bookings, Holds, or Blocks)
- Compute availability_score (available days / 180)
- Update DiscoveryIndex.available_dates array
- Set last_availability_sync timestamp
Batch commit updates (1000 spaces per transaction)
Emit availability_sync.completed event
Log metrics:
- Total spaces processed
- Duration
- Errors (if any)

Postconditions:

DiscoveryIndex reflects current availability state
Date-based availability filters accurate for next 180 days
Monitoring alerted if sync exceeds SLA (>30 minutes)

Business Rules

Index Freshness: SearchIndex updates must complete within 5 seconds of source data change (P95)
Availability Window: DiscoveryIndex tracks next 180 days; older data pruned
Tenant Isolation: All search queries filtered by authorized org_id/account_id scope
Public vs. Private: Only Spaces with status='active' appear in public search
Result Limits: Maximum 1000 results per query; use pagination for larger sets
Facet Limits: Maximum 100 unique values per facet dimension shown in UI
Query Timeout: Search queries must complete within 2 seconds or return partial results
Reindex Priority: Price/availability changes processed within 1 minute; content changes within 5 minutes
Synonym Precedence: Org-specific synonyms override global synonyms
Zero Results: Log all zero-result queries for synonym/expansion tuning

Version Progression

MVP.0: OUT OF SCOPE (Deferred)

Status: Search functionality deferred to post-launch

Rationale:

Initial launch supports direct listing URLs and admin panel
Manual curation replaces algorithmic discovery
Reduces MVP complexity and database load

Workaround:

Admin lists all properties in dashboard
Shared listing links via direct Space URLs
Manual recommendation by TVL team

V1.2: Full-Text Search with Elasticsearch

Status: Initial Search Implementation

Features:

PostgreSQL full-text search (FTS) using tsvector and GIN indexes
Basic filters: country, region, price range, bedrooms, bathrooms, capacity
Tag-based filtering (e.g., "Oceanfront", "Private Chef")
Amenity filtering (e.g., pool, wifi, parking)
Simple text search on name, headline, description
Event-driven reindexing from domain events
Query logging to SearchQuery table
Nightly availability sync to DiscoveryIndex

Entities Implemented:

search_indexes (core denormalized search table)
discovery_indexes (availability and pricing snapshots)
search_queries (query logs for analytics)

Search Backend:

PostgreSQL FTS for text search
JSONB and array operators for filters
Option to migrate to Elasticsearch if performance requires

Indexing Strategy:

Event-driven updates via domain event bus
Nightly batch sync for availability
Incremental updates for price changes

API Endpoints:

GET /api/search - Execute search with filters
GET /api/search/facets - Get available facet values
POST /admin/search/reindex - Trigger manual reindex

Performance Targets:

P50 latency: <100ms
P95 latency: <200ms
Index lag: <5 seconds for critical updates

V2.0: Faceted Search & Geospatial Queries

Status: Advanced Search Features

New Features:

Geospatial search:
- Radius-based queries ("within 10km of coordinates")
- Polygon boundary searches ("listings in this area")
- Geohash indexing for spatial performance
- PostGIS or Elasticsearch geo queries
Faceted filtering:
- Pre-computed facet counts (avoid expensive aggregations)
- Facet values scoped to current filter context
- Multi-select facets with AND/OR logic
- Numeric range sliders (price, capacity)
Advanced text search:
- Multi-language stemming and tokenization
- Phrase matching and proximity search
- Boosted fields (name > headline > description)
- Fuzzy matching for typo tolerance
Search relevance tuning:
- Configurable field weights
- Freshness boosting (recently updated listings)
- Quality signals (booking rate, reviews)
- Featured listing promotion
Synonym management:
- search_synonyms table populated with common variants
- Admin UI for synonym editing
- A/B testing synonym rule impact
Real-time availability:
- Per-day availability pricing in DiscoveryIndex
- Instant calendar-aware search (no nightly delay)
- Blocking rule awareness (min/max stay, check-in days)

New Entities:

facets (facet dimension definitions)
search_synonyms (query expansion rules)
search_index_queue (prioritized reindex job queue)

Performance Targets:

P95 latency: <150ms (with facets)
Support 10,000+ active listings
Real-time index updates (<1 second lag)

Migration to Elasticsearch:

Full index rebuild with Elasticsearch schema
Dual-write strategy during migration
A/B testing PostgreSQL vs. Elasticsearch performance
Rollback plan if issues detected

Future Enhancements

V2.1: Personalized Search

User preference tracking (past searches, bookings)
Collaborative filtering recommendations
Behavioral ranking adjustments
Saved searches and alerts

V2.2: Vector Search & Embeddings

Semantic search using embeddings ("luxury modern villa")
Image similarity search
AI-generated search suggestions
Natural language query parsing

V2.3: Multi-Channel Search

Channel-specific ranking and filtering
Partner API endpoints with custom facets
White-label search configuration per brand
Cross-org federated search (optional)

V3.0: Predictive & Analytics-Driven

Predicted availability based on booking patterns
Dynamic pricing integration (show real-time rates)
Search-to-book conversion optimization
Automated A/B testing of ranking algorithms

Operational Notes

Indexing Strategy

Event-Driven Updates:

Subscribe to domain events: space.updated, pricing.changed, availability.blocked, content.published
Queue reindex jobs with priority (availability > pricing > content)
Batch process jobs to reduce database load
Idempotent reindex operations (safe to retry)

Batch Sync:

Nightly full sync of availability (DiscoveryIndex)
Weekly full reindex verification (detect drift)
Monthly search index rebuild (schema evolution)

Monitoring:

Track reindex queue depth and lag
Alert on failed reindex jobs
Monitor index freshness per Space
Dashboard: avg index age, reindex throughput

Search Performance

Database Indexes (PostgreSQL):

CREATE INDEX idx_search_indexes_org_account ON search_indexes(org_id, account_id);
CREATE INDEX idx_search_indexes_status ON search_indexes(status) WHERE status = 'active';
CREATE INDEX idx_search_indexes_location ON search_indexes USING gist(ll_to_earth(location_lat, location_lng));
CREATE INDEX idx_search_indexes_fts ON search_indexes USING gin(search_vector);
CREATE INDEX idx_search_indexes_price ON search_indexes(price_min, price_max);
CREATE INDEX idx_search_indexes_capacity ON search_indexes(bedrooms, capacity);
CREATE INDEX idx_search_indexes_tags ON search_indexes USING gin(tags);
CREATE INDEX idx_search_indexes_amenities ON search_indexes USING gin(amenities);

CREATE INDEX idx_discovery_indexes_space ON discovery_indexes(space_id);
CREATE INDEX idx_discovery_indexes_available_dates ON discovery_indexes USING gin(available_dates);
CREATE INDEX idx_discovery_indexes_facets ON discovery_indexes USING gin(facets);

CREATE INDEX idx_search_queries_executed ON search_queries(executed_at DESC);
CREATE INDEX idx_search_queries_user_session ON search_queries(user_id, session_id);

Caching:

Cache popular searches in Redis (TTL: 5 minutes)
Cache facet counts for common filter combinations
Cache geo-boundary queries (city, region)
Invalidate cache on index updates

Query Optimization:

Limit full-text search to top 1000 results
Use pagination with cursor-based offsets for deep pages
Pre-filter by indexed columns before FTS
Avoid expensive JSONB queries in hot path

Data Retention

SearchQuery Logs:

Retain raw queries: 90 days
Aggregate to daily metrics: indefinite
Anonymize user_id after 30 days (GDPR compliance)
Export to data warehouse monthly

Index History:

Keep source_version for last 7 versions
Archive old indexes on schema migration
Retain reindex job logs: 30 days

Security

Authorization:

All queries filtered by org_id and account_id scope
Respect user Membership permissions
Public search shows only status='active' Spaces
Admin search shows all statuses (draft, inactive)

Data Exposure:

Sanitize error messages (no SQL leakage)
Rate limit search API (100 requests/minute per IP)
Log suspicious query patterns (SQL injection attempts)

MVP Mapping - Which MVP versions use this domain
Supply Domain - Source data for indexing
Content Domain - Descriptions and attributes
Availability Domain - Calendar data
Pricing Domain - Rate data
System Architecture - Technology choices
V1.2 Product Vision
V2.0 Roadmap

Overview​

Responsibilities​

Relationships​

Core Concepts​

Entity: SearchIndex​

Entity: DiscoveryIndex​

Entity: SearchQuery​

Entity: SearchSynonym​

Entity: Facet​

Workflows​

Workflow: Reindex Space​

Workflow: Execute Search Query​

Workflow: Nightly Availability Sync​

Business Rules​

Version Progression​

MVP.0: OUT OF SCOPE (Deferred)​

V1.2: Full-Text Search with Elasticsearch​

V2.0: Faceted Search & Geospatial Queries​

Future Enhancements​

V2.1: Personalized Search​

V2.2: Vector Search & Embeddings​

V2.3: Multi-Channel Search​

V3.0: Predictive & Analytics-Driven​

Operational Notes​

Indexing Strategy​

Search Performance​

Data Retention​

Security​

Related Documents​

Overview

Responsibilities

Relationships

Core Concepts

Entity: SearchIndex

Entity: DiscoveryIndex

Entity: SearchQuery

Entity: SearchSynonym

Entity: Facet

Workflows

Workflow: Reindex Space

Workflow: Execute Search Query

Workflow: Nightly Availability Sync

Business Rules

Version Progression

MVP.0: OUT OF SCOPE (Deferred)

V1.2: Full-Text Search with Elasticsearch

V2.0: Faceted Search & Geospatial Queries

Future Enhancements

V2.1: Personalized Search

V2.2: Vector Search & Embeddings

V2.3: Multi-Channel Search

V3.0: Predictive & Analytics-Driven

Operational Notes

Indexing Strategy

Search Performance

Data Retention

Security

Related Documents