Skip to main content

Search & Indexing - Domain Specification

First Introduced: V1.2 Status: Specification Complete Last Updated: 2025-10-25


Overview

Search & Indexing provides the query and ranking infrastructure that enables users, partners, and internal tools to find relevant listings efficiently and accurately. This domain bridges operational data (Spaces, Attributes, Availability, Pricing, Tags) into materialized indexes designed for fast, filterable retrieval — whether searching for "4-bedroom oceanfront villas in Barbados" or syncing results for a partner channel feed.

The domain separates the concerns of search data preparation (indexing) from search execution (querying), enabling independent scaling of write-heavy reindexing operations and read-heavy query workloads.


Responsibilities

This domain IS responsible for:

  • Maintaining denormalized search indexes from authoritative domain data
  • Processing search queries with filters, facets, and sorting
  • Managing search relevance and ranking algorithms
  • Tracking search query logs for analytics and optimization
  • Supporting geospatial and full-text search capabilities
  • Ensuring search index freshness through event-driven updates
  • Providing faceted filtering and aggregation support

This domain is NOT responsible for:

  • Authoritative storage of Space, Pricing, or Availability data (→ respective domains)
  • User preference tracking or personalization (→ Analytics domain)
  • Content generation or description text (→ Content domain)
  • Access control enforcement (→ Authorization domain)
  • Search UI rendering (→ Frontend applications)

Relationships

Depends On:

Depended On By:

  • Channels - Search results syndicated to partner feeds
  • Analytics - Query logs feed behavioral analytics
  • Public search APIs and internal admin tools

Related Domains:


Core Concepts

Entity: SearchIndex

Purpose: Flattened, query-optimized representation of a Space or Unit designed for fast keyword and filter-based retrieval.

Key Attributes:

  • id (UUID, primary key)
  • space_id (UUID, foreign key → spaces.id, unique)
  • org_id (UUID, foreign key → organizations.id)
  • account_id (UUID, foreign key → accounts.id)
  • name (TEXT) - Space name for text search
  • headline (TEXT) - Short description
  • description_text (TEXT) - Full searchable description content
  • location_lat, location_lng (DECIMAL) - Geospatial coordinates
  • country (VARCHAR) - ISO country code
  • region (VARCHAR) - State/province/region
  • city (VARCHAR) - City name
  • tags (TEXT[]) - Array of tag keys
  • amenities (TEXT[]) - Array of amenity keys
  • price_min (INTEGER) - Minimum nightly rate in minor units (cents)
  • price_max (INTEGER) - Maximum nightly rate in minor units (cents)
  • capacity (INTEGER) - Maximum guest capacity
  • bedrooms (INTEGER) - Number of bedrooms
  • bathrooms (DECIMAL) - Number of bathrooms
  • property_type (VARCHAR) - villa | apartment | house | etc.
  • status (ENUM) - active | draft | inactive | deleted
  • featured (BOOLEAN) - Featured listing flag
  • search_vector (TSVECTOR) - PostgreSQL full-text search vector
  • last_indexed_at (TIMESTAMP) - When this record was last updated
  • source_version (INTEGER) - Version of source data indexed
  • created_at, updated_at (timestamps)

Relationships:

  • SearchIndex → Space (1:1, one index record per space)
  • SearchIndex → Org/Account (many-to-one for tenancy)

Lifecycle:

  • Created: When Space becomes active or searchable
  • Updated: On Space, Content, Pricing, or Availability changes (event-driven)
  • Deleted: When Space deleted or status changed to non-searchable

Business Rules:

  • Only Spaces with status='active' appear in public search results
  • Index must include org_id and account_id for authorization filtering
  • Full reindex required when schema changes
  • Stale indexes (>24h old) flagged for refresh

Entity: DiscoveryIndex

Purpose: Aggregated, query-ready snapshot combining availability, pricing calendars, and dynamic metadata for advanced filtering.

Key Attributes:

  • id (UUID, primary key)
  • space_id (UUID, foreign key → spaces.id, unique)
  • org_id (UUID, foreign key → organizations.id)
  • account_id (UUID, foreign key → accounts.id)
  • available_dates (DATE[]) - Array of available dates (next 180 days)
  • price_calendar (JSONB) - Date → price mapping: {"2025-11-01": 25000, "2025-11-02": 28000}
  • availability_score (DECIMAL) - Percentage available (0.0-1.0)
  • instant_bookable (BOOLEAN) - No owner approval required
  • min_stay_nights (INTEGER) - Minimum length of stay
  • max_stay_nights (INTEGER) - Maximum length of stay
  • check_in_days (INTEGER[]) - Day of week codes (0=Sunday, 6=Saturday)
  • attributes (JSONB) - Structured property attributes from Content domain
  • facets (JSONB) - Pre-computed facet values for filtering
  • rank_score (DECIMAL) - Computed relevance/quality score
  • last_availability_sync (TIMESTAMP) - When availability was last computed
  • last_pricing_sync (TIMESTAMP) - When pricing was last computed
  • created_at, updated_at (timestamps)

Relationships:

  • DiscoveryIndex → Space (1:1)
  • DiscoveryIndex → AvailabilityCalendar (derived, many-to-one)
  • DiscoveryIndex → RatePlan (derived, many-to-one)

Lifecycle:

  • Created: When Space activated for discovery
  • Updated: Nightly batch for availability; event-driven for pricing changes
  • Deleted: When Space removed from search

Business Rules:

  • available_dates refreshed daily for next 180-day window
  • price_calendar includes only base nightly rates (excludes fees)
  • availability_score computed as: (available days / 180)
  • Stale availability data (>48h) triggers warning flag

Entity: SearchQuery

Purpose: Log of executed search queries for analytics, optimization, and debugging.

Key Attributes:

  • id (UUID, primary key)
  • org_id (UUID, foreign key → organizations.id, nullable for public searches)
  • user_id (UUID, foreign key → users.id, nullable for anonymous)
  • session_id (VARCHAR) - Session identifier for tracking
  • query_text (TEXT) - Freeform search text (if any)
  • filters (JSONB) - Applied filters: {"country": "MX", "bedrooms": {">=": 3}}
  • sort_by (VARCHAR) - price | capacity | relevance | featured
  • sort_order (ENUM) - asc | desc
  • page (INTEGER) - Pagination page number
  • page_size (INTEGER) - Results per page
  • result_count (INTEGER) - Total matching results
  • result_ids (UUID[]) - Space IDs returned (first page)
  • latency_ms (INTEGER) - Query execution time
  • clicked_result_id (UUID, nullable) - Which result user clicked
  • click_position (INTEGER, nullable) - Position in results (1-indexed)
  • converted_to_booking (BOOLEAN) - Whether search led to booking
  • search_source (VARCHAR) - web | mobile | api | partner
  • executed_at (TIMESTAMP)
  • created_at (timestamp)

Relationships:

  • SearchQuery → User (many-to-one, optional)
  • SearchQuery → Org (many-to-one, optional for scoped searches)
  • SearchQuery → Space (many-to-many via result_ids)

Lifecycle:

  • Created: On every search query execution
  • Updated: When user clicks result or converts to booking
  • Retained: 90 days for analytics; aggregated to metrics afterward

Business Rules:

  • Log all queries regardless of result count (including zero results)
  • Store click and conversion events for relevance tuning
  • Anonymize personal data after retention period
  • Index performance data used for query optimization

Entity: SearchSynonym

Purpose: Manages query expansion rules to improve search recall by mapping alternative terms.

Key Attributes:

  • id (UUID, primary key)
  • org_id (UUID, foreign key → organizations.id, nullable for global)
  • term (VARCHAR, required) - Source search term
  • synonyms (TEXT[]) - Equivalent terms: ["pool", "swimming pool", "private pool"]
  • type (ENUM) - synonym | expansion | spelling_correction
  • language_code (VARCHAR, default 'en') - Language scope
  • is_bidirectional (BOOLEAN) - Apply synonym in both directions
  • is_active (BOOLEAN) - Enable/disable rule
  • created_by (UUID, foreign key → users.id)
  • created_at, updated_at (timestamps)

Relationships:

  • SearchSynonym → Org (many-to-one, optional for global rules)
  • SearchSynonym → User (many-to-one for audit)

Lifecycle:

  • Created: By admin or data team based on query analysis
  • Updated: When refining synonym sets or enabling/disabling
  • Deleted: Soft delete via is_active=false

Business Rules:

  • Global synonyms apply to all searches; org-specific override globals
  • Bidirectional synonyms create symmetric expansion
  • Maximum 20 synonyms per term to prevent explosion
  • Applied during query parsing before index lookup

Entity: Facet

Purpose: Defines filterable dimensions and their available values for faceted search interfaces.

Key Attributes:

  • id (UUID, primary key)
  • facet_key (VARCHAR, unique) - System identifier: country | bedrooms | amenities
  • display_name (VARCHAR) - Human-readable label
  • facet_type (ENUM) - categorical | numeric_range | boolean | date_range
  • source_field (VARCHAR) - SearchIndex field name
  • value_source (ENUM) - static | dynamic | computed
  • allowed_values (JSONB) - Predefined values for categorical facets
  • sort_order (INTEGER) - Display order in UI
  • is_filterable (BOOLEAN) - Appears in filter UI
  • is_sortable (BOOLEAN) - Can be used for sorting
  • aggregation_type (VARCHAR) - count | sum | avg | min | max
  • created_at, updated_at (timestamps)

Relationships:

  • Facet → SearchIndex (defines which fields are faceted)
  • Facet values computed dynamically from SearchIndex

Lifecycle:

  • Created: By system configuration or admin
  • Updated: When adding new facet dimensions
  • Static: Core facets (country, bedrooms) seeded at deployment

Business Rules:

  • Categorical facets show available value counts
  • Numeric range facets show min/max bounds
  • Facet values filtered by current query scope
  • Disabled facets hidden from UI but preserve data

Workflows

Workflow: Reindex Space

Trigger: Space, Content, Pricing, or Availability update event

  1. Receive event from domain event bus
  2. Fetch source data:
    • Space entity (name, location, status, capacity)
    • Description text from Content domain
    • Amenities and Attributes from Content domain
    • Price min/max from active RatePlan
    • Tags from Space-Tag relationships
  3. Transform to search document:
    • Flatten nested structures
    • Generate full-text search vector
    • Compute derived fields (price range, availability score)
  4. Upsert to SearchIndex:
    • Update if exists, insert if new
    • Set last_indexed_at timestamp
    • Increment source_version
  5. Update DiscoveryIndex (if applicable):
    • Refresh availability flags if calendar changed
    • Update price calendar if rates changed
  6. Emit reindex.completed event
  7. Log to ReindexTask table with status and duration

Postconditions:

  • SearchIndex record reflects current source data state
  • Index marked with timestamp for freshness monitoring
  • Changes immediately queryable (eventual consistency: <5 seconds)

Workflow: Execute Search Query

Trigger: API request with search parameters

  1. Parse and validate request:
    • Extract query text, filters, sort, pagination
    • Validate filter syntax and values
    • Apply authorization scope (org_id, account_id)
  2. Apply query expansion:
    • Look up synonyms for query terms
    • Expand to include equivalent terms
  3. Build search query:
    • Full-text match on search_vector (if query text provided)
    • Apply filters: country, region, price range, bedrooms, amenities, tags
    • Apply availability filter (if date range specified)
    • Apply sort order
  4. Execute against SearchIndex + DiscoveryIndex:
    • PostgreSQL: Use FTS + GIN indexes + JSONB operators
    • Elasticsearch: Use query DSL with filters and aggregations
  5. Compute facets:
    • Aggregate available values for each facet dimension
    • Count results per facet value
  6. Apply pagination:
    • Offset and limit results
  7. Return results:
    • Space IDs, names, hero images, summary data
    • Facet counts for UI
    • Total result count
  8. Log query to SearchQuery table:
    • Capture filters, result count, latency
    • Associate with user/session if available

Postconditions:

  • Results returned to caller within SLA (<200ms P95)
  • Query logged for analytics
  • Facets reflect current data state

Workflow: Nightly Availability Sync

Trigger: Scheduled job (runs daily at 2:00 AM UTC)

  1. Select all active Spaces with status='active'
  2. For each Space:
    • Query AvailabilityCalendar for next 180 days
    • Identify available dates (no Bookings, Holds, or Blocks)
    • Compute availability_score (available days / 180)
    • Update DiscoveryIndex.available_dates array
    • Set last_availability_sync timestamp
  3. Batch commit updates (1000 spaces per transaction)
  4. Emit availability_sync.completed event
  5. Log metrics:
    • Total spaces processed
    • Duration
    • Errors (if any)

Postconditions:

  • DiscoveryIndex reflects current availability state
  • Date-based availability filters accurate for next 180 days
  • Monitoring alerted if sync exceeds SLA (>30 minutes)

Business Rules

  1. Index Freshness: SearchIndex updates must complete within 5 seconds of source data change (P95)
  2. Availability Window: DiscoveryIndex tracks next 180 days; older data pruned
  3. Tenant Isolation: All search queries filtered by authorized org_id/account_id scope
  4. Public vs. Private: Only Spaces with status='active' appear in public search
  5. Result Limits: Maximum 1000 results per query; use pagination for larger sets
  6. Facet Limits: Maximum 100 unique values per facet dimension shown in UI
  7. Query Timeout: Search queries must complete within 2 seconds or return partial results
  8. Reindex Priority: Price/availability changes processed within 1 minute; content changes within 5 minutes
  9. Synonym Precedence: Org-specific synonyms override global synonyms
  10. Zero Results: Log all zero-result queries for synonym/expansion tuning

Version Progression

MVP.0: OUT OF SCOPE (Deferred)

Status: Search functionality deferred to post-launch

Rationale:

  • Initial launch supports direct listing URLs and admin panel
  • Manual curation replaces algorithmic discovery
  • Reduces MVP complexity and database load

Workaround:

  • Admin lists all properties in dashboard
  • Shared listing links via direct Space URLs
  • Manual recommendation by TVL team

V1.2: Full-Text Search with Elasticsearch

Status: Initial Search Implementation

Features:

  • PostgreSQL full-text search (FTS) using tsvector and GIN indexes
  • Basic filters: country, region, price range, bedrooms, bathrooms, capacity
  • Tag-based filtering (e.g., "Oceanfront", "Private Chef")
  • Amenity filtering (e.g., pool, wifi, parking)
  • Simple text search on name, headline, description
  • Event-driven reindexing from domain events
  • Query logging to SearchQuery table
  • Nightly availability sync to DiscoveryIndex

Entities Implemented:

  • search_indexes (core denormalized search table)
  • discovery_indexes (availability and pricing snapshots)
  • search_queries (query logs for analytics)

Search Backend:

  • PostgreSQL FTS for text search
  • JSONB and array operators for filters
  • Option to migrate to Elasticsearch if performance requires

Indexing Strategy:

  • Event-driven updates via domain event bus
  • Nightly batch sync for availability
  • Incremental updates for price changes

API Endpoints:

  • GET /api/search - Execute search with filters
  • GET /api/search/facets - Get available facet values
  • POST /admin/search/reindex - Trigger manual reindex

Performance Targets:

  • P50 latency: <100ms
  • P95 latency: <200ms
  • Index lag: <5 seconds for critical updates

V2.0: Faceted Search & Geospatial Queries

Status: Advanced Search Features

New Features:

  • Geospatial search:
    • Radius-based queries ("within 10km of coordinates")
    • Polygon boundary searches ("listings in this area")
    • Geohash indexing for spatial performance
    • PostGIS or Elasticsearch geo queries
  • Faceted filtering:
    • Pre-computed facet counts (avoid expensive aggregations)
    • Facet values scoped to current filter context
    • Multi-select facets with AND/OR logic
    • Numeric range sliders (price, capacity)
  • Advanced text search:
    • Multi-language stemming and tokenization
    • Phrase matching and proximity search
    • Boosted fields (name > headline > description)
    • Fuzzy matching for typo tolerance
  • Search relevance tuning:
    • Configurable field weights
    • Freshness boosting (recently updated listings)
    • Quality signals (booking rate, reviews)
    • Featured listing promotion
  • Synonym management:
    • search_synonyms table populated with common variants
    • Admin UI for synonym editing
    • A/B testing synonym rule impact
  • Real-time availability:
    • Per-day availability pricing in DiscoveryIndex
    • Instant calendar-aware search (no nightly delay)
    • Blocking rule awareness (min/max stay, check-in days)

New Entities:

  • facets (facet dimension definitions)
  • search_synonyms (query expansion rules)
  • search_index_queue (prioritized reindex job queue)

Performance Targets:

  • P95 latency: <150ms (with facets)
  • Support 10,000+ active listings
  • Real-time index updates (<1 second lag)

Migration to Elasticsearch:

  • Full index rebuild with Elasticsearch schema
  • Dual-write strategy during migration
  • A/B testing PostgreSQL vs. Elasticsearch performance
  • Rollback plan if issues detected

Future Enhancements

  • User preference tracking (past searches, bookings)
  • Collaborative filtering recommendations
  • Behavioral ranking adjustments
  • Saved searches and alerts

V2.2: Vector Search & Embeddings

  • Semantic search using embeddings ("luxury modern villa")
  • Image similarity search
  • AI-generated search suggestions
  • Natural language query parsing
  • Channel-specific ranking and filtering
  • Partner API endpoints with custom facets
  • White-label search configuration per brand
  • Cross-org federated search (optional)

V3.0: Predictive & Analytics-Driven

  • Predicted availability based on booking patterns
  • Dynamic pricing integration (show real-time rates)
  • Search-to-book conversion optimization
  • Automated A/B testing of ranking algorithms

Operational Notes

Indexing Strategy

Event-Driven Updates:

  • Subscribe to domain events: space.updated, pricing.changed, availability.blocked, content.published
  • Queue reindex jobs with priority (availability > pricing > content)
  • Batch process jobs to reduce database load
  • Idempotent reindex operations (safe to retry)

Batch Sync:

  • Nightly full sync of availability (DiscoveryIndex)
  • Weekly full reindex verification (detect drift)
  • Monthly search index rebuild (schema evolution)

Monitoring:

  • Track reindex queue depth and lag
  • Alert on failed reindex jobs
  • Monitor index freshness per Space
  • Dashboard: avg index age, reindex throughput

Search Performance

Database Indexes (PostgreSQL):

CREATE INDEX idx_search_indexes_org_account ON search_indexes(org_id, account_id);
CREATE INDEX idx_search_indexes_status ON search_indexes(status) WHERE status = 'active';
CREATE INDEX idx_search_indexes_location ON search_indexes USING gist(ll_to_earth(location_lat, location_lng));
CREATE INDEX idx_search_indexes_fts ON search_indexes USING gin(search_vector);
CREATE INDEX idx_search_indexes_price ON search_indexes(price_min, price_max);
CREATE INDEX idx_search_indexes_capacity ON search_indexes(bedrooms, capacity);
CREATE INDEX idx_search_indexes_tags ON search_indexes USING gin(tags);
CREATE INDEX idx_search_indexes_amenities ON search_indexes USING gin(amenities);

CREATE INDEX idx_discovery_indexes_space ON discovery_indexes(space_id);
CREATE INDEX idx_discovery_indexes_available_dates ON discovery_indexes USING gin(available_dates);
CREATE INDEX idx_discovery_indexes_facets ON discovery_indexes USING gin(facets);

CREATE INDEX idx_search_queries_executed ON search_queries(executed_at DESC);
CREATE INDEX idx_search_queries_user_session ON search_queries(user_id, session_id);

Caching:

  • Cache popular searches in Redis (TTL: 5 minutes)
  • Cache facet counts for common filter combinations
  • Cache geo-boundary queries (city, region)
  • Invalidate cache on index updates

Query Optimization:

  • Limit full-text search to top 1000 results
  • Use pagination with cursor-based offsets for deep pages
  • Pre-filter by indexed columns before FTS
  • Avoid expensive JSONB queries in hot path

Data Retention

SearchQuery Logs:

  • Retain raw queries: 90 days
  • Aggregate to daily metrics: indefinite
  • Anonymize user_id after 30 days (GDPR compliance)
  • Export to data warehouse monthly

Index History:

  • Keep source_version for last 7 versions
  • Archive old indexes on schema migration
  • Retain reindex job logs: 30 days

Security

Authorization:

  • All queries filtered by org_id and account_id scope
  • Respect user Membership permissions
  • Public search shows only status='active' Spaces
  • Admin search shows all statuses (draft, inactive)

Data Exposure:

  • Sanitize error messages (no SQL leakage)
  • Rate limit search API (100 requests/minute per IP)
  • Log suspicious query patterns (SQL injection attempts)