Provider pricing management

Claude Code Hub includes a comprehensive pricing system that tracks API usage costs across different models and providers. This system enables accurate cost calculation, spending limits, and billing transparency for administrators.

Overview

The pricing system serves several key purposes:

Cost tracking: Calculate and track API usage costs for each request across different models and providers
Rate limiting: Enable spending limits at provider and system levels based on calculated costs
Billing transparency: Provide clear visibility into pricing for administrators
Multi-provider support: Handle different pricing models from various providers (Anthropic, OpenAI, Gemini, and others)
Custom pricing: Allow administrators to set custom prices through manual price entries that override cloud-synced prices

Pricing models supported

The system handles multiple pricing structures:

Pricing model	Description	Example use case
Per-token pricing	Input and output tokens billed separately	Standard LLM API pricing
Per-request fixed fee	Flat fee per request regardless of tokens	Embedding models
Tiered pricing	Different rates above token thresholds	Gemini 200K+ context
Cache pricing	Separate rates for cache creation and reads	Anthropic prompt caching
Image generation	Per-image or per-token pricing	DALL-E, image models
Provider multipliers	Scale costs up or down per provider	Markup or discount pricing

Core architecture

Model price storage

Model prices are stored in the model_prices database table with the following structure:

export const modelPrices = pgTable('model_prices', {
  id: serial('id').primaryKey(),
  modelName: varchar('model_name').notNull(),
  priceData: jsonb('price_data').notNull(),
  // Price source: 'litellm' = synced from cloud, 'manual' = manually added
  source: varchar('source', { length: 20 }).notNull().default('litellm'),
  createdAt: timestamp('created_at', { withTimezone: true }).defaultNow(),
  updatedAt: timestamp('updated_at', { withTimezone: true }).defaultNow(),
});

Key design decisions:

JSONB price_data: Flexible schema accommodates different pricing models without schema migrations
Source tracking: Distinguishes between cloud-synced (litellm) and manually-added (manual) prices
Time-based versioning: Multiple records per model allow price history tracking
Manual priority: When fetching latest prices, manual entries take precedence over cloud-synced prices

Price data structure

The ModelPriceData interface defines all supported pricing fields:

export interface ModelPriceData {
  // Base pricing
  input_cost_per_token?: number;
  output_cost_per_token?: number;
  input_cost_per_request?: number; // Fixed fee per request

  // Cache pricing
  cache_creation_input_token_cost?: number;
  cache_creation_input_token_cost_above_1hr?: number;
  cache_read_input_token_cost?: number;

  // 200K tiered pricing (for Gemini, etc.)
  input_cost_per_token_above_200k_tokens?: number;
  output_cost_per_token_above_200k_tokens?: number;
  cache_creation_input_token_cost_above_200k_tokens?: number;
  cache_read_input_token_cost_above_200k_tokens?: number;

  // Image generation pricing
  output_cost_per_image?: number;
  output_cost_per_image_token?: number;
  input_cost_per_image?: number;
  input_cost_per_image_token?: number;

  // Search context pricing (for search-enabled models)
  search_context_cost_per_query?: {
    search_context_size_low?: number;
    search_context_size_medium?: number;
    search_context_size_high?: number;
  };

  // Model metadata
  display_name?: string;
  litellm_provider?: string;
  providers?: string[];
  max_input_tokens?: number;
  max_output_tokens?: number;
  max_tokens?: number;
  mode?: "chat" | "image_generation" | "completion";

  // Capability flags
  supports_assistant_prefill?: boolean;
  supports_computer_use?: boolean;
  supports_function_calling?: boolean;
  supports_pdf_input?: boolean;
  supports_prompt_caching?: boolean;
  supports_reasoning?: boolean;
  supports_response_schema?: boolean;
  supports_tool_choice?: boolean;
  supports_vision?: boolean;

  // Other
  tool_use_system_prompt_tokens?: number;
  [key: string]: unknown;
}

Price retrieval logic

The system uses PostgreSQL's DISTINCT ON to get the latest price for each model, with manual prices taking precedence:

const query = sql`
  SELECT DISTINCT ON (model_name)
    id,
    model_name as "modelName",
    price_data as "priceData",
    source,
    created_at as "createdAt",
    updated_at as "updatedAt"
  FROM model_prices
  ORDER BY
    model_name,
    (source = 'manual') DESC,  -- Manual prices first
    created_at DESC NULLS LAST,
    id DESC
`;

The (source = 'manual') DESC clause ensures that manual prices always take precedence over cloud-synced prices, even if the cloud price is newer.

Cost calculation

Request cost calculation

The calculateRequestCost function computes the total cost for a request by summing multiple cost segments:

export function calculateRequestCost(
  usage: UsageMetrics,
  priceData: ModelPriceData,
  multiplier: number = 1.0,
  context1mApplied: boolean = false
): Decimal {
  const segments: Decimal[] = [];

  // 1. Fixed per-request fee
  if (typeof inputCostPerRequest === "number") {
    segments.push(toDecimal(inputCostPerRequest));
  }

  // 2. Input tokens (with tiered pricing support)
  if (context1mApplied && inputCostPerToken != null) {
    // Claude 1M context: use multiplier-based tiering
    segments.push(calculateTieredCost(
      usage.input_tokens,
      inputCostPerToken,
      CONTEXT_1M_INPUT_PREMIUM_MULTIPLIER
    ));
  } else if (inputAbove200k != null) {
    // Gemini: use separate price fields
    segments.push(calculateTieredCostWithSeparatePrices(
      usage.input_tokens,
      inputCostPerToken,
      inputAbove200k
    ));
  } else {
    // Standard calculation
    segments.push(multiplyCost(usage.input_tokens, inputCostPerToken));
  }

  // 3. Output tokens (similar tiered logic)
  // ...

  // 4. Cache creation (5min and 1hour TTL)
  // ...

  // 5. Cache read
  // ...

  // 6. Image tokens
  // ...

  // Apply provider multiplier
  const total = segments.reduce((acc, seg) => acc.plus(seg), new Decimal(0));
  return total.mul(multiplier).toDecimalPlaces(COST_SCALE);
}

Usage metrics

The cost calculation accepts a UsageMetrics type that tracks all token usage categories:

type UsageMetrics = {
  input_tokens?: number;
  output_tokens?: number;
  cache_creation_input_tokens?: number;      // Generic cache creation (legacy)
  cache_creation_5m_input_tokens?: number;   // 5-minute TTL cache
  cache_creation_1h_input_tokens?: number;   // 1-hour TTL cache
  cache_ttl?: "5m" | "1h" | "mixed";
  cache_read_input_tokens?: number;
  input_image_tokens?: number;   // Image modality input tokens
  output_image_tokens?: number;  // Image modality output tokens
};

Tiered pricing

For Claude 1M context models, tiered pricing uses multipliers:

export const CONTEXT_1M_TOKEN_THRESHOLD = 200000;
export const CONTEXT_1M_INPUT_PREMIUM_MULTIPLIER = 2.0;   // 2x for >200k
export const CONTEXT_1M_OUTPUT_PREMIUM_MULTIPLIER = 1.5;  // 1.5x for >200k

For Gemini models, separate price fields are used:

// input_cost_per_token for <=200k
// input_cost_per_token_above_200k_tokens for >200k

Cache price fallbacks

When cache prices are not explicitly set, the system uses fallback calculations:

const cacheCreation5mCost =
  priceData.cache_creation_input_token_cost ??
  (inputCostPerToken != null ? inputCostPerToken * 1.25 : undefined);

const cacheCreation1hCost =
  priceData.cache_creation_input_token_cost_above_1hr ??
  (inputCostPerToken != null ? inputCostPerToken * 2 : undefined) ??
  cacheCreation5mCost;

const cacheReadCost =
  priceData.cache_read_input_token_cost ??
  (inputCostPerToken != null
    ? inputCostPerToken * 0.1
    : outputCostPerToken != null
      ? outputCostPerToken * 0.1
      : undefined);

Provider cost multiplier

Each provider can have a costMultiplier that scales the final cost:

export const providers = pgTable('providers', {
  // ...
  costMultiplier: numeric('cost_multiplier', { precision: 10, scale: 4 })
    .default('1.0'),
  // ...
});

This multiplier is applied at the end of cost calculation, allowing:

Markup pricing (multiplier > 1.0)
Discount pricing (multiplier < 1.0)
Pass-through pricing (multiplier = 1.0)

Price synchronization

Cloud price table

The system syncs prices from a cloud-hosted TOML file:

export const CLOUD_PRICE_TABLE_URL =
  "https://claude-code-hub.app/config/prices-base.toml";

The sync process includes security hardening to verify no unexpected redirects:

export async function fetchCloudPriceTableToml(
  url: string = CLOUD_PRICE_TABLE_URL
): Promise<CloudPriceTableResult<string>> {
  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS);

  try {
    const response = await fetch(url, {
      signal: controller.signal,
      headers: { Accept: "text/plain" },
      cache: "no-store",
    });

    // Security: Verify no unexpected redirects
    if (expectedUrl && typeof response.url === "string" && response.url) {
      const finalUrl = new URL(response.url);
      if (
        finalUrl.protocol !== expectedUrl.protocol ||
        finalUrl.host !== expectedUrl.host ||
        finalUrl.pathname !== expectedUrl.pathname
      ) {
        return { ok: false, error: "Cloud price table fetch failed: unexpected redirect" };
      }
    }

    const tomlText = await response.text();
    return { ok: true, data: tomlText };
  } catch (error) {
    return { ok: false, error: `Fetch failed: ${message}` };
  }
}

Processing price updates

The processPriceTableInternal function handles bulk price updates:

export async function processPriceTableInternal(
  jsonContent: string,
  overwriteManual?: string[]
): Promise<ActionResult<PriceUpdateResult>> {
  const priceTable: PriceTableJson = JSON.parse(jsonContent);

  // Get existing manual prices for conflict detection
  const manualPrices = await findAllManualPrices();
  const overwriteSet = new Set(overwriteManual ?? []);

  for (const [modelName, priceData] of entries) {
    // Skip manual prices not in overwrite list
    const isManualPrice = manualPrices.has(modelName);
    if (isManualPrice && !overwriteSet.has(modelName)) {
      result.skippedConflicts?.push(modelName);
      result.unchanged.push(modelName);
      continue;
    }

    if (!existingPrice) {
      await createModelPrice(modelName, priceData, "litellm");
      result.added.push(modelName);
    } else if (!isPriceDataEqual(existingPrice.priceData, priceData)) {
      // Delete old record first if overwriting manual
      if (isManualPrice && overwriteSet.has(modelName)) {
        await deleteModelPriceByName(modelName);
      }
      await createModelPrice(modelName, priceData, "litellm");
      result.updated.push(modelName);
    } else {
      result.unchanged.push(modelName);
    }
  }
}

Sync throttling

To prevent excessive sync requests, the system implements throttling:

const DEFAULT_THROTTLE_MS = 5 * 60 * 1000; // 5 minutes

export function requestCloudPriceTableSync(options: {
  reason: "missing-model" | "scheduled" | "manual";
  throttleMs?: number;
}): void {
  const lastAt = g.__CCH_CLOUD_PRICE_SYNC_LAST_AT__ ?? 0;
  const now = Date.now();
  if (now - lastAt < throttleMs) {
    return; // Skip if within throttle period
  }

  // Deduplication: Check if task already running
  if (g.__CCH_CLOUD_PRICE_SYNC_SCHEDULING__) {
    return;
  }
  // ...
}

Manual price management

Creating and updating manual prices

Administrators can manually add or update prices via the UI. The system uses a delete-then-insert pattern:

export async function upsertModelPrice(
  modelName: string,
  priceData: ModelPriceData
): Promise<ModelPrice> {
  return await db.transaction(async (tx) => {
    // Delete all old records for this model
    await tx.delete(modelPrices)
      .where(eq(modelPrices.modelName, modelName));

    // Insert new record with source='manual'
    const [price] = await tx
      .insert(modelPrices)
      .values({
        modelName: modelName,
        priceData: priceData,
        source: "manual",
      })
      .returning();
    return toModelPrice(price);
  });
}

Important: This removes ALL historical prices for the model and creates a single new manual record.

Conflict detection

Before syncing from the cloud, the system checks for conflicts with manual prices:

export async function checkLiteLLMSyncConflicts(): Promise<
  ActionResult<SyncConflictCheckResult>
> {
  const tomlResult = await fetchCloudPriceTableToml();
  const parseResult = parseCloudPriceTableToml(tomlResult.data);
  const priceTable: PriceTableJson = parseResult.data.models;

  // Get all manual prices from database
  const manualPrices = await findAllManualPrices();

  // Build conflict list
  const conflicts: SyncConflict[] = [];
  for (const [modelName, manualPrice] of manualPrices) {
    const litellmPrice = priceTable[modelName];
    if (litellmPrice && typeof litellmPrice === "object" &&
        "mode" in litellmPrice) {
      conflicts.push({
        modelName,
        manualPrice: manualPrice.priceData,
        litellmPrice: litellmPrice as ModelPriceData,
      });
    }
  }

  return {
    ok: true,
    data: {
      hasConflicts: conflicts.length > 0,
      conflicts,
    },
  };
}

Price comparison logic

When checking if a price has changed during sync, the system uses deep equality comparison with numeric precision handling:

function isPriceDataEqual(
  existing: ModelPriceData,
  incoming: ModelPriceData
): boolean {
  const keys = new Set([
    ...Object.keys(existing),
    ...Object.keys(incoming),
  ]);

  for (const key of keys) {
    const existingVal = existing[key];
    const incomingVal = incoming[key];

    // Handle numeric comparisons with precision
    if (typeof existingVal === "number" && typeof incomingVal === "number") {
      if (Math.abs(existingVal - incomingVal) > 1e-15) {
        return false;
      }
      continue;
    }

    // Direct comparison for non-numeric values
    if (existingVal !== incomingVal) {
      return false;
    }
  }

  return true;
}

This ensures that minor floating-point differences don't trigger unnecessary updates while still detecting actual price changes.

Provider rate limits

Providers can have spending limits configured:

export const providers = pgTable('providers', {
  // ...
  limit5hUsd: numeric('limit_5h_usd', { precision: 10, scale: 2 }),
  limitDailyUsd: numeric('limit_daily_usd', { precision: 10, scale: 2 }),
  dailyResetMode: dailyResetModeEnum('daily_reset_mode')
    .default('fixed')
    .notNull(), // 'fixed' or 'rolling'
  dailyResetTime: varchar('daily_reset_time', { length: 5 })
    .default('00:00')
    .notNull(), // HH:mm format (only used in fixed mode)
  limitWeeklyUsd: numeric('limit_weekly_usd', { precision: 10, scale: 2 }),
  limitMonthlyUsd: numeric('limit_monthly_usd', { precision: 10, scale: 2 }),
  limitTotalUsd: numeric('limit_total_usd', { precision: 10, scale: 2 }),
  totalCostResetAt: timestamp('total_cost_reset_at', { withTimezone: true }),
  limitConcurrentSessions: integer('limit_concurrent_sessions')
    .default(0),
  // ...
});

Daily reset modes

Fixed: Resets at a specific time every day (configured by dailyResetTime)
Rolling: Uses a 24-hour sliding window

Edge cases and handling

Missing price data

When a model has no price data:

Cost calculation returns 0
Request is still processed
Admin can trigger async price sync via requestCloudPriceTableSync({ reason: "missing-model" })

Cache token derivation

When cache tokens are reported without TTL separation:

if (typeof usage.cache_creation_input_tokens === "number") {
  const remaining = usage.cache_creation_input_tokens -
    (cache5mTokens ?? 0) - (cache1hTokens ?? 0);

  if (remaining > 0) {
    const target = usage.cache_ttl === "1h" ? "1h" : "5m";
    if (target === "1h") {
      cache1hTokens = (cache1hTokens ?? 0) + remaining;
    } else {
      cache5mTokens = (cache5mTokens ?? 0) + remaining;
    }
  }
}

Image token pricing

Image tokens have fallback logic:

// Output image tokens
if (usage.output_image_tokens != null && usage.output_image_tokens > 0) {
  const imageCostPerToken =
    priceData.output_cost_per_image_token ??
    priceData.output_cost_per_token;
  segments.push(multiplyCost(usage.output_image_tokens, imageCostPerToken));
}

// Input image tokens
if (usage.input_image_tokens != null && usage.input_image_tokens > 0) {
  const imageCostPerToken =
    priceData.input_cost_per_image_token ??
    priceData.input_cost_per_token;
  segments.push(multiplyCost(usage.input_image_tokens, imageCostPerToken));
}

Cost precision

Costs are stored with high precision (15 decimal places) for accurate billing:

export const COST_SCALE = 15;
export const COST_DISPLAY_SCALE = 6;

UI capabilities

Price list

The price list component provides extensive functionality:

Column display:

Model name with capability icons
Input/output prices (displayed per million tokens)
Cache read and creation prices
Last updated timestamp
Source badge (Local/Cloud)

Capability icons (9 types):

Function calling
Tool choice
Response schema
Prompt caching
Vision
PDF input
Reasoning
Computer use
Assistant prefill

Quick filters:

All prices
Local (manual) only
Anthropic models
OpenAI models
Vertex AI models

Pagination options: 20, 50, 100, or 200 items per page

Conflict resolution

The sync conflict dialog allows administrators to:

Review conflicts: See side-by-side comparison of manual vs cloud prices
Search conflicts: Filter by model name
Selective overwrite: Choose which models to update
Price diff view: Detailed comparison showing:
- Input price changes
- Output price changes
- Image price changes
- Provider changes
- Mode changes

Color coding:

Red: Current manual price
Green: Cloud price that would replace it

API endpoints

The pricing system exposes the following API endpoints (admin-only, GET only):

Endpoint	Description
`/api/prices`	Price list endpoint
`/api/prices/cloud-model-count`	Cloud price table model count

Best practices

Regular syncs: Schedule periodic price syncs to stay current with provider pricing changes
Manual overrides: Use manual prices sparingly and document why they are needed
Conflict review: Always review conflicts before overwriting manual prices
Provider multipliers: Use multipliers for consistent markup/discount across all models from a provider
Monitoring: Monitor cost calculations for anomalies that might indicate pricing errors

Provider management — Configure providers and their settings
Rate limiting — Understand spending limits and quotas
Session management — Track usage and costs per session