Expense report validation with AI using structured outputs and business rules

Learn how to use AI to transform unstructured documents into reliable, structured data and apply business rules to automate expense report validation and approval at scale.

Overview

This guide shows you how to build an automated expense report validation system that uses AI to extract data from PDF files and applies business rules to decide how each report is approved.

By the end, you'll have a pipeline that:

Receives a document via API.
Extracts structured data using AI.
Applies deterministic business rules.
Routes reports based on predefined thresholds.

This AI extraction applies to many document processing scenarios:

Invoice processing: Extract vendor, amount, and due date, then route the information to accounting.
Contract review: Extract key terms, relevant dates, and involved parties, and send them for legal review.
CV screening: Extract skills, professional experience, and education, and forward the profile to the hiring manager.
Insurance claims: Extract the policy number and incident details, and route the claim to an adjuster.

The core is always the same: AI extracts structured data, and business rules determine what happens next.

Pipeline setup

Configuring the trigger

The pipeline starts with an HTTP File Trigger that accepts PDF uploads. When a file is uploaded, it becomes available in the pipeline through the files[] array.

Validating the request

Validate the request before the AI step. This ensures that tokens are used only when the required file is present.

You can validate the payload with multiple approaches.

For this scenario, use the Validator v2 connector. Configure the Validator v2 with a JSON schema that requires a file name matching the pattern ^.+\\.(pdf)$.

If validation fails, use a Choice connector to route to the error path.
If validation succeeds, proceed to AI extraction.

AI extraction

Understanding structured outputs

Without structured outputs, AI might return text like:

The employee John Smith (ID: EMP001) from Sales submitted an expense report on November 15, 2024 for a domestic trip totaling $1,250.00...

This format is hard to work with. You'd need complex parsing logic, handle many variations and deal with errors.

With structured outputs, you define a JSON schema and the AI returns:

{
  "employee_id": "EMP001",
  "employee_name": "John Smith",
  "department": "Sales",
  "total": 1250.00
}

This is immediately usable for routing, validation, and storage. No parsing needed.

Configuring the Agent Component

When configuring the Agent Component, focus on the following settings:

Model selection

Model: Select the model you want to use.
- Account: Choose the account to be used by the LLM.
- Temperature: 0.3 (A low temperature helps ensure consistent and predictable extraction)
- Max Output Tokens: 1024 (Enough for most expense reports. Adjust if processing very detailed reports.)
- Other parameters: Keep Top-P, Top-K, and penalties at their default values.

Files

Files: {{ message.fileName }} (This field accepts Double Braces, so you can reference your file dynamically.)
Guardrails:
- Enable PII (Personally Identifiable Information) protection: Detects and masks sensitive personal data.
- Enable JSON Schema: Forces the LLM to return structured data in the expected structure (highly recommended).
- Enable Regex Patterns: Validates specific field formats.

For this expense extraction use case, we'll enable JSON Schema to ensure the structured output that can be used directly in routing decisions. The schema definition is shown in the next section.

Messages

System Message (defines the AI role):

You are an expense report extraction agent.
Your task is to extract structured data from the provided document.
Do not make decisions.
Do not apply business rules.
Return only the extracted data following the provided JSON schema.

User Message (the specific request):

Extract the expense report data from the attached PDF.

Defining the JSON Schema

Structured outputs add value because they keep the data consistent. Here, use the Agent as a data normalizer, not as a reasoning tool.

JSON Schema:

{
  "type": "object",
  "required": ["employee_name", "department", "total", "trip_type", "itemized_expenses"],
  "properties": {
    "employee_name": {
      "type": "string",
      "description": "Full name of the employee"
    },
    "department": {
      "type": "string",
      "description": "Department name"
    },
    "total": {
      "type": "number",
      "description": "Total expense amount"
    },
    "trip_type": {
      "type": "string",
      "enum": ["Domestic", "International"],
      "description": "Type of trip"
    },
    "itemized_expenses": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["date", "description", "category", "amount"],
        "properties": {
          "date": {"type": "string"},
          "description": {"type": "string"},
          "category": {"type": "string"},
          "amount": {"type": "number"}
        }
      }
    }
  }
}

Example output:

{
  "employee_name": "John Smith",
  "department": "Sales",
  "total": 1250.00,
  "trip_type": "Domestic",
  "itemized_expenses": [
    {
      "date": "11/05/2024",
      "description": "Hotel - Downtown (3 nights)",
      "category": "Lodging",
      "amount": 600.00
    },
    {
      "date": "11/05/2024",
      "description": "Team dinner",
      "category": "Meals",
      "amount": 150.00
    }
  ]
}

Related quickstart: Turn AI responses into a structured JSON output

The extracted data is now available in message.body and ready for routing.

Tracking token usage (Optional)

Tracking AI usage is a good practice to monitor costs. In this example, an Event Publisher was added after the Agent for consumption tracking:

{
  "timestamp": {{ NOW() }},
  "step_name": "expense-extraction",
  "model": "gpt-4o",
  "tokens": {
    "input": {{ message.tokenUsage.inputTokenCount }},
    "output": {{ message.tokenUsage.outputTokenCount }},
    "total": {{ message.tokenUsage.totalTokenCount }}
  },
  "cost_calc": {{ COST_CALCULATION }}
}

This publishes token usage to an event pipeline for cost analysis.

Business logic

Now that you have structured data from the AI, you can route reports based on business rules.

Simple routing by total amount

In most cases, routing depends on the total expense amount:

Under $500: Auto-approve
$500-$2000: Manager approval required
Over $2000: Director approval required

For this exercise, we'll use a Choice connector since we're routing based on the total field with no calculations or aggregations required.

{
  "when": [
    {
      "path": "auto-approve",
      "jsonPath": "$.[?(@.body.total < 500)]"
    },
    {
      "path": "manager-approval",
      "jsonPath": "$.[?(@.body.total >= 500 && @.body.total < 2000)]"
    },
    {
      "path": "director-approval",
      "jsonPath": "$.[?(@.body.total >= 2000)]"
    }
  ],
  "otherwise": "error"
}

After routing, each path generates an appropriate response.

Example: Auto-approve response

JSON:

{
  "status": 200,
  "decision": "APPROVED",
  "employee": {{ message.body.employee_name }},
  "department": {{ message.body.department }},
  "total": {{ message.body.total }},
  "message": "Expense report automatically approved - under $500 threshold"
}

Extensions

You can extend this solution in several ways:

Email notifications: Send emails to the correct approver after routing.
Department-specific thresholds: Apply different approval rules for different departments.
Database storage: Add a database connector after the response to log insert expense decisions.

This creates an audit trail and enables analytics on expense patterns.

Key takeaways

Structured outputs are essential: JSON Schema ensures the AI returns reliable and predictable data, not free text.
Monitor costs: Always track token usage when using LLMs. Even small PDF files can consume hundreds of tokens when processed at scale.
Fail fast: Validate inputs (file existence, file type) early to avoid wasting AI tokens. The validator connector is your first line of defense.
Combine AI and rules: Use AI to extract data (unstructured to structured), then apply deterministic business logic to make decisions. This gives you flexibility and control.

PreviousBuild secure and controlled AI Agents with Guardrails NextInsurance claim analysis with AI using a multi-agent architecture

Last updated 2 months ago

Was this helpful?

hashtagOverview

hashtagRelated patterns

hashtagPipeline setup

hashtagConfiguring the trigger

hashtagValidating the request

hashtagAI extraction

hashtagUnderstanding structured outputs

hashtagConfiguring the Agent Component

hashtagModel selection

hashtagFiles

hashtagMessages

hashtagDefining the JSON Schema

hashtagTracking token usage (Optional)

hashtagBusiness logic

hashtagSimple routing by total amount

hashtagExample: Auto-approve response

hashtagExtensions

hashtagKey takeaways

Overview

Related patterns

Pipeline setup

Configuring the trigger

Validating the request

AI extraction

Understanding structured outputs

Configuring the Agent Component

Model selection

Files

Messages

Defining the JSON Schema

Tracking token usage (Optional)

Business logic

Simple routing by total amount

Example: Auto-approve response

Extensions

Key takeaways