Data Import - Infrahub

Infrahub supports multiple methods for importing data: YAML files, GraphQL mutations, and the Python SDK. Choose the method that best fits your workflow.

YAML Import

YAML is the recommended format for importing large datasets or initial infrastructure definitions.

YAML Structure

Define objects in YAML with their attributes and relationships:

---
# Define locations
- kind: LocationSite
  name: atl1
  description: Atlanta Datacenter 1
  timezone: America/New_York
  region: us-east

- kind: LocationSite
  name: dfw1
  description: Dallas Datacenter 1
  timezone: America/Chicago
  region: us-central

# Define devices
- kind: InfraDevice
  name: atl1-edge-01
  description: Edge router
  site: atl1  # Reference by name
  device_type: cisco-asr9k
  role: edge-router
  status: active

- kind: InfraDevice
  name: atl1-edge-02
  site:
    name: atl1  # Alternative reference syntax
  device_type:
    name: cisco-asr9k
  status: active

Import Command

Use infrahubctl to import YAML files:

infrahubctl load data.yaml

With branch support:

infrahubctl load --branch feature-branch data.yaml

Import Options

Option	Description
`--branch`	Target branch (defaults to main)
`--update`	Update existing objects instead of creating new ones
`--validate-only`	Validate without importing
`--dry-run`	Show what would be imported

Handling Relationships

You can reference related objects by: Name reference (if unique):

site: atl1

ID reference:

site:
  id: "<uuid>"

HFID reference:

site:
  hfid: ["atl1"]

Nested creation:

device_type:
  name: cisco-asr9k
  manufacturer: cisco
  model: ASR-9010

Many Relationships

For cardinality MANY relationships, use a list:

tags:
  - production
  - critical
  - edge

Or with explicit IDs:

tags:
  - id: "<tag-1-uuid>"
  - id: "<tag-2-uuid>"

GraphQL Import

For programmatic imports or integration with external systems, use GraphQL mutations.

Single Object Import

mutation {
  LocationSiteCreate(
    data: {
      name: { value: "atl1" }
      description: { value: "Atlanta Datacenter 1" }
      timezone: { value: "America/New_York" }
      region: { value: "us-east" }
    }
  ) {
    ok
    object {
      id
      display_label
    }
  }
}

Batch Import with Aliases

Import multiple objects in one request using aliases:

mutation {
  site1: LocationSiteCreate(
    data: {
      name: { value: "atl1" }
      description: { value: "Atlanta DC" }
    }
  ) {
    ok
    object { id }
  }

  site2: LocationSiteCreate(
    data: {
      name: { value: "dfw1" }
      description: { value: "Dallas DC" }
    }
  ) {
    ok
    object { id }
  }

  device1: InfraDeviceCreate(
    data: {
      name: { value: "atl1-edge-01" }
      site: { hfid: ["atl1"] }
    }
  ) {
    ok
    object { id }
  }
}

Variables for Dynamic Import

Use variables for reusable import scripts:

mutation CreateDevice($name: String!, $siteId: String!) {
  InfraDeviceCreate(
    data: {
      name: { value: $name }
      site: { id: $siteId }
      status: { value: "active" }
    }
  ) {
    ok
    object {
      id
      display_label
    }
  }
}

With variables:

{
  "name": "edge-router-01",
  "siteId": "<site-uuid>"
}

Python SDK Import

The Python SDK provides a programmatic interface for importing data from external sources.

Basic Import

from infrahub_sdk import InfrahubClient

async def import_sites(sites_data):
    client = InfrahubClient()

    for site_data in sites_data:
        site = await client.create(
            kind="LocationSite",
            **site_data
        )
        await site.save()
        print(f"Created site: {site.name.value}")

# Example data
sites = [
    {"name": "atl1", "description": "Atlanta DC", "timezone": "America/New_York"},
    {"name": "dfw1", "description": "Dallas DC", "timezone": "America/Chicago"},
]

await import_sites(sites)

Import with Relationships

from infrahub_sdk import InfrahubClient

async def import_devices():
    client = InfrahubClient()

    # Fetch existing site
    site = await client.get(
        kind="LocationSite",
        name__value="atl1"
    )

    # Create device with relationship
    device = await client.create(
        kind="InfraDevice",
        name="atl1-edge-01",
        site=site.id,  # Use the site's ID
        status="active"
    )
    await device.save()

await import_devices()

Batch Import with Error Handling

from infrahub_sdk import InfrahubClient
from infrahub_sdk.exceptions import ValidationError

async def batch_import(objects_data):
    client = InfrahubClient()
    results = {"success": [], "failed": []}

    for obj_data in objects_data:
        try:
            obj = await client.create(
                kind=obj_data["kind"],
                **obj_data["attributes"]
            )
            await obj.save()
            results["success"].append(obj.id)
        except ValidationError as e:
            results["failed"].append({
                "data": obj_data,
                "error": str(e)
            })

    return results

# Example data
devices = [
    {
        "kind": "InfraDevice",
        "attributes": {"name": "router-01", "site": "<site-uuid>"}
    },
    {
        "kind": "InfraDevice",
        "attributes": {"name": "router-02", "site": "<site-uuid>"}
    },
]

results = await batch_import(devices)
print(f"Imported {len(results['success'])} objects")
print(f"Failed: {len(results['failed'])}")

Import from CSV

import csv
from infrahub_sdk import InfrahubClient

async def import_from_csv(csv_file):
    client = InfrahubClient()

    with open(csv_file, 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            device = await client.create(
                kind="InfraDevice",
                name=row["name"],
                site=row["site_id"],
                device_type=row["device_type_id"],
                description=row.get("description", "")
            )
            await device.save()

await import_from_csv("devices.csv")

Import from JSON

import json
from infrahub_sdk import InfrahubClient

async def import_from_json(json_file):
    client = InfrahubClient()

    with open(json_file, 'r') as f:
        data = json.load(f)

    for obj in data["objects"]:
        kind = obj.pop("kind")
        instance = await client.create(kind=kind, **obj)
        await instance.save()

await import_from_json("infrastructure.json")

Import from External Systems

Import from NetBox

import pynetbox
from infrahub_sdk import InfrahubClient

async def import_from_netbox(netbox_url, netbox_token):
    # Connect to NetBox
    nb = pynetbox.api(netbox_url, token=netbox_token)

    # Connect to Infrahub
    client = InfrahubClient()

    # Import sites
    for nb_site in nb.dcim.sites.all():
        site = await client.create(
            kind="LocationSite",
            name=nb_site.slug,
            description=nb_site.name,
            region=nb_site.region.slug if nb_site.region else None
        )
        await site.save()
        print(f"Imported site: {site.name.value}")

await import_from_netbox(
    "https://netbox.example.com",
    "your-netbox-token"
)

Import from Git Repository

import yaml
from pathlib import Path
from infrahub_sdk import InfrahubClient

async def import_from_git_repo(repo_path):
    client = InfrahubClient()

    # Find all YAML files
    yaml_files = Path(repo_path).glob("**/*.yaml")

    for yaml_file in yaml_files:
        with open(yaml_file, 'r') as f:
            data = yaml.safe_load(f)

        for obj_data in data:
            kind = obj_data.pop("kind")
            obj = await client.create(kind=kind, **obj_data)
            await obj.save()

await import_from_git_repo("/path/to/repo/data")

Validation During Import

Infrahub validates all imported data against schema constraints:

Required attributes must have values
Unique attributes are checked for duplicates
Regex patterns validate string formats
Relationship cardinality ensures correct peer counts
Type validation ensures values match attribute types

Handling Validation Errors

Validation errors include the field path and error message:

try:
    device = await client.create(
        kind="InfraDevice",
        name="",  # Invalid: empty name
    )
    await device.save()
except ValidationError as e:
    print(f"Validation failed: {e.message}")
    print(f"Field: {e.path}")

Performance Tips

Batch mutations: Use GraphQL aliases to create multiple objects in one request
Reuse client connections: Don’t create a new client for each object
Parallel imports: Use asyncio.gather() for concurrent SDK imports
Use YAML for bulk: YAML import is optimized for large datasets
Fetch references once: Cache frequently referenced objects (sites, types, etc.)

Next Steps

Use Resource Manager for IP and VLAN allocation
Organize objects with Groups
Apply common settings with Profiles

Documentation Index

​YAML Import

​YAML Structure

​Import Command

​Import Options

​Handling Relationships

​Many Relationships

​GraphQL Import

​Single Object Import

​Batch Import with Aliases

​Variables for Dynamic Import

​Python SDK Import

​Basic Import

​Import with Relationships

​Batch Import with Error Handling

​Import from CSV

​Import from JSON

​Import from External Systems

​Import from NetBox

​Import from Git Repository

​Validation During Import

​Handling Validation Errors

​Performance Tips

​Next Steps