Skip to content

Model Reuse and Deduplication

When generating models from schemas, you may encounter duplicate model definitions. datamodel-code-generator provides options to deduplicate models and share them across multiple files, improving output structure, reducing diff sizes, and enhancing performance.

Quick Overview

Option Description
--reuse-model Deduplicate identical model/enum definitions
--reuse-scope Control scope of deduplication (root or tree)
--shared-module-name Name for shared module in multi-file output
--collapse-root-models Inline root models instead of creating wrappers
--use-type-alias Create TypeAlias for reusable field types (see Reducing Duplicate Field Types)

--reuse-model

The --reuse-model flag detects identical enum or model definitions and generates a single shared definition instead of duplicates.

Without --reuse-model

datamodel-codegen --input schema.json --output model.py
# Duplicate enums for animal and pet fields
class Animal(Enum):
    dog = 'dog'
    cat = 'cat'

class Pet(Enum):  # Duplicate!
    dog = 'dog'
    cat = 'cat'

class User(BaseModel):
    animal: Optional[Animal] = None
    pet: Optional[Pet] = None

With --reuse-model

datamodel-codegen --input schema.json --output model.py --reuse-model
# Single shared enum
class Animal(Enum):
    dog = 'dog'
    cat = 'cat'

class User(BaseModel):
    animal: Optional[Animal] = None
    pet: Optional[Animal] = None  # Reuses Animal

Benefits

  • Smaller output - Less generated code
  • Cleaner diffs - Changes to shared types only appear once
  • Better performance - Faster generation for large schemas
  • Type consistency - Same types are truly the same

--reuse-scope

Controls the scope for model reuse detection when processing multiple input files.

Value Description
root Detect duplicates only within each input file (default)
tree Detect duplicates across all input files

Single-file input

For single-file input, --reuse-scope has no effect. Use --reuse-model alone.

Multi-file input with tree scope

When generating from multiple schema files to a directory:

datamodel-codegen --input schemas/ --output models/ --reuse-model --reuse-scope tree

Input files:

schemas/
├── user.json      # defines SharedModel
└── order.json     # also defines identical SharedModel

Output with --reuse-scope tree:

models/
├── __init__.py
├── user.py        # imports from shared
├── order.py       # imports from shared
└── shared.py      # SharedModel defined once

# models/user.py
from .shared import SharedModel

class User(BaseModel):
    data: Optional[SharedModel] = None

# models/shared.py
class SharedModel(BaseModel):
    id: Optional[int] = None
    name: Optional[str] = None

--shared-module-name

Customize the name of the shared module when using --reuse-scope tree.

datamodel-codegen --input schemas/ --output models/ \
  --reuse-model --reuse-scope tree --shared-module-name common

Output:

models/
├── __init__.py
├── user.py
├── order.py
└── common.py      # Instead of shared.py


--collapse-root-models

Inline root model definitions instead of creating separate wrapper classes.

Without --collapse-root-models

class UserId(BaseModel):
    __root__: str

class User(BaseModel):
    id: UserId

With --collapse-root-models

class User(BaseModel):
    id: str  # Inlined

When to use

  • Simpler output when wrapper classes aren't needed
  • Reducing the number of generated classes
  • When root models are just type aliases

Combining Options

datamodel-codegen \
  --input schemas/ \
  --output models/ \
  --reuse-model \
  --reuse-scope tree \
  --shared-module-name common \
  --collapse-root-models

This produces: - Deduplicated models across all files - Shared types in a common.py module - Inlined simple root models - Minimal, clean output

datamodel-codegen \
  --input schema.json \
  --output model.py \
  --reuse-model \
  --collapse-root-models

Performance Impact

For large schemas with many models:

Scenario Without reuse With reuse
100 schemas, 50% duplicates 100 models ~50 models
Generation time Baseline Faster (less to generate)
Output size Large Smaller
Git diff on type change Multiple files Single location

Performance tip

For very large schemas, combine --reuse-model with --disable-warnings to speed up generation:

datamodel-codegen --reuse-model --disable-warnings --input large-schema.json

Output Structure Comparison

Without deduplication

models/
├── user.py         # UserStatus enum
├── order.py        # OrderStatus enum (duplicate of UserStatus!)
└── product.py      # ProductStatus enum (duplicate!)

With --reuse-model --reuse-scope tree

models/
├── __init__.py
├── user.py         # imports Status from shared
├── order.py        # imports Status from shared
├── product.py      # imports Status from shared
└── shared.py       # Status enum defined once

Reducing Duplicate Field Types

When multiple classes share the same field type with identical constraints or metadata, you can reduce duplication by defining the type once in $defs and referencing it with $ref. Combined with --use-type-alias, this creates a single TypeAlias that's reused across all classes.

Problem: Duplicate Annotated Fields

Without using $ref, each class gets its own inline field definition:

class ClassA(BaseModel):
    place_name: Annotated[str, Field(alias='placeName')]  # Duplicate!

class ClassB(BaseModel):
    place_name: Annotated[str, Field(alias='placeName')]  # Duplicate!

Solution: Use $defs with --use-type-alias

Step 1: Define the shared type in $defs

{
  "$defs": {
    "PlaceName": {
      "type": "string",
      "title": "PlaceName",
      "description": "A place name"
    },
    "ClassA": {
      "type": "object",
      "properties": {
        "place_name": { "$ref": "#/$defs/PlaceName" }
      }
    },
    "ClassB": {
      "type": "object",
      "properties": {
        "place_name": { "$ref": "#/$defs/PlaceName" }
      }
    }
  }
}

Step 2: Generate with --use-type-alias

datamodel-codegen \
  --input schema.json \
  --output model.py \
  --use-type-alias

Result: Single TypeAlias reused across classes

PlaceName = TypeAliasType(
    "PlaceName",
    Annotated[str, Field(..., description='A place name', title='PlaceName')],
)


class ClassA(BaseModel):
    place_name: PlaceName  # Reuses the TypeAlias


class ClassB(BaseModel):
    place_name: PlaceName  # Reuses the TypeAlias

Benefits

  • Single source of truth - Field type is defined once
  • Easier maintenance - Change the type in one place
  • Cleaner generated code - No redundant annotations
  • Type safety - All fields share the exact same type

When to Use This Pattern

This pattern is ideal when:

  • Multiple classes share fields with the same constraints (e.g., minLength, pattern)
  • Fields have identical metadata (e.g., description, examples)
  • You want to ensure type consistency across your schema

See Also