nadap: Namespace-Aware Data Validation and Pre-Processing
This Python module provides data validation against a data schema. The data schema describes the structure, the data types and all value limitations which a given data must match.
In addition data values at defined points within the data schema can be referenced among each other. They can be tested on uniqueness or if at some point in the data a value (consumer) is the same value that is located at another point in data (producer). For more details see Reference Feature Documentation.
Furthermore, input data can be enriched with default values or values can be converted (i.e. into another data type). For more details see Conversion Feature Documentation.
Code Example
import yaml
import nadap
schema_definition_yaml = """
root:
type: list
elements:
type: dict
restrictions:
required: ["id", "name"]
keys:
id:
type: int
reference: person_id
name: str
healthy: bool
"""
# Correct data
data1_yaml = """
- id: 1
name: Nadap
healthy: true
- id: 2
name: Other
healthy: false
- id: 3
name: Unkown
"""
# Wrong type for 'name'
data2_yaml = """
- id: 1
name: 1
"""
# 'id' of 'Other' is not unique; used by 'Nadap'
data3_yaml = """
- id: 1
name: Nadap
healthy: true
- id: 1
name: Other
healthy: false
"""
schema_def = yaml.load(schema_definition_yaml, Loader=yaml.SafeLoader)
n = nadap.Nadap()
n.schema = schema_def
data1 = yaml.load(data1_yaml, Loader=yaml.SafeLoader)
try:
n.validate(data1)
except nadap.DataValidationError:
print("Data1 fails:")
for finding in n.findings:
print(finding)
# Recreate a Nadap instance to clear referencing cache
n = nadap.Nadap()
n.schema = schema_def
data2 = yaml.load(data2_yaml, Loader=yaml.SafeLoader)
try:
n.validate(data2)
except nadap.DataValidationError:
print("Data2 fails:")
for finding in n.findings:
print(finding)
# Recreate a Nadap instance to clear referencing cache
n = nadap.Nadap()
n.schema = schema_def
data3 = yaml.load(data3_yaml, Loader=yaml.SafeLoader)
try:
n.validate(data3)
except nadap.DataValidationError:
print("Data3 fails:")
for finding in n.findings:
print(finding)
else:
if n.findings:
print("Data3 referencing fails:")
for finding in n.findings:
print(finding)
... will print this output:
Data2 fails:
[0].name: Data is not an instance of 'str'
Data3 referencing fails:
[1].id: Reference already defined at [0].id