Zalando Product Taxonomy
The purpose of this page is to detail how we can download different categories and their respective taxonomy from Zalando
The way Taxonomy works on Zalando is in a generally standard way. The terminology they use is “Outlines” for a Category and “Attributes” for everything that is inside a category as explanation. “Attributes” can be everything from Title and Description, through to IS and VS from Hemisphere. In reality the “Attributes” are the “fields” we can use in a category
To receive all necessary data we need to call first for the Outlines (or single outline) which will provide us with all possible attributes associated with this category and then call each attribute to gather more data on the type of it, enum values, potential sub attributes, etc.
Attributes (as anything else Product related in Zalando) can be on Model, Config or Simple level
Please note that there are attributes that overlap across multiple different Outlines. This is important for the way we are storing them and the decision we want to take with this. Example: If we are calling all categories at once and find the attribute “color_code.primary” in category 1 we can then download all its properties and store them against the attribute and when we get to category 2 we don’t need to download it again.
To make the attributes more interesting though there is a thing called “restricted attributes”. This means that a single attribute can have 100 options but for this specific category only a few are available. The restricted allowed values are present as part of the outline download meaning we can still download all values and specifications only once and then store against the specific category only the available ones. Example: If we are calling all categories at once and find the restricted attribute “washing Instructions” in category 1 we can then download all its properties and store the restricted values as well for category 1. When we call category 2 after that and find it again with different restrictions we don’t have to download the whole attribute, just store the restricted options against category 2.
For the end user they will need to know when they list in category X what are the attribute specifics and available options only for category X (same for category N, etc.). Up to the dev team to decide on the best route to store and work with Zalando’s API to make sure functionality is working as expected and providing the needed information
Get Outlines
API Docs: https://developers.merchants.zalando.com/docs/onboarding-new-products.html
API Call: GET /merchants/{merchant_id}/outlines
The above will return all Outlines available for the Merchant ID. This means that different merchants will have access to different categories and we should be ready to handle this both on a VM instance level and later in a generic taxonomy level
Example call: https://api-sandbox.merchants.zalando.com\ /merchants/$YOUR_MERCHANT_ID/outlines" \ "Authorization:Bearer $YOUR_ACCESS_TOKEN"
Example response (of a single outline specified in a call)
{
"label": "sandals",
"tiers": {
"model": {
"optional_types": [],
"mandatory_types": [
"target_age_groups",
"target_genders",
"name",
"brand_code",
"size_group"
],
"restricted_attributes": []
},
"config": {
"optional_types": [
"condition",
"color_code.secondary",
"color_code.tertiary",
"width",
"shape",
"decksohle",
"material.filling",
"futter",
"heel_form",
"insole_technology",
"material_construction",
"non_textile_but_animal_parts",
"occasion",
"padding_type",
"pattern",
"shoe_detail",
"shoe_toecap",
"shoe_width",
"size_fits",
"sole_material",
"sport_qualities",
"sport_shoe_details",
"sport_shoe_insole",
"sport_shoe_outer_sole",
"sport_shoe_outer_sole_technology",
"sport_type",
"upper_material",
"washing_instructions"
],
"mandatory_types": [
"color_code.primary",
"media",
"season_code",
"supplier_color",
"description"
],
"restricted_attributes": [
{
"type": {
"label": "washing_instructions",
"version": "1.0.0"
},
"values": [
"washing_instructions_4106",
"washing_instructions_4107",
"washing_instructions_6999",
"washing_instructions_7858"
]
},
...
]
},
"simple": {
"optional_types": [
"metric.heel_height",
"metric.platform_height",
"metric.schafthoehe",
"metric.schaftweite"
],
"mandatory_types": [
"ean",
"size_codes"
],
"restricted_attributes": []
}
}
}
As can be seen in the example beside the structure being split by Model, Config and Simple we also have a separation for each one by Optional and Mandatory Attributes. Besides the specifically mapped values (“name”, “brand_code”, “condition”, “media”, “description”, “ean”) all others are expected to be within the IS or VS fields in the Hemisphere Product Account. The general expectation is that attributes within the Model level will be part of the IS section; most of Config level attributes will be part of the IS BUT a “Config” can vary by pretty much any field; all fields besides “size” will be part of IS. Size is part of VS
Basically where I am getting with that is due to the nature of Config we should be ready to look for the attributes to be filled in almost in both IS and VS fields (still only VS values should change for a whole Variation)
Get Attributes
API Docs: https://developers.merchants.zalando.com/docs/onboarding-new-products.html
API Calls:
/merchants/{merchant_id}/attribute-types/{type_label}
- for getting the specific attribute options/merchants/{merchant_id}/attribute-types/{type_label}/attributes
- for getting the specific attribute enum options
With the first call we want to go over each attribute to build our first layer of understanding what those attributes can be. We can generally put them in 3 categories where for each category the required value can be free text or an enum value.
To begin understanding the attributes let’s start with the easiest “name”. Below the example response if we call for the attribute “name”
{
"label": "name",
"name": {
"en-gb": "name"
},
"type_variants": [],
"cardinality": "one",
"definition": {
"type": "StringDefinition"
},
"usage": "literal"
}
We can define the following fields:
Zalando field | WAP Note |
---|---|
label |
This is the ID we should use in any API calls |
name |
Human readable representation of the field |
type_variants |
If the variant has multiple options (at which point those are created with a “.” between the parent and the additional internal options |
cardinality |
How many times a value can be present. Seems the available options are “one” and “many” |
definition |
Additional information regarding the attribute. Seems options are StringDefinition & StructuredDefinition . The first one is simple value in the payload, either free text or picked from an enum. The second one is much more complicated - it introduces sub attributes within an attribute, hence “structure”. See more info in the detailed explanation below |
usage |
The type of the attribute (string vs enum). Seems the available options are literal & reference_by_label respectively |
Based on the above for “name” we have a single time required field, that is a free text string, and that’s it.
In the below example we can look at how we can receive a simple attribute that is enum value when we call for the attribute “brand_code”:
{
"label": "brand_code",
"name": {
"en": "brand code"
},
"description": {
"en": "Brand code is used to identify the Brand of the product"
},
"type_variants": [],
"cardinality": "one",
"definition": {
"type": "StringDefinition"
},
"usage": "reference_by_label"
}
We can see again that it is only a one time available attribute that has a “usage” of an enum value. This means we need to call this attribute further to get its “inner attributes”, which we do with the second call from above
Example call: https://api-sandbox.merchants.zalando.com\ /merchants/$YOUR_MERCHANT_ID/attribute-types\ /brand_code/attributes" \ "Authorization:Bearer $YOUR_ACCESS_TOKEN"
Example (limited) response:
{
"items":[
{
"label":"ns1",
"name":{
"en":"Nike Sportswear"
},
"value":{
"string":"ns1"
}
},
{
"label":"A55",
"name":{
"en":"adidas Consortium by Y's"
},
"value":{
"string":"A55"
}
},
{
"label":"A56",
"name":{
"en":"adidas by Raf Simons"
},
"value":{
"string":"A56"
}
},
...
]
}
As for the general attributes for the values we have to use “label” for any API communication and we can use the “name” for any display purposes
Next example we can review is simply for an attribute that allows more than one occurrences. We will get the example with the attribute “target_genders”. Example response below:
{
"label": "target_genders",
"name": {
"en": "target genders"
},
"description": {
"en": "Target genders"
},
"type_variants": [],
"cardinality": "many",
"definition": {
"type": "StringDefinition"
},
"usage": "reference_by_label"
}
Based on what we already know we should call its enum values that will give us the options. Once we have them if we want to put multiples in the payload it should simply look like this:
"target_genders": [
"target_gender_male",
"target_gender_female"
]
To see how an attribute with multiple type variants acts we can take a look at the simple color_code attribute. In the Outline response we can see that we receive back attributes in the following suit
color_code.primary
color_code.secondary
color_code.tertiary
All of these “primary, secondary, tertiary” are type variants of the attribute “color_code”. For such attributes in general doesn’t matter if we will call for the attribute “color_code” or for “color_code.primary” we will always get the parent structure as per the below:
Example call: https://api-sandbox.merchants.zalando.com\ /merchants/$YOUR_MERCHANT_ID/attribute-types\ /color_code.primary" \ "Authorization:Bearer $YOUR_ACCESS_TOKEN"
Example response:
{
"label": "color_code",
"name": {
"en-gb": "color_code"
},
"type_variants": [
{
"label": "primary",
"name": {}
},
{
"label": "secondary",
"name": {}
},
{
"label": "tertiary",
"name": {}
}
],
"cardinality": "one",
"definition": {
"type": "LocalizedStringDefinition"
},
"usage": "reference_by_label"
}
color_code is also an enum value attribute which means we will have to pick its values additionally for the general “color_code” attribute and each of the available enum values can apply to all type variants
So far so good. We have simple attributes, enum attributes and type_variant attribtues allowing single or multiple values. This is where it gets interesting 🙂
Complex Attributes - these are attributes that we also call in our tool “SubAttributes”, meaning a single attribute holds additional internal attributes that have their own values. Such “complex” attributes can also be simple or include type_variants as well. To recognise a complex attribute we need to read the “definition” field and receive a StructuredDefinition
as value. Post that we will have the different subAttribuets available for this specific complex attribute with their own optionality (“false” meaning this subAttribute MUST be present if the complex attribute is to be used)
To understand more about a subAttribute we basically treat the “complex attribute” as its outline and then we have to call the attribute itself for more information. Let’s take material as one such complex attribute and examine what happens when we call it
First we need to call the attribute from the outline as we’ve seen “material.upper_material_clothing” (no such value in the current example outline above, just using the Zalando docs examples)
Example call: https://api-sandbox.merchants.zalando.com\ /merchants/$YOUR_MERCHANT_ID/attribute-types\ /material.upper_material_clothing" \ "Authorization:Bearer $YOUR_ACCESS_TOKEN"
To which we will receive the full “material” response:
{
"cardinality": "many",
"definition": {
"type": "StructuredDefinition",
"types": [
{
"label": "material_percentage",
"optional": false
},
{
"label": "material_code",
"optional": false
}
]
},
"label": "material",
"name": {
"en-gb": "material"
},
"type_variants": [
...
{
"label": "upper_material_clothing",
"name": {
"cs": "Materiál vnější látky",
"da": "Materiale",
"de": "Material Oberstoff",
"en": "Outer fabric material",
"es": "Material exterior",
"fi": "päällikankaan materiaali",
"fr": "Composition",
"it": "Composizione",
"nl": "Materiaal buitenlaag",
"no": "Overmateriale",
"pl": "Materiał",
"pt": "material do tecido exterior",
"ru": "",
"sv": "Material",
"tr": ""
}
},
...
}
],
"usage": "literal"
}
We can see there are multiple type variants, meaning for different Outlines we can see different options needed, material.uppper_material_clothing in this case or as in our Example outline above, material.filling
For both of these we will have to pass what we see in the definition though which is “material_percentage” and “material_code”
To see what those are we need to move to another call to the their own internal attributes as they are treated by Zalando almost as just the next attribute with its own schema
Example call: https://api-sandbox.merchants.zalando.com\ /merchants/$YOUR_MERCHANT_ID/attribute-types\ /material_percentage" \ "Authorization:Bearer $YOUR_ACCESS_TOKEN"
Example response:
{
"label": "material_percentage",
"name": {
"en-gb": "material_percentage"
},
"type_variants": [],
"cardinality": "one",
"definition": {
"type": "DecimalDefinition"
},
"usage": "literal"
}
We will do the same for “material_code”
Example call: https://api-sandbox.merchants.zalando.com\ /merchants/$YOUR_MERCHANT_ID/attribute-types\ /material_code" \ "Authorization:Bearer $YOUR_ACCESS_TOKEN"
Example response:
{
"label": "material_code",
"name": {
"en-gb": "material_code"
},
"type_variants": [],
"cardinality": "one",
"definition": {
"type": "LocalizedStringDefinition"
},
"usage": "reference_by_label"
}
We can see that material_percentage is simple decimal, with literal use where material_code is a string that is enum value bound. This means we need to call for its own internal enumerations so we can store against the taxonomy and at the end in a payload to send the values like this:
"material.upper_material_clothing": [
{
"material_code": "li",
"material_percentage": 97.50
},
{
"material_code": "el",
"material_percentage": 2.50
}
]
We have to use the already established structure for subAttributes in Hemisphere with one note to be checked (TBD) - will there be an issue with a type_variant attribute that is a complex attribute like the above mentioned “material” as at this point we will have “material.upper_material_clothing.material_code” in Hemisphere
Last but not least are Sizes
Sizes, follow the same structure above as generic attribute fields BUT they work together. We have 2 size attributes: size_group (model level attribute) and size_code (simple level attribute)
The size_group, as an “unofficial parent” of the pair, specifies what type should the size_code be. Just as an example - sizes between shirts and shoes are completely different. Sometimes they are even between shoes and shoes 🙂
Size is almost like a taxonomy of its own (TBD - should we split it?). The group controls what is to come for the code in the Simple layer of the JSON.
Example call for getting sizes:
https://api-sandbox.merchants.zalando.com\ /merchants/$YOUR_MERCHANT_ID/attribute-types\ /size/attributes" \ "Authorization:Bearer $YOUR_ACCESS_TOKEN"
Example (limited) response:
{
"items": [
{
"_meta": {
"active": true,
"dimension": {
"value": "Shoe size",
"type": "size",
"name": "Shoes",
"group": "Female",
"comment": "Shoes, socks, knee socks, shoe trees",
"category": "Shoes"
},
"sizes": [
{
"conversions": [
{
"cluster": "eu",
"raw": "42"
},
{
"cluster": "us",
"raw": "11"
}
],
"supplier_size": "42",
"sort_key": 18
},
{
"conversions": [
{
"cluster": "eu",
"raw": "44.5"
},
{
"cluster": "us",
"raw": "13"
}
],
"supplier_size": "44.5",
"sort_key": 22
},
...
]
},
"label": "4FE1000E0A",
"name": {
"en-gb": "4FE1000E0A"
},
"value": {
"string": "4FE1000E0A"
}
},
{
"_meta": {
"active": true,
"dimension": {
"category": "Clothing",
"comment": "Pants, leggings, jeans, ...",
"group": "Male",
"name": "Clothing",
"type": "size",
"value": "Confection"
},
"sizes": [
{
"conversions": [
{
"cluster": "eu",
"raw": "M"
},
{
"cluster": "us",
"raw": "M"
}
],
"sort_key": 7,
"supplier_size": "M"
},
{
"conversions": [
{
"cluster": "eu",
"raw": "L"
},
{
"cluster": "us",
"raw": "L"
}
],
"sort_key": 9,
"supplier_size": "L"
},
...
]
},
"label": "4MU1000E2A",
"name": {
"en-gb": "4MU1000E2A"
},
"value": {
"string": "4MU1000E2A"
}
},
...
]
}
It seems that even though the size_group value should be somewhat limited and follow suit of the Outline it relies on the users preparing the data to chose the right sizing group of the products with Zalando providing information on the different group options so they can be either displayed to an end user or understood better for any automation purposes as well. As always we need to send the Label through the API and use a specification for people to be able to see and pick the right value. In this case we can’t use simply the “name” so while we should store all other values so people can use them to pick the right label we will require them to provide us with the label itself directly in the IS of a product
Once a user has decided what is going to be the size_group the validation for the size_code becomes quite simple, just following the available “sizes”.
For size_code we need to always send the “supplier_size” value
Last thing on the sizes is the fact they ar actually complex attributes. This means they can hold multiple values. Those are “size” and “length”. This can be distinguished by the “type” field in the specific size_group. There are some Categories that allow (or probably even require) length sizes as well. For any such that Zalando has made available we will see the same 2 size_groups, differentiated by their type and of course having a different label to be used in the API. The “length” size_group will have their own sizes the same way the “size” size_groups do. In a payload to Zalando this should look like this:
"size_group": {
"size": "4MU1000E2A",
"length": "5AAU000012"
}
and
"size_codes": {
"size": "42",
"length": "52"
}
The way we should address these in Hemisphere is as for any other complex attribute. We should be expecting a “size_group.size” as IS and “size_code.size” as variant. The only specification we’d like to do is to allow for only “size_group” and “size_code” to be used when the product is to use only “size” type values (TBD). If the are to use both “size” and “length” both fields will need to have their respective subAttribute appended