Product and Groups Ingestion

Product Ingestion

Introduction

Eagle Eye AIR allows for a list of Retailer’s Products and Product Groups to be imported
automatically. This can be a one-off import during the on-boarding process or a regular import
to ensure the information within the AIR platform is up-to-date.

Initial Setup

The initial setup of the unit structure needs to be completed in Air before products can be
ingested.

Ingestion Overview

The product ingestion is broken into two separate files. One that represents all the products
sold by the retailer and a second file that represents the hierarchy within which those products
exist. Both files are required for the ingestion to run.

The ingestion is a complete replacement every time the ingestion is run for the given unit.
There is no delta ingestion for products as this can easily lead to data inconsistency.

In a basic unit structure (See the Store Ingestion guide for details) the product ingestion is
run against the top level unit. For an advanced structure the ingestion is run at the banner
level making it possible to have different product lists for different company units.

File Formats

Both the products and the groups file have the same requirements around the file format and
a set of encoding, delimiter, encapsulation and forbidden characters as described below.

We require CSV files to have a list of field names as file header (very first line
of the file). All other records (lines) should contain the data to be imported.
Data files may include additional columns (with empty or non-empty values), and we
will ignore them during import. For each supported import mechanism, mandatory
columns have to be present in the file to start ingestion process.

File Encoding

All files must be encoded with Linux line endings rather than Windows.

Content Encoding

The CSV files must be encoded with UTF-8 encoding (ASCII is a subset of UTF-8 encoding).
Non-UTF-8 files or rows containing forbidden characters will not be imported.

Files must not contain a UTF-8 BOM

Field Delimiter

We require comma sign (, or U+002C) as CSV Field Delimiter by default. We require
a double quote sign (" or U+0022) as CSV Text Delimiter by default. We recommend
you use CSV Text Delimiter for all fields values, not just for string fields.

Data Escaping

For product files, we allow the use of special characters (double quotes, slash,
backslash, apostrophe, comma, etc.) as long as they are specially prepared.

  • a quotation mark (" or U+0022) should be represented by a pair of consecutive double quotes or prefixed with backslash sign in the CSV content.
    e.g "" or \"
  • other allowed special characters should be prefixed using backslash symbol (\or U+005C).
  • additional whitespace outside field values may corrupt CSV file; hence we will not be able to read and process it.

Forbidden Characters

We will not be able to process rows containing special, non-visual characters in
field values, examples include:

  • NUL (\0 or U+0000)
  • TAB (\t or U+0009)
  • LINE FEED (\f or U+000A)
  • NEWLINE (\n or U+005F)
  • CARRIAGE RETURN (\r or U+000D)

Warning – these characters maybe added by your CSV editor when compiling a list
of stores

File Names

The file naming format is configurable in the Air platform. Once configured, only files that
match the configured file naming format will be ingested. Air also checks for files of the
same name having already been processed and does not process the same file more than once.

Although this is configurable Eagle Eye
proposes you use formats similar to the below:

  • Product Master: <UnitNameWithNoSpaces>-product\_master-<dateCreated>.csv
  • Product Groups: <UnitNameWithNoSpaces>-product\_groups-<dateCreated>.csv

Where <dateCreated> is in the format YYYYMMDDHHMMSS. This allows uniqueness.

Group/Hierarchy File

The following fields can be specified in the group ingestion file

ColumnTypeMandatoryValue Uniqueness RequiredDescriptionMaximum String Length
group_referenceStringYYProduct Group Reference100
typeStringYNProduct Group Type100
nameStringNNProduct Group Name255
descriptionStringNNProduct Group Description255
parent_referenceStringNNProduct Group Parent Reference100

Please note: Where uniqueness is required, the first occurrence of a record will be ingested.

Example File

The first row of the CSV file is the header and must contain the column names. All other rows
in the file must contain the data to be ingested.

group_reference,type,name,description,parent_reference
"2","EE","Confectionary","Confectionary",
"201","EE","Confectionary Multi Packs","Confectionary Multi Packs","2"
"202","EE","Chocolate Bars","Chocolate Bars","2"
"203","EE","Sharing Bags","Sharing Bags","2"

Product Master File

The following fields can be specified in the product master ingestion file

ColumnTypeMandatoryValue Uniqueness RequiredDescription
short_nameStringNNProduct Short Name
long_nameStringNNProduct Full Name
upcStringY*Y*Product UPC
skuStringY*Y*Product SKU
group_referenceStringYNGroup Reference as defined in the Groups File
tagStringNNProduct Tags (pipe separated)
imageStringNNProduct Image URL
brandStringNNThe name of the brand for this product

* One of UPC or SKU must be provided depending on the configuration of your environment

Example File

The first row of the CSV file is the header and must contain the column names. All other
rows must contain the data to be imported. Mandatory column headers must be present in the
file to start the ingestion process. Any additional columns in the file will be ignored
during the import.

short_name,long_name,upc,group_reference,tag,image,brand
"5x Milk Chocolate","5x Milk Chocolate Bars","8726536","201","Meal Deal|Confectionary|Chocolate","https://image3.com/image3.jpg","Brand 1"
"Dark Chocolate Bar","Dark Chocolate Bar","8276345","202","Meal Deal|Confectionary|Chocolate","https://image1.com/image1.jpg","Brand 2"
"Milk Chocolate Sharer","Milk Chocolate Sharing bag","7637265","203","Meal Deal|Confectionary|Chocolate","https://image2.com/image2.jpg",

Common Ingestion Issues

Due to the complexity of data in these files it's frequent that the first attempt at
extraction and ingestion does not work exactly as expected. The below sections outline some
common cases where issue arise.

Unescaped Characters

Frequently in product names the double quote character " is used to denote inches. Because the
encapsulation character is also a double quote the file structure gets affected.

UTF-8 BOM Header

Some applications save a utf-8 Byte Order Marker at the top of a CSV file. Having this present
makes the file unreadable and is not supported. Please make sure the file encoding is as per the
above specifications.

Empty Group References or Unknown Groups

All products in the file need to be associated to a group as defined in the product hierarchy
file. Often files are received without any reference or a reference that does not exist in the
hierarchy file.

In these cases the product row in the file is skipped and not ingested.

Missing/Duplicate Files

The ingestion of products requires both a product and a group file. If one of the files is missing
the ingestion will not be able to be processed.

As noted above as well a file with the same name can not be processed more than once. Having
the unique date/time stamp often solves this.

Ingestion Timing

When working with the on-boarding team, a schedule should be agreed on when the ingestion process
should be set to run. We suggest a daily refresh of product data and the suggestion is to run
this process overnight. It's worth considering a gap between the schedule of the extraction
from the retailer systems and the timing of the ingestion into Air. If there is any delay in
extraction and the job runs in Air, then no files will be found.