Batch Data Sent trough SFTP to the DMP (DATA IN)
Updated by bruno.morini@retargetly.com
Note: Prior to sending web user data to the SFTP repository there must be a user synchronization.
In this document we will discuss how to carry out an integration of data transmission through SFTP so that the client enriches the information of its users within the DMP platform. It consists of the following steps:
1. SFTP repository configuration
2. File Generation with a specific format
3. Batch File Sent to the SFTP repository
4. Obtaining the status of each uploaded file
5. Taxonomy generation
- SFTP Repository Configuration
The client must configure an SFTP repository from where Retargetly will read the files with the user data. You should also ensure that this repository has the space necessary to host all files that are uploaded periodically, and they must last with a minimum of 30 days from the date of creation. This is to ensure that in case of any failure the system has the possibility to re-extract them and there is no loss of information.
Retargetly will send the following information so that the client can create the SFTP repository:
retargetly.pub -> [File with the SFTP public access key]
This public key must be installed for access to the SFTP through the "retargetly" user.
Retargetly should receive the following information:
Protocol: SFTP
User: retargetly
Host: [host address]
Port: [port number]
- File Generation with a specific format
The client must generate files periodically and publish them within the SFTP. The files need to have a standard format so that the system can read them correctly:
- Filename
Each generated file must be within a directory that follows the following characteristics:
[YYYYMMDD] / [file.tsv.gz]
Considerations:
The name of the files must end with ".tsv.gz" and finalize with every string of characters the client wishes.
The date by which the / YYYYMMDD directory is completed must be the date on which the files were created and uploaded, even if there are files of different days in that same folder.
The files must be valid TSV and compressed in GZIP format, so the extension at the end is "tsv.gz".
Valid Examples:
20181021 / custom_name_0000.tsv.gz
20181021 / custom_name_0001.tsv.gz
20181021 / another_file.tsv.gz
Example of directories and invalid names:
21 / rely_custom_name_0000.tsv.gz -> incorrect home directory
2018/10/21 / rely_custom_name_0000.tsv -> incorrect file and directory extension
2018_10_21 / rely_custom_name_0000.tsv.gz -> incorrect home directory
- File format
The content of each file must always follow a defined pattern:
[device type] [TAB] [user ID] [TAB] [comma separated attributes] [End of line character (\ n)]
Possible values of the device type:
web (indicates that they are web cookies)
android (indicates that they are android devices)
ios (indicates that they are ios devices)
ml_raw (user's email)
ml_sh2 (user's email in sha256 format)
ml_sh5 (user's email in sha512 format)
mb_raw (user's cell phone)
mb_sh2 (user cell phone in sha256 format)
mb_sh5 (user cell phone in sha512 format)
nid_raw (national user ID)
nid_sh2 (national user identifier in sha256 format)
nid_sh5 (national user ID in sha512 format)
Valid Examples:
- ml_raw example@gmail.com property1, property2, property3
- ml_sh2 fc2ae4a1fb374548ea80556dc51ab3471a311231d8bffaa1dece31371bcceb62 property2, property3, property4
- ml_sh2 bea1debd1d52608ac7c0723aed5bd4ce3e821cffdde924bd760a64f40e550313 1b671a64‑40d5‑491e ‑ 99b0 ‑ da01ff1f3341 property6, property7, property8
Clarification: between the type of device, user ID and attributes there is a TAB character, also referenced as \ t. Also at the end of each line is the end of line character, also referenced as enter or \ n. Invalid carriage return character, also referenced as \ r.
Clarification 2: If attributes are sent e-mail, cell number or national identifier in sha256 or sha512 format, they must be formatted prior to applying the hashing algorithm in this way.
- E-mails: all lowercase. Example: example@gmail.com
- National identifier: numbers only, any other character must be deleted. Example: 34848988
- Cellular number: only numbers and in the format (country code) (area code without 0) (number). Example: 541151190123
Invalid examples:
ml_raw, example @ gmail.com, property1, property2, property3 -> the character that divides the client ID from the attributes is not TAB
property1, property2, property3 example@gmail.com ml_raw -> incorrect data order
example@gmail.com -> there is no device type or attributes loaded for this user.
- Batch File Sent to the SFTP repository:
For the generation of files, after having defined their formats, the process must comply with the following policies:
- Each file should not weigh more than 250MB.
- Files must last at least 30 days after the generation date.
- After the 30 days, they can be deleted.
- Obtaining the status of each uploaded file
For each file generated within the SFTP, its status can be obtained. The files have 3 possible states:
- In process
- Failed
- Successful
These 3 states are reported as a clone file of the file to be ingested, but with the following extensions:
processing -> file without content
failed -> file with an error message
success -> file with the processing results
Example:
If the system is processing the file /20181021/custom_name_0000.tsv.gz
In the same folder, this file will be present:
20181021 / custom_name_0000.tsv.gz.processing
And once finished, the .processing will be deleted and this file will be created if the execution has been successful:
20181021 / custom_name_0000.tsv.gz.success
- Taxonomy generation:
The customer must send Rdesk@retargetly.comt a list of properties that will be received for users. In this way Retargetly will return the same list but attaching the corresponding Segment IDs within the DMP platform.
Let us know if this is helpful please.
Bests,