About Data Pumps
The Advantage Data Pump system allows data to be exported as a simple CSV file on a regular schedule, with the results output to one of two destinations:
-
A local folder on the computer hosting the Advantage Cloud Sync Service, or
-
An Amazon S3 bucket (files will be compressed; see Requirements below for additional information).
This system is intended for programmatic consumption of data, and thus may not be suitable for typical reporting use-cases.
Requirements
To use the data pump system, your location must be using the latest Long-Term Support (LTS) version of Advantage.
If you wish to use the Amazon S3 bucket destination option, you will be responsible for creating your own Amazon AWS account to host the files. You will also be solely responsible for the configuration, maintenance, and costs of operating your AWS account. Some recommendations for the configuration can be found at the end of this document. Note that files uploaded to an Amazon S3 bucket will be compressed (GZIP). This cannot be disabled.
If you wish to use the local folder option, please ensure that the SYSTEM user has write access to the designated folder.
Data
You will need to work with CenterEdge Support to outline the data that you’d like to export. A good place to start when describing the data you want is to look at our existing reports. As a familiar format to CenterEdge, it will be easier for us to identify the correct information if it is referenced on an existing report or reports.
Constraints
Data pumps shouldn't be used for accounting level data extraction, unless you are guaranteed that they will run during idle hours.
Please keep in mind that complex data formats and relationships may not translate well to a CSV format.
In particular, data with a one-to-many relationship can result in more exported rows than you may expect. For example, a Family Membership with several members can be exported as a single row if it includes just information about the membership itself. If, however, you wish to export the membership data and the names of each customer (one membership to many customers), you will see the membership data duplicated for each customer that is exported. This is not a hard and fast constraint, but something that will likely be discussed as you outline the data you wish to export.
You should also consider that some data is not exportable, such as customer pictures or extremely long text fields. Certain other data is not accessible to the data pump system.
Finally, please note that some data is sensitive in nature. Great care should be taken to avoid exporting personally-identifiable information about customers or employees, or securing such data if an export is absolutely necessary.
Options
Scheduling
Data pumps are scheduled by defining the StartTime for the export, or when the data pump runs for the first time each day; and by defining the IntervalMinutes, or how often it will run after the StartTime. StartTime is always interpreted to be in UTC (not local time). For example:
StartTime |
IntervalMinutes |
Behavior |
---|---|---|
03:00 |
1440 |
Executes once per day at 3am UTC |
00:00 |
60 |
Executes every hour on the hour |
00:30 |
60 |
Executes every hour on the half hour |
03:00 |
120 |
Executes every two hours starting at 3am UTC. Will wrap and run at 1am UTC as well. |
The maximum interval for a data pump is 1,440 minutes, or one day. The minimum interval is 5 minutes.
It is important to consider load implications on the server when deciding how frequently to run data pumps. Excessive use can diminish the performance of your Advantage software.
File Name
You can specify any Windows-compatible file name for your data pump. Optionally, you may request a date or time be appended to the file name.
Note that, without a date or time, previous exports will be overwritten by each new export. This can be used to limit the amount of storage required for your data pumps. On the other hand, dated file names can allow you to reference data at various points in time.
Filters
Most exported data can be restricted to specific categories or times. For example, an export of all customers might be limited to just those customers that have visited in the last 24 hours. This should be discussed with CenterEdge Support while outlining your data export.
Amazon S3 Bucket Recommendations
In general, your Amazon AWS account should be configured with:
-
An S3 Bucket
-
A dedicated user account for CenterEdge,
-
with programmatic access to Amazon.
-
-
A user policy allowing only write access to the S3 Bucket.
The "Block all public access" option on the S3 Bucket should be disabled. You can implement other bucket policies to secure the data as you see fit.
The user account does not need access to the AWS Console, just programmatic access. You will need to supply CenterEdge with the Access Key and Secret Key ID for this user (we won't need the username or any other details).
The user policy will require the following permissions for the CenterEdge user:
-
PutObject
-
AbortMultipartUpload
-
ListMultipartUploadParts
-
ListBucketMultipartUploads
Any other policies are not necessary and not recommended. When creating the policy, make sure the designated resource includes a wildcard (*) so it applied to everything in the bucket. Ex: "Resource": "arn:aws:s3:::your-bucket-name/*"
Since Amazon S3 buckets are priced based on the amount of data stored, you should consider other factors that might increase the cost of operating your bucket. When configuring your bucket, the option for “Bucket Versioning” should be avoided if you only wish to keep a single copy of each upload. Additionally, omitting a date or time from your File Name will ensure previous files are overwritten.
If you are unsure about any of this information, you should consult with an Amazon AWS expert for assistance with configuring your account.
Comments
1 comment
This article needs revised to help better understand what data is available and what data is not available.
Please sign in to leave a comment.
Related articles