Basic Concepts

datasqill uses transformations to manipulate data. Transformation logic is captured as code by the developer (datasqill supports the developer through modules that automate recurring transformation operations). In addition, datasqill handles the execution of captured transformations and enables scheduling and monitoring.

The Transformation

The Transformation is the core entity in datasqill. A transformation contains the transformation code entered by the developer (e.g. SQL) and enables its execution by using the functionality of the selected module. A datasqill Module provides functionality for automating recurring transformation operations. At runtime, the developer's transformation code is converted into the effective transformation command with the help of the module logic.

A typical example would be the Insert from Select module. The developer enters the following transformation code:

SELECT part_key
     , SUM(l_extendedprice * (1 - l_discount)) AS revenue
  FROM stage.lineitem
 GROUP BY 1

The code is converted by the module at runtime into the following statement.

INSERT INTO data_mart.f_profit (
       part_key
     , revenue_eur
     )
SELECT part_key
     , revenue_eur
  FROM (
SELECT part_key
     , SUM(l_extendedprice * (1 - l_discount)) AS revenue_eur
  FROM stage.lineitem
GROUP BY 1
) src

Sources and Targets

Each transformation can have source(s) and target(s). The source is an object required by the transformation, e.g. a source table in the database. The target is an object that is written to or created by the transformation, e.g. a target table in the database.

Dependencies

Dependencies between transformations are recognized by datasqill and automatically taken into account for the execution order. If transformation A uses an object that is written by transformation B, then transformation A waits until transformation B is completed. Dependencies exist only between transformations but are determined through the objects used.