Transfer

The "Transfer between Databases" module reads data from various source tables of a source database with the help of a SQL query and writes it to a target table in the target database. The SQL query is defined by the datasqill developer via the GUI. All data objects used in the query must be in the same database.

Name Meaning
Module Transfer
Module Class DsModRemote
Type Java
Purpose Transfer data between 2 databases e.g. Staging
Transformation Code SQL query
Sources Tables in the source database
Targets Tables in the database

Description

The "Transfer between Databases" module is controlled by a SQL query that the datasqill developer defines. The module reads data from the source tables with the help of this query and writes it to a target table. In contrast to modules that operate within a database, in this module rows are read from the source database and written to the target database.

The following simple example shows how to transfer data with the help of the module. First, the SQL query used:

SELECT c_custkey AS key
     , c_name AS name
  FROM staging.customer

If the target schema is "demo" and the target table is "d_customer", the module generates the following SQL for inserting into the target database:

INSERT INTO demo.d_customer (key, name) VALUES(?,?)

As can be seen in the example, the column names of the target table must be specified as aliases when the source columns have different names. In the example, the target table "d_customer" has the columns "key" and "name". This applies especially when using literals or functions to generate column values, as in this example:

SELECT c_custkey AS key
     , c_name AS name
     , 'de' AS country
     , CURRENT_DATE() AS createdate
  FROM staging.customer

Data sources and data targets of the module are database tables. Sources and targets can be in different databases. The module supports multiple source tables but only one target table.

Data Sources

The sources of the module are tables that are read via the query defined by the datasqill developer. For these tables, the datasqill runtime user must have read rights. All source tables used in the SQL query must be connected to the module input in the datasqill GUI in the graphical data model.

Data Target

The target of the module is a single table. For this, the datasqill runtime user must have write rights. The target table must be connected to the module output in the datasqill GUI in the graphical data model.

Attributes

Delete before insert

There are different attributes for optionally emptying the table completely or partially before writing. These can be optionally activated during installation

Truncate Before

If the boolean attribute Truncate Before is offered, then the other attributes for emptying the table are suppressed. If this attribute is activated, the table is completely emptied beforehand. If the target database offers a procedure for TRUNCATE TABLE, it is used. Otherwise all rows are deleted with the DELETE command.

Delete Before

If the boolean attribute Delete Before is offered, then the textbox Delete Condition and the boolean attribute Source Condition also become available. If this attribute is activated, deletion is performed according to different strategies:

  • If Source Condition is activated, then a Select statement is expected in Delete Condition that can be executed in the source database. All found rows are then deleted from the target table. Deletion is thus done per Java row by row
  • If Delete Condition is empty, the same procedure as with Truncate Before is used
  • Otherwise deletion is from the target database and an amendment to the DELETE statement is expected in Delete Condition. The WHERE keyword must be specified. Deletion is thus done for all data to be deleted with one command The Delete Condition is currently not validated!

Loop Query

A Select statement can be specified here. If the attribute Execute loop query in source db is activated, then the query is executed in the source database, otherwise in the target database. The columns returned by the Loop Query can be used as variables in the main query. The main query is executed once per row in the Loop Query

Validate main query

If activated, the main query is validated

Source database ID

In cases such as loop query from the target database (i.e. Execute loop query in source db not activated), it must be specified which database should be used for the source query. The database ID must be specified here

Fetch Size and Write Size

Fetch Size is the size of the read buffer in rows and Write Size the size of the write buffer in rows. A good balance between data throughput and memory usage must be found. With queries that have many and large columns, memory usage increases and can reach the limit of available memory. It should also be considered that the (virtual) machine on which datasqill runs has sufficient memory; especially when many processes use this module simultaneously.

Examples

Transferring data 1:1 from a source to a target database

All data from a table should be copied from a source database to the target database in the same structure. For this, a similar table must be created in the target database, as well as a mapping that performs the copy operation.

Assume there is the following table in a Postgres database:

CREATE TABLE artikel (
    artikel_nr character varying(30) NOT NULL
  , bezeichnung character varying(60)
  , preis numeric(14,2)
  , PRIMARY KEY (artikel_nr)
);

This statement can be viewed when you create an object in datasqill that points to this table and then click the "Show Table Create Script" button.

In the target database (here Exasol) a table with the same columns is created:

CREATE TABLE artikel (
    artikel_nr VARCHAR(30) UTF8 NOT NULL
  , bezeichnung VARCHAR(60) UTF8
  , preis DOUBLE
  , PRIMARY KEY (artikel_nr)
);

Now connect the source object in datasqill to a new action and this to a new object that points to the Exasol target table (refresh the database in the object first once). The Select statement can be copied from "Show Table Create Script":

SELECT artikel_nr
     , bezeichnung
     , preis
  FROM softquadrat.artikel

The action then looks like this: copy_one_to_one You can now regularly copy the data with this transformation.