The "Transfer between Databases" module reads data from various source tables of a source database with the help of a SQL query and writes it to a target table in the target database. The SQL query is defined by the datasqill developer via the GUI. All data objects used in the query must be in the same database.
| Name | Meaning |
|---|---|
| Module | Transfer |
| Module Class | DsModRemote |
| Type | Java |
| Purpose | Transfer data between 2 databases e.g. Staging |
| Transformation Code | SQL query |
| Sources | Tables in the source database |
| Targets | Tables in the database |
The "Transfer between Databases" module is controlled by a SQL query that the datasqill developer defines. The module reads data from the source tables with the help of this query and writes it to a target table. In contrast to modules that operate within a database, in this module rows are read from the source database and written to the target database.
The following simple example shows how to transfer data with the help of the module. First, the SQL query used:
SELECT c_custkey AS key
, c_name AS name
FROM staging.customer
If the target schema is "demo" and the target table is "d_customer", the module generates the following SQL for inserting into the target database:
INSERT INTO demo.d_customer (key, name) VALUES(?,?)
As can be seen in the example, the column names of the target table must be specified as aliases when the source columns have different names. In the example, the target table "d_customer" has the columns "key" and "name". This applies especially when using literals or functions to generate column values, as in this example:
SELECT c_custkey AS key
, c_name AS name
, 'de' AS country
, CURRENT_DATE() AS createdate
FROM staging.customer
Data sources and data targets of the module are database tables. Sources and targets can be in different databases. The module supports multiple source tables but only one target table.
The sources of the module are tables that are read via the query defined by the datasqill developer. For these tables, the datasqill runtime user must have read rights. All source tables used in the SQL query must be connected to the module input in the datasqill GUI in the graphical data model.
The target of the module is a single table. For this, the datasqill runtime user must have write rights. The target table must be connected to the module output in the datasqill GUI in the graphical data model.
There are different attributes for optionally emptying the table completely or partially before writing. These can be optionally activated during installation
If the boolean attribute Truncate Before is offered, then the other attributes for emptying the table are suppressed. If this attribute is activated, the table is completely emptied beforehand. If the target database offers a procedure for TRUNCATE TABLE, it is used. Otherwise all rows are deleted with the DELETE command.
If the boolean attribute Delete Before is offered, then the textbox Delete Condition and the boolean attribute Source Condition also become available. If this attribute is activated, deletion is performed according to different strategies:
A Select statement can be specified here. If the attribute Execute loop query in source db is activated, then the query is executed in the source database, otherwise in the target database. The columns returned by the Loop Query can be used as variables in the main query. The main query is executed once per row in the Loop Query
If activated, the main query is validated
In cases such as loop query from the target database (i.e. Execute loop query in source db not activated), it must be specified which database should be used for the source query. The database ID must be specified here
Fetch Size is the size of the read buffer in rows and Write Size the size of the write buffer in rows. A good balance between data throughput and memory usage must be found. With queries that have many and large columns, memory usage increases and can reach the limit of available memory. It should also be considered that the (virtual) machine on which datasqill runs has sufficient memory; especially when many processes use this module simultaneously.
All data from a table should be copied from a source database to the target database in the same structure. For this, a similar table must be created in the target database, as well as a mapping that performs the copy operation.
Assume there is the following table in a Postgres database:
CREATE TABLE artikel (
artikel_nr character varying(30) NOT NULL
, bezeichnung character varying(60)
, preis numeric(14,2)
, PRIMARY KEY (artikel_nr)
);
This statement can be viewed when you create an object in datasqill that points to this table and then click the "Show Table Create Script" button.
In the target database (here Exasol) a table with the same columns is created:
CREATE TABLE artikel (
artikel_nr VARCHAR(30) UTF8 NOT NULL
, bezeichnung VARCHAR(60) UTF8
, preis DOUBLE
, PRIMARY KEY (artikel_nr)
);
Now connect the source object in datasqill to a new action and this to a new object that points to the Exasol target table (refresh the database in the object first once). The Select statement can be copied from "Show Table Create Script":
SELECT artikel_nr
, bezeichnung
, preis
FROM softquadrat.artikel
The action then looks like this:
You can now regularly copy the data with this transformation.