SQL Parser

Currently, the drivers based on the SQL parser support only reading via SQL SELECT statements.

SQL Syntax

square brackets --> Optional

vertical bar --> Alternative

select

SELECT <expression-1> [ AS <identifier-1> ]
     , <expression-2> [ AS <identifier-2> ]
       ...
     , <expression-n> [ AS <identifier-n> ]
 [
   FROM <identifier>.FILES() | <identifier>[.<identifier>] [tableColumns] [tableOptions]
   [ WHERE <expression> ]
   [ FILTER <filter-text> [ ENDFILTER [ GROUPBY <groupby-text> [ ENDGROUPBY ] ] ] ]
 ]

There are the following variants for the table specification:

No FROM Section

No file is required. The query returns one row and the result consists of the specified expressions based on constants.

FROM <identifier>.FILES()

  • The specified identifier points to a directory.
  • The result returns one row per found file.
  • This syntax is only supported by file-based JDBC drivers.

The metadata of the pseudo-columns can then be used.

FROM <identifier>

  • With the Excel driver this syntax is not allowed
  • The specified identifier points to a csv/json file, or to a Salesforce object.

As a rule, the filename must be specified in double quotes, since the period for the file extension is not part of an unprotected identifier.

FROM <identifier-1>.<identifier-2>

  • With the CSV and JSON driver the 1st identifier is ignored
  • With the Excel driver the 1st identifier specifies the Excel file
  • With the Salesforce driver the 1st identifier is the schema
  • With the CSV and JSON driver the 2nd identifier contains the filename
  • With the Excel driver the 2nd identifier provides the worksheet name
  • With the Salesforce driver the 2nd identifier is the object

As a rule, the filename must be specified in double quotes, since the period for the file extension is not part of an unprotected identifier.

tableColumns

COLUMNS ( <identifier-1>, ... <identifier-n> ) [ HEADLINE <integer> ]

The optional specification

HEADLINE <integer>

can only be used with the CSV and Excel drivers.

tableOptions

in any order:

SEPARATED BY <single character in single quotes>
QUOTED BY <single character in single quotes>
ENCODING <string in single quotes>

If SEPARATED BY is not specified, a comma is used.

If QUOTED BY is not specified, the double quote is used.

If ENCODING is not specified, UTF-8 is used.

Identifiers

Identifiers are used for table names, column names in the source or as aliases.

Identifiers in double quotes are treated case-sensitively and allow special characters.

Identifiers without double quotes are converted to lowercase.

When comparing column names in CSV and Excel files to the column headers, the case is ignored (case insensitive).

Constants

Numbers

Constant numbers are of data type BIGINT. They are integers and have a value range from -263 to 263-1.

Constants are specified exclusively via a sequence of digits.

Character Strings

Character strings can be of any length and have the data type VARCHAR.

Constant character strings are enclosed in single quotes.

To include a single quote in a character string, two single quotes must be specified.

Character strings can also be multiline.

SELECT 'Karens'' Backstube', '!hallo
Du da!';

Output

 column_1 column_2
-------- --------
Karens' Backstube !hallo
Du da!

Null Value

NULL

The null value can be specified as a constant via the NULL keyword.

True

TRUE

The boolean value True is specified with the TRUE keyword.

False

FALSE

The boolean value False is specified with the FALSE keyword.

Pseudo-Columns

ROWNUMBER

The constant ROWNUMBER corresponds to the data row in the source.

If the optional HEADLINE is specified, then the 1st row number corresponds to the 1st row after the header row. Otherwise it is the 1st row in the source.

Directory Name

Only with file-based JDBC drivers. The constant DIRECTORY is the directory in which the file is located.

Filename

Only with file-based JDBC drivers. The constant FILENAME is the filename.

Expressions

Expressions are combinations of constant values, column names from the source, pseudo-columns, operators, case distinctions or built-in functions.

Instead of an expression, the character * can be specified. Then no alias is allowed.

The priorities correspond to ANSI-SQL.

To change the priority, expressions can be enclosed in parentheses. The expression within a parenthesis pair is evaluated before the outer operations.