Loading...

Incremental Crawling Database Sources in Data Cube

When connecting to a database with Data Cube, you have the option to perform incremental crawling of the data source. When incremental crawling is selected, only the values in the database that have changed since the previous crawl will be updated in the data source. However, you must provide the following additional information when you select incremental crawling for databases:

Option Description
Incremental Crawling Select to crawl only updated values in the database. If incremental crawling is enabled, you must enter a Primary Key, Delta SQL Query, and Delta Import SQL Query for the data source. If incremental crawling is not enabled, all crawls will be inclusive of the entire dataset from the SQL Query.

If selected, the first crawl of the database will select all of the data from the SQL Query, and any subsequent crawls will update the data according to the Delta SQL Query and Delta Import SQL Query.

Primary Key Used to specify the column that is the primary key in the database. Required if Incremental Crawling is selected.
Delta SQL Query A SQL query used to define the criteria for determining the fields that have been updated since the previous crawl. Required if Incremental Crawling is selected.
Delta Import SQL Query A SQL query used to select the data to import from the database for the rows that match the criteria in the Delta SQL Query. Required if Incremental Crawling is selected.

Example Queries

The following are example queries for configuring incremental crawling of database sources in Data Cube.

Note: In the following examples, a data field named last_index_time is used. The last_index_time field is the time when the previous crawl has completed. This field has a strongly typed datetime data type in the local time zone of the Analytics Engine, formatted as YYYY-MM-DD HH:MM:SS. For example: 2016-09-07 08:11:42.

For information about connecting to database sources, see Connecting to a Database with Data Cube.

Example 1: Crawling Order Details that have Changed Since the Previous Crawl

The following example contains configuration information to update the data for orders that have changed since the previous crawl for a database table named ORDERITEM.

  • SQL Query

    SELECT OrderItemID, ProductID, OrderRevievedTime, Quantity, Price, ShippingPrice
    FROM ORDERITEM

  • Primay Key

    OrderItemID

  • Delta SQL Query

    SELECT OrderItemID
    FROM ORDERITEM
    WHERE CONVERT(VARCHAR, CAST(OrderRevievedTime AS datetime), 20) > '${dih.last_index_time}'

  • Delta Import Query

    SELECT OrderItemID, ProductID, OrderRevievedTime, Quantity, Price, ShippingPrice
    FROM ORDERITEM
    WHERE OrderItemID = '${dih.delta.OrderItemID}'

Example 2: Crawling Order Details from Past 24 Hours

The following example contains configuration information to update the data for orders that have occurred in the last 24 hours for a database table named ORDERITEM.

  • SQL Query

    SELECT OrderItemID, ProductID, OrderRevievedTime, Quantity, Price, ShippingPrice
    FROM ORDERITEM

  • Primay Key

    OrderItemID

  • Delta SQL Query

    SELECT OrderItemID
    FROM ORDERITEM
    WHERE (OrderRevievedTime > (DATEADD (DAY, -1, GETDATE ())))

  • Delta Import Query

    SELECT OrderItemID, ProductID, OrderRevievedTime, Quantity, Price, ShippingPrice
    FROM ORDERITEM
    WHERE OrderItemID = '${dih.delta.OrderItemID}'