CData Elasticsearch Driver Features & Differentiators

Elasticsearch is a distributed, RESTful, full-text search engine designed to store, index, retrieve, and manage document-oriented or semi-structured data. Common uses for Elasticsearch range from building a simple search engine for a web site or document collection, to supporting auto-completion, analytics, AI and cognitive computing workloads.

Because of the schemaless / NoSQL nature of Elasticsearch, data management is often a challenge. Through REST APIs users can access the search capabilities and features of Elasticsearch, but that requires custom development to connect it with other applications and services. The other option of using the SQL API and vendor-supplied libraries offers limited read-only SQL query capability and still only offers a small subset of tool integrations.

The CData Elasticsearch drivers offer the best of both worlds. The CData Drivers expand on the capabilities of the REST and SQL APIs, providing standards-compliant SQL-92 interfaces for data management, and the support for standards like ODBC / JDBC / ADO.NET provides near-universal seamless tool integration. The effectiveness of this methodology is best represented by our knowledge base of Elasticsearch integrations, which is only a small sample of what is possible.

In this article, we explore how CData drivers:

Access to All of Elasticsearch

CData Drivers for Elasticsearch provide unmatched access to Elasticsearch with many features beyond those available from the native drivers. The distinctive features for enabling access to Elasticsearch are outlined below:

Connects to the SQL endpoint and REST endpoint (as opposed to only the SQL endpoint) Enables SQL access to Elasticsearch v2.2 and above (compared to v6.3 and above) Compatible with Open Source Elasticsearch subscriptions in addition to other commercially available subscriptions Returns the relevance score as a _score column for queries without explicitly requiring the SCORE() function Allows querying multiple indices (details below) Querying Multiple Indices Elasticsearch SQL supports querying multiple indices (or tables) through pattern matching. The CData Drivers support querying multiple indices, but for both the REST and SQL endpoints (as opposed to only the SQL endpoint). Multiple indices can be queried using any of the following formats:

Result Source SQL Statement REST API Endpoint All available indices SELECT FROM [_all] /_all/_search A list of specific indices SELECT FROM [index1, index2, index3] /index1%2Cindex2%2Cindex3/_search Indices matching a wildcard pattern SELECT FROM [index] /index/_search Indices mathing a pattern (with exclusions) SELECT FROM [index, -index3] /index%2C-index/_search

SQL Support

The CData Drivers for Elasticsearch fully support standard SQL queries, whether crafted manually for custom applications and visualizations or generated automatically by a BI, reporting, or ETL tool.

Full CRUD Support

Since the CData Drivers include support for both the SQL and REST API endpoints, users can submit SELECT, INSERT, UPDATE, and DELETE statements to easily read from and write to Elasticsearch with CData Drivers.

Operation SQL Statement Endpoint(s) Used Query all documents in an index SELECT * FROM [myIndex]; REST, SQL Create a new document in an index INSERT INTO [myIndex] (myField1, myField2, myField3, ...) VALUES ('myValue1', 'myValue2', 'myValue3', ...); REST Update an existing document in an index UPDATE [myIndex] SET myField1 = 'myNewValue1' WHERE myId = '1'; REST DELETE an existing document from an index DELETE FROM [myIndex] WHERE myId = '1'; REST

Filtering

CData Drivers allow for filtering using LIKE and = operators, as opposed to only supporting LIKE operators, offering more precise querying. This functionality is useful, but can still lead to imprecise results. For example, Elasticsearch may return records that contain the same words but in a different order: a search performed using the value "blue sky" will return a record with "sky blue." The Query Type logic built-in to the CData drivers identifies columns as Analyzed or Non-Analyzed and the drivers issue the best possible Elasticsearch query based on the specific operator and the search value.

WHERE Clause Column Type Elasticsearch Query Type column = 'value' Analyzed Query String Query column = 'value with spaces' Analyzed Match Phrase Query column LIKE 'v_lu%' Analyzed Query String with wildcards column = 'value' Non-Analyzed Query String Query* column = 'value with spaces' Non-Analyzed Wildcard Query column LIKE 'v_lu%' Non-Analyzed Wildcard Query with wildcards

Non-analyzed columns can be matched case-insensitive, so four popular cases are checked: 'myValue' OR 'MYVALUE' OR 'myvalue' OR 'Myvalue'

JOIN Support

In Elasticsearch, the join datatype creates a parent/child relation within documents of the same index. CData Drivers can leverage the join datatypes to split related tables and enable SQL JOIN queries across those parent/child relationships. If you need to JOIN across indexes, the built-in SQL engine can perform a client-side, in-memory JOIN to eliminate restrictions on how you work with your data. Using the Elasticsearch example , we can build related tables using the join datatype.

For example, if we have the following documents in the sample index, we get different results from a query, depending on whether we use the native drivers or the CData drivers.

Example Index: sample view source { "_index": "sample", "_type": "doc", "_id": "2", "_score": 1, "_source": { "text": "What is your favorite motorcycle?", "question_id": { "name": "question" } } }, { "_index": "sample", "_type": "doc", "_id": "1", "_score": 1, "_source": { "text": "What is your favorite company?", "question_id": { "name": "question" } } }, { "_index": "sample", "_type": "doc", "_id": "3", "_score": 1, "_routing": "1", "_source": { "text": "CData Software", "question_id": { "name": "answer", "parent": "1" } } }, { "_index": "sample", "_type": "doc", "_id": "4", "_score": 1, "_routing": "1", "_source": { "text": "Elastic", "question_id": { "name": "answer", "parent": "1" } } }, { "_index": "sample", "_type": "doc", "_id": "5", "_score": 1, "_routing": "1", "_source": { "text": "Suzuki V-Strom 650", "question_id": { "name": "answer", "parent": "2" } } }, { "_index": "sample", "_type": "doc", "_id": "6", "_score": 1, "_routing": "1", "_source": { "text": "Yamaha FZ6", "question_id": { "name": "answer", "parent": "2" } } } The native drivers expose a single table, sample, with a single column, text, but with no preservation of the relationship between the documents. The CData Drivers expose two tables, sample_question and sample_answer based on the relationship between the documents in the index.

Native Drivers When querying the data, the native Elasticsearch driver returns a single table with one column, 'text':

Query Result (sample) text What is your favorite company? What is your favorite motorcycle? CData Software Elastic Suzuki V-Strom 650 Yamaha FZ6 CData Drivers In contrast, CData Drivers expose two separate related tables, with several columns, including foreign keys to define the relationships:

Parent (sample_question) _id _score text 1 0.0 What is your favorite company? 2 0.0 What is your favorite motorcycle? Child (sample_answer) _id sample_question_id _score text 3 1 0.0 CData Software 4 1 0.0 Elastic 5 2 0.0 Suzuki V-Strom 650 6 2 0.0 Yamaha FZ6

Support for JSON Structures

CData Drivers provide support for querying JSON structures, like arrays and nested JSON objects, which can often be found in Elasticsearch records. With CData, users will be able to get exactly the data they want from Elasticsearch, thanks to built-in schema discovery and JSON structure flattening.

Array Support

Native Elasticsearch drivers do not (at the time of writing) support querying data found in arrays. CData Drivers not only support accessing the arrays (as raw JSON data), but also support flattening the arrays to drill down into the relevant data. Consider the following JSON array:

{ ... "coords" : [35.91,-79.06], ... } CData Elasticsearch Drivers expose the columns coords coords.0 and coords.1 (sample results below).

coords coords.0 coords.1 [35.92,-79.06] 35.91 -79.06

Nested Type Support

Data stored for use with Elasticsearch often contains nested JSON objects. Without support for these objects, native drivers will miss a lot of available data in their queries. CData Drivers support searching nested objects and include functionality to flatten the nested objects to atomize the data. Consider the following JSON object:

{ "manager": { "name": "Ms. Manager", "age": 35, "location": "Flagship Store", ... } } CData Elasticsearch Drivers expose manager as a column, along with a separate column for each nested element: manager.name, manager.age, manager.location (sample results below):

manager manager.name manager.age manager.location { "name": "Ms. Manager", "age": 35, "location": "Flagship Store", ... } Ms. Manager 35 Flagship Store

Platform Support

Outside of the version and subscription restrictions, one of the largest limitations of the native Elasticsearch drivers is that they only support the JDBC and ODBC standards. In comparison, CData offers connectivity across a variety of data-centric standards like ODBC(Windows, Linux, Mac), JDBC, and ADO.NET and directly within tools like Excel, SSIS, BizTalk , and PowerShell . Whether you are using tools and applications for BI, analytics, reporting, and ETL or building custom applications, CData drivers provide live access to Elasticsearch data wherever you need it.

Performance Comparison

The CData JDBC Driver is able to read data more than twice as fast as the native driver. The CData engineers built a driver that makes better use of client-side resources to work with large datasets quickly.

JDBC Query Times by Company (in milliseconds) Rows Queried CData Software Native 25,000 1,016.9 (+35%) 1,375.4 ~10,000,000 199,577 (+155%) 508,338

For more information, refer to our performance comparison article.

Other Features

Beyond these major features and key differentiators, CData Drivers for Elasticsearch come with additional features, that are included across all of the full range of data sources that CData supports.

Collaborative query processing - a powerful, built-in SQL engine pushes supported complex queries (filtering, aggregations, JOINS, etc.) down to Elasticsearch for server-side processing and features client-side processing for unsupported queries Active development & maintenance — changes to the Elasticsearch API are implemented and supported as soon as possible, keeping up with the latest versions and updates World-class support — our Support Team is available to answer your questions via email, live chat, and phone.

Free Trials & More Information

You will find the latest version of our Elasticsearch drivers here and all of our drivers include a free, 30-day trial. If you want to learn more about using our drivers in your existing tools or applications, you can read more in our Knowledge Base.

Jerod Johnson's Blog