How to work with Talend & Couchbase Component
While the whole world is shifting towards big data, NoSQL has become a crucial technology in the data management industry. The need for moving and transforming data between traditional and modern systems has likewise become mission critical for data-driven businesses. This data movement could either be to a new data warehouse project or migrating the existing data from traditional RDBMS to the new NoSQL platform or adding new transformations to the existing jobs.
Talend offers a diverse range of components for utilizing big data to suit each data integration purpose. It also provides NoSQL connectivity to leading NoSQL databases like Couchbase, Cassandra, MongoDB, HBase, Neo4J, Apache CouchDB and Riak. Using Talend to manage unstructured data in a NoSQL scenario doesn’t require any specialized knowledge of NoSQL databases. In short, Talend is a big umbrella providing many connectors for all kinds of data movement/transformations.
What is NoSQL?
NoSQL stands for Not only SQL. It is a movement towards document stores that do not make use of the relational model. The fundamental shift is in the way NoSQL stores data. For example, when you would need to store data about customer details, in RDBMS you would need to extract this information into tables and then use a server side or report side language to transfer this data back to its original state. On the other hand, in NoSQL, you just store the customer details. NoSQL is schema free, which means you don’t need to design your tables and structure up front – you can simply start storing values. All the values are stored in Documents and all the query joins are done using MapReduce. MapReduce is used to create a ‘view’ (like a resultset) this view consists of a subset of the overall data.
Coushbase Server is a NoSQL database. It is designed with a distributed architecture for performance, scalability, and availability. It enables developers to build applications easier and faster by leveraging the power of SQL with the flexibility of JSON.
Talend & Couchbase Server
Talend enables you to manage and transform data between Couchbase Server, a NoSQL document database, and any other relational or big data system. This integration also allows you to efficiently build richer reports and analytics on the data stored in Couchbase, utilizing the power of Couchbase’s pre-computed indexes and aggregates.
What Components Can You Use?
Talend offers the following components to work with Couchbase Server.
- tCouchbaseConnection : This component allows you to create a connection to a Couchbase bucket and reuse that connection for other components. This opens a connection to a Couchbase bucket in order that a transaction may be made.
- tCouchbaseInput : This component allows you to query the documents from the Couchbase database. This allows you to fetch your documents from the Couchbase database either by the unique key or through Views.
- tCouchbaseOutput : This component allows you to perform actions on the JSON or binary documents stored in the Couchbase database based on the incoming flat data from a file, a database table etc. This inserts, updates, upserts or deletes the documents in the Couchbase database which are stored in the form of Key/Value pairs, where the Value can be JSON or binary data.
- tCouchbaseClose : This component closes a connection to the Couchbase bucket when all transactions are done, in order to guarantee the integrity of transactions. This closes a Couchbase bucket connection.
How Does it Work?
Talend in/out Couchbase connectors allows you to manage and transform your data. To bring data from other data sources into Couchbase, the tCouchbaseInput connector takes incoming data streams and transforms it into JSON documents before they are stored in Couchbase. To import data into Couchbase, you can define which data fields need to be transformed into JSON attributes. Similarly, to export data from Couchbase to other data sources, the tCouchbaseOutput connector uses the schema mapping specified by the user to read JSON documents and transform them into target data formats. You have the flexibility to define which attributes in your JSON document need to be exported and transformed. For this blog, I have created two simple jobs, however, more complex scenarios can be tackled with Talend as well.
Couchbase and Talend Jobs: Create a Document
The first job reads a .txt file which has unstructured data. The job creates a document with the data read. The input file consists of feedback from the customer and the customer_id is not of a single data type. It consists of characters, numbers and special characters as shown below. In the traditional approach, we would have started with creating a surrogate key (which will be a primary key) for the customer_id. However, with Couchbase, we could store this as-is.
The overall job would look like the image given below. tCouchbase_Connection opens a connection to the Couchbase server. Once the connection is established, the input file is read and few transformations are done in tMap component post which the data is written to a document in the Couchbase Server.
For the example job given the default bucket is used and tCpouchbaseOutput settings look like this:
Note that the JSON configuration is very important as this would define the way your document would be stored. In the example, the JSON configuration is similar to what is shown below.
Once the job runs successfully, you could login to Couchbase and check that the document is created.
Couchbase and Talend Jobs: Read a Document
This job would read the document created by our previous job. There are two ways of reading the documents.
- Using the key: IDs of the documents stored in the Couchbase database document. In our example, it could be either 123,6534672 or john.
- Using the views: Use this check box to view the document information as per the Map/Reduce functions and other settings. The schema here has three pre-defined fields, Id, Key, and Value. Where, Id holds the document ID, Key holds the information specified by the key of the Map function and Value holds the information specified by the value of the Map function.
The job given below reads the document using the key. The key must be specified in the settings.
tCouchBaseInput settings in the job are given below.
Once the job runs successfully, the output according to the filter given in the settings would be displayed in the console.
The next job reads the document using the views.
Change the settings in the tCouchbaseoutput as shown below. Here I am creating a view ‘customer_view’ with customer_id,customer_name and feedback columns.
Save the and run the job. Go to Couchbase console and check that the view and the result set is created. This view could further be used for jobs or for ad-hoc queries.
BigData Dimension is a leading provider of cloud and on-premise solutions for BigData Lake Analytics, Cloud Data Lake Analytics, Talend Custom Solution, Data Replication, Data Quality, Master Data Management (MDM), Business Analytics, and custom mobile, application, and web solutions. BigData Dimension equips organizations with cutting edge technology and analytics capabilities, all integrated by our market- leading professionals. Through our Data Analytics expertise, we enable our customers to see the right information to make the decisions they need to make on a daily basis. We excel in out-of-the-box thinking to answer your toughest business challenges.
You’ve already invested in Talend project or maybe you already have a Talend Solution implemented, but may not be utilizing the full power of the solution. To get the full value of the product, you need to get the solution implemented from industry experts.
At BigData Dimension, we have experience spanning over a decade integrating technologies around Data Analytics. As far as Talend goes, we’re one of the few best-of-breed Talend-focused systems integrators in the entire world. So when it comes to your Talend deployment and getting the most out of it, we’re here for you with unmatched expertise.
Our work covers many different industries including Healthcare, Travel, Education, Telecommunications, Retail, Finance, and Human Resources.
We offer flexible delivery models to meet your needs and budget, including onshore and offshore resources. We can deploy and scale our talented experts within two weeks.
- Full requirements analysis of your infrastructure
- Implementation, deployment, training, and ongoing services both cloud-based and/or on-premise
- BigData Management by Talend: Leverage Talend Big Data and its built-in extensions for NoSQL, Hadoop, and MapReduce. This can be done either on-premise or in the cloud to meet your requirements around Data Quality, Data Integration, and Data Mastery
- Cloud Integration and Data Replication: We specialize in integrating and replicating data into Redshift, Azure, Vertica, and other data warehousing technologies through customized revolutionary products and processes.
- ETL / Data Integration and Conversion: Ask us about our groundbreaking product for ETL-DW! Our experience and custom products we’ve built for ETL-DI through Talend will give you a new level of speed and scalability
- Data Quality by Talend: From mapping, profiling, and establishing data quality rules, we’ll help you get the right support mechanisms setup for your enterprise
- Integrate Your Applications: Talend Enterprise Service Bus can be leveraged for your enterprise’s data integration strategy, allowing you to tie together many different data-related technologies, and get them to all talk and work together
- Master Data Management by Talend: We provide end-to-end capabilities and experience to master your data through architecting and deploying Talend MDM. We tailor the deployment to drive the best result for your specific industry – Retail, Financial, Healthcare, Insurance, Technology, Travel, Telecommunications, and others
- Business Process Management: Our expertise in Talend Open Studio will lead the way for your organization’s overall BPM strategy
As a leading Systems Integrator with years of expertise in the latest and greatest integrating numerous IT technologies, we help you work smarter, not harder, and at a better Total Cost of Ownership. Our resources are based throughout the United States and around the world. We have subject matter expertise in numerous industries and solving IT and business challenges.
We blend all types of data and transform it into meaningful insights by creating high performance Big Data Lakes, MDM, BI, Cloud, and Mobility Solutions.
CloudCDC is equipped with the most intuitive and user friendly interface. With in a couple of clicks, you can load, transfer and replicate data to any platforms without any hassle. Do not worry about codes or scripts.
• Build Data Lake on AWS, Azure and Hadoop
• Continuous Real Time Data Sync.
• Click-to-replicate user interface.
• Automated Integration & Data Type Mapping.
• Automated Schema Build.
• Codeless Development Environment.
CONTACT THE EXPERTS AT BIGDATA DIMENSION FOR YOUR CLOUDCDC, TALEND, DATA ANALYTICS, AND BIG DATA NEEDS. CONTACT US TODAY TO LEARN MORE!