Achieve better performance with an efficient lookup input option in Talend Spark Streaming

Description

Talend provides two options to deal with lookup in Spark streaming Jobs: a simple input component (for example: tMongoDBInput) or a lookup input component (tMongoDBLookupInput). Using a lookup input component will provide heavy uplifting in performance and code optimization for any Spark streaming Job. 

Instead of looking up the entire data from the lookup component, Talend provides a unique option for streaming Jobs: to query a smaller chunk of input data for lookup, thereby saving an enormous amount of time and building highly performant Jobs.

By Definition

Lookup components like tMongoDBLookupInputtJDBCLookupInput, and others provided by Talend execute a database query with a strictly defined order that must correspond to the schema definition.

It passes on the extracted data to tMap in order to provide the lookup data to the main flow. It must be directly connected to a tMap component, and requires this tMap to use Reload at each row or Reload at each row (cache) for the lookup flow.

The tricky part here is to understand the usage of the Reload at each row functionality of the Talend tMap component, and how it can be integrated with the lookup component.

Example

Below is an example of how we have used a tJDBCLookupInput component with tMap in a Talend Spark Streaming Job.

 

  1. At the tMap level, make sure the tMap for the lookup is set up with Reload at each row, and an expression for globalMap Key is defined as well.
  2. At the lookup input component level, make sure our Query option is set up to query the globalMap Key (where condition extract.consumer_id) is defined in tMap as shown below. This is key for making sure the lookup component only fetches the data needed for processing at that point in time.

Summary

As we have seen, these minute changes in our Streaming Jobs can make our ETL Jobs more effective and performant. As there will always be multiple implementations of a Talend ETL Job, the ability to understand the nuances in making them more efficient is an integral part of being a data engineer.

For more information, reach out to us at: solutions@thinkartha.com[/vc_column_text][vc_column_text css=”.vc_custom_1596545053063{padding-top: 30px !important;padding-bottom: 30px !important;}”]Author: Siddartha Rao Chennur

This article also published on Talend Community:
Source: https://community.talend.com/s/article/Achieve-better-performance-with-an-efficient-lookup-input-option-in-Talend-Spark-Streaming

Quick Start Guide: Talend and Docker

Enterprise deployment work is notorious for being hidebound and slow to react to change. With many organizations adopting Docker and container services, it becomes easy to incorporate their Talend deployment life cycle into their existing Docker and container services, creating a more unified deployment platform to be shared across various applications within an organization.

This article is intended as a quick start guide on how to generate Talend Jobs as Docker images using a Docker service that is on a remote host.

Also, to provide better understanding on handling Docker images, a few topics below are discussed by drawing comparisons between sh/bat scripts and Docker images.

Setting up your Docker for remote build

Talend Studio needs to connect to a Docker service to be able to generate a Docker image.

The Docker service can run on a machine where Talend Studio is installed, or it might be running somewhere on a remote host. This step is not needed if Docker is running on the same machine where Talend Studio is installed; this step is needed only if Talend Studio and Docker are running on different hosts.

Building a Docker Image from Talend Studio v7.1 or Greater

In v7.1, Talend introduced the Fabric 8 Maven plugin to generate a Docker image directly from Talend Studio.

Using Talend Studio, we can build a Docker image stored in a local Docker repository. Otherwise, we can build and publish a Docker image to any registry of our choice.

Let us look at both options:

Build the Docker Image from Talend Studio

  1. Right-click on the Job and navigate to the Build Job option:
  2. Under build type, select Docker Image:

3. Choose the appropriate context and log4h level.

4. Under Docker Options, select local if Docker and Studio are installed on same host, or select Remote if your Docker service is running on a different host from the one where Talend Studio is installed. In our example, we enabled Docker for a remote build via TCP on port 2375

tcp://dockerhostIP:2375

5. Once this is done, your Docker image is built and stored in the Docker repository, in our example on host 2.

6. Log in to the Docker host, in our example host 2, and execute the command docker images. You should be able to view the image we just built:

Build and Publish the Docker Image to the Registry from Talend Studio

Talend Studio can be used to build a Docker image, and the image can be published to any registry where the images can be picked up by Kubernetes or any container services. In our example, I have set up an AWS ECR registry.

  1. Right-click on the Job name and navigate to the Publish option.
Quick-Start-Guide-Talend-and-Docker-publish.png
Quick-Start-Guide-Talend-and-Docker-publish.png

2. Select the Export Type Docker Image:

3. Under Docker Options, provide the Docker host and port details as discussed in the previous topics. Give the necessary details of the registry and Docker image name:

Image Name = Repository Name
Image Tag=Jobname_Version
Username = AccessKeyId (AWS)
Password=Secret (AWS)

4. Once this is done, navigate to AWS ECR and you should able to search and find the image

Running Docker Images vs Shell or Bat scripts

With Talend, we are all accustomed to either .SH or .Bat scripts, so for better understanding of how to run Docker images let’s cover various aspects, like how to pass run time parameters and volume mounting, in detail below.

Passing Run Time Parameters to a Docker Image

To run the Docker image that is in your Docker repository (Talend Build Job as Docker)

  1. List all the Docker Images by running the command docker images:
  2. Now I want to run the image madhav_tmc/tlogrow, Tag latest, which uses a tWarn component to print a message. Part of the message will be from the context variable param.

3. Run the Docker image by passing a value to the context variable param at runtime:

docker run madhav_tmc/tlogrow:latest \--context_param param="Hello TalendDocker"

Below in the log, we can see the value passed to the Docker image at runtime

Talend Cloud & AMC Web UI: Hybrid approach

What is AMC? fadeInDown Talend Activity Monitoring Console is an add-on tool integrated into Talend Studio and Talend Administration Center for monitoring Talend Jobs and projects. It helps Talend product administrators or users to achieve enhanced resource management and improved process performance through a convenient graphical interface and a supervising tool. It provides detailed monitoring capabilities that can be used to consolidate the collected activity monitoring information, understands the underlying component and Job interaction, prevents faults that could be unexpectedly generated, and supports system management decisions.

In general, the functionalities are:

  • Batch process monitoring
  • Log information about each execution of a DI Job
  • Jobs can automatically write information to the AMC DB or File
  • TAC and Studio can access information within the AMC DB or File through the Activity Monitoring Console GUI

The Talend Activity Monitoring Console interface consists of the following views:

  • Jobs view
  • History and Detailed history views
  • Meter log view
  • Main chart view
  • Job Volume view
  • Logged Events view
  • Error report view
  • Threshold Charts view

This article is intended for Talend Cloud customers who want to leverage the AMC web UI to monitor Talend Jobs. As many existing Talend on-prem customers are used to the AMC Web UI, with more customers migrating to Talend Cloud we can take a hybrid approach by using an amc.war file from the Talend on-prem version to host AMC as a standalone tool.

It is recommended to use custom dashboards on top of an AMC Database if you are looking for more advanced or custom metrics than that are offered by the AMC web UI.[/vc_column_text][/vc_column_inner][/vc_row_inner][vc_row_inner lg_spacing=”padding_top:20″][vc_column_inner][vc_column_text]

Steps to host AMC as a Standalone Tool on Apache Tomcat

  1. Install the Apache Tomcat Service.
  2. Contact Talend for access, and download the amc.war file.
  3. Place the amc.war file under the tomcat Install Dir/webapps folder.
  4. Restart the Tomcat service.
  5. Once Tomcat is started, create a folder under the webapps directory named amc.
  6. Download the Database JAR file that we want to host to store AMC Data.
  7. Place that JAR under tomcat Install Dir/webapps/amc/WEB-INF/plugins/org.talend.amc.libraries_7.3.1.20190624_1017/lib/ext.
  8. Restart Tomcat.
  9. Navigate to the URL http://ip:port/amc/rap?startup=amc, for example http://localhost:8080/amc/rap?startup=amc as shown in the screenshot below.This should take us to the AMC web page.

 

Conclusion

The AMC Web UI from Talend is plug and play to monitor Talend Jobs. Many on-premises customers have the leverage of accessing the AMC web UI; hosting AMC as a standalone tool with Talend Cloud as a kind of hybrid approach gives the same web UI for cloud customers in line with on-prem customers.

For more information, reach out to us at: solutions@thinkartha.comAuthor: Madhav Nalla

This article also published on Talend Community:
Source: https://community.talend.com/s/article/Talend-Cloud-AMC-Web-UI-Hybrid-approach-jJdLO 

Talend Studio Best Practices – Increase Studio Performance and Settings

Lets discuss about Talend Studio best practices, Issues/Fixes/Recommendation’s at studio level.

Increase Talend Studio memory :
Increase your Talend studio memory – Go to Talend Installed directory and change xms and xmx values based on your system memory. Ini file name should be Talend-Studio-win-x86_64.

If you are using Talend Cloud #TalendCloud #TalendReferenceProject
Setup Reference Project from Talend Studio – Feature is no more available in Talend TAC

Notice: Reference projects now managed in Studio. Read Release Notes how to migrate. If you want to remove reference – use DeleteProjectReference operation in MetaServlet.
Go to File -> Project Settings -> Click on Reference projects -> Added new reference -> Select project and add corresponding branch and click +.
Note : Don’t forget to click on + symbol.[/vc_column_text][/vc_column_inner][/vc_row_inner][vc_row_inner][vc_column_inner][vc_single_image image=”10303″ img_size=”full” alignment=”center”][/vc_column_inner][/vc_row_inner][vc_row_inner][vc_column_inner][vc_column_text]How to Change GroupId in Talend studio

• The default Group Id in Talend studio is org.example.projectname
• You need to change to com.arthasolutions. Talend – To do these changes you need to go file -> project settings -> Build -> Deployment groupid[/vc_column_text][vc_column_text]Publish Snapshot or Releases
In order to deploy a snapshot – Open Talend Studio – Open Talend Job
• Go to Job Tab and navigate to Deployment Tab -> Select the check box Use Snapshot[/vc_column_text][vc_column_text]How to make sure your Talend studio pointing to right snapshot or releases ?

Go to Talend Studio -> window -> Perferences -> Talend -> Artifact Repository -> Repository Settings
• Now check your repository settings tab
• You should see all the repository settings pre-populated.
• Now in 7.2.1 All these settings come from TAC itself.
• If you ever want to change these default settings, you need to click on Use Customized Settings and type the following default release repo and default snapshot repo.

Install all Additional Packages – Talend studio

The code of method (Map<String,Object>) is exceeding the 65535 bytes limit

• In Talend 7.2.1, some Jobs may fail in code compilation with an error of “65535 byte code”. This may happen in some specific Job designs where the code generation is already at the 65 KB limit.
• You can prevent this error by the following parameter to the config.ini file located in the configuration folder under your Talend Studio installation directory:
deactivate_extended_component_

This article is published on Talend Community:
https://community.talend.com/s/feed/0D53p00007vCnuMCAS[/vc_column_text][/vc_column_inner][/vc_row_inner][/vc_column][/vc_row]

Fastest MDM Rollout

Introduction

Carhartt Inc., a prominent clothing and sporting goods company, has been into manufacturing and supplying of work wear apparels since 1889. They have been growing over the years and with time have embraced various channels to sell their products. To this end, various underlying IT systems were built catering to each of the sales channels. With this in place, they have an assortment of systems like AS400, SAP & cloud SAS applications like Cloud twist and simple excel files. While the growth and sales from multiple channels did help them diversify their revenue and gain global reach, the company was facing various challenges in terms of operational inefficiency, poor customer support, bloated marketing expenses, minimal analytics, and non-uniform customer experience.

Thus, what was the best way for Carhartt to do so? The best solution was to shift from a multi-channel approach to an Omni channel retail experience. In response to these challenges, Carhartt reached out to Talend to implement a Master Data Management (MDM) to create a rich and satisfying shopping experience across all channels, as well as gain greater control over master data assets. Talend’s implementation partner, Artha Solutions, focused on Carhartt’s data integration and complete data management. It provided Carhartt a 360° view and delivered high data accuracy for operational transactions and analytical reporting. From requirements to implementation, Artha Solutions successfully implemented the complex MDM solution in 4 months flat! This is something that is unheard of in the MDM world. The telling benefit of the complex central rules implemented by Artha was when data of over 4 million customers were brought on to the MDM platform, a total of 50,000 duplicate customers were identified and removed from the system in the first 6 hours itself.]While multi-channel e-commerce and retail line up perfectly with how customers already shop, adopting an internal multi-channel approach could be disastrous:

  • Difficult for data management and analysis
  • Difficult to differentiate each marketing channel
  • Does not put focus on creating a consistent shopping experience
  • Unavailability of real time data
  • Unable to apply common rules on data due to lack of central data

Omni Channel Experience

Important data or information keeps piling up over time, which becomes redundant and needs to be managed. However, when the point of heightened redundancy is reached, businesses start to suffer from operational and efficiency challenges. This leads to a dire need for data accuracy, coherence, accessibility, and uniqueness. Searchable storage becomes very crucial for all types of businesses, highlighting the significance and criticality of master data management.

  • Single source of truth
  • Improving sale
  • Better data collection
  • Enhanced productivity

MDM Implementation Team

Businesses today struggle to become agile by implementing information systems that support and facilitate changing business requirements. As a result, the management of information about products, customers, etc. has become increasingly important.Additionally, business organizations have systems in place to store and retrieve this data. However, many disparate systems store information, which leads to overlapping, redundant, and inconsistent data. So, where do users go to get valid, accurate, consistent data?

Master Data Management! However, the effectiveness of an MDM solution is as good as the team that is implementing. MDM implementation encompasses wide areas of expertise to do activities such as Talend DI, DQ, Talend MDM, and Talend Data Prep. Master data management (MDM) is critical to achieving effective information governance. Failure to accurately manage information has been the root cause of several incidents, including the leak of sensitive information.

Why is Master Data Management Important?

  • Allows smooth functioning of the organization
  • Derives meaning out of assembled data
  • Improves operations and efficiency
  • Reduces process errors by consolidating and controlling data
  • Removes duplications
  • Avoids missed opportunities and dissatisfied customers
  • Data consistency

 

Data Governance Breakfast with Talend and Artha Solutions

Data Governance Breakfast with Talend and Artha Solutions

We invite you to join Artha Solutions & Talend at Data Governance Breakfast seminar to discover how a comprehensive data governance framework can help you meet business objectives, mitigate risk, and address compliance challenges.

We’ll cover how to:

  • Balance data accessibility and control with a pragmatic 3-step approach
  • Implement agile governance for better customer experience
  • Kick-start your data governance initiatives to realize the value in 100 days

What is the Agenda?

8:00 AM – Registration and Hot Breakfast
8:30 AM – Introduction by Jean-Michel Franco, Senior Product Marketing Director- Data Governance
8:45 AM – Artha Solutions Interactive Presentation and Demo
10:15 AM – Networking
11:00 AM – Conclusion

When is the event?

Tuesday 21 May, 2019
From: 8:00 A.M. – 11:00 A.M.

Where is the event?

The Hilton Sydney, 488 George Street,
Sydney NSW 2000.

Registerhttps://info.talend.com/ArthaSolutionsBreakfastSyd_21May_reg.html?utm_medium=email&utm_source=outbound&utm_campaign=seminar

Data Governance Breakfast in Melbourne with Talend and Artha Solutions

Data Governance Breakfast in Melbourne with Talend and Artha Solutions

We invite you to join Artha Solutions & Talend at Data Governance Breakfast seminar to discover how a comprehensive data governance framework can help you meet business objectives, mitigate risk, and address compliance challenges.

You can learn the following:

  • Balance data accessibility and control with a pragmatic 3-step approach
  • Implement agile governance for better customer experience
  • Kick-start your data governance initiatives to realise value in 100 days

When is the Agenda?

8:00 AM – Registration and Hot Breakfast
8:30 AM – Introduction by Jean-Michel Franco, Senior Product Marketing Director- Data Governance
8:45 AM – Artha Solutions Interactive Presentation and Demo
10:15 AM – Networking
11:00 AM – Conclusion

When is the event?

Wednesday 22 May, 2019
From: 8:00 A.M. – 11:00 A.M.

Where is the event?

QT Melbourne, 133 Russell Street
Melbourne VIC 3000

Register Here

Artha honored with “Talend Partner of the Year 2020 Award”

Artha Solutions honored with “Talend Partner of the Year 2020 Award”

 

Artha Solutions received Talend partner of the Year Award in the category of “System Integrator” of the Year 2020.

Artha recognized as a leader in Cloud and Big Data Innovation with 2020 Partner of the Year Awards. Winners were revealed in a ceremony during the company’s annual sales kickoff conference, Talend Engage, which took place this year in New Orleans.

The Partner of the Year winners is judged using a range of criteria including ACV booking of net new deals, creativity and innovation, project scope and complexity, as well as overall business value achieved for customers in cloud and big data.

Read More:
https://www.talend.com/about-us/press-releases/talend-recognizes-leaders-in-cloud-and-big-data-innovation-with-2020-partner-of-the-year-awards/

Artha Solutions honored with Global Award from Talend

Channel Partner Supporting Talend Core Values – Global Award 

Artha proud to announce that, we honored with Channel Partner Supporting Talend Core Values – Global Award from Talend at Talend Partner Excellence Awards. This recognition demonstrates innovation in cloud and big data services and technologies, helping customers be more agile, effective and data-driven.

Srinivas Poddutoori (Vice President, Artha Solutions) received the award from Mike Tucson (CEO, Talend) during the annual Talend Engage conference taking place this year at The Broadmoor Hotel in Colorado Springs, Colorado.

The Talend Partner Excellence winners are judged using a range of criteria including creativity and innovation, project scope and complexity, as well as overall business value achieved for customers in the cloud and big data.

Talend Data Masters Award to PT Bank Danamon Indonesia, Tbk.

Talend Data Masters Award to PT Bank Danamon Indonesia, Tbk. 

Artha Solutions glad to announce that, our client PT Bank Danamon Indonesia Tbk won the Talend Data Masters Award. Artha solutions helped BDI in digital transformation services to improve customer experience and customer engagement thus improving overall business for the bank.

Artha initiated several transformation engagements by using Talend Platform for data transformation and effective data governance. During the process Artha figured out many challenges with existing technology like:

  • The speed at which the data can be aggregated using the new age Big Data Technologies
  • Ability to meet the current and scale to future business growth
  • Provide options to expand beyond ETL, to a complete platform that provides ETL, DQ, Big data, data governance platform

Artha provided many solutions for challenges such as:

  • Migration of existing platform to Talend
  • Implementing Talend Big data capabilities to the fullest potential
  • Deployed solution architecture which can meet current digital transformatoin initiatives and also ability to handle future expansion requirements.