Achieve better performance with an efficient lookup input option in Talend Spark Streaming

Description

Talend provides two options to deal with lookup in Spark streaming Jobs: a simple input component (for example: tMongoDBInput) or a lookup input component (tMongoDBLookupInput). Using a lookup input component will provide heavy uplifting in performance and code optimization for any Spark streaming Job.

Instead of looking up the entire data from the lookup component, Talend provides a unique option for streaming Jobs: to query a smaller chunk of input data for lookup, thereby saving an enormous amount of time and building highly performant Jobs.

By Definition

Lookup components like tMongoDBLookupInput, tJDBCLookupInput, and others provided by Talend execute a database query with a strictly defined order that must correspond to the schema definition.

It passes on the extracted data to tMap in order to provide the lookup data to the main flow. It must be directly connected to a tMap component, and requires this tMap to use Reload at each row or Reload at each row (cache) for the lookup flow.

The tricky part here is to understand the usage of the Reload at each row functionality of the Talend tMap component, and how it can be integrated with the lookup component.

Example

Below is an example of how we have used a tJDBCLookupInput component with tMap in a Talend Spark Streaming Job.

At the tMap level, make sure the tMap for the lookup is set up with Reload at each row, and an expression for globalMap Key is defined as well.
At the lookup input component level, make sure our Query option is set up to query the globalMap Key (where condition extract.consumer_id) is defined in tMap as shown below. This is key for making sure the lookup component only fetches the data needed for processing at that point in time.

Summary

As we have seen, these minute changes in our Streaming Jobs can make our ETL Jobs more effective and performant. As there will always be multiple implementations of a Talend ETL Job, the ability to understand the nuances in making them more efficient is an integral part of being a data engineer.

For more information, reach out to us at: solutions@thinkartha.com[/vc_column_text][vc_column_text css=”.vc_custom_1596545053063{padding-top: 30px !important;padding-bottom: 30px !important;}”]Author: Siddartha Rao Chennur

This article also published on Talend Community:
Source: https://community.talend.com/s/article/Achieve-better-performance-with-an-efficient-lookup-input-option-in-Talend-Spark-Streaming

Recent Events

Visit the Data Distillery at Qlik Connect 2025, Booth #1013 in Orlando

Tuesday, 13 May 9:00 am
Orlando, FL

Experience the journey of your enterprise data like never before. At Artha Data Solutions, we believe data deserves the same craftsmanship and care as the finest wines. Let us walks you through the Data Distillery process — a powerful metaphor that makes enterprise data management relatable, elegant, and efficient.

Achieve better performance with an efficient lookup input option in Talend Spark Streaming

Description

By Definition

Example

Summary

Recent Events

AI-Driven Data Privacy (DPDP Act) & Governance

Friday, 18 Jul 6:30 pm
Bengaluru, Karnataka, India

Visit the Data Distillery at Qlik Connect 2025, Booth #1013 in Orlando

Tuesday, 13 May 9:00 am
Orlando, FL

AI Driven SAP Modernization: A Game-Changer for Pharma Manufacturing

Friday, 25 Apr 6:30 pm
Radisson Hyderabad, Hitech City, Hyderabad, Telangana, India

Recent Posts

Why Enterprises Are Moving from Informatica to Talend

Machine Learning in Insurance Risk Management: A Strategic Guide for CIOs

Reinventing Customer Identity: How ML-Based Deduplication is Transforming Banking Data Integrity

Achieve better performance with an efficient lookup input option in Talend Spark Streaming

Description

By Definition

Example

Summary

Recent Events

AI-Driven Data Privacy (DPDP Act) & Governance

Friday, 18 Jul 6:30 pmBengaluru, Karnataka, India

Visit the Data Distillery at Qlik Connect 2025, Booth #1013 in Orlando

Tuesday, 13 May 9:00 amOrlando, FL

AI Driven SAP Modernization: A Game-Changer for Pharma Manufacturing

Friday, 25 Apr 6:30 pmRadisson Hyderabad, Hitech City, Hyderabad, Telangana, India

Recent Posts

Why Enterprises Are Moving from Informatica to Talend

Machine Learning in Insurance Risk Management: A Strategic Guide for CIOs

Reinventing Customer Identity: How ML-Based Deduplication is Transforming Banking Data Integrity

Friday, 18 Jul 6:30 pm
Bengaluru, Karnataka, India

Tuesday, 13 May 9:00 am
Orlando, FL

Friday, 25 Apr 6:30 pm
Radisson Hyderabad, Hitech City, Hyderabad, Telangana, India