Achieve better performance with an efficient lookup input option in Talend Spark Streaming

Description

Talend provides two options to deal with lookup in Spark streaming Jobs: a simple input component (for example: tMongoDBInput) or a lookup input component (tMongoDBLookupInput). Using a lookup input component will provide heavy uplifting in performance and code optimization for any Spark streaming Job. 

Instead of looking up the entire data from the lookup component, Talend provides a unique option for streaming Jobs: to query a smaller chunk of input data for lookup, thereby saving an enormous amount of time and building highly performant Jobs.

By Definition

Lookup components like tMongoDBLookupInputtJDBCLookupInput, and others provided by Talend execute a database query with a strictly defined order that must correspond to the schema definition.

It passes on the extracted data to tMap in order to provide the lookup data to the main flow. It must be directly connected to a tMap component, and requires this tMap to use Reload at each row or Reload at each row (cache) for the lookup flow.

The tricky part here is to understand the usage of the Reload at each row functionality of the Talend tMap component, and how it can be integrated with the lookup component.

Example

Below is an example of how we have used a tJDBCLookupInput component with tMap in a Talend Spark Streaming Job.

 

  1. At the tMap level, make sure the tMap for the lookup is set up with Reload at each row, and an expression for globalMap Key is defined as well.
  2. At the lookup input component level, make sure our Query option is set up to query the globalMap Key (where condition extract.consumer_id) is defined in tMap as shown below. This is key for making sure the lookup component only fetches the data needed for processing at that point in time.

Summary

As we have seen, these minute changes in our Streaming Jobs can make our ETL Jobs more effective and performant. As there will always be multiple implementations of a Talend ETL Job, the ability to understand the nuances in making them more efficient is an integral part of being a data engineer.

For more information, reach out to us at: solutions@thinkartha.com[/vc_column_text][vc_column_text css=”.vc_custom_1596545053063{padding-top: 30px !important;padding-bottom: 30px !important;}”]Author: Siddartha Rao Chennur

This article also published on Talend Community:
Source: https://community.talend.com/s/article/Achieve-better-performance-with-an-efficient-lookup-input-option-in-Talend-Spark-Streaming

Recent Events

Future Ready Data Foundation From AI Pilot to Production Value

Thursday, 30 Apr

Explores how organizations can overcome these challenges by building a future-ready data foundation that enables trusted, scalable, and AI-ready data.

Artha Solutions Proud Sponsor of Qlik Connect 2026

flag-usa Monday, 13 Apr 9:00 am
Gaylord Palms Resort & Convention Center, Kissimmee, Florida

Artha Solutions at Qlik Connect 2026 – Showcasing AI-ready data foundations, data integration, ETL modernization, and MDM solutions. Meet us at Booth# SS15.

Artha Solutions to host Qlik AI Reality Tour 2025 – Bengaluru

flag-india Thursday, 27 Nov 4:00 pm

Artha Solutions is proud to host the Qlik AI Reality Tour 2025 – Bengaluru, an exclusive event designed to help you turn data, analytics, and AI initiatives into real business impact. Discover practical strategies, real-world use cases, and how to embed intelligence where decisions are made with speed, trust, and scale. Whether you’re driving digital … Continue reading Artha Solutions to host Qlik AI Reality Tour 2025 – Bengaluru

Recent Posts