Achieve better performance with an efficient lookup input option in Talend Spark Streaming

Description

Talend provides two options to deal with lookup in Spark streaming Jobs: a simple input component (for example: tMongoDBInput) or a lookup input component (tMongoDBLookupInput). Using a lookup input component will provide heavy uplifting in performance and code optimization for any Spark streaming Job. 

Instead of looking up the entire data from the lookup component, Talend provides a unique option for streaming Jobs: to query a smaller chunk of input data for lookup, thereby saving an enormous amount of time and building highly performant Jobs.

By Definition

Lookup components like tMongoDBLookupInputtJDBCLookupInput, and others provided by Talend execute a database query with a strictly defined order that must correspond to the schema definition.

It passes on the extracted data to tMap in order to provide the lookup data to the main flow. It must be directly connected to a tMap component, and requires this tMap to use Reload at each row or Reload at each row (cache) for the lookup flow.

The tricky part here is to understand the usage of the Reload at each row functionality of the Talend tMap component, and how it can be integrated with the lookup component.

Example

Below is an example of how we have used a tJDBCLookupInput component with tMap in a Talend Spark Streaming Job.

 

  1. At the tMap level, make sure the tMap for the lookup is set up with Reload at each row, and an expression for globalMap Key is defined as well.
  2. At the lookup input component level, make sure our Query option is set up to query the globalMap Key (where condition extract.consumer_id) is defined in tMap as shown below. This is key for making sure the lookup component only fetches the data needed for processing at that point in time.

Summary

As we have seen, these minute changes in our Streaming Jobs can make our ETL Jobs more effective and performant. As there will always be multiple implementations of a Talend ETL Job, the ability to understand the nuances in making them more efficient is an integral part of being a data engineer.

For more information, reach out to us at: solutions@thinkartha.com[/vc_column_text][vc_column_text css=”.vc_custom_1596545053063{padding-top: 30px !important;padding-bottom: 30px !important;}”]Author: Siddartha Rao Chennur

This article also published on Talend Community:
Source: https://community.talend.com/s/article/Achieve-better-performance-with-an-efficient-lookup-input-option-in-Talend-Spark-Streaming

Recent Events

Artha Solutions & Qlik as Associate Partners for ET BFSI CXO Conclave 2025, Mumbai

Thursday, 21 Aug 9:00 am
Hotel Sofitel, BKC, Mumbai

Artha Solutions and Qlik are proud to join as Associate Partners at the ET BFSI CXO Conclave 2025, taking place on 21 August at Hotel Sofitel, BKC, Mumbai.

Accelerating Migration from Informatica with B’etl

flag-usa Wednesday, 10 Sep 2:00 pm

Informatica PowerCenter 10.5 is reaching end-of-support. Join this exclusive webinar to learn how enterprises are migrating to Qlik Talend up to 70% faster using B’etl™, an automation-driven ETL migration accelerator. Watch a live job conversion demo and hear success stories from data leaders who saved time, cost, and effort while future-proofing their data stack.

Register Now

Future Bank India Conclave 2025: Data, Compliance & AI for Banks

flag-india Tuesday, 12 Aug 9:00 am
Sterling's Mac Hotel, Bengaluru, India

Explore how Indian banks can achieve compliance, data maturity, and AI readiness to lead the digital future. This session highlights key trends, regulatory imperatives, and strategies to build secure, agile, and customer-centric banking ecosystems.

Recent Posts