Achieve better performance with an efficient lookup input option in Talend Spark Streaming

Description

Talend provides two options to deal with lookup in Spark streaming Jobs: a simple input component (for example: tMongoDBInput) or a lookup input component (tMongoDBLookupInput). Using a lookup input component will provide heavy uplifting in performance and code optimization for any Spark streaming Job.

Instead of looking up the entire data from the lookup component, Talend provides a unique option for streaming Jobs: to query a smaller chunk of input data for lookup, thereby saving an enormous amount of time and building highly performant Jobs.

By Definition

Lookup components like tMongoDBLookupInput, tJDBCLookupInput, and others provided by Talend execute a database query with a strictly defined order that must correspond to the schema definition.

It passes on the extracted data to tMap in order to provide the lookup data to the main flow. It must be directly connected to a tMap component, and requires this tMap to use Reload at each row or Reload at each row (cache) for the lookup flow.

The tricky part here is to understand the usage of the Reload at each row functionality of the Talend tMap component, and how it can be integrated with the lookup component.

Example

Below is an example of how we have used a tJDBCLookupInput component with tMap in a Talend Spark Streaming Job.

At the tMap level, make sure the tMap for the lookup is set up with Reload at each row, and an expression for globalMap Key is defined as well.
At the lookup input component level, make sure our Query option is set up to query the globalMap Key (where condition extract.consumer_id) is defined in tMap as shown below. This is key for making sure the lookup component only fetches the data needed for processing at that point in time.

Summary

As we have seen, these minute changes in our Streaming Jobs can make our ETL Jobs more effective and performant. As there will always be multiple implementations of a Talend ETL Job, the ability to understand the nuances in making them more efficient is an integral part of being a data engineer.

For more information, reach out to us at: solutions@thinkartha.com[/vc_column_text][vc_column_text css=”.vc_custom_1596545053063{padding-top: 30px !important;padding-bottom: 30px !important;}”]Author: Siddartha Rao Chennur

This article also published on Talend Community:
Source: https://community.talend.com/s/article/Achieve-better-performance-with-an-efficient-lookup-input-option-in-Talend-Spark-Streaming

Recent Events

Artha Solutions to host Qlik AI Reality Tour 2025 – Bengaluru

Thursday, 27 Nov 4:00 pm

Artha Solutions is proud to host the Qlik AI Reality Tour 2025 – Bengaluru, an exclusive event designed to help you turn data, analytics, and AI initiatives into real business impact. Discover practical strategies, real-world use cases, and how to embed intelligence where decisions are made with speed, trust, and scale. Whether you’re driving digital … <a href="https://www.thinkartha.com/events/artha-solutions-to-host-qlik-ai-reality-tour-2025-bengaluru/" class="more-link">Continue reading <span class="screen-reader-text">Artha Solutions to host Qlik AI Reality Tour 2025 – Bengaluru</span></a>

Accelerating Migration from Informatica with B’etl

Wednesday, 24 Sep 2:00 pm

Informatica PowerCenter 10.5 is reaching end-of-support. Join this exclusive webinar to learn how enterprises are migrating to Qlik Talend up to 70% faster using B’etl™, an automation-driven ETL migration accelerator. Watch a live job conversion demo and hear success stories from data leaders who saved time, cost, and effort while future-proofing their data stack.

Achieve better performance with an efficient lookup input option in Talend Spark Streaming

Description

By Definition

Example

Summary

Recent Events

Artha Solutions to host Qlik AI Reality Tour 2025 – Bengaluru

Thursday, 27 Nov 4:00 pm

Artha Solutions & Qlik as Associate Partners for ET BFSI CXO Conclave 2025, Mumbai

Thursday, 21 Aug 9:00 am
Hotel Sofitel, BKC, Mumbai

Accelerating Migration from Informatica with B’etl

Wednesday, 24 Sep 2:00 pm

Recent Posts

Streaming Data Processing: Real-Time Insights for Retail

Master Data Management in Manufacturing: Powering AI, SAP, and PLM Integration

Cross-System Patient Data Sharing: Breaking Down the Real Data Barriers

Achieve better performance with an efficient lookup input option in Talend Spark Streaming

Description

By Definition

Example

Summary

Recent Events

Artha Solutions to host Qlik AI Reality Tour 2025 – Bengaluru

Thursday, 27 Nov 4:00 pm

Artha Solutions & Qlik as Associate Partners for ET BFSI CXO Conclave 2025, Mumbai

Thursday, 21 Aug 9:00 amHotel Sofitel, BKC, Mumbai

Accelerating Migration from Informatica with B’etl

Wednesday, 24 Sep 2:00 pm

Recent Posts

Streaming Data Processing: Real-Time Insights for Retail

Master Data Management in Manufacturing: Powering AI, SAP, and PLM Integration

Cross-System Patient Data Sharing: Breaking Down the Real Data Barriers

Thursday, 21 Aug 9:00 am
Hotel Sofitel, BKC, Mumbai