Skip to content
Green background with a white heart and an electrocardiogram line, next to the text “3 Good Health and Well-Being.” Highlights the importance of ensuring healthy lives and well-being
Orange background with three connected white building blocks and the text “9 Industry, Innovation and Infrastructure.” Represents the development of resilient infrastructure and sustainable industrialization.
Orange background with white icons of various buildings, including a house and skyscrapers, and the text “11 Sustainable Cities and Communities.” Represents the goal to make cities inclusive, safe, resilient, and sustainable.

KTU Researchers Develop a Model That Improves Machine Understanding of the Real World

Important | 2026-03-26

What if technology, such as self-driving cars, drones, or intelligent navigation systems, could understand the world the way we do – not just seeing shapes, but recognising meaning? A person waiting at a crosswalk, a bicycle left on the pavement, or a dog running across a yard – for us, these distinctions are instant. For systems that rely on data, they have long been a challenge.

Today, that is beginning to change. One of the key technologies behind this is 3D point cloud analysis.

Rytis Maskeliūnas
Rytis Maskeliūnas

“Imagine taking millions of precise laser measurements of a physical space, like a street, a forest, or an entire city, and stitching them together to create a detailed three-dimensional map made up of individual points. This is known as a 3D point cloud. The technology used to analyse it focuses on helping computers understand the shapes of objects in the map and interpret their context within the scene,” explains Kaunas University of Technology (KTU) professor Rytis Maskeliūnas.

From Detecting Pedestrians to Mapping Entire Cities

Although most people rarely think about it, early forms of this technology are already embedded in everyday life. “An average person regularly encounters the underlying 3D data and technologies similar to those described in our work without even realising it,” notes KTU researcher Dr Sarmad Maqsood.

Modern vehicles rely on such systems to enable functions like automatic emergency braking or adaptive cruise control, distinguishing between pedestrians, vehicles, and road boundaries. However, reliability remains a challenge in complex or low-visibility conditions.

3D point cloud data is also increasingly used to construct detailed digital models of cities. These models support urban planning, infrastructure monitoring, and environmental analysis, forming the basis for so-called digital twins – virtual representations of real-world environments that can be continuously updated and used to monitor changes over time.

Yet, according to Maqsood, this understanding does not come easily. “Computers face significant difficulties in analysing 3D point clouds primarily because this data type is inherently irregular, unstructured, and massive,” he explains. The researcher notes that the data is uneven – dense for nearby objects and sparse for distant ones – while important elements such as pedestrians may appear far less frequently than dominant classes like roads or buildings.

Sarmad Maqsood
Sarmad Maqsood

These challenges are not only technical, but also practical. Processing millions of data points in real time requires significant computational power, while ensuring accuracy remains critical in safety-sensitive applications. Noise, occlusions, and the need to balance speed with precision further complicate reliable 3D analysis.

To address these challenges, KTU researchers have developed a new model that combines multiple ways of analysing 3D data into a single, more effective system. Instead of focusing only on local details or global structure, it integrates both perspectives simultaneously, allowing machines to interpret complex environments more reliably. The model combines advanced transformer-based analysis, a method that captures relationships across the entire scene rather than isolated regions, with mechanisms that prioritise important but less frequent features, enabling it to better handle imbalanced data.

A Solution That Works Even When Data is Incomplete

“Imagine you have a massive, messy 3D puzzle made of millions of points that needs to be sorted into meaningful objects like roads, trees, and pedestrians. Our model acts like a highly intelligent and efficient puzzle-solver,” says KTU scientist Maskeliūnas. By analysing relationships across the entire scene while also emphasising less frequent but important features, the system improves the detection of small or partially visible objects that earlier approaches might miss.

This becomes particularly important in real-world situations. For example, an autonomous vehicle approaching an intersection at dusk may only detect a few data points from a partially obscured pedestrian. “Instead of missing this information, the model interprets it in context – relating sparse signals to surrounding elements such as a pole or a crosswalk – and identifies the presence of a person even when the data is incomplete. This ability to interpret context from limited information could significantly improve safety in autonomous systems,” shares Maskeliūnas.

The model also achieves strong performance in terms of efficiency, processing complex scenes in just over two seconds per frame while maintaining high accuracy. “Beyond segmentation accuracy, a key achievement is the demonstration of an efficient, unified pipeline,” adds Maqsood, noting that the system integrates compression and transmission without losing essential detail, allowing large-scale 3D data to be processed and transmitted efficiently in near real time.

Looking ahead, the potential applications extend far beyond today’s use cases. From delivery drones navigating unpredictable environments to robots operating in search-and-rescue missions, reliable 3D understanding is becoming increasingly important. Even less obvious fields could benefit – such as archaeology, where sparse data must be reconstructed into meaningful structures, or forensic science, where subtle spatial details can be critical. It could also support advanced augmented reality applications, where digital content is seamlessly integrated into complex physical environments.

At a broader level, these advancements could fundamentally reshape how our environments are understood and managed. What once seemed like science fiction is steadily becoming reality – machines are not only learning to see the world, but to understand it.

Article Hybrid attention-based PTv3-SE model for efficient point cloud segmentation can be found here.