AI Pre-labeling Example Display of MindFlow SEED Platform




AI Pre-labeling Example Display of MindFlow SEED Platform

AI Pre-labeling Example Display of MindFlow SEED Platform.

At present, the main way to achieve Artificial Intelligence (AI) is machine learning, especially in-depth learning.

The core value of machine learning is to analyze the data already known through specific algorithms and identify possibilities hidden in the data, thereby independently making predictions and decisions or assisting users to do so. 

The prerequisite for realizing the value of machine learning is that there should be a large number of analyzable structured datasets for the use of training, modifying and improving the algorithm model. After the model gets refined, it would be used to utilize the true value of enterprise data. Therefore, providing high-quality basic data services for algorithm training and optimization concerning machine learning has become one of the hot spots in the artificial intelligence industry.

At this stage, the basic data service industry of artificial intelligence mainly depends on human labor. Basic data annotators manually provide a wide variety of high-quality structured datasets for the training of AI algorithm models. 

However, with the rapid expansion of AI application in the scenario-based direction, this traditional way of relying on human resources exposes many problems in the efficiency of execution and output quality. Improving the human-machine collaboration ability in the industry and attention to the benefits of AI to the basic data service industry is the key to increasing the data productivity of the industry in the next stage.

By introducing AI screening in different data processing stages, such as collecting stage, pre-labeling in labeling stage, and AI quality inspection in auditing quality stage, the overall efficiency of business execution can be effectively improved, and the absolute influence of human resources in business execution can be weakened, thus indirectly improving data quality.

We take the MindFlow SEED data service platform as an example to show the various AI pre-labeling functions it provides in the labeling process.

1. Boxed Object Detection

Box labeling is one of the most common labeling types in the field of data annotation. It can be subdivided into two ways: 2D box selection and 3D cubic box selection.

It is often used in specific scenarios such as auto-driving, new retailing, AI education, etc, mainly targeting automobiles, human bodies, etc. in images.

In the traditional execution mode, the box selection operation is all completed manually, which requires high proficiency and image understanding ability of annotators.

Manual Labeling Effect

The SEED data service platform sets different AI-assisted auto-boxing functions for different segmented scenarios. The platform algorithm automatically completes object detection, realizes one-click auto-edge, and doubles the labeling efficiency:

SEED Platform Auto-Boxed Object Detection

2.Polygon Image Segmentation

Polygon labeling can be divided into single object polygon labeling and semantic segmentation of panoramic images labeling according to the number of labeled objects. Semantic segmentation of panoramic is widely used in auto-driving, unmanned aerial vehicle and other fields.

In traditional execution, annotators need to manually delineate the boundary of the object, which is time-consuming and difficult to achieve pixel-level precision:

SEED data service platform provides complete high-precision image segmentation assistant functions, which can achieve pixel-level automatic edge-fitting and improve labeling efficiency by more than 10 times:

SEED Platform Automatic Image Segmentation

3.OCR Automatic Recognition Transcription

OCR transcription is to convert the text content of an image into tagged text information for the training and invocation of image text recognition algorithms.

In traditional execution, annotators need to spell out the text in the image manually:

Advanced OCR automatic recognition and transcription function provided by SEED data service platform can realize automatic recognition and transcription, and fully free annotators’ hands:

SEED Platform OCR Automatic Recognition Transcription

4. 3D Point Cloud Object Detection

3D laser point cloud data can provide accurate three-dimensional images for auto-driving, and is one of the common data types used in the auto-driving environment awareness and decision-making planning module.

Unlike 2D images, laser point cloud data is a three-dimensional image, of which X-axis, Y-axis and Z-axis boundaries need to be labeled separately, which highly requires 3-dimensional spatial imagination of annotators.

In traditional execution mode, the annotator needs to label the X-axis, Y-axis, and Z-axis boundaries in turn:

The function of 3D object detection and auto-fitting provided by SEED data service platform can achieve auto-fitting in 3D space, weaken the direct impact of annotator's personal ability on labeling results, and improve the overall labeling efficiency:

3D Point Cloud Object Detection on SEED Platform

In addition to the AI pre-labeling functions of the above examples, the MindFlow SEED data service platform also provides AI-assisted labeling functions such as ASR automatic voice transcription, structured text detection, 3D point cloud automatic segmentation, etc. It fully covers different data types such as image, voice, text, point cloud, and gives full play to the unique advantages of human-computer collaboration in improving efficiency and data quality.

However, it also needs to be noted that AI pre-labeling currently plays a ancillary role rather than fully replacing human beings for all data labeling operations. Still, as a beneficial attempt in the basic data service industry, the algorithm preprocessing technology will have wide application in the future, and will even become a distinct competitive edge in the fine management of AI basic data service industry.