DOCUMENT PREPARATION FOR GENAI

Semantic DocPrep

Vertesia's agentic API service converts complex documents to XML for Retrieval-Augmented Generation (RAG)

EXPLORE THE SERVICE PRICING

SEMANTIC DOCUMENT PREPARATION

Prevent LLM hallucinations with semantic document preparation

Large language models (LLMs) often struggle to understand PDFs and other complex documents that contain images, tables, charts, graphs, and other difficult-to-process elements

This is why we developed a revolutionary approach for document preparation which converts complex documents into richly structured XML and preserves the exact original text while adding semantic understanding that makes document content truly machine-readable

Vertesia-Semantic-LVMH-Results

# FINANCIAL HIGHLIGHTS

## Revenue

(EUR millions)
![img-0.jpeg](img-0.jpeg)

2022
2023

## Profit from recurring operations

(EUR millions)
![img-1.jpeg](img-1.jpeg)

2022
2023

| Change in revenue by business group <br> (EUR millions and percentages) | 2024 | 2023 | 2024/2023 Change | | 2022 |
| :--: | :--: | :--: | :--: | :--: | :--: |
| | | | Published | Organic (a) | |
| Wines and Spirits | 5,862 | 6,602 | $-11 \%$ | $-8 \%$ | 7,099 |
| Fashion and Leather Goods | 41,060 | 42,169 | $-3 \%$ | $-1 \%$ | 38,648 |
| Perfumes and Cosmetics | 8,418 | 8,271 | $2 \%$ | $4 \%$ | 7,722 |
| Watches and Jewelry | 10,577 | 10,902 | $-3 \%$ | $-2 \%$ | 10,581 |
| Selective Retailing | 18,262 | 17,885 | $2 \%$ | $6 \%$ | 14,852 |
| Other activities and eliminations | 504 | 324 | - | - | 281 |
| Total | 84,683 | 86,153 | $-2 \%$ | 1\% | 79,184 |

(a) On a constant consolidation scope and currency basis. The net impact of exchange rate fluctuations on Group revenue was -2\% and the net impact of changes in the scope of consolidation was $-1 \%$. The principles used to determine the net impact of exchange rate fluctuations on the revenue of entities reporting in foreign currencies and

FINANCIAL HIGHLIGHTS
Revenue
Change in revenue by business group
2024
2023
2024/2023 Change
2022
(EUR millions)
(EUR millions and percentage)
Published
Organic
(a)
86,153 84,683
Wines and Spirits
5,862
6,602
-11%
-8%
7,099
79,184
Fashion and Leather Goods
41,060
42,169
-3%
-1%
38,648
Perfumes and Cosmetics
8,418
8,271
2%
4%
7,722
Watches and Jewelry
10,577

Intrinsic content referencing

Eliminate AI hallucinations by never rewriting or altering the original text

Table normalization

Normalize tables into consistent formats utilizing a dedicated API interface

Structured content extraction

Perform targeted extraction of specific content types with full preservation of relationships

Hierarchy preservation

Create proper parent-child relationships between document elements

Explicit tagging

Assign explicit IDs to every tag to enable accurate downstream operations like insertions

Content filtering

Provide users fine-grained control over what is input to the LLM to enable more efficient processing

Stateful reprocessing

Reprocess failed runs with stateful, automatic retries and even dynamically failover to alternate models

XML output

Leverage eXtensible Markup Language (XML), a well-proven standard for transporting data that is readily understandable by LLMs

/analyze

Trigger content analysis for an object

/status

Get the status of a previously requested analysis

/results

Retrieve the result of an analysis once it is completed. The response will contain the XML conversion of the object.

/xml

Fetch the object's corresponding XML string once the analysis is completed

/tables

Fetch the object's table content once the analysis is completed

/images

Retrieve information about the images that are embedded in a PDF

/annotated

Get a rendition of the PDF annotated with block outlines and IDs

/adapt_tables

Transform tables within a PDF to the format of your choice. The service will identify relevant tables and map columns to the requested format.

/adapt_tables/:runId

Retrieve the adapted tables when processing is complete

No hardware investment

Eliminate the need to purchase, maintain, or scale specialized hardware

No model management

Skip the complexity of running and tuning your own GenAI models

Automated failover

System intelligently switches between models for maximum reliability

Multi-Model Support

Semantic DocPrep

Agentic RAG

Trust & Security

Blog

Events

Documentation

About Us

Our Partners

Careers

Contact Us

DOCUMENT PREPARATION FOR GENAI

Semantic DocPrep

Vertesia's agentic API service converts complex documents to XML for Retrieval-Augmented Generation (RAG)

SEMANTIC DOCUMENT PREPARATION

Prevent LLM hallucinations with semantic document preparation

This is why we developed a revolutionary approach for document preparation which converts complex documents into richly structured XML and preserves the exact original text while adding semantic understanding that makes document content truly machine-readable

Original document

AGENTIC PREPROCESSING FOR RAG

Document Preparation as a Service

Intrinsic content referencing

Table normalization

Structured content extraction

Hierarchy preservation

Explicit tagging

Content filtering

Stateful reprocessing

XML output

We do not allow models to rewrite or alter the original text of the document

Our approach eliminates GenAI hallucinations which is particularly valuable in regulated industries where unintended rewrites can have significant consequences

Extensive API endpoints

Our API endpoints expose our service to transform content into XML files

/analyze

/status

/results

/xml

/tables

/images

/annotated

/adapt_tables

/adapt_tables/:runId

Flexible, enterprise-grade service

Deploy on AWS, Google Cloud, Azure, or any private cloud infrastructure

No hardware investment

No model management

Automated failover

Our production-ready service offers a curated set of pre-qualified GenAI models optimized for each element of document preparation

Get started today

Our pricing is designed to grow with you as you scale your use of our Cloud service. Starting at around 65% of the price of AWS Textract, you’ll get greater performance and more accurate, more usable results as well as cost savings.

Pages

Images

Tables

Pricing example:

For more information, check out the 'Getting Started' documentation