DOCUMENT PREPARATION FOR GENAI
Semantic DocPrep
Vertesia's agentic API service converts complex documents to XML for Retrieval-Augmented Generation (RAG)
SEMANTIC DOCUMENT PREPARATION
Prevent LLM hallucinations with semantic document preparation
Large language models (LLMs) often struggle to understand PDFs and other complex documents that contain images, tables, charts, graphs, and other difficult-to-process elements
This is why we developed a revolutionary approach for document preparation which converts complex documents into richly structured XML and preserves the exact original text while adding semantic understanding that makes document content truly machine-readable
->
# FINANCIAL HIGHLIGHTS
## Revenue
(EUR millions)

2022
2023
## Profit from recurring operations
(EUR millions)

2022
2023
| Change in revenue by business group <br> (EUR millions and percentages) | 2024 | 2023 | 2024/2023 Change | | 2022 |
| :--: | :--: | :--: | :--: | :--: | :--: |
| | | | Published | Organic (a) | |
| Wines and Spirits | 5,862 | 6,602 | $-11 \%$ | $-8 \%$ | 7,099 |
| Fashion and Leather Goods | 41,060 | 42,169 | $-3 \%$ | $-1 \%$ | 38,648 |
| Perfumes and Cosmetics | 8,418 | 8,271 | $2 \%$ | $4 \%$ | 7,722 |
| Watches and Jewelry | 10,577 | 10,902 | $-3 \%$ | $-2 \%$ | 10,581 |
| Selective Retailing | 18,262 | 17,885 | $2 \%$ | $6 \%$ | 14,852 |
| Other activities and eliminations | 504 | 324 | - | - | 281 |
| Total | 84,683 | 86,153 | $-2 \%$ | 1\% | 79,184 |
(a) On a constant consolidation scope and currency basis. The net impact of exchange rate fluctuations on Group revenue was -2\% and the net impact of changes in the scope of consolidation was $-1 \%$. The principles used to determine the net impact of exchange rate fluctuations on the revenue of entities reporting in foreign currencies and
FINANCIAL HIGHLIGHTS
Revenue
Change in revenue by business group
2024
2023
2024/2023 Change
2022
(EUR millions)
(EUR millions and percentage)
Published
Organic
(a)
86,153 84,683
Wines and Spirits
5,862
6,602
-11%
-8%
7,099
79,184
Fashion and Leather Goods
41,060
42,169
-3%
-1%
38,648
Perfumes and Cosmetics
8,418
8,271
2%
4%
7,722
Watches and Jewelry
10,577
AGENTIC PREPROCESSING FOR RAG
Document Preparation as a Service
Available as a scalable and secure Cloud service, Vertesia’s patent-pending Semantic DocPrep enables users to rapidly convert PDFs and other complex documents to a machine-readable XML format. Our agentic service leverages a number of different LLMs to efficiently and accurately process complex documents.
Intrinsic content referencing
Table normalization
Structured content extraction
Hierarchy preservation
Explicit tagging
Content filtering
Stateful reprocessing
XML output
We do not allow models to rewrite or alter the original text of the document
Our approach eliminates GenAI hallucinations which is particularly valuable in regulated industries where unintended rewrites can have significant consequences
Extensive API endpoints
Our API endpoints expose our service to transform content into XML files
/analyze
Trigger content analysis for an object
/status
Get the status of a previously requested analysis
/results
Retrieve the result of an analysis once it is completed. The response will contain the XML conversion of the object.
/xml
Fetch the object's corresponding XML string once the analysis is completed
/tables
Fetch the object's table content once the analysis is completed
/images
Retrieve information about the images that are embedded in a PDF
/annotated
Get a rendition of the PDF annotated with block outlines and IDs
/adapt_tables
Transform tables within a PDF to the format of your choice. The service will identify relevant tables and map columns to the requested format.
/adapt_tables/:runId
Retrieve the adapted tables when processing is complete
Flexible, enterprise-grade service
Deploy on AWS, Google Cloud, Azure, or any private cloud infrastructure
No hardware investment
No model management
Skip the complexity of running and tuning your own GenAI models
Automated failover
System intelligently switches between models for maximum reliability
Our production-ready service offers a curated set of pre-qualified GenAI models optimized for each element of document preparation
Get started today
Our pricing is designed to grow with you as you scale your use of our Cloud service. Starting at around 65% of the price of AWS Textract, you’ll get greater performance and more accurate, more usable results as well as cost savings.
Pricing example:
A 14 page PDF has 3 pages with multiple images and 2 pages with multiple tables:
$0.001 x 14 = $0.014
$0.002 x 3 = $0.006
$0.003 x 2 = $0.006
Total price to process the complete PDF = $0.026