PREVENT LLM HALLUCINATIONS
Semantic DocPrep prepares your content for AI
Preserve the semantic context of your documents for more reliable and accurate results.
Prevent LLM hallucinations
LLMs struggle to understand PDFs that contain images, tables, charts, and graphs. Semantic DocPrep – document preprocessing for RAG – turns complex PDFs into clear, structured, machine-readable XML for accurate retrieval in RAG workflows.
->

# FINANCIAL HIGHLIGHTS
## Revenue
(EUR millions)

2022
2023
## Profit from recurring operations
(EUR millions)

2022
2023
| Change in revenue by business group <br> (EUR millions and percentages) | 2024 | 2023 | 2024/2023 Change | | 2022 |
| :--: | :--: | :--: | :--: | :--: | :--: |
| | | | Published | Organic (a) | |
| Wines and Spirits | 5,862 | 6,602 | $-11 \%$ | $-8 \%$ | 7,099 |
| Fashion and Leather Goods | 41,060 | 42,169 | $-3 \%$ | $-1 \%$ | 38,648 |
| Perfumes and Cosmetics | 8,418 | 8,271 | $2 \%$ | $4 \%$ | 7,722 |
| Watches and Jewelry | 10,577 | 10,902 | $-3 \%$ | $-2 \%$ | 10,581 |
| Selective Retailing | 18,262 | 17,885 | $2 \%$ | $6 \%$ | 14,852 |
| Other activities and eliminations | 504 | 324 | - | - | 281 |
| Total | 84,683 | 86,153 | $-2 \%$ | 1\% | 79,184 |
(a) On a constant consolidation scope and currency basis. The net impact of exchange rate fluctuations on Group revenue was -2\% and the net impact of changes in the scope of consolidation was $-1 \%$. The principles used to determine the net impact of exchange rate fluctuations on the revenue of entities reporting in foreign currencies and
FINANCIAL HIGHLIGHTS
Revenue
Change in revenue by business group
2024
2023
2024/2023 Change
2022
(EUR millions)
(EUR millions and percentage)
Published
Organic
(a)
86,153 84,683
Wines and Spirits
5,862
6,602
-11%
-8%
7,099
79,184
Fashion and Leather Goods
41,060
42,169
-3%
-1%
38,648
Perfumes and Cosmetics
8,418
8,271
2%
4%
7,722
Watches and Jewelry
10,577
What is Semantic DocPrep?
Semantic DocPrep is a secure, scalable, and cloud-based API service, available in the Vertesia platform. With a free trial and flexible pricing, it’s easy to convert even the most complex PDF into a LLM-friendly format.
Intrinsic content referencing
Table normalization
Structured content extraction
Hierarchy preservation
Explicit tagging
Content filtering
Stateful reprocessing
Extensible Markup Language (XML) output
Enterprise-grade service
Work with a curated set of pre-qualified GenAI models optimized for each element of document preparation. Deploy on AWS, Google Cloud, Azure, or any private cloud infrastructure.
No hardware investment
No model management
Skip the complexity of running and tuning your own GenAI models
Automated failover
Intelligently switch between models for maximum reliability.
Flexible endpoints for custom content structuring
Access our range of API endpoints to transform and organize your content exactly how you need it – from structured XML and Markdown to custom document formats.
/analyze
Trigger content analysis for an object.
/status
Get the status of a previously requested analysis.
/results
Retrieve the result of an analysis once it is completed.
/xml
Fetch the object's corresponding XML once the analysis is completed.
/tables
Fetch the object's table content once the analysis is completed.
/images
Retrieve information about the images that are embedded in a PDF.
/annotated
Get a rendition of the PDF annotated with block outlines and IDs.
/adapt_tables
Identify relevant tables and map columns to transform the format.
/adapt_tables/:runId
Retrieve the adapted tables when processing is complete.
Only pay for what you use
Starting at around 35% cheaper than AWS Textract, you’ll spend less, get great performance, and accurate results.
PDF Pages
Pricing is per page.
- 1—1,000 = FREE
- 1,001—1,000,000 = $0.001
- 1,000,001+ = $0.0008
Embedded images and tables are priced separately.
Images
Pricing is per page with one or more images
- 1—1,000 = FREE
- 1,001—1,000,000 = $0.002
- 1,000,001+ = $0.0016
Pricing example:
One page has two images: $0.001 + $0.002 = $0.003
Tables
Pricing is per page with one or more tables
- 1—1,000 = FREE
- 1,001—1,000,000 = $0.003
- 1,000,001+ = $0.0024
Pricing example:
One page has three tables: $0.001 + $0.003 = $0.004
Pricing example:
A 14 page PDF has 3 pages with multiple images and 2 pages with multiple tables:
$0.001 x 14 = $0.014
$0.002 x 3 = $0.006
$0.003 x 2 = $0.006
Total price to process the complete PDF = $0.026
For more information on converting PDF to XML, check out the 'Getting Started' documentation

