{"id":30900,"date":"2025-08-29T12:52:20","date_gmt":"2025-08-29T12:52:20","guid":{"rendered":"https:\/\/parashift.io\/?p=27877"},"modified":"2025-11-30T22:47:20","modified_gmt":"2025-11-30T22:47:20","slug":"generative-vs-discriminative-ai-how-parashift-makes-document-automation-smarter","status":"publish","type":"post","link":"https:\/\/parashift.ai\/en\/generative-vs-discriminative-ai-how-parashift-makes-document-automation-smarter\/","title":{"rendered":"Generative vs. Discriminative AI: How Parashift makes Document Automation smarter"},"content":{"rendered":"\n<p><em>This article was written by our ML team.<\/em><\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-5b317c8050270fd6f28a2f4d5ea164df\">This is the beginning of a multi part blog mini-series where we shine some light onto how we use ML \/ AI at <a href=\"https:\/\/parashift.ai\/why-parashift-idp-is-different\/\">Parashift<\/a> for solving intelligent document understanding problems.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-7c8e3d43f04754e901609ebaf3a699d3\">The first two parts are a short introduction into general key topics and concepts in terms of AI and not specifically focused on Parashift. We start by discussing the difference between <strong>discriminative<\/strong> and <strong>generative models<\/strong>. In the third part we will talk about our current efforts related to&nbsp;<strong>grounding LLM answers in documents<\/strong>. The term&nbsp;grounding&nbsp;in this context means the ability to link the answer of an LLM to explicit tokens in the document. The fourth part touches on&nbsp;more use cases&nbsp;of both methods in our product and finally the last part is as an&nbsp;outlook&nbsp;into where we want to go next.<\/p>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 49px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<h4 class=\"wp-block-heading has-black-color has-text-color has-link-color wp-elements-44d64ed967ec18414ba6ec7259b61f70\" id=\"h-discriminative-models-vs-generative-ai\">Discriminative Models vs Generative AI<\/h4>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-7246cff5ab1922c86d61ddf02a248757\">First we examine two different types of AI models and start with an introduction to highlight conceptual differences. As we will see those differences influence how and where we want to use those models in practice.<\/p>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 36px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<h5 class=\"wp-block-heading has-black-color has-text-color has-link-color wp-elements-f04a1ff3278b1d9d4e05a427e045e108\" id=\"h-discriminative-models\">Discriminative Models<\/h5>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-88f38f3069ea73b471125664dd93d314\">These models learn how to discriminate \/ differentiate between &#8220;things&#8221;. Think of them as &#8216;labeling machines&#8217; to which you show a thing and the machine will assign a specific label and a number to it. The label(s) determine what that thing was classified as and the number reflects the confidence in the label. A common example is an image classifier that identifies animals. If this model is trained on, say, 100 different animal species all it can do is look at images and assign the best matching label. The model can not come up with a new label but is restricted (and biased) by the labels it was exposed to during training. Such models are also referred to as classifiers.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-15d5683b889d56b79d98e943d234d655\">Many types of IDP tasks can be formulated in a way that a discriminative models can be trained to solve them. The approach always boils down to assigning a label to something. Use cases differ in the kind of label is assignd and to what &#8220;thing&#8221; they are assigned to. In the following examples we continuously increase the granularity of those label assignments and see how this solves different tasks.<\/p>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 29px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<h6 class=\"wp-block-heading has-black-color has-text-color has-link-color wp-elements-9f1ceaff900ee04b26a9b81503af929e\" id=\"h-document-classification-one-label-per-document\"><strong>Document Classification (one label per document)<\/strong><\/h6>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-73bc3ead2db388a086848e60e9ed38bf\">The most straight forward use case where during training the model is fed documents (and labels) of different types to learn how to distinguish them. Here the&nbsp;<code>labels<\/code>&nbsp;are all the document-types we need to distinguish (i.e. &#8216;invoice&#8217;, &#8216;passport&#8217;, &#8216;gym membership card&#8217; etc.). Despite the task being well defined and straight forward there is quite a lot of creative freedom on how the model is trained to perform the task. That could be on visual features, on textual features or a combination of both, to name a few design decisions.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/parashift.ai\/wp-content\/uploads\/2025\/08\/pdc-classification-1024x580.png\" alt=\"Document classification in the Parashift Platform\" class=\"wp-image-27878\"\/><\/figure>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 49px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<h6 class=\"wp-block-heading has-black-color has-text-color has-link-color wp-elements-c1a7511dc81f9286790e834ff7646b58\" id=\"h-page-sequence-separation-one-label-per-page\"><strong>Page Sequence Separation (one label per page)<\/strong><\/h6>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-453dbfc2b7ddd2fd75aec934e0995202\">Now we start from an ordered sequence of pages and the task is to evaluate which pages belong together to form an actual document. This is useful if one scans physical mail, resulting in one large PDF containing all documents. We opted for an approach to train a model that predicts two labels:&nbsp;this is a first page&nbsp;or&nbsp;this is a last page. Based on these predictions we can then infer where to split the page sequence into individual documents. See below a schematic example of a page sequence consisiting of 6 pages that results in 3 individual documents.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/parashift.ai\/wp-content\/uploads\/2025\/08\/Parashift_Page-Separation-1024x406.png\" alt=\"Page separation in the Parashift Platform\" class=\"wp-image-27880\"\/><\/figure>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 49px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<h6 class=\"wp-block-heading has-black-color has-text-color has-link-color wp-elements-4b804d5630e66aaec2469a979b3a1be9\" id=\"h-information-extraction-one-label-per-token\"><strong>Information Extraction (one label per token)<\/strong><\/h6>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-b45178acb04ab7d56d7d66edee0d3a65\">Now lets see how to apply discriminative models to extract information such as the&nbsp;sender address,&nbsp;payable amounts, or&nbsp;delivery dates&nbsp;etc. If we look at this problem through the lens of a classifier, we must build a system that assigns labels to only specific parts of the document. As an example it can assign the label&nbsp;company name&nbsp;to the document text &#8220;Parashift AG&#8221;. As an additional step further one can train a model to predict which of these tokens belong together (so called link prediction). So the model has to decide if &#8220;Parashift AG&#8221; is one entity or if it is two separate ones like: &#8220;Parashift&#8221; &amp; &#8220;AG&#8221;.<\/p>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 49px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<h6 class=\"wp-block-heading has-black-color has-text-color has-link-color wp-elements-706e2ca382bb988cd2507c09fb32d783\" id=\"h-embeddings-and-tokenization\"><strong>Embeddings and Tokenization<\/strong><\/h6>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-e4617aa067a1f7624a602b73d13a457c\">Now we know how to conceptually formulate those IDP tasks for a discriminative model. There are at least two more fundamental concepts to understand to get a clearer picture of the workings of such models:&nbsp;embeddings&nbsp;&amp;&nbsp;tokenization. Since the task boils down to &#8220;assign label to thing&#8221; we need a way to meaningfully represent the &#8220;things&#8221; for the models to distinguish them (where thing is a document, a page, or tokens of documents). This is where&nbsp;embeddings&nbsp;come into play.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-9320a3d265e786b132b67a71cb06fb9a\">&nbsp;<\/p>\n\n\n\n<p><strong>Embeddings<\/strong><\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-87a801e62bf69a701a543de97b26c45d\">AI models do not work with what we humans understand as&nbsp;images,&nbsp;audio,&nbsp;document,&nbsp;text&nbsp;or&nbsp;tokens&nbsp;directly, they only understand numbers. The act of transforming real world objects into &#8220;a list of numbers&#8221; is called embedding or vectorization. The important aspect is to figure out a way such that those numbers have meaning for the problem you attempt to solve.<\/p>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 31px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<p><strong>RGB as example<\/strong><\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-64194983cc6d1f3d8ebb2c12f0e930a0\">This section is not tied in any way to IDP, but just a generic example how we can encode information into a list of numbers. A nice visual example of using numbers to represent information is colors. On your screen the color of each pixel can be represented as three&nbsp;RGB-values&nbsp;(where R: red, G: green, B: blue). We can look at this as a type of embedding consisting of 3 numbers representing the&nbsp;[R, G, B]&nbsp;values. For our example here we decide that each number takes on values from 0..1 (pixels usually are represented with numbers from 0..255, but we make our lives easier with 0..1). &#8220;Red&#8221; is represented as&nbsp;[1, 0, 0], green as&nbsp;[0, 1, 0]&nbsp;and&nbsp;[0, 0, 1]&nbsp;corresponds to blue.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-f26f5adb323d2657c9ddb67165ff826b\">This enables us to express ANY color as a combination of these 3 numbers, for example&nbsp;yellow = red + green -&gt; [1, 1, 0]. We pick the RGB example only because it is easy to visualize for humans (as color). Modern LLMs like ChatGPT, Claude or Llama use embeddings that are significantly larger than our example, typically in the range of a few thousand numbers per embedding!<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis: 25%;\">\u00a0<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:50%\">\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/parashift.ai\/wp-content\/uploads\/2025\/08\/RGB_no-bg-1024x1024.png\" alt=\"\" class=\"wp-image-27911\"\/><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis: 25%;\">\u00a0<\/div>\n<\/div>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-05823edeb60a8aacdd8ebb156b107c8b\">With this in mind assume we build a token embedding model, and the model is only allowed to use 3 numbers for representing the meaning of a token (such that it is easy to visualize). We can see the output of the model below and that &#8216;addresses&#8217; have a reddish tone, numbers appear green and dates have a blue note to them. It is unclear how the model learned this, but it is pure observation at this point. Upon further inspection we find exceptions to that observation, namely both the street numbers and zip codes in addresses are orange, not green. What this means is that our model is smart enough to change the embedding of a token based on its contex (= the tokens surrounding it). This means the embeddings are&nbsp;dynamic. In contrast&nbsp;static embeddings are fixed lookups where for example the token&nbsp;9155&nbsp;will always be assigned the same embedding, no matter the context.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-74595b1eeffecb25483f8f19a89bc1bc\">We will see how at Parashift we leverage&nbsp;static&nbsp;and&nbsp;dynamic&nbsp;embeddings when it comes to grounding LLM answers in the actual document in the third part of this blog series. By&nbsp;grounding&nbsp;we mean &#8220;to identify which tokens in the document make up the answer&#8221; which allows to provide users with exact coordinates that belong to the answer (down to the token level, not just paragraph level). Static embeddings are limited in their expressivenes as they for example fail to distinguish homonyms (words that have the same spelling but different meaning based on context). An example would be the word&nbsp;lead&nbsp;in: &#8211; My action lead to a consequence &#8211; I don&#8217;t like lead in my drinking water.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/parashift.ai\/wp-content\/uploads\/2025\/08\/invoices-1024x695.png\" alt=\"AI Extraction in the Parashift Platform\" class=\"wp-image-27884\"\/><\/figure>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 49px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-46df6176dbab79bfb9ab3507b4d3f698\"><strong>Tokenization<\/strong><\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-45aade071a5d261ab1d56b730bc328fb\">Another crucial step for turning information extraction into a classification problem is to divide the document into smaller parts. This can be paragraphs, lines, words or tokens. Then we can train models to assign labels to those &#8216;parts&#8217;. The distinction between a word and a token is a bit blurry here but as a first approximation think of a token as &#8220;short words or sub-words&#8221;.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-1419f5667d019f3b52f3a3a08f059a7c\">Dividing a document or text into tokens is done by a tokenizer and it is important to note that there is no agreed upoon method on how to best split a text into tokens. Below is a visualization on how the&nbsp;GPT-4&nbsp;tokenizer splits up an example sentence. Tokens are visualized as colored rectangles. You can also go to this&nbsp;<strong><em><a href=\"https:\/\/huggingface.co\/spaces\/Xenova\/the-tokenizer-playground\" target=\"_blank\" rel=\"noopener\">interactive tool<\/a>&nbsp;<\/em><\/strong>and play with different tokenizers yourself. Once we have the document tokenized we can assign labels to them. One can see how assigning labels such as&nbsp;sender-name,&nbsp;sender-street,&nbsp;sender-house-number&nbsp;can finally serve for information extraction.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/parashift.ai\/wp-content\/uploads\/2025\/08\/giraffe-tokens.png\" alt=\"\" class=\"wp-image-27887\"\/><\/figure>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-cf08ed7b7cb7f34d191521b2c5029acd\">To close the loop back to information extraction we can look at the embeddings above one can hopefully imagine to train an other model that takes those embeddings and uses them to assign labels to tokens one is interested in. For this we train a separate model on different training examples to figure out the relationship between the embeddings and the task the model is actually required to solve. It is easy to see that with such an embedding model it would be easy to extract information about &#8220;addresses&#8221; or &#8220;dates&#8221;. If we are after information different from these two cases we are probably out of luck since the model cannot distinguish most of the other tokens from each other (they all have the same color).<\/p>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-5ac7b9145d6992e036c892efebb40bb0\">This highlights the need for training embeddings that are generic enough for solving a wide range of potential downstream tasks but at the same time capture enough information to make it easier on the following classification models.<\/p>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 49px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<h4 class=\"wp-block-heading has-black-color has-text-color has-link-color wp-elements-5098fa4c2f49163238a103f9598e146a\" id=\"h-summary-of-discriminative-models\">Summary of discriminative models<\/h4>\n\n\n\n<p class=\"has-black-color has-text-color has-link-color wp-elements-8dd305ff4253395b8de7fe8869544e2a\">These models come with a few pros and cons when using them to solve tasks in real world applications.<\/p>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 31px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<h6 class=\"wp-block-heading has-black-color has-text-color has-link-color wp-elements-fd79818155365de41f21263efb06f216\" id=\"h-the-pros\">The Pros<\/h6>\n\n\n\n<ul class=\"wp-block-list\">\n<li>By predicting a specific label for each token in the document, it is straight forward to pinpoint the corresponding tokens in the document that correspond to the extracted information.<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 16px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Every prediction comes with a &#8216;confidence score&#8217; which reflects how sure the model is for having found the correct value. By calibrating the confidences with actual benchmarks on any given task one can get a reliable measure of how trustworthy a model&#8217;s predictions actually are.<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 16px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Discriminative models are usually on the small side, especially when comparing them to the current LLM models. They typically range from a few million to a few hundreds of million parameters and can be trained in a few minutes to hours on a single GPU. This makes it feasible to train multiple specialized models tailored to the specific task.<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 16px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The small size of these models also comes with the advantage of &#8216;speed&#8217; where we measure speed as the time it takes for a model to generate an interpretable answer. Typical values here in terms of document understanding would be in the order of milliseconds for a handful of pages. So small documents can be processed &lt; 1 second.<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 16px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Privacy: By the very nature of those models it is not possible for these models to reveal sensitive training data between customers. The only thing that these models do is add a label on existing text \/ tokens in a document. So there is no way a model could directly reveal for example addresses that were in the training data.<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 16px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A lot of the &#8216;heavy lifting&#8217; can be done by the embedding models and then share the embeddings for multiple downstream tasks.<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 49px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<h6 class=\"wp-block-heading has-black-color has-text-color has-link-color wp-elements-55b79b7372ee00126c1acdcc124065ce\" id=\"h-the-cons\">The Cons<\/h6>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A fundamental limitation is that they cannot naturally generalize their answers to unseen problems, even if the underlying embedding model is powerful enough to generalize to more problems than we trained our downstream model on. For each new problem we have to either extend the training of an existing model or to train a new one specialized in the new task. Since the advent of LLMs and their astonishing generalization capabilities this approach seems almost antiquated. We would not want to train a new model from scratch if we now also wanted to extract the phone numbers of documents.<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 16px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Assigning labels is a fundamentally less general way of solving problems than the generative model approach. There are many use cases that simply cannot be tackled with purely discriminative models.<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 36px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-spacer\" style=\"height: 36px;\" aria-hidden=\"true\">\u00a0<\/div>\n\n\n\n<p>This is just the start of the conversation. If you\u2019d like to explore how Parashift can support your automation journey, don\u2019t hesitate to <strong><a href=\"https:\/\/parashift.ai\/en\/contact\/\"><em>contact us<\/em><\/a><\/strong>.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-16018d1d wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button is-style-outline is-style-outline--1\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/parashift.ai\/book-demo\/\">Try Parashift free<\/a><\/div>\n<\/div>\n\n\n\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Not all AI is created equal. The models behind the scenes decide whether your document automation saves you time\u2026 or creates new headaches. Our ML team wrote an article that breaks it down. <\/p>\n","protected":false},"author":5,"featured_media":27884,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[74],"tags":[],"class_list":["post-30900","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-parashift-platform"],"_links":{"self":[{"href":"https:\/\/parashift.ai\/en\/wp-json\/wp\/v2\/posts\/30900","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/parashift.ai\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/parashift.ai\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/parashift.ai\/en\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/parashift.ai\/en\/wp-json\/wp\/v2\/comments?post=30900"}],"version-history":[{"count":2,"href":"https:\/\/parashift.ai\/en\/wp-json\/wp\/v2\/posts\/30900\/revisions"}],"predecessor-version":[{"id":31337,"href":"https:\/\/parashift.ai\/en\/wp-json\/wp\/v2\/posts\/30900\/revisions\/31337"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/parashift.ai\/en\/wp-json\/wp\/v2\/elementor_library\/27884"}],"wp:attachment":[{"href":"https:\/\/parashift.ai\/en\/wp-json\/wp\/v2\/media?parent=30900"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/parashift.ai\/en\/wp-json\/wp\/v2\/categories?post=30900"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/parashift.ai\/en\/wp-json\/wp\/v2\/tags?post=30900"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}