Data Knowledge Engineering 70(2011)483-503 Contents lists available at Science Direct Data Knowledge Engineering ELSEVIER journalhomepagewww.elsevier.com/locate/datak Recommendation-based editor for business process modeling Agnes Koschmider a,, Thomas Hornung Andreas Oberweisa Institute of Applied Informatics and Formal Description Methods, Karlsruhe Institute of Technology, Germay b Institute of Computer Science, Albert-Ludwigs University, Freiburg. Germany ARTICLE INFO A BSTRACT proper and efficient modeling of business processes, it is editors adequately. With only minimal modeling support. uctivity of novice in revised form 7 February 2011 7 February 2011 process modelers may be low when starting process me n this article. we online 24 February 2011 present a theoretically sound and empirically validated recommendation-based modeling support system, which covers different aspects of business process modeling. We consider basic functionality, such as an intuitive search interface, as well as advanced concepts like patterns observed in other users'preferences. Additionally, we propose a multitude of Recommender system interaction possibilities with the recommendation system, e. g different metrics that can be used in isolation or an overall recommender component that combines several sub metrics into tanking function one comprehensive score. We validate a prototype implementation of the recommendation Process modeling support stem with exhaustive user experiments based on real-life process models To our knowledge, this is the only comprehensive recommendation system for business process modeling that is available o 2011 Elsevier B V. All rights reserved 1 Introduction Although most business process models nowadays are created with the help of graphic editors, the learning curve for experienced users is still very steep [61]. Pure awareness of the modeling language syntax is often insufficient. A profound orking knowledge of the user is required in order to efficiently and effectively apply a modeling language in practice. This is confirmed by [8 who posit that the main driver for successful process modeling is the users modeling expertise To increaseuser productivity most of the currently available modeling tools focus on providing a repository of graphical symbols and advanced visualization techniques. However, there is room for improvement, and a full-fledged modeling support system should focus on retaining high fidelity to the user's modeling intentions. 1. 1. Problem description Currently, business process modelers can choose between a variety of formal and semi-formal modeling languages and standards, e.g, 4, 50,69, for which there exists a multitude of different modeling tools. Usually, these tools provide a simple epository of graphical symbols, which represent the building blocks of the underlying modeling formalisms. However, during process modeling there is a lack of specific user support, i.e., no suggestions are provided by the system on how to finish appropriately an already started business process model. New support tools that assist the user at modeling time are required to mprove the quality of process models and to increase the productivity of the modeler One of the main problems of suggesting appropriate process models to the user is to detect her modeling intention. A similar problem is tackled by recommender systems. Here, user preferences and opinions from individual users are collected and E-mail addresses: agnes, koschmiderekitedu(A Koschmider). hornungteinformatik uni-freiburg de (t hornung), andreas. oberweisekitedu(A. oberweis) 69-023X/S-see front matter o 2011 Elsevier B.v. All rights reserve
Recommendation-based editor for business process modeling Agnes Koschmider a,⁎, Thomas Hornung b , Andreas Oberweis a a Institute of Applied Informatics and Formal Description Methods, Karlsruhe Institute of Technology, Germany b Institute of Computer Science, Albert-Ludwigs University, Freiburg, Germany article info abstract Article history: Received 7 January 2009 Received in revised form 7 February 2011 Accepted 7 February 2011 Available online 24 February 2011 To ensure proper and efficient modeling of business processes, it is important to support users of process editors adequately. With only minimal modeling support, the productivity of novice business process modelers may be low when starting process modeling. In this article, we present a theoretically sound and empirically validated recommendation-based modeling support system, which covers different aspects of business process modeling. We consider basic functionality, such as an intuitive search interface, as well as advanced concepts like patterns observed in other users' preferences. Additionally, we propose a multitude of interaction possibilities with the recommendation system, e.g., different metrics that can be used in isolation or an overall recommender component that combines several sub metrics into one comprehensive score. We validate a prototype implementation of the recommendation system with exhaustive user experiments based on real-life process models. To our knowledge, this is the only comprehensive recommendation system for business process modeling that is available. © 2011 Elsevier B.V. All rights reserved. Keywords: Recommender system Process model search Indexing Ranking function Process modeling support 1. Introduction Although most business process models nowadays are created with the help of graphic editors, the learning curve for inexperienced users is still very steep [61]. Pure awareness of the modeling language syntax is often insufficient. A profound working knowledge of the user is required in order to efficiently and effectively apply a modeling language in practice. This is confirmed by [8], who posit that the main driver for successful process modeling is the user's modeling expertise. To increase user productivity most of the currently available modeling tools focus on providing a repository of graphical symbols and advanced visualization techniques. However, there is room for improvement, and a full-fledged modeling support system should focus on retaining high fidelity to the user's modeling intentions. 1.1. Problem description Currently, business process modelers can choose between a variety of formal and semi-formal modeling languages and standards, e.g., [4,50,69], for which there exists a multitude of different modeling tools. Usually, these tools provide a simple repository of graphical symbols, which represent the building blocks of the underlying modeling formalisms. However, during process modeling there is a lack of specific user support, i.e., no suggestions are provided by the system on how to finish appropriately an already started business process model. New support tools that assist the user at modeling time are required to improve the quality of process models and to increase the productivity of the modeler. One of the main problems of suggesting appropriate process models to the user is to detect her modeling intention. A similar problem is tackled by recommender systems. Here, user preferences and opinions from individual users are collected and Data & Knowledge Engineering 70 (2011) 483–503 ⁎ Corresponding author. E-mail addresses: agnes.koschmider@kit.edu (A. Koschmider), hornungt@informatik.uni-freiburg.de (T. Hornung), andreas.oberweis@kit.edu (A. Oberweis). 0169-023X/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.datak.2011.02.002 Contents lists available at ScienceDirect Data & Knowledge Engineering j o u r n a l h om e p a g e : www. e l s ev i e r. c om / l o c a t e / d a t a k
484 A Koschmider et al. Data S Knowledge Engineering 70(2011 )483-503 aggregated. This information is then used to suggest appropriate items, e.g., books from a fixed collection to new users. For a recommendation system in the business process modeling domain, the following aspects are essential the user's modeling intentions, the modeling context, an the modeling history of a related community of users. The next section introduces a running example that is used in the remainder of the paper. Let the following scenario be given as Fig 1. The user wants to model a process describing the handling of order quests. Her intention is to model this prod a customer perspective( this means that technical details can be neglected ). a a query interface, she can search for nodel fragments concerning customer requests. The results of the q uery are displayed according to a ranking function, and she can then insert the desired process model fragment into the active workspace which best matches her modeling intention. Subsequently, if she is uncertain as to how to complete the process model she has two options: she can either search again via the query interface for fitting process model activities, or she can invoke the recommender component which automatically suggests appropriate process model parts for completing the model. Unlike the query component, the recommender component can only be invoked after she has already started modeling the business proces p ernests(among others)the CustomerOrder In our example, the user has opted for the recommender component, whicl process model for completion. If the user decides to insert this recommendation in her workspace, she can configure this process del by inserting or deleting elements. Finally, she can store the modified process model version in a process repository for future process model reuse 13. Contributions This paper describes an empirically validated business process modeling editor, which assists users twofold in purpose- oriented modeling of processes. First, the user can search via a query interface for business process models or process model parts (logically coherent groups of elements belonging together, e. g approval, billing or assembly ) The user can save time by reusing already existing process model parts. Second, we use an automatic tagging mechanism in order to unveil the modeling intention of a user at process modeling time and to better meet the users model requirements. We validated the support system with two experiments using real-life business processes modeled with Petri nets and a prototype implementation. For the validation, Petri nets were serialized to PNML 72]. The first experiment focused on the feasibility and usefulness of the modeling support system. This experiment confirmed that users are willing to follow up recommendations and prefer reusing elements rather than modeling elements from scratch. In the second experiment, the focus was on the evaluation of the benefits in using the recommendation system. This evaluation confirmed that the recommendation uery Interface modeler interaction Recommendations 0+ Process Repository modeler interaction Fig 1 User interaction scenario for finding an appropriate process model part
aggregated. This information is then used to suggest appropriate items, e.g., books from a fixed collection to new users. For a recommendation system in the business process modeling domain, the following aspects are essential: • the user's modeling intentions, • the modeling context, and • the modeling history of a related community of users. The next section introduces a running example that is used in the remainder of the paper. 1.2. Running example Let the following scenario be given as shown in Fig. 1. The user wants to model a process describing the handling of order requests. Her intention is to model this process from a customer perspective (this means that technical details can be neglected). Via a query interface, she can search for process model fragments concerning customer requests. The results of the query are displayed according to a ranking function, and she can then insert the desired process model fragment into the active workspace which best matches her modeling intention. Subsequently, if she is uncertain as to how to complete the process model she has two options: she can either search again via the query interface for fitting process model activities, or she can invoke the recommender component which automatically suggests appropriate process model parts for completing the model. Unlike the query component, the recommender component can only be invoked after she has already started modeling the business process. In our example, the user has opted for the recommender component, which suggests (among others) the CustomerOrder process model for completion. If the user decides to insert this recommendation in her workspace, she can configure this process model by inserting or deleting elements. Finally, she can store the modified process model version in a process repository for future process model reuse. 1.3. Contributions This paper describes an empirically validated business process modeling editor, which assists users twofold in purposeoriented modeling of processes. First, the user can search via a query interface for business process models or process model parts (logically coherent groups of elements belonging together, e.g., approval, billing or assembly). The user can save time by reusing already existing process model parts. Second, we use an automatic tagging mechanism in order to unveil the modeling intention of a user at process modeling time and to better meet the user's model requirements. We validated the support system with two experiments using real-life business processes modeled with Petri nets and a prototype implementation. For the validation, Petri nets were serialized to PNML [72]. The first experiment focused on the feasibility and usefulness of the modeling support system. This experiment confirmed that users are willing to follow up recommendations and prefer reusing elements rather than modeling elements from scratch. In the second experiment, the focus was on the evaluation of the benefits in using the recommendation system. This evaluation confirmed that the recommendation Fig. 1. User interaction scenario for finding an appropriate process model part. 484 A. Koschmider et al. / Data & Knowledge Engineering 70 (2011) 483–503
A Koschmider et al. Data 8 Knowledge Engineering 70(2011)483-503 is equally useful for different types of users and that the quality of the recommendations improves over time through user ack. The evaluation results highlight additional benefits of the modeling support tool: the tagging-based system increases the quality of the process models by highlighting the corresponding process model parts that violate the correctness criteria (e.g, structural deadlocks, which occur if an alternativ initiated by an OR-split is synchronized by an AND-join in a process model). the system overcomes the limitation of a controlled vocabulary for labeling process elements, and the process fragments are used with process vocabularies that might differ from the vocabulary used in the currently edited business process model The functionalities of the recommendation system have been implemented for Petri net-based process models. Howe generality of our approach makes it possible to apply the presented methods also for business processes modeled with The remainder of the paper is structured as follows. Section 2 presents a tagging algorithm and the creation of the proces repository index In Section 3 we describe two modes of the search interface. Additionally, we extend the search functionality in order to consider relevant process models in case they do not conform exactly to the specified query. The cumulative ranking unction and the complete recommendation algorithm are illustrated in Section 4. The theoretical underpinning for two empirical studies is presented in Section 5. The findings of the studies are reported in Section 6. Section 7 compares our approach with related work, and Section 8 concludes the paper with an outlook on future research. 2. Tagging of business process models Each business process model in the repository is associated with metadata, which is used for the recommendation and search functionality. As foundation, we use an Information Retrieval-based tagging approach over the process activity /state descriptions For more elaborate queries, we identify a set of relevant criteria which can be provided by users for each process model. The intuition is that new process models in the repository can be found based on the automatically acquired metadata(automatic tagging, cf. Section 2.1): over time, different users can refine the available information about a process, thus enabling higher-level queries(manual tagging, cf Section 2.2 ). In the remainder of this paper we use the term ' tag to identify a single word occurring in a metadata criteria. 2. 1. Automatic tagging The motivation of Information Retrieval is to be able to find an item(normally a text document) by providing the search engi with only few keywords that adequately capture the intended information need. For this, first the significant keywords or tags that describe the desired item need to be acquired. Additionally, it is valuable, if a rating can be imposed on these tags, e.g. for a text document this is usually done by counting the number of occurrences of each word in the text and assigning the word with the highest frequency the highest rating. This automatic indexing or tagging of documents is typically the basis for efficient retrieval by a search engine. While the assignment of tags to items is straightforward in the case of text documents just use the top-k words with the highest frequency after stop word removal). it is less obvious how to identify the salient concepts of business process models and convert them into appropriate tags for later retrieval In the following, we present our automatic tagging approach that is geared towards identifying the most descriptive tags for business process models. The tag extraction and scoring for business process models is inspired by the Term and Document Frequency measure, which is elatively efficient to compute(cf[23)). Each place and transition in a Petri net representation of a business process model is labeled with a description that specifies the purpose of each process activity or state, respectively. This means, we can regard each word of these descriptions as tag candidates for the business process model (or item, respectively). More generally, we associate with each business process model a tag characterization of the form a1-., an+Tn, where the a reflects the attributes of the process model that are searchable later on(cf. Section 3), Ti is the set of associated indexed tags, and n is the number of indexed attributes of the business process model For each business process model we index the place and transition descriptions plus additional metadata criteria, as described in the remainder of this section. This allows us to use standard Information Retrieval techniques(cf. 58)to build up an index over business process models. We rst remove common English words from the set of tag candidates because they appear so often in a typical natural language corpus that they do not convey any meaning specific to the business process. This phenomenon is often referred to as zipfs law, which states is assigned a tag score for this business process model based on a modified version of the tf-idf mem: o word removal, each keyword that the frequency of any word is inversely proportional to its rank in the frequency table[76].After sto TagScore(t:) TF(i) Spilt is a tag in pill Here, TF(ti) is the frequency of the tag t in transition or place labels, N is the total number of distinct tag candidates(after stop vord removal), IP denotes the total number of indexed business process models, and I(p lt is a tag in pill is the number of business term frequencyinverse document frequency
system is equally useful for different types of users and that the quality of the recommendations improves over time through user feedback. The evaluation results highlight additional benefits of the modeling support tool: • the tagging-based system increases the quality of the process models by highlighting the corresponding process model parts that violate the correctness criteria (e.g., structural deadlocks, which occur if an alternative flow initiated by an OR-split is synchronized by an AND-join in a process model), • the system overcomes the limitation of a controlled vocabulary for labeling process elements, and • the process fragments are used with process vocabularies that might differ from the vocabulary used in the currently edited business process model. The functionalities of the recommendation system have been implemented for Petri net-based process models. However, the generality of our approach makes it possible to apply the presented methods also for business processes modeled with other languages. The remainder of the paper is structured as follows. Section 2 presents a tagging algorithm and the creation of the process repository index. In Section 3 we describe two modes of the search interface. Additionally, we extend the search functionality in order to consider relevant process models in case they do not conform exactly to the specified query. The cumulative ranking function and the complete recommendation algorithm are illustrated in Section 4. The theoretical underpinning for two empirical studies is presented in Section 5. The findings of the studies are reported in Section 6. Section 7 compares our approach with related work, and Section 8 concludes the paper with an outlook on future research. 2. Tagging of business process models Each business process model in the repository is associated with metadata, which is used for the recommendation and search functionality. As foundation, we use an Information Retrieval-based tagging approach over the process activity/state descriptions. For more elaborate queries, we identify a set of relevant criteria which can be provided by users for each process model. The intuition is that new process models in the repository can be found based on the automatically acquired metadata (automatic tagging, cf. Section 2.1); over time, different users can refine the available information about a process, thus enabling higher-level queries (manual tagging, cf. Section 2.2). In the remainder of this paper we use the term ‘tag’ to identify a single word occurring in a metadata criteria. 2.1. Automatic tagging The motivation of Information Retrieval is to be able to find an item (normally a text document) by providing the search engine with only few keywords that adequately capture the intended information need. For this, first the significant keywords or tags that describe the desired item need to be acquired. Additionally, it is valuable, if a rating can be imposed on these tags, e.g. for a text document this is usually done by counting the number of occurrences of each word in the text and assigning the word with the highest frequency the highest rating. This automatic indexing or tagging of documents is typically the basis for efficient retrieval by a search engine. While the assignment of tags to items is straightforward in the case of text documents (just use the top-k words with the highest frequency after stop word removal), it is less obvious how to identify the salient concepts of business process models and convert them into appropriate tags for later retrieval. In the following, we present our automatic tagging approach that is geared towards identifying the most descriptive tags for business process models. The tag extraction and scoring for business process models is inspired by the Term and Document Frequency measure, which is relatively efficient to compute (cf. [23]). Each place and transition in a Petri net representation of a business process model is labeled with a description that specifies the purpose of each process activity or state, respectively. This means, we can regard each word of these descriptions as tag candidates for the business process model (or item, respectively). More generally, we associate with each business process model a tag characterization of the form [a1→T1,…, an→Tn], where the ai reflects the attributes of the process model that are searchable later on (cf. Section 3), Ti is the set of associated indexed tags, and n is the number of indexed attributes of the business process model. For each business process model we index the place and transition descriptions plus additional metadata criteria, as described in the remainder of this section. This allows us to use standard Information Retrieval techniques (cf. [58]) to build up an index over business process models. We first remove common English words from the set of tag candidates because they appear so often in a typical natural language corpus that they do not convey any meaning specific to the business process. This phenomenon is often referred to as Zipf's law, which states that the frequency of any word is inversely proportional to its rank in the frequency table [76]. After stop word removal, each keyword is assigned a tag score for this business process model based on a modified version of the tf*idf metric1 : TagScore ti ð Þ : = TF ti ð Þ ∑N j= 1tj × log jP j jfpj jti is a tag in pjgj !: Here, TF(ti) is the frequency of the tag ti in transition or place labels, N is the total number of distinct tag candidates (after stop word removal), |P| denotes the total number of indexed business process models, and |{pj|ti is a tag in pj}| is the number of business 1 term frequency*inverse document frequency. A. Koschmider et al. / Data & Knowledge Engineering 70 (2011) 483–503 485
A Koschmider et al. Data S Knowledge Engineering 70(2011 )483-503 CustomerOrder execution The process descnbes the notification of ■國 minimized fault rate standardized Save Cancel Fig. 2. Process description window. process models, where the tag t, appears. The purpose of the idf part(log( TP I stag. is to decrease the impact of common words that all business process models have in common. [26 observed that people often use a surprisingly great variety of words to refer to the same thing. In order to bridge the gap between different modeling vocabularies, we determine for each keyword the set of synonyms via an extended structure of WordNet2and assign the same tag score to each word in the synonym set. For this practical purpose, we index for each relevant attribute two versions: one with word Net information included and one without Based on the extended structure of tags, the system can also determine homonyms(two terms having the same pronunciation. but with different meaning)and tags with different abstraction levels. To uncover homonyms and different abstraction levels of tags, we use the similarity measures presented in [24]. We can always extract the abovementioned tags from the business process model because they form an intrinsic part of the business process model. In the next section, we present a set of additional metadata criteria where the user can enhance the tag characterization of a process model. Each metadata criteria corresponds to an attribute of the tag characterization and thus can be either searched in isolation or in conjunction with other attributes, such as the default attributes mentioned above. Additionally both the search and the recommender component can work solely on automatically acquired data. 2. 2. Manual tagging Apart from the whole process model, users can identify coherent parts within a business process model, e.g., order approval omplaints handling, or order receipt. We index these parts in the same way as if they were regular business process models and additionally store a pointer to the business process model with which they are associated. For example, for a business process model that consists of three distinct process model parts, we would include four tag characterizations in our index: one for the whole process model, and one for each of the three parts as well. Fig. 2 shows the process model description window via which the insertion of the following metadata criteria is allowed: Process name: each business process model or part can be identified with a describing label (e.g, customer order). Purpose: the purpose fulfilled by this process model(part): analysis, documentation, execution or re-engineering (if required the user can annotate more purpose criteria Objective description: the objective fulfilled by this process model (e g, modeling handling of an order request). Process description: a textual description of the process model (part), and Property: that results from practical modeling experiences, e.g., standard denotes a standardized process. If required, the user can introduce more annotation properties. 2http://wordnetprinceton.edu!
process models, where the tag ti appears. The purpose of the idf part log j P j j fpj jti is a tag in pjg j is to decrease the impact of common words that all business process models have in common. [26] observed that people often use a surprisingly great variety of words to refer to the same thing. In order to bridge the gap between different modeling vocabularies, we determine for each keyword the set of synonyms via an extended structure of WordNet2 and assign the same tag score to each word in the synonym set. For this practical purpose, we index for each relevant attribute two versions: one with WordNet information included and one without. Based on the extended structure of tags, the system can also determine homonyms (two terms having the same pronunciation, but with different meaning) and tags with different abstraction levels. To uncover homonyms and different abstraction levels of tags, we use the similarity measures presented in [24]. We can always extract the abovementioned tags from the business process model because they form an intrinsic part of the business process model. In the next section, we present a set of additional metadata criteria where the user can enhance the tag characterization of a process model. Each metadata criteria corresponds to an attribute of the tag characterization and thus can be either searched in isolation or in conjunction with other attributes, such as the default attributes mentioned above. Additionally, both the search and the recommender component can work solely on automatically acquired data. 2.2. Manual tagging Apart from the whole process model, users can identify coherent parts within a business process model, e.g., order approval, complaints handling, or order receipt. We index these parts in the same way as if they were regular business process models and additionally store a pointer to the business process model with which they are associated. For example, for a business process model that consists of three distinct process model parts, we would include four tag characterizations in our index: one for the whole process model, and one for each of the three parts as well. Fig. 2 shows the process model description window via which the insertion of the following metadata criteria is allowed: • Process name: each business process model or part can be identified with a describing label (e.g., customer order), • Purpose: the purpose fulfilled by this process model (part): analysis, documentation, execution or re-engineering (if required the user can annotate more purpose criteria), • Objective description: the objective fulfilled by this process model (e.g., modeling handling of an order request), • Process description: a textual description of the process model (part), and • Property: that results from practical modeling experiences, e.g., standard denotes a standardized process. If required, the user can introduce more annotation properties. Fig. 2. Process description window. 2 http://wordnet.princeton.edu/. 486 A. Koschmider et al. / Data & Knowledge Engineering 70 (2011) 483–503
A Koschmider et al. Data S Knowledge Engineering 70(2011)483-503 These metadata criteria are tated for process models or process model parts. In addition to this annotation, the user can notate each process model activity with the following metadata: cost: the costs for the design of a process activity, and quality: the quality of the design of a process activity Each of these metadata types is indexed and can be used for the query-based retrieval of business process models and process odel parts, as presented in the next sectior Example. Consider a process model part dealing with order approval. After the automatic tagging phase the corresponding tag haracterization looks as follows, description→{" order"," approval",…}…, 1st_el→{" check"," order",… Now, the user additionally tags the attribute purpose yielding the final tag characterization: description→{…}…,1stel→{-}, purpose→{" execution"H 3. Searching for process model parts The modeler has the possibility to use the search functionality at each stage of the process modeling. She can choose whether he wants to search for process model parts. the whole business process model, or both. For practical reasons, we differentiate our earch interface into two modes: 1)the most common search options and the underlying retrieval model, as discussed in Section 3. 1 and 2) the more elaborate and less frequently used options, as presented in Section 3. 2 In Section 3. 3 we introduce a method to suggest related process models, although they do not conform exactly to the specified query 3. 1. Basic search The implementation of the index and search functionality is based on the open source search engine Lucene. Lucene's data model is based on so-called documents which contain fields that have a name and a set of associated values that can be indexed. ie. to align it with the introduced syntax for tag characterizations, a Lucene document can be represented as fi-Vi.nfi-Vn. where the fi are the names of the fields and the vi the associated sets of values. Thus, the definition of tag characterizations can be mapped one-to-one to Lucene documents: each attribute ai is mapped to a field name fi and the set of associated tags Ti is mapped to the set of values Ve. The tag characterizations, i.e. the business process models, can then later be searched by providing search keywords for one attribute in isolation or for several attributes at the same time. The results of a query are scored by a mixture of he Vector Space Model and the Boolean Model (cf. 58)). desczintinuing our example from Section 1.2, the user is searching for both process model parts and entire business process models user activates WordNet in order to suggest process models, where process model objects have been labeled with respect to a different vocabulary. For each of the free text fields the user can use the standard Boolean operators AND, OR, and NOT. Additionally, she can po wildcard queries and perform fuzzy searches based on the Levenshtein distance, or Edit distance algorithm [18. The quality of the search results correlates positively with the metadata criteria introduced in the previous section In our xample, the search for a model with a documentation purpose thus requires a corresponding annotation of models beforehand. The first and last element search field were not mentioned earlier, but they are automatically acquired. this is done by using the labels of the first or last element(s) in the process model (part), converting them to attributes of the tag characterization, espectively. The advantage of these search criteria is that modelers can search for a specific input and output of the process model. For instance, the modeler is interested in process models starting with for instance send request. 3.2. Extended search To provide the range of possible designs and not to overlook variants of business processes, the process modeler can activate optional search criteria that further limit the query results(see Fig. 4) 2. 1. Process design cost and qu g To calculate the cost and the quality of a process design we adapted the functions presented in [32] for a generic process design. process design P, a business process model in our context, is a set of ordered pairs(a, Bi) P={(a1;)}
These metadata criteria are annotated for process models or process model parts. In addition to this annotation, the user can annotate each process model activity with the following metadata: • cost: the costs for the design of a process activity, and • quality: the quality of the design of a process activity. Each of these metadata types is indexed and can be used for the query-based retrieval of business process models and process model parts, as presented in the next section. Example. Consider a process model part dealing with order approval. After the automatic tagging phase the corresponding tag characterization looks as follows: ½ description→f g ”order”; ”approval”;… ; …; 1st el→f g ”check”; ”order”;… : Now, the user additionally tags the attribute purpose yielding the final tag characterization: ½ description→f g … ; …; 1st el→f g … ; purpose→f g ”execution” : 3. Searching for process model parts The modeler has the possibility to use the search functionality at each stage of the process modeling. She can choose whether she wants to search for process model parts, the whole business process model, or both. For practical reasons, we differentiate our search interface into two modes: 1) the most common search options and the underlying retrieval model, as discussed in Section 3.1 and 2) the more elaborate and less frequently used options, as presented in Section 3.2. In Section 3.3 we introduce a method to suggest related process models, although they do not conform exactly to the specified query. 3.1. Basic search The implementation of the index and search functionality is based on the open source search engine Lucene.3 Lucene's data model is based on so-called documents, which contain fields that have a name and a set of associated values that can be indexed, i.e. to align it with the introduced syntax for tag characterizations, a Lucene document can be represented as [f1→V1,…, f1→Vn], where the fi are the names of the fields and the Vi the associated sets of values. Thus, the definition of tag characterizations can be mapped one-to-one to Lucene documents: each attribute ai is mapped to a field name fi and the set of associated tags Ti is mapped to the set of values Vi. The tag characterizations, i.e. the business process models, can then later be searched by providing search keywords for one attribute in isolation or for several attributes at the same time. The results of a query are scored by a mixture of the Vector Space Model and the Boolean Model (cf. [58]). Continuing our example from Section 1.2, the user is searching for both process model parts and entire business process models describing customer approvals and orders. In this context she decides to use the query interface, as shown in Fig. 3. Additionally, the user activates WordNet in order to suggest process models, where process model objects have been labeled with respect to a different vocabulary. For each of the free text fields the user can use the standard Boolean operators AND, OR, and NOT. Additionally, she can pose wildcard queries and perform fuzzy searches based on the Levenshtein distance, or Edit distance algorithm [18]. The quality of the search results correlates positively with the metadata criteria introduced in the previous section. In our example, the search for a model with a documentation purpose thus requires a corresponding annotation of models beforehand. The first and last element search field were not mentioned earlier, but they are automatically acquired. This is done by using the labels of the first or last element(s) in the process model (part), converting them to attributes of the tag characterization, respectively. The advantage of these search criteria is that modelers can search for a specific input and output of the process model. For instance, the modeler is interested in process models starting with for instance send request. 3.2. Extended search To provide the range of possible designs and not to overlook variants of business processes, the process modeler can activate optional search criteria that further limit the query results (see Fig. 4). 3.2.1. Process design cost and quality To calculate the cost and the quality of a process design we adapted the functions presented in [32] for a generic process design. A process design P, a business process model in our context, is a set of ordered pairs (ai,θi): P = ai; θi f g ð Þ 3 http://lucene.apache.org. A. Koschmider et al. / Data & Knowledge Engineering 70 (2011) 483–503 487