The Enterprise Knowledge Graph is a disruptive platform that combines emerging Big Data and Graph technologies to reinvent knowledge management inside organizations. This platform aims to organize and distribute the organization’s knowledge, and making it centralized and universally accessible to every employee. The Enterprise Knowledge Graph is a central place to structure, simplify and connect the knowledge of an organization. By removing complexity, the knowledge graph brings more transparency, openness and simplicity into organizations. That leads to democratized communications and empowers individuals to share knowledge and to make decisions based on comprehensive knowledge. This platform can change the way we work, challenge the traditional hierarchical approach to get work done and help to unleash human potential!

Problem

“Today knowledge has power. It controls access to opportunity and advancement" ~ Peter Drucker

As we all know, the nature of business is changing rapidly. The economy is more and more relying on data, information and knowledge, and these are the most important resources of organizations as well. In order to engage all employees in knowledge sharing and collaboration, many organizations adopt the emerging social networking trend and offer new types of internal tools, including social networks, collaboration networks, wikis, blogging systems and document- & media-shares. However, these applications generate an overwhelming amount of mostly unstructured enterprise data hoarded in various isolated systems across the organization. The result is a more and more complex infrastructure consisting of many data-silos with redundant, duplicated and outdated information. This makes it complicated to find the right information and get valuable insights. Today’s organizations struggle to manage the huge amount of data in a way that enables efficient tools for searching, analyzing, sharing and filtering. With too many emails and information channels, information overload became a serious problem. Employees cannot just focus on work, instead they are losing time by doing work about work.

What we need is a new platform to accommodate the increased data management needs, to tackle information flow, communication and data infrastructure issues and to enable the next generation of more efficient tools for searching, sharing, filtering and analyzing data. We need one central place to structure, simplify and connect the knowledge of an organization. No human is able to read and analyze this immense amount of ever growing knowledge. This raises the question: why not use emerging Big Data technologies on organizations most valuable resource – the knowledge?

Solution

Do you want to spend less time on searching and asking for the right information and instead just focus on the creative part of your work? Do you ever feel to have wasted too much time on emails and meetings just to get the small missing piece of information? The Enterprise Knowledge Graph (EKG) offers a solution and could truly transform the way knowledge workers do their work by improving the information flow through transparency and simplicity. Furthermore, it is designed from the ground up for helping people to work together and to find the right information as fast as possible. The Enterprise Knowledge Graph is a next generation platform for organizing and distributing the organization’s knowledge, and making it centralized and universally accessible to achieve its comprehensive potential. In the context of the growing Social Media and Big Data trends, this platform fits ideally in the current situation of the evolving knowledge economy. The EKG offers the entire infrastructure to aggregate, store, organize, interconnect, analyze and visualize the increasing amount of enterprise data.

Bold words … but how does this graph thing actually work?

Every company is organized in a unique network of clearly identifiable entities with valuable connections between them. Some examples of entities are:

employee, content (e.g. documents, presentations, wikipage), product, location (e.g. building, city), project, department, group, event, process, customer, company related term …

The Enterprise Knowledge Graph aims to replicate this unique network with most of its relevant entities into a graph database (organizational and content structure). Moreover, the EKG uses advanced data analysis technologies, machine-learning, graph operations and semantic methods to enhance the data quality and transform unstructured data into knowledge. Therefore, this platform can be used for turning straw into gold by aggregating, analyzing and interconnecting the enterprise data across multiple disparate data stores. By using Big Data technologies such as text analysis, new implicit connections can be formed between entities. This is a key feature of the EKG, since it reveals additional insights (that would not be discovered otherwise). The graph allows machines to understand the organization structure and almost everything the organization is working on. The EKG provides a unified and transparent information structure and keeps users from drowning in an ocean of unconnected and meaningless information.

Let’s get into some more technical details:

The Enterprise Knowledge Graph combines some of SAP’s hottest technologies such as HANA, HANA Text Analysis, HANA Graph Engine, Fuzzy Search and HANA Predictive Analytics. In fact, the EKG uses the HANA Graph Engine (an additional layer on top of the HANA database) and leverages other technologies in HANA to analyze and structure the data. A graph is a highly flexible and schema-less data structure and, therefore, capable of adapting to all kinds of changes. The flexibility of the graph structure to adapt to changes can help companies to scale and distribute globally and still maintain a transparent and simple structure. The EKG also contains a comprehensive corporate taxonomy (acronyms, product names, glossary entries and other terms) in order to understand the organization and analyze the content.

The data in the knowledge graph is aggregated from many internal as well as external data sources. It is crucial to only include internally public data and to exclude any private or confidential information. Furthermore, the graph does not store the actual content of a document in the graph but rather some important metadata including the path to the document. That ensures that the EKG is not a collection of replicated data from other systems, but rather a collection of connected entities linking to the actual content.

To sum up, the Enterprise Knowledge Graph shall not be a replacement for existing knowledge management systems, data warehouses and other data stores, but rather a complement to these systems. Further, the EKG acts as an interconnecting and bridging component between existing tools and harmonizes the way knowledge is accessed.

Good to know … but what can I do with this knowledge graph?

There is a vast amount of innovative use cases and applications that could be built on top of the Enterprise Knowledge Graph platform. This platform is not focused on replacing any tool, but rather on connecting and enhancing existing ones as well as enabling the next generation of innovative applications for knowledge management. Here are a few examples:

Enterprise Recommendation Engine: Filtering and recommending proper and relevant information (e.g. content, experts, news) based on the context of the user in the knowledge graph (e.g. interest, expertise, other relations). Therefore, relevant knowledge reaches the employee without having to search it or know about its existence. As a result, by using such a system the ‘knowledge seeks people’ rather than ‘people seek knowledge’. Further, it reduces information overload.
Duplication Detector: Find and prevent duplicate content, projects and work.
Intelligent Business Assistant: A smart work assistant that knows about the employee’s context (connections, tasks and created content) and delivers the right information to the right people at the right time.
Enhanced Enterprise Search Features: Personalized search (by using the employee’s context from the graph), query suggestions, summarized information panel (like in Google Search for well known entities) and semantic search.
Social Network Analysis: Obtain real and valuable insights of the organization's knowledge and structure (e.g. what topics are trending in a specific location or around you, how well the connections between support and development of a specific product are, how spreads information and knowledge inside the organization).
Enhanced Employee Profile: Enhance employee's online profile by adding information from the knowledge graph such as contributions from various sources, related employees and mined expertise & interests.
Expert Locator: Identifies experts by analyzing their contributions (expertise and interest mining). Employees can search experts for specific topics.
Enterprise Gamification System: Calculates a score based on the knowledge activity (contributions) across multiple employee network tools in the organization. Therefore, active employees in knowledge sharing can get rewards.
Automatic Tagging: Utilizing data analysis and machine learning technologies to automatically annotate any entity in the knowledge graph with company-related tags from a predefined dictionary.
Context Awareness in Applications: Utilize the context information from the knowledge graph to support and react to the user.
Offering easy and consistent access to organization’s knowledge empowers employees to imagine and create new innovative applications on top of the EKG platform.

These applications could ultimately increase employees’ productivity, motivation, collaboration and satisfaction. The following section provides a more detailed explanation on the Enhanced Enterprise Search:

Most people use search engines, such as Google or Bing, as an entry point for most activities on the Internet. However, most organizations do not provide search tools with similar capabilities inside the enterprise Intranet. We believe that it is mandatory to have one single entry point to search, explore and find anything related to the organization. The Enterprise Knowledge Graph does have the potential to add some powerful functionalities to the current enterprise search. Thereby, the knowledge graph provides information about related entities, and can be used to find and explorer experts, content or other specific entities (e.g. products, processes, locations). Further, the structured knowledge in the EKG can be utilized for advanced search capabilities such as natural language processing, and to provide semantic capabilities by understanding the contextual meaning of a search query. In particular, the EKG provides the context of the user (for personalized results) and the semantic meaning and context of mentioned entities. Thus, making the EKG an essential part to implement advanced semantic capabilities into enterprise search systems.

In the shown mock-up, the search query is resolved with the classic list of search results as well as a horizontal list, on top of the classic search results, containing structured and relevant entities. By selecting an entity, summarized and detailed information about the searched entity and related connections to other entities are displayed in the information panel (on the right). The goal is that employees would be able to use this information to resolve their search query without navigating to other sites for assembling the information themselves. This concept is comparable to the knowledge graph display integrated into Google Search. The horizontal list and the information panel are embedded components provided by the Enterprise Knowledge Graph and thus the enterprise search does not require any change in the search back-end. In addition, an editing functionality directly on the information panel would allow employees to maintain and enhance the graph data.

Practical Impact

By removing complexity, breaking down barriers and connecting employees among each other and with entities, responsiveness and engagement increase. This ultimately brings more transparency, openness and simplicity into organizations. That leads to democratized communications and empowers individuals to make decisions based on comprehensive knowledge. Further, it provides one central repository to connect the organization’s knowledge and structure.

Moreover, it encourages all employees to share knowledge with the organization by knowing that the graph will recommend the knowledge to the right person. A transparent and responsive organization is the key factor to enable knowledge sharing. The Enterprise Knowledge Graph makes organizations possible where …

… influence comes from sharing information, not from hoarding it.

… power is the product of contribution rather than position.

… communities form spontaneously around shared interests.

… coordination happens without centralized management.

… the wisdom of the many trumps the authority of the few.

The Enterprise Knowledge Graph is a disruptive platform that combines emerging Big Data technologies to reinvent knowledge management inside organizations. Therefore, we believe that the EKG can change the way we work, challenge the traditional hierarchical approach to get work done and help to unleash human potential!

Challenges

The successful integration of the EKG into the employee network depends on some success-critical factors and faces several challenges.

An infrastructure of several social media and knowledge management tools as well as enough accessible datasets is a substantial foundation for the EKG. This platform can only aggregate and interconnect data that are already available in various systems.

The data quality of these data sources is also an essential factor. In particular, the EKG hugely benefits by non-redundant, up-to-date and complete data sets. Further, some entity types should be replicated as a preferably complete collection, such as a full list of employees (from an address book).

After the aggregation, data maintenance mechanisms are essential to ensure high data quality, an up-to-date knowledge graph and prevent data redundancy. To solve this challenge, we suggest to implement automatic maintenance by software agents which regularly and automatically re-aggregate and update the data from connected data sources as well as manual maintenance by crowdsourcing, which allows employees to collaboratively maintain, correct and enhance the knowledge graph data in a Wikipedia-like interface.

Moreover, the Enterprise Knowledge Graph also requires a comprehensive dictionary (collection of organization-related terms), otherwise it is difficult to understand work-related content. If the organization does not provide a manually maintained dictionary, this taxonomy could also be created automatically by using machine-learning techniques such as topic modelling.

Much of the knowledge inside companies still resides in the minds of employees. Therefore, it is crucial to shape a work environment where people are willing to be more transparent about their knowledge and engaged in knowledge sharing as well as participate in enterprise social media applications. Therefore, it is necessary to engage employees in knowledge sharing and shift conversations from the isolated email environment into an internal social network where everybody can profit from these conversations. This is important, since the EKG only aggregates internally public information and therefore depends on shared data.

The aggregation, analysis and interconnection of personal data from disparate data sources is an essential concept of the Enterprise Knowledge Graph. However, this is in conflict with privacy regulations of many countries and companies. To comply these regulations, it is required to establish appropriate data protection and privacy mechanisms in this platform. We suggest to implement these regulations in the core data access operations with a combination of industry-standard security technologies. As a result, only identifiable employees with a valid authentication are able to access and consume specific information. Thereby, those restrictions shall be aligned with the general interests of employees. The aggregation should be restricted on internally public data (information any employee inside the company is authorized to access). Private (e.g. emails, chat transcripts) as well as sensitive personal information shall not be aggregated into the knowledge graph without the explicit consent and control of the respective user. This platform shall also provide privacy control mechanisms that allow employees to inspect, modify, block or delete their personal information and associations at any time. Further, we suggest to assemble a strong set of policies to ensure full transparency about the type of collected data and the planned use-cases.

First Steps

For the implementation of the Enterprise Knowledge Graph platform, we defined the following requirements:

As a whole, a flexible architecture that enables aggregation, storage, querying, processing, and analyzing of Big Data in a graph structure is the core prerequisite of this platform. Thereby, a scalability for millions of nodes and edges for aggregating, storing and analyzing the data is substantial. The data store has to be optimized for graph operations and should use a flexible and extensible data schema for organizing the information. The data should be stored in a machine-understandable form, allowing computers to traverse and reason on this structured knowledge. An easy integration and extensibility for various types and formats of internal as well as external data sources is required. Further, the extracting and mapping functionality of the data aggregation shall handle both structured as well as unstructured information. In addition, this data aggregation should be able to monitor and regularly update entities, properties and associations by requesting the newest representation from the respective data source, and re-aggregate if new data is available. The combination of various data analysis technologies makes it possible to constantly interconnect entities, enhance the data quality and transform unstructured data into knowledge. Furthermore, a recommendation component with the functionality to distribute and recommend the proper information from the knowledge graph to the right user based on a specific context is needed. Moreover, the platform shall provide an easy, comprehensive and consistent access to the data in the knowledge graph via APIs, so it is possible to create applications and extensions based on this structured information. Users should be able to visualize, explore, and collaboratively edit the information in the knowledge graph. Therefore, a graphical user interface to enable this kind of interactions is a required component of this platform.

The platform architecture of our first EKG prototype is separated into several modules as illustrated below:

The Graph Store is the core component of the Enterprise Knowledge Graph with the task to store the interconnected data. The Graph Store is implemented with HANA Graph, an additional layer on top of the HANA database optimized for graph operations. HANA Graph aims at providing a platform for efficiently managing, integrating, and analyzing structured, semi-structured, and unstructured information. An advantage of HANA Graph is that the schema of the data is not completely defined upfront in a rigid way, but evolves as new entities are created, new properties are added, and as new associations between entities are established. Therefore, the schema can be changed at any time and entities do not have to match a fixed schema. HANA Graph is tested and evaluated with several million nodes and edges. For data transactions, other modules use WIPE - the data manipulation and query language of HANA Graph - for storing, updating, retrieving and communicating with the Graph Store. In addition, SQL can be used for managing the data directly in the underlying HANA Database.

The main task of the Data Aggregation Module is to connect to multiple internal and external data sources for extracting relevant structured and unstructured data from different data formats. Therefore, the Data Aggregation Module provides a toolkit for handling various types of data sources, data formats and authentication methods. This module is implemented as an external Java-Framework (not integrated into HANA), which makes it possible to easily parallelize the aggregation and maintenance task on different servers. The combination of various libraries (Apache Tika, Jsoup, Gson, Rome, Bliki, Scribe, XOM and more) allow to process and extract relevant information from the following data formats: JSON, XML, RSS, Wiki Markup, CSV, Spreadsheets, HTML and many other document formats. To access internal data stored in the enterprise Intranet the module uses SSO authentication and to retrieve data from some external sources Oauth 2.0 authentication is supported, as well. With these capabilities, the Aggregation Module is able to connect to about 30 internal and external data sources. After the aggregation, the unstructured data can also be analyzed by utilizing the Data Analysis Module to get additional insights and transform it into structured information. In the next step, the entity and metadata have to be mapped on a partly-defined enterprise data schema. Finally, the mapped entity is inserted into the Graph Store by using WIPE. The Data Aggregation Module also provides maintenance functionality, that is able to regularly select and retrieve entities from the Graph Store and update the metadata and relations by requesting the newest representation from the respective data source. Furthermore, the monitoring and re-aggregation of data from connected data sources is essential for an always up-to-date graph.

The Data Analysis Module offers several technologies for data mining, natural language processing, machine learning and text analysis to analyze unstructured, semi-structured and structured data. Since most information in companies is stored in the form of unstructured text (e.g. documents, social network content, emails), natural language processing and text analysis are essential techniques for the EKG. SAP HANA provides a built-in text analysis toolkit that includes a suite of analyzers to construct text analysis applications for analyzing unstructured data. These analyzers allow to perform extraction and various forms of natural language processing on unstructured text. The extraction processing capability makes it possible to identify and extract well-known entities. The current Data Analysis Module utilizes the extraction analyzer from the HANA text analysis toolkit with customized configuration. In particular, the analyzer is configured to only include terms defined in a custom dictionary for the extraction process. The custom dictionary is a user-defined repository of terms generated from the corporate taxonomy (dictionary with company-related terms). In addition, the EKG utilizes some predictive algorithms from HANA Predictive Analysis Libraries for preprocessing (apriori), social network analysis (link prediction) and clustering (anomaly detection).

The Recommendation Module bundles various technologies and algorithms to generate context-based recommendations from the knowledge graph. Moreover, this module includes a hybrid recommendation method that uses graph algorithms to cluster, detect, rank and search entities in the EKG. The Recommendation Module implements this functionality with stored procedures that are using WIPE and SQL statements for data transactions with the Graph Store. The Knowledge Access Module utilizes those stored procedures to expose recommendation functionality via APIs to applications.

The Knowledge Access Module exposes a collection of REST APIs to access and insert information in the EKG. Applications can leverage these APIs to dynamically access and explore the knowledge graph data. These APIs are implemented on the XSEngine, an application and web server integrated into HANA that allows to build REST services and lightweight web pages. The benefit of XSEngine is that it has an integrated and straightforward way to connect to HANA databases. The Knowledge Access Module is also highly important for security and privacy aspects. In fact, this module shall implement security mechanisms and privacy control at the core of all data transactions and, thus, is essential to comply existing regulations. In addition, this module uitlizes the Fuzzy Search in HANA to implement fault-tolerant search functionality for the knowledge graph.

Graphical user interfaces for exploring and editing the data in the knowledge graph are an important components of the EKG infrastructure. The EKG Explorer, shown below, is a web graph-visualization tool with an interactive and responsive user interface to visualize, explore and search information in the Enterprise Knowledge Graph. The visualization in the EKG Explorer is powered by sigma.js, a JavaScript library dedicated to graph drawing. There are many use-cases for visualization tools, such as information flow modeling, social network exploration and duplication discovery.