Version HistoryVersion History

Title

Concept Searching 

CompanyLogoSmall

 

CompanyLogo

 

CompanyUrl

http://www.conceptsearching.com 

Summary

Highly accurate automatic semantic metadata generation, automated classification, and rich taxonomy tools to improve search and enable content to drive business processes.

SPG Sponsor Type

Standard 

Tab1 Name

Overview 

Tab1 Contents

Founded in 2002, Concept Searching is the only statistical metadata generation and classification software company in the world that uses concept extraction and compound term processing to significantly improve access to unstructured information.  The product was launched in 2003 and is delivered using a simple Web Services API with all data exchanged in XML and all document metadata held in an open relational database. 

conceptClassifier for SharePoint extends the capabilities of SharePoint and Microsoft Search products and offers a technology unique solution that addresses content and metadata management issues in enterprises and at the same time improves the overall search experience for knowledge workers.

Headquartered in the U.K. with offices in the U.S. and South Africa, Concept Searching solves the problem of finding, organizing, and managing information capital.

Tab2 Name

conceptClassifier 

Tab2 Contents

Concept Searching’s conceptClassifier for SharePoint technology platform provides advanced semantic metadata generation, compound term processing, automatic classification and taxonomy management, fully integrated with Microsoft technologies including SharePoint, Office, Exchange, Microsoft search products and FAST ESP.

The technology platform leverages compound term processing and stores the automatically generated semantic metadata directly within SharePoint properties, fully integrating the classification schemas with Microsoft Search products.  conceptClassifier for SharePoint greatly increases precision and relevancy and offers users both a faceted or browsable taxonomy search experience. 

Capabilities:

  • Downloadable in 30 minutes - no programming required
  • Fully SOA compliant , delivered as Web Parts, based on open standards
  • Runs natively in SharePoint
  • Does not need a separate index
  • Fully integrated with Content Types
  • Automatic Content Update during classification based on organizationally defined content
  • Fully respects inherent SharePoint security
  • Automated classification and automatic semantic metadata generation
  • Classification technology uses concept extraction and compound term processing
  • Automatic classification from within Office and Exchange
  • Automated classification from SharePoint, file stores and web sites
  • Robust suite of taxonomy tools proven to reduce the time to build and maintain taxonomies by 80%
  • Simple intuitive interface designed for the Subject Matter Expert to rapidly build and maintain corporate taxonomies
  • Automated Taxonomy Load feature to import industry standard taxonomies
  • Preview functionality with highlighted concepts from the search interface without launching the originating application
  • Provides a single search interface to SharePoint, internal repositories and web sites

Tab3 Name

Solutions 

Tab3 Contents

With the exponential increase in unstructured information, enterprises are seeking new ways to improve not only the search and retrieval process but to identify tools to manage, capitalize on, and leverage their information assets to improve organizational performance.  The need for classification is a critical component in managing enterprise content.  Content is the lifeblood of an enterprise, created because of business processes, changed through business processes and ultimately drives the business processes.  Enterprises should not only be looking for a way to improve search outcomes but fundamentally to improve business outcomes.

 

conceptClassifier is unique as it is a technology, not an application.  With native integration in SharePoint and Microsoft technologies conceptClassifier can be used by any application that requires metadata.  The ability to accurately and automatically capture semantic metadata, enables content to become a business driver to improve organizational performance, compliance, and data security. 

 

As a horizontal solution, conceptClassifier is being used in Legal and eDiscovery, Records Management,Life Sciences, in the Public Sector, and to identify unknown potential privacy and data exposures.

 

Our clients have deployed the technology to solve a variety of content challenges including: support of a 24/7 Call Center, as a piece of a global proposal generation application, as a digital forensic and eDiscovery tool, as a research tool, to deliver brokered search across internal and external repositories, and for 'Know How' and 'Know Who' applications.

 

 

 

Tab4 Name

The Technology 

Tab4 Contents

Compound Term Processing

Concept Searching’s unique Compound Term Processing performs matching on the basis of compound terms as opposed to keywords.  Compound terms are built by combining two (or more) simple terms, for example ‘triple’ is a single word term but ‘triple heart bypass’ is a compound term.  The ambiguity in single word terms results in inefficient search.  For example, does the word ‘triple’ mean three or is it a baseball term?  Does heart mean an organ or center?  Is bypass a highway or does it mean to avoid? A traditional search query return would return all documents that contained the words ‘triple’, all the words that contain ‘heart’, and all the words that contain ‘bypass’.  

 

By identifying and forming compound (multi-word) terms and placing these in the search engine’s index the search can be performed with a greater degree of accuracy because the ambiguity inherent in single words is no longer a problem.  A search for ‘survival rate after triple heart bypass’ will locate documents about this topic even if the precise phrase is not contained in any of the documents. 

 

The Metadata Issue

The metadata generation issue is increasingly a growing concern in large enterprises.  A comprehensive approach requires more than syntactic metadata (i.e. date, author, title) and requiring end users to add rich metadata is haphazard and subjective at best.  Since Concept Searching’s technology is no longer restricted to keyword identification, compound term metadata can be automatically generated either when the content is created or ingested.  The generation of metadata based on concepts extracts compound terms and keywords from a document or corpus of documents that are highly correlated to a particular concept.  By identifying the most significant patterns in any text, these compound terms can then be used to generate non-subjective metadata based on an understanding of conceptual meaning.

 

The ability to identify ‘concepts in context’ generates far richer metadata, improving the precision and relevancy in the information retrieval process. Meta-tags are automatically added to the properties field of each document making the document more valuable to the organization by increasing the ability of the document to be retrieved using Microsoft Search Products that use keywords and metadata to retrieve information.

 

Precision versus Recall

Precision and recall are the two key performance measures for information retrieval.  Precision is the retrieval of only those items that are relevant to the query.  Recall is the retrieval of all items that are relevant to the query.  Yet most information retrieval technologies are less than 22% accurate for both precision and recall.  The ideal goal is to have them balanced.  Compound Term Processing has the ability to increase precision with no loss of recall. 

 

Managing Content

Taxonomy development and maintenance has traditionally been a laborious and on-going challenge, not to mention costly.  The most effective approach is to use rules-based categorization providing enterprises complete control of rules-based descriptors unique to their organization.  Since all rules can be defined and managed, error prone results utilizing ‘training’ algorithms typically found in other approaches is eliminated. 

 

A concept based automatic classification process identifies during indexing categories that each document belongs to.  Each category is identified by a unique descriptor and is associated with key descriptive words and/or phrases held in the database.  This approach enables a rapid implementation of a corporate taxonomy with all documents classified to multiple nodes at index time.  Ideally, the taxonomy can be used to browse the document collection or as a filter when running ad hoc searches.

 

An easy-to-use taxonomy and automatic classification tool creates the framework to classify content based on concepts to one or more nodes in the taxonomy.  Features to enable Subject Matter Experts to interact with the taxonomy can simplify on-going maintenance.  For example; automatically generating compound term clues from the document corpus, dynamically showing the effect of changes on the taxonomy, and class weighting influenced by parent, child, and sibling can reduce taxonomy development and on-going maintenance by 66%-80%. 

Tab5 Name

Why We Joined 

Tab5 Contents

Technology in isolation does not solve business challenges or help organizations accomplish business goals.  It is only when technology is transformed into the driver to enable business processes can real benefits be achieved.  Ultimately it is dependent on people whose skills and expertise convert technology into the solutions that will obtain the organizational objectives. 

 

Concept Searching believes that the sharing of knowledge, the contribution of knowledge, and the aggregation of knowledge results in innovation and insight.  The value of SharePointGovernance.org transcends the limitations of physical boundaries and creates a global community that engages the participants to contribute, learn, and communicate with others who are seeking deeper knowledge to maximize the SharePoint platform to drive business efficiencies.

Attachments
Version: 23.0 
Created at 6/23/2009 1:36 PM  by CORRIDORWEB01\administrator 
Last modified at 6/30/2009 4:59 PM  by Michael Dwyer