Docstoc

CIS - EMC

Document Sample
CIS - EMC Powered By Docstoc
					Content Intelligence Services on ECM_v6.5 demo system


This document describes the steps to demonstrate Content Intelligence Services (CIS) on ECM_v6.5
image. CIS is loaded on the Content Server machine.

Sample documents are located in D:/Demos/CIS folder on AppServer machine (corpc).

This script is recommended for xCP_Platform_65_v7; valid for xCP_Extended_65_v7, IA_DocSci_65_v7,
IA_Captiva_65_v7, IA_SP_65_v7, IA_Captiva_SP_65_v7, IA_CenterStage_65_v7, IA_MyD_65_v7,
IA_SP_MyD_65_v7, IA_DocSci_SP_65_v7.


Contents
CIS Training Resources .................................................................................................................................. 2
Overall Preparation and Demo Flow ............................................................................................................. 2
Demo preparation ......................................................................................................................................... 3
   Start the CIS service. ................................................................................................................................. 3
   CIS Configuration....................................................................................................................................... 3
   Creating a new Document Set................................................................................................................... 7
Demo Flow 1: Processing English with Existing Taxonomy ........................................................................... 9
Demo Flow 2: Processing French Content with Existing Taxonomy ...........................................................12
Demo Flow 3: Importing an Intellisophic Taxonomy ..................................................................................13
Demo Flow 4: Creating a New Taxonomy (Multilingual) ............................................................................16
   Determining Taxonomy, Categories and Rules .......................................................................................16
   New Taxonomy Creation .........................................................................................................................17
Demo Flow 5: Updating Multiple Attributes ...............................................................................................26
Demo Tips and Tricks ..................................................................................................................................29
   Creating Sample Content ........................................................................................................................29
   Immediate Processing .............................................................................................................................29
Troubleshooting ..........................................................................................................................................30
   Log Files ...................................................................................................................................................30
   Synchronization .......................................................................................................................................30
   Is CIS Running? ........................................................................................................................................30
   Content Not Getting Tagged ...................................................................................................................30
   Tagging Seems Incorrect .........................................................................................................................31




                                                                                  Created by: Irina Ryabchuk, EMC Demo Team
                                                                        Updated by: Mary Ann Fisher, Friend of EMC Demo Team
Overview
The CIS 6.5 script details the steps to preparing and demonstrating CIS as a stand-alone product. These
same steps may be followed to demonstrate a total solution in which attributes and folder links are
quietly assigned with CIS running in the background.

The script focuses on the most common demonstration scenarios, including the new 6.5 support for
multi-lingual. Please read the documentation and view the online training for more
features/functions/benefits of CIS.


CIS Training Resources
A recorded training session is available to become familiar with the CIS technology and tips for
demonstrating CIS:

     Subject:                      CIS Update- SE Training
     Recording URL:                https://www.livemeeting.com/cc/emc/view
     Recording ID:                 6N7C7G
     Attendee Key:                 emc

Documentation is available from the subscribenet download site or in dm_notes under the
/Documentation Library/Current Documentation folder. Useful guides are Content Intelligence
Services.pdf and Content_Intelligence_Services_65_Administration_Guide.pdf.


Overall Preparation and Demo Flow
Most of the time spent with CIS demonstrations is in the preparation. For convenience, a fully
demonstrable CIS system, in two languages, has been configured. If these preconfigured elements are
not sufficient, the overall flow of the preparation is:

Prep steps:
   1. Log into Documentum Administrator.
   2. Create a cabinet/folder to be used as the CIS DropZone. Note that CIS can run against the
        cabinet or folder level.
   3. Create a Category class.
   4. Create a top level taxonomy that points to the Class.
   5. Create categories and sub-categories under the taxonomy.
   6. Create a Document Set that points to the cabinet/folder created above.
   7. Create and test sample content that is imported to the DropZone.
   8. Fine tune the taxonomy (thresholds, categories, subcategories, rules) to ensure content is
        tagged.

Demo steps:

    1.   Demo may be done in DA, Webtop or DAM.
    2.   Import sample content to the DropZone.
    3.   Display attribute automatically updated.
    4.   Show links to category folders.

Details to accomplishing these steps are provided below.
Demo preparation
Start the CIS service.

The CIS service is set to manual and must be started.
    1. Log into Content Server machine
    2. Launch Services window.
    3. Start EMC Documentum Content Intelligence Services.




   CIS Configuration

   CIS is configured to link to category folders and to update document attributes. To view or change
   these settings:
   1. Log into DA http://corpc:8080/da
   1. CIS is preconfigured on the demo to work in a full swing after the service is started. You can
        check it in the Administration node in DA:
To view or manage the specific classes, taxonomies and document sets:

To view and manage currently configured CIS document sets:
Navigate to Administration/Content Intelligence Node in Documentum Administrator:




                                      Figure 1. CIS Node




The demo system is configured with three taxonomies: two imported English Intellisophic
taxonomies of Biotechnology and Organizational Development, and one manually created French
taxonomy of Beaux Arts. Corresponding English and French Document Sets and Drop Zones are also
configured.
                                     Figure 2. Taxonomy Listing




                                 Figure 3. Available Document Sets




                          Figure 4. DropZones under the /Content cabinet

If ‘keywords’ is the attribute that will be automatically updated, it is best to demonstrate CIS with
the ‘keywords’ column appearing in the columns view. Click the columns icon to add the keyword
column to the view: this way you will be able to see the keywords applied to the document by CIS
without going to Properties:
Creating a new Document Set

Follow this preparation step if you do not want to use the existing document set.

Create the destination Cabinet/Folder.
In the Document Sets section of the Content Intelligence Node, click New>Document Set:




When the properties screen appears, enter in a name for the Document Set and select English as the
language.
Click Next to enter in the Document Set Builder properties. Click on “look in” and select the
Cabinet/Folder you created above. Note that CIS will process at either the Cabinet level or the Folder
level. Leave the type as dm_sysobject, even if custom document types are to be processed.
Click Next to move to the Processing tab. Make the document set “active,” set a date to schedule the
first run and click on “Production.” By using the Production server, a single document set will process all
taxonomies.




Once the properties are saved, highlight the new Document Set and synchronize it.




Demo Flow 1: Processing English with Existing Taxonomy


    Navigate to Cabinets/Content/CIS DropZone and import any of the English documents.
Sample content is located on the AppServer machine in D:/Demos/CIS. Import any of the English
documents into the CIS DropZone.




CIS is configured to run in frequent intervals, it should only take a short while to show that the
content is linked to the category folders. If you want to speed up the process, you can select the
documents and proceed to Tools  Submit for Categorization:
   After the document has been run through CIS, you will witness the keywords added to its
   properties:




and new locations for the document:
Demo Flow 2: Processing French Content with Existing Taxonomy
   1. Navigate to Cabinets/Content/CIS Francais:




   2. Import the file from D:/Demos/CIS/French samples folder on corpc machine:




   3. After the document is run through CIS, you can see the new keywords and locations:




Note that the keywords are also in French.
Demo Flow 3: Importing an Intellisophic Taxonomy
The taxonomy should be imported to the Content Server machine, where CIS is installed.

1. Download the taxonomy zip file from Subscribenet. Navigate to Documentum Content Intelligence
Services, version 6.5.

2. Unzip the file in C:\Documentum\jboss4.2.0\server\DctmServer_CIS\deploy\cis.ear\doc\tef\samples.

The script for importing an Intellisophic taxonomy is on the Content Server machine:
C:\Documentum\jboss4.2.0\server\DctmServer_CIS\deploy\cis.ear\doc\tef\tef2repository.bat.

3. In a DOS window, navigate to
C:\Documentum\jboss4.2.0\server\DctmServer_CIS\deploy\cis.ear\doc\tef.

4. Enter the command:
tef2repository.bat –TefFile:sample/<filename>
where <filename> is the name of the taxonomy, including the xml extension. Note that this example
assumes the taxonomy to be in the relative path of “sample.”

Once the taxonomy is imported, log into DA and navigate to the Content Intelligence Node.
Click on “Category Class,” highlight the newly imported Class and open the Properties box.
In the Classification Attribute field, enter in the repeating attribute, such as “keywords” that will be
automatically updated as CIS processes. Note that this can be a custom attribute, but it MUST be a
repeating attribute.
Click on the Default Values tab. Select include category name as evidence term and use stemming.
Depending on the level of automatic attribution, select either “evidence from child” or “evidence from
parent.” By selecting “parent,” the full hierarchy of categories will be updated in the attribute field and
linked to the folder categories. By selecting “child,” only the matching category name will be applied.

Next, click on the Taxonomies node and highlight the newly imported taxonomy. Open the Properties
page.
Add ‘dmadmin’ as the taxonomy owner, which will send any pending assignments to dmadmin’s My
Categories.
Select ‘online’ for the taxonomy.
After the properties are set, synchronize the taxonomy for both production and test servers.




Sample content may now be imported to the /Content/CIS DropZone folder.




Demo Flow 4: Creating a New Taxonomy (Multilingual)

Determining Taxonomy, Categories and Rules

When creating a new taxonomy for a customer, it is best to have a taxonomy that fits the content
and/or the business. If the customer has provided sample content for the demonstration, read through
the files and look for patterns. What words or phrases are repeated? Which words and phrases are
considered a repeating theme and therefore a category and which indicate the classification and are
therefore rules within a category. This analysis will determine the structure of the taxonomy.

If the customer has not provided any sample content, consider the nature of the business or the
department to whom you are demonstrating. For example, if the company is an Oil & Gas company, use
google or wikipedia to find relevant topics and build the taxonomy accordingly. You can also use content
found in google results or in wikipedia to create sample content. See the Demo Tips & Tricks section for
detail.

New Taxonomy Creation

Step 1. Create the Class.
In Documentum Administrator, navigate to the Content Intelligence Node and the Category Class link.
Select File>New>Class.
Enter in a new class name and the attribute to be automatically updated. The Class name should be
something that bridges the taxonomy(ies) under it. In this example, the Class is “Arts” which could
contain individual taxonomies of Music, Painting, Dance, etc.

The attribute may be a standard dm_document attribute or a custom attribute. However, the attribute
MUST be repeating.




Click Next to complete the Class properties. It is best to select stemming to most accurately asses the
content and to select evidence from “parent, “ which will apply both parent and child category matches
to the repeating attribute values.
Step 2. Taxonomy Creation

Once the new Class is created, navigate to the Taxonomies link in the Content Intelligence node.
Note that multiple taxonomies may use the same Class (that is the purpose of a Class) and these
multiple taxonomies may be in different languages and utilize the same Class. Each language will need
its own Document Set, defined to be in that language.

Select File>New>Taxonomy

   1. On the Attributes Tab, enter the name of the new taxonomy.
   2. Select the taxonomy owner, which will be the user who will see any pending (borderline)
      documents and have the ability to approve/reject the content for classification. This is not the
      same as the object owner.
   3. Select the Class that was created in Step 1.
   4. Change the state from Offline to Online, to make the taxonomy available and to start the
      automatic creation of the Category Folder structure.
   5. Select the Taxonomy Language from the pull down list.
   6. Leave the threshold levels at 80/40.
Click Next and Finish. As we will use Document Sets to process content, we need not enter any property
rules here, even if a custom document type is to be used.

Step 3. Create Categories

   1. Double click on the newly created taxonomy to start adding categories. We will only add top
      level categories here. However, categories may be as deeply nested as necessary.
   2. Select File>New>Category.
   3. Enter in a Category Name and Language. The rest defaults from the choices made at the
      taxonomy level.
   4. Add as many categories as desired.
An optional step is to add rules to the category. The rules will help identify content to a category, but
only the category name will be set as an attribute value.

Double click on the newly created category.
Click on the whistle icon.




Click on “Add a Simple Term.” This may be a single word or phrase.
Enter in a word or phrase. CIS is not case-sensitive and will process all iterations of the word/phrase.
Indicate the level of confidence from the pulldown list. Most common is “HIGH.”
Continue adding rules as necessary. There is no save/ok button; the rules are saved as you enter.

When completed, return to the top level taxonomy and highlight the new taxonomy.
Synchronize the new taxonomy from the Content Intelligent command.




Step 4. Create New Document Set

Two document sets are pre-configured in the demo system: one English and one French. If a new
document set is desired, follow the steps in the Preparation section above. Note that each taxonomy
language will need a Document Set defined for that language.

Step 5. Non-English Taxonomy

The example above focused on an English taxonomy. With CIS 6.5, it is possible to have a single Category
Class with taxonomies in multiple languages.

Follow the same steps as that for the English “Dance” example, selecting an alternate language from the
taxonomy language pull down list. For example, the following will create a German taxonomy under the
same Arts Class as the English Dance taxonomy.
Add a German Music category:
And create a German Document set:
Because German was selected as the language for the document set, CIS will expect and process
German content in the given cabinet/folder.


Demo Flow 5: Updating Multiple Attributes
Some demonstrations require multiple attributes to be updated against the same content. For example,
using the theme of the Arts, a customer may want to see the type of Art (Dance, Music, Painting) in the
keywords field, and have the name of the artist in the Authors field. This can be accomplished by
creating two Category Classes where the attribute to be updated is indicated, and having separate
taxonomies that seek different types of information (taxonomy one looks for type of art, taxonomy
number two looks for the artists’ names).

Note that the attribute can be a custom attribute, such as “artists.” For this document, we will use
“authors” as the attribute to hold the artists names. Also note that any standard or custom attribute
entered in the Class must be a repeating attribute.

In the Category Class link of the Content Intelligence node in DA, select File>New>Class.
Enter in the information, using the secondary attribute.
Complete the default values as in the New Taxonomy process, checking off Include Stemming and Use
Evidence of Parent.

We now have two classes that update different repeating attributes.




Create a new taxonomy, as described in the New Taxonomy section above, selecting the new Class just
created.
Add in categories that map to the artists names:




Synchronize the top level taxonomies.
Synchronize the Document Set that will process the content.
Import the content that contains the category terms for both the attributes. In this example, this will be
the artist names and the genre of art.
Both keywords and authors were updated in the same document set.


Demo Tips and Tricks

Creating Sample Content

If the customer has not provided sample content, wikipedia.org is an excellent source to copy and paste
content for many languages. Simply enter in one of the category names in the wikipedia search. Ensure
that the word or phrase appears commonly throughout the displayed article. If so, copy and paste the
content into word.

The document is now ready to be imported and processed with CIS.

Repeat this copy and paste step a number of times in order to have content that may be newly imported
during the live demonstration.




Immediate Processing

During the testing phase of CIS, there may be some fine tuning of the taxonomy. Rather than wait for
the next interval of the Document Set to transpire, you can clear assignments and start processing. By
selecting “all” for the cleared assignments, the system will remove the attribute values, the category
folder links, and the flag that indicates the content was already processed.

The “start processing” command may also be useful during the demonstration, if immediate processing
is needed.
Troubleshooting

Log Files

All log files are located on the Content Server (corpb) machine in the CIS installation directory:
C:\Documentum\jboss4.2.0\server\DctmServer_CIS\log
The main log file to check is Server.log, which provides useful detail and error messages.

Synchronization

The most common cause of error is an unsynchronized taxonomy or document set. If any updates have
been made to either, synchronize both taxonomy(ies) and the document set(s).


Is CIS Running?

This can be indicated with either the Document Set not running or remaining in the run_now state, a
“connection refused” method, synchronizations not completing.

Check that the CIS service on the Content Server machine is still up. Start it if necessary.
If CIS is running, bounce it with restarting the service on the Content Server machine.


Content Not Getting Tagged

There are a number of reasons content is not tagged:

1. The content does not meet the threshold.

This may happen if the document is very small, only a sentence or two, or the document is very large
and the term does not appear very often. Check the pending queues in the MyCategories section of
Content Intelligence node in DA. If the document is there, it means it was close but did not quite reach
the thresholds. Either adjust the thresholds on the taxonomy by lowering both numbers. Retry the
content. You can also try adding the term more frequently to the content or adding more rules to the
category. Be sure to synchronize after the change.
    2. Inactive document set

Make sure that the document set is active and scheduled. Check the status of the last run, which is
displayed in the Document Set listing. If it does not say “completed,” there may be an error with the
run. Check the CIS log file.

To force a run, use the “Start Processing” command of Content Intelligence for the Document Set. Check
the number of documents processed in the last run. The estimate should equal the number of
documents in the DropZone, and the number processed will be the number that met the threshold. If
the numbers are inaccurate, check the CIS log.


Tagging Seems Incorrect

The keywords and locations are coming from all the taxonomies. There may be some categories in
other taxonomies affecting the outcome. Either change the additional taxonomy to “offline” through
the Taxonomy property screen, or delete the taxonomy.

There are a lot of extraneous numbers in the keywords. The Intellisophic taxonomies include numbers
as unique identifiers to ensure differentiation for categories with the same name in separate
taxonomies. No workaround other than to manually create a taxonomy.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:170
posted:8/7/2011
language:English
pages:31