Method and system is disclosed for identifying related entities based on names of the entities. The method involves grouping one or more entities with similar names in a group. The names of the one or more entities are analyzed for grouping the one or more entities in a group based on similarity between the names.
Method and System for Identifying Related Entities Based on Entities' Names
Disclosed is a method and system for identifying related entities based on names of the entities. The method involves grouping one or more entities with similar names in a group. The names of the one or more entities are analyzed for grouping the one or more entities in a group based on similarity between the names. An entity can be for example, a file in a file system, a music file in a music library, a record in a database or a resource in a multiple shared resource system.
Consider an exemplary scenario wherein the one or more entities are one or more files in a file system. The method analyzes one or more file names of the one or more files.
As illustrated in Fig. 1, the method begins analysis by removing version information contained in a file name of a file to obtain a group name for the file. Thereafter, one or more files with the same group name are grouped together.
Fig. 1
As shown in Fig. 1 for each file name special characters are either removed or changed to blanks in the file name. The special characters can be one or more of
_|+-()[]{}/\"~`!@#$%^*?:;,'. For example, consider a file name as
Project
Scorpion Business-Case ID55 v2.34 Rev F#2.xls. Therefore, in accordance with the method, after removing and replacing the special characters in the file name, the file name is modified as, 20080915 Project Scorpion Business Case ID55 v2 34 Rev F 2 xls.
Thereafter, the modified file name is translated to upper case and multiple spaces in the modified file name are removed to produce a set of single words. Accordingly, the file name is further modified to 20080915 PROJECT SCORPION BUSINESS CASE ID55 V2 34 REV F 2
S.
Moving on, one or more digits present in the file name may be optionally removed to
[20080915
]
_
XL
1
further modify the file name. Subsequently, one or more single character words from the set of words may be removed. As a result, the file name is refined as PROJECT SCORPION BUSINESS CASE ID55 REV XLS.
In an instance, one or more words which belong to a predefined set of special words may also be removed from the set of single words in the file name. For example, the predefined set of special words may include one or more of JAN, JANUARY, FEB, FEBRUARY, MAR, MARCH, APR, APRIL, MAY, JUN, JUNE, JUL, JULY, AUG, AUGUST, SEP, SEPT, SEPTEMBER, OCT, OCTOBER, NOV, NOVEMBER, DEC, DECEMBER, V, VER, VERSION, REV, REVISION, and ITERATION.
Taking into account the above steps, the file name with final modification is PROJECT SCORPION BUSINESS CASE ID55
XL
. In accordance with the method, the fi...