Home

Built for Java™

Built for .NET™

Features

Simian version runs under any Java2 1.4 or higher Java Virtual Machine (JVM) and any Dot Net 1.1 or higher environment, meaning Simian can be run on anything from windows, macOS and linux to zOS.

The distribution contains everything you need to be up and running in minutes:

Aslak Hellesoy has kindly donated a Maven plugin.

Neil Bartlett has kindly donated an Eclipse plugin.

Simian fully supports the following languages:

with partial support for the following languages:

If the file is not of a supported type, it is treated as plain text. This means that you can usually run Simian on just about any type of human-readable file with good results.

Ignores whitespace, curly braces, comments, imports, includes, package declarations, etc.

Supports the following processing options:

OptionLanguagesDefaultPossible valuesDescription
formatterallnoneplain, xml, emacs, vs (visual studio), yaml, nullSpecifies the format in which processing results will be produced.
thresholdall6 integer >= 2Matches will contain at least the specified number of lines.
languagen/anonejava, c#, cs, csharp, c, c++, cpp, cplusplus, js, javascript, cobol, abap, rb, ruby, vb, jsp, html, xml, groovy, asm390Assumes all files are in the specified language
defaultLanguagen/anonejava, c#, cs, csharp, c, c++, cpp, cplusplus, js, javascript, cobol, abap, rb, ruby, vb, jsp, html, xml, groovy, asm390Assumes files are in the specified language if none can be inferred
failOnDuplicationalltruebooleanCauses the checker to fail the current process if duplication is detected
reportDuplicateTextallfalsebooleanPrints the duplicate text in reports
ignoreBlocksallnonestringIgnores all lines between specified START/END markers
ignoreCurlyBracesJava, C#, C, C++, JavaScript, Ruby, GroovyfalsebooleanCurly braces are ignored.
ignoreIdentifiersJava, C#, C, C++, JavaScript, COBOL, Ruby, GroovyfalsebooleanCompletely ignores all identfiers.
ignoreIdentifierCaseJava, C#, C, C++, JavaScript, COBOL, Ruby, GroovytruebooleanMatches identifiers irrespective of case. Eg. MyVariableName and myvariablename would both match.
ignoreRegionsC#falsebooleanIgnore lines between #region/#endregion.
ignoreStringsJava, C#, C, C++, JavaScript, COBOL, Ruby, SQL, GroovyfalsebooleanMyVariable and myvariablewould both match.
ignoreStringCaseJava, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovytrueboolean"Hello, World" and "HELLO, WORLD" would both match.
ignoreNumbersJava, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovyfalsebooleanint x = 1; and int x = 576; would both match.
ignoreCharactersJava, C#, C, C++, JavaScript, COBOL, Ruby, Groovyfalseboolean'A' and 'Z'would both match.
ignoreCharacterCaseJava, C#, C, C++, JavaScript, COBOL, Ruby, Groovytrueboolean'A' and 'a'would both match.
ignoreLiteralsJava, C#, C, C++, JavaScript, COBOL, Ruby, SQL, Groovyfalseboolean'A', "one" and 27.8would all match.
ignoreSubtypeNamesJava, C, Groovy false booleanBufferedReader, StringReader and Reader would all match.
ignoreModifiersJava, C#, C, C++, JavaScript, Groovy truebooleanpublic, protected, static, etc.
ignoreVariableNamesJava, C, Groovy falsebooleanCompletely ignores variable names (field, parameter and local). Eg. int foo = 1; and int bar = 1 would both match
balanceParenthesesJava, C#, C, C++, JavaScript, COBOL, Ruby, SQL, GroovyfalsebooleanEnsures that expressions inside parenthesis that are split across multiple physical lines are considered as one.
balanceCurlyBracesRuby falsebooleanEnsures that expressions inside curly braces that are split across multiple physical lines are considered as one.
balanceSquareBracketsJava, C#, C, C++, JavaScript, Ruby, GroovyfalsebooleanEnsures that expressions inside square brackets that are split across multiple physical lines are considered as one. Defaults to false.

Recognises the following file extensions/language options:

LanguageExtensions
javajava
c sharpcs, c#, csharp
cc, h, m
cppcpp, c++, hpp, cplusplus, inl
rubyrb, ruby
cobolcobol
abapabap
xmlxml, xsl, xsd
jspjsp
aspasp
javascriptjs, javascript
htmlhtml, htm
vbvb, bas, cls, frm
lisplisp, lsp
groovygroovy
textthis is the default when no appropriate language can be determined

Sample Output

Here is an example of the standard output produced by Simian (version 2.6.0) when run against the JDK 9 source code:

Similarity Analyser 2.6.0 - http://www.harukizaemon.com/simian
Copyright (c) 2003-2018 Simon Harris.  All rights reserved.
Simian is not free unless used solely for non-commercial or evaluation purposes.
{failOnDuplication=true, ignoreCharacterCase=true, ignoreCurlyBraces=true, ignoreIdentifierCase=true, ignoreModifiers=true, ignoreStringCase=true, threshold=6}
Found 6 duplicate lines with fingerprint 2340e9b1e2419bcb5516a5a1d9037271 in the following files:
 Between lines 70 and 82 in com/sun/corba/se/PortableActivationIDL/_ServerProxyImplBase.java
 Between lines 70 and 82 in com/sun/corba/se/spi/activation/_ServerImplBase.java
 Between lines 90 and 102 in org/omg/CosNaming/BindingIteratorPOA.java
Found 6 duplicate lines with fingerprint e94fb8a8017a3d05048dcdfb8bce8dff in the following files:
 Between lines 101 and 111 in javax/swing/plaf/synth/SynthOptionPaneUI.java
 Between lines 96 and 106 in javax/swing/plaf/synth/SynthMenuBarUI.java
Found 6 duplicate lines with fingerprint 16485a9bd0994dc56f52735c2395a7b2 in the following files:
 Between lines 290 and 295 in java/time/zone/ZoneRules.java
 Between lines 234 and 239 in java/time/zone/ZoneRules.java
Found 6 duplicate lines with fingerprint 7ca74bcd5707431bd195c0d867f5767e in the following files:
 Between lines 380 and 398 in org/omg/DynamicAny/_DynFixedStub.java
 Between lines 463 and 481 in org/omg/DynamicAny/_DynSequenceStub.java
...
Found 233 duplicate lines with fingerprint 8bc044fa6e21987c76424535dbc1fe47 in the following files:
 Between lines 77 and 377 in javax/swing/plaf/nimbus/TextFieldPainter.java
 Between lines 77 and 377 in javax/swing/plaf/nimbus/PasswordFieldPainter.java
 Between lines 77 and 377 in javax/swing/plaf/nimbus/FormattedTextFieldPainter.java
Found 382 duplicate lines with fingerprint 922ba26b84cbbf0edfabb0e25189c3b4 in the following files:
 Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_sv.java
 Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_es.java
 Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_fr.java
 Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources.java
 Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_ko.java
 Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_zh_CN.java
 Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_pt_BR.java
 Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_zh_TW.java
 Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_de.java
 Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_it.java
 Between lines 81 and 482 in com/sun/org/apache/xalan/internal/res/XSLTErrorResources_ja.java
Found 141070 duplicate lines in 12134 blocks in 2406 files
Processed a total of 775314 significant (2402974 raw) lines in 7714 files
Processing time: 4.818sec

* Results may vary depending on factors such as hardware used, number of duplicate lines, etc.


Java and all Java-based marks are trademarks or registered trademarks of Oracle in the United States and other countries.

.NET and all .NET-based marks are trademarks or registered trademarks of Microsoft® in the United States and other countries.

Copyright (c) 2003-2018 Simon Harris. All rights reserved.