ASPEN console configuration options can be accessed from the Tools->Options menu item. Current list of supported options is illustrated on the image below.
Illustration 2: ASPEN console configuration options.
ASPEN console configuration options can be accessed from the Tools->Options menu item. Current list of supported options is illustrated on the image below.
Illustration 2: ASPEN console configuration options.
ASPEN console is modular software, meaning that it consists of a number of modules, called plugins, that perform specific system functions. Modules can be installed, de-installed and updated dynamically using ASPEN public update servers.
Illustration 3: List of installed modules.
ASPEN parsing engine works by recursively traversing the tree of parsing rules with each matched rule being able to extract some piece of information from raw log and add it to security event. Detailed parsed workflow follows below.
Parsing ruleset editor module is accessed using Administration->Parsing rules->Parsing ruleset editor menu item.
Illustration 4: Parsing ruleset editor module.
ASPEN allows operators to create arbitrary number of different rulesets, possibly representing different data sources, use cases, etc. Each instance of the parser process on the server is configured to use exactly one of these rulesets. All the rulesets can be accessed and edited using parsing ruleset editor module.
Rules are organized in a hierarchical groups and are evaluated in order, starting from the first rule in the first group. Each rule is evaluated against the raw log being parsed and, if it matches, actions specified by the rule are executed. Evaluation stops after the successful match, unless rule specifies differently.
Parsing rule is defined by following items:
Parsing rule matches the raw log if both regular expression and all the meta rules match the raw log being parsed. If the rule doesn't contain regular expression, then regular expression part always matches. Similarly, the rule without meta rules will match if only regular expression matches the raw log text.
In order to provide maximum power to the operators, ASPEN features expression language than can be used in meta rule expressions and assignment values.
The expression language supports following constructs:
| Literal values | |
|---|---|
| “Hello World” | String |
| 123 | Integer |
| 6.0221415E+23 | Exponential notation |
| 0x7FFFFFFF | Hexadecimal format |
| Relational operators | |
| < | Less than |
| > | Greater than |
| <= | Less than or equal |
| >= | Greater than or equal |
| == | Equal |
| != | Not equal |
| Logical operators | |
| and | Conjunction |
| or | Disjunction |
| not | Negation |
| Mathematical operators | |
| + | Addition |
| - | Subtraction |
| * | Multiplication |
| / | Division |
| % | Modulus |
| ^ | Exponentiation |
Beside these constructs, certain objects can be accessed within the expression:
| Object | Description |
|---|---|
| meta | This is the map of keys and values representing additional (meta) data contained in the raw log. For example, raw logs received using syslog contain following keys:
|
| groups | This object can only be used in assignment value expressions. It is an array that contains values matched by regular expression groups. group[0] always contains whole raw log text. group[1] and on contain only part of raw log text contained in the appropriate group. |
Beside these constructs, certain objects can be accessed within the expression:
| Function | Description |
|---|---|
| tag | Security event can be tagged with zero or more tags. This function is used to add tags to the security event. If one or more of the tags are already present in the security event, they will not be repeated. Function is invoked in assignment values like this:tag(“tag1, tag2, tag3”) |
| untag | Removes one or more tags from the security event. Invoked like: untag(“tag2, unknown”) |
| retag | Strips all existing tags and applies only the tags supplied as parameters to the function. Invoked like:retag(“security, auth, login”) |
Parsing ruleset editor module also contains regular expression debugger. This tool allows the operator to supply sample log text and test whether the regular epression matches the sample text.
If sample text is matched, the table below the regular expression will contain the list of matched groups and their values.
If sample text is not matched, the text will be colored indicating which parts of the text were matched (green) and which were not (red).
Illustration 5: Regular expressions debugger.
Summary of regular-expression constructs:
| Construct | Matches |
|---|---|
| Characters | |
| x | The character x |
| \\ | The backslash character |
| \0n | The character with octal value 0n (0 <= n <= 7) |
| \0nn | The character with octal value 0n (0 <= n <= 7) |
| \0mnn | The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7) |
| \xhh | The character with hexadecimal value 0xhh |
| \uhhhh | The character with hexadecimal value 0xhhhh |
| Construct | Matches |
| \x{h...h} | The character with hexadecimal value 0xh...h |
| \t | The tab character ('\u0009') |
| \n | The newline (line feed) character ('\u000A') |
| \r | The carriage-return character ('\u000D') |
| \f | The form-feed character ('\u000C') |
| \a | The alert (bell) character ('\u0007') |
| \e | The escape character ('\u001B') |
| \cx | The control character corresponding to x |
| Character classes | |
| [abc] | a, b, or c (simple class) |
| [^abc] | Any character except a, b, or c (negation) |
| [a-zA-Z] | a through z or A through Z, inclusive (range) |
| [a-d[m-p]] | a through d, or m through p: [a-dm-p] (union) |
| [a-z&&[def]] | d, e, or f (intersection) |
| [a-z&&[^bc]] | a through z, except for b and c: [ad-z] (subtraction) |
| [a-z&&[^m-p]] | a through z, and not m through p: [a-lq-z](subtraction) |
| Predefined character classes | |
| . | Any character (may or may not match line terminators) |
| \d | A digit: [0-9] |
| \D | A non-digit: [^0-9] |
| \s | A whitespace character: [ \t\n\x0B\f\r] |
| \S | A non-whitespace character: [^\s] |
| \w | A word character: [a-zA-Z_0-9] |
| \W | A non-word character: [^\w] |
| POSIX character classes (US-ASCII only) | |
| \p{Lower} | A lower-case alphabetic character: [a-z] |
| \p{Upper} | An upper-case alphabetic character:[A-Z] |
| \p{ASCII} | All ASCII:[\x00-\x7F] |
| \p{Alpha} | An alphabetic character:[\p{Lower}\p{Upper}] |
| \p{Digit} | A decimal digit: [0-9] |
| \p{Alnum} | An alphanumeric character:[\p{Alpha}\p{Digit}] |
| \p{Punct} | Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ |
| \p{Graph} | A visible character: [\p{Alnum}\p{Punct}] |
| Construct | Matches |
| \p{Print} | A printable character: [\p{Graph}\x20] |
| \p{Blank} | A space or a tab: [ \t] |
| \p{Cntrl} | A control character: [\x00-\x1F\x7F] |
| \p{XDigit} | A hexadecimal digit: [0-9a-fA-F] |
| \p{Space} | A whitespace character: [ \t\n\x0B\f\r] |
| Classes for Unicode scripts, blocks, categories and binary properties | |
| \p{IsLatin} | A Latin script character (script) |
| \p{InGreek} | A character in the Greek block (block) |
| \p{Lu} | An uppercase letter (category) |
| \p{IsAlphabetic} | An alphabetic character (binary property) |
| \p{Sc} | A currency symbol |
| \P{InGreek} | Any character except one in the Greek block (negation) |
| [\p{L}&&[^\p{Lu}]] | Any letter except an uppercase letter (subtraction) |
| Boundary matchers | |
| ^ | The beginning of a line |
| $ | The end of a line |
| \b | A word boundary |
| \B | A non-word boundary |
| \A | The beginning of the input |
| \G | The end of the previous match |
| \Z | The end of the input but for the final terminator, if any |
| \z | The end of the input |
| Greedy quantifiers | |
| X? | X, once or not at all |
| X* | X, zero or more times |
| X+ | X, one or more times |
| X{n} | X, exactly n times |
| X{n,} | X, at least n times |
| X{n,m} | X, at least n but not more than m times |
| Reluctant quantifiers | |
| X?? | X, once or not at all |
| X*? | X, zero or more times |
| Construct | Matches |
| X+? | X, one or more times |
| X{n}? | X, exactly n times |
| X{n,}? | X, at least n times |
| X{n,m}? | X, at least n but not more than m times |
| Possessive quantifiers | |
| X?+ | X, once or not at all |
| X*+ | X, zero or more times |
| X++ | X, one or more times |
| X{n}+ | X, exactly n times |
| X{n,}+ | X, at least n times |
| X{n,m}+ | X, at least n but not more than m times |
| Logical operators | |
| XY | X followed by Y |
| X|Y | Either X or Y |
| (X) | X, as a capturing group |
| Back references | |
| \n | Whatever the nth capturing group matched |
| \k< name > | Whatever the named-capturing group "name" matched |
| Quotation | |
| \ | Nothing, but quotes the following character |
| \Q | Nothing, but quotes all characters until \E |
| \E | Nothing, but ends quoting started by \Q |
| Special constructs (named-capturing and non-capturing) | |
| (?< name >X) | X, as a named-capturing group |
| (?:X) | X, as a non-capturing group |
| (?idmsuxU-idmsuxU) | Nothing, but turns match flags i d m s u x U on - off |
| (?idmsux-idmsux:X) | X, as a non-capturing group with the given flags i d m s u x on - off |
| (?=X) | X, via zero-width positive lookahead |
| (?!X) | X, via zero-width negative lookahead |
| (?<=X) | X, via zero-width positive lookbehind |
| (? | X, via zero-width negative lookbehind |
| Construct | Matches |
| (?>X) | X, as an independent, non-capturing group |
Backslashes, escapes, and quoting
The backslash character ('\') serves to introduce escaped constructs, as defined in the table above, as well as to quote characters that otherwise would be interpreted as unescaped constructs. Thus the expression \\ matches a single backslash and \{ matches a left brace.
It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language. A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.
Backslashes within string literals in Java source code are interpreted as required by The Java™ Language Specification as either Unicode escapes (section 3.3) or other character escapes (section 3.10.6) It is therefore necessary to double backslashes in string literals that represent regular expressions to protect them from interpretation by the Java bytecode compiler. The string literal "\b", for example, matches a single backspace character when interpreted as a regular expression, while "\\b" matches a word boundary. The string literal "\(hello\)" is illegal and leads to a compile-time error; in order to match the string (hello) the string literal "\\(hello\\)" must be used.
Character Classes
Character classes may appear within other character classes, and may be composed by the union operator (implicit) and the intersection operator (&&). The union operator denotes a class that contains every character that is in at least one of its operand classes. The intersection operator denotes a class that contains every character that is in both of its operand classes.
The precedence of character-class operators is as follows, from highest to lowest:
| 1 | Literal escape | \x |
| 2 | Grouping | [...] |
| 3 | Range | a-z |
| 4 | Union | [a-e][i-u] |
| 5 | Intersection | [a-z&&[aeiou]] |
Note that a different set of metacharacters are in effect inside a character class than outside a character class. For instance, the regular expression . loses its special meaning inside a character class, while the expression - becomes a range forming metacharacter.
Line terminators
A line terminator is a one- or two-character sequence that marks the end of a line of the input character sequence. The following are recognized as line terminators:
If UNIX_LINES mode is activated, then the only line terminators recognized are newline characters.
The regular expression . matches any character except a line terminator unless the DOTALL flag is specified.
By default, the regular expressions ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire input sequence. If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input. When in MULTILINE mode $ matches just before a line terminator or the end of the input sequence.
Groups and capturing
Group number
Capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C))), for example, there are four such groups:
| 1 | ((A)(B(C))) |
| 2 | (A) |
| 3 | (B(C)) |
| 4 | (C) |
Group zero always stands for the entire expression.
Capturing groups are so named because, during a match, each subsequence of the input sequence that matches such a group is saved. The captured subsequence may be used later in the expression, via a back reference, and may also be retrieved from the matcher once the match operation is complete.
Group name
A capturing group can also be assigned a "name", a named-capturing group, and then be back-referenced later by the "name". Group names are composed of the following characters. The first character must be a letter.
A named-capturing group is still numbered as described in Group number.
he captured input associated with a group is always the subsequence that the group most recently matched. If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails. Matching the string "aba" against the expression (a(b)?)+, for example, leaves group two set to "b". All captured input is discarded at the beginning of each match.
Groups beginning with (? are either pure, non-capturing groups that do not capture text and do not count towards the group total, or named-capturing group.
Unicode support
A Unicode character can also be represented in a regular-expression by using its Hex notation(hexadecimal code point value) directly as described in construct \x{...}, for example a supplementary character U+2011F can be specified as \x{2011F}, instead of two consecutive Unicode escape sequences of the surrogate pair \uD840\uDD1F.
Unicode scripts, blocks, categories and binary properties are written with the \p and \P constructs as in Perl. \p{prop} matches if the input has the property prop, while \P{prop} does not match if the input has that property.
Scripts, blocks, categories and binary properties can be used both inside and outside of a character class.
Scripts are specified either with the prefix Is, as in IsHiragana, or by using the script keyword (or its short form sc)as in script=Hiragana or sc=Hiragana.
Blocks are specified with the prefix In, as in InMongolian, or by using the keyword block (or its short form blk) as in block=Mongolian or blk=Mongolian.
Categories may be specified with the optional prefix Is: Both \p{L} and \p{IsL} denote the category of Unicode letters. Same as scripts and blocks, categories can also be specified by using the keyword general_category (or its short form gc) as in general_category=Lu or gc=Lu.
Binary properties are specified with the prefix Is, as in IsAlphabetic. The supported binary properties by Pattern are
Predefined Character classes and POSIX character classes are in conformance with the recommendation of Annex C: Compatibility Properties of Unicode Regular Expression , when UNICODE_CHARACTER_CLASS flag is specified.
| Classes | Matches |
|---|---|
| \p{Lower} | A lowercase character:\p{IsLowercase} |
| \p{Upper} | An uppercase character:\p{IsUppercase} |
| \p{ASCII} | All ASCII:[\x00-\x7F] |
| \p{Alpha} | An alphabetic character:\p{IsAlphabetic} |
| \p{Digit} | A decimal digit character:p{IsDigit} |
| \p{Alnum} | An alphanumeric character:[\p{IsAlphabetic}\p{IsDigit}] |
| \p{Punct} | A punctuation character:p{IsPunctuation} |
| \p{Graph} | A visible character: [^\p{IsWhite_Space}\p{gc=Cc}\p{gc=Cs}\p{gc=Cn}] |
| \p{Print} | A printable character: [\p{Graph}\p{Blank}&&[^\p{Cntrl}]] |
| \p{Blank} | A space or a tab: [\p{IsWhite_Space}&&[^\p{gc=Zl}\p{gc=Zp}\x0a\x0b\x0c\x0d\x85]] |
| \p{Cntrl} | A control character: \p{gc=Cc} |
| \p{XDigit} | A hexadecimal digit: [\p{gc=Nd}\p{IsHex_Digit}] |
| \p{Space} | A whitespace character:\p{IsWhite_Space} |
| \d | A digit: \p{IsDigit} |
| \D | A non-digit: [^\d] |
| \s | A whitespace character: \p{IsWhite_Space} |
| \S | A non-whitespace character: [^\s] |
| \w | A word character: [\p{Alpha}\p{gc=Mn}\p{gc=Me}\p{gc=Mc}\p{Digit}\p{gc=Pc}] |
| \W | A non-word character: [^\w] |
Windows Security Event 528 – Successful logon onto domain (russian)
Let's analyze complete flow of source raw log and parsed security event through the parsing engine and see what happens at each intermediate step.
Each raw log begins it's journey through the parsing engine at the first rule in the first group of the parsing ruleset. Source raw log is acompanied on this journey by a new security event. At the start, security event is completely empty and it is the job of parsing rules to populate it, as the pair moves though the parsing ruleset.
In the case of IT security parsing ruleset, this starting position would look like this:
As was already discussed above, parsing engine works by matching incoming raw logs against a hierarchy of parsing rules. Each parsing rule contains two optional checks for matching raw logs:
Parsing rule matches the raw log if both of these checks successfully match the raw log.
If either of the two is missing (regular expression is empty or there are no active meta rules) then it will not be included and only the other check will determine if the parsing rule matches the raw log or not.
So, let's examine our first rule. We can see that it does not contain any meta rules, so, only the regular expression will determine the match.
The regular expression is [\S\s]+, which means match any sequence of whitespace and non-whitespace characters, which, effectively, means, match any text.
With no meta rules and regular expression that matches any text, we have a rule that will match any raw log. What happens with those logs, once they are matched?
Parsing engine does two things when parsing rules matches raw log:
One more reminder, possible actions for parsing rules are:
Back to our rule, because the action is set to “Continue Parsing”, the engine will continue matching raw log against other rules in the Start group. Before that, assignments from this rule will be executed, populating security event will some basic data.
The next rule in the Start group is “Select Windows EventLogs”.
This is another rule with a regular expression that matches everything. This time, we have a meta rule that matches only raw logs of a “MSWinEventLog” type. Again, we have a bunch of assignments that populate security event with additional data and then we have a “Jump To” action that instruct parsing engine to continue parsing raw log with rules from “EventLogs” group. Clicking on the “Jump to EventLogs” button in the parsing ruleset editor will show us rules in that group and we can resume analysis from there.
Going through the rules in this group, we can observe a pattern … first rule is usually “pass-through” (action set to Continue) with assignments that set common values for all rules in the group followed by rules for specific events and with another catch-all rule that either warns operators or simply silences (action to Silent) unmatched events.
Looking at the rule called “Security logs” we can see an example of a rule that doesn't have regular expression, so, it will not be checked and only meta rules will determine whether the rule matches the log or not.
Since there are no assignments, security event will remain as is and parsing will continue with rules in “Security” group.
Security group contains lots of rules for matching specific Windows Security EventLogs by event ID and other flags.
There are also rules that match multiple event IDs, which is fine, as long as the log text has the same format for all different IDs and can be parsed by the same regular expression.
For this analysis we will study in more detail the rule that matches Windows Security EventLog ID 528 – Successful logon onto domain, localized to russian language.
Again, we have three basic parts to this rule: first, we have meta rule that matches only Windows Security EventLogs with event ID 528; second, we have regular expression that matches the log text, extracting interesting pieces of information using the regular expressions grouping mechanism (expressions such as “([\S ]+)”). Text that matches regular expression in the parentheses will be extracted into dedicated group.
Third part are assignments which assign extracted values from log text to security event. We can see here how groups matched by regular expression are used as an expression that is assigned to a key in the security event. For example, looking at the second assignment, we can see that key “username” in the category “who” in the security event will be set to the value of the first group. Let's look at the relevant part of the regular expression:
… Успешный вход в систему:\s+Пользователь:\s+([\S ]+?)\s+Домен…
Let's interpret this segment of regular expression: match literal text “Успешный вход в систему:” followed by one or more whitespaces, then literal text “Пользователь:” followed by one or more whitespaces, then take a sequence of one or more non-whitespace or space characters, until you encounter the literal text “Домен” and make a group of that sequence. Since this is the first group definition, it will be numbered 1 (groups[1] in the assignment expression). The same reasoning goes for the rest of the regular expression, with the increasing group numbers.
NOTE: groups[0] always represents entire log text.
After the assignments are executed, this rule's action will continue parsing the log with the rules in the group “Logon Type”.
Looking at these rules we can see that none of them has a regular expression. They are using meta rules to match previously assigned extra.event_id field against pre-defined values and assign descriptive text to the value of what.type field in the security event. Since all the rules have a “Stop parsing” action, regardless of which rule is matched, parsing will stop here.