Implementation
Hardened Appliance
INDICA can be implemented using a so-called hardened appliance, which means that the application will be bundled with a secured system that can take care of its security updates by itself. These updates will be sourced from standard system and kernel sources. Apart from that, INDICA will provide updates that will be delivered through api.indica.nland tools.indica.nl. These systems are regularly (PEN-) tested by INDICA and her clients. Also regular audits and reviews are scheduled. INDICA is ISO 9001 and 27001 certified. This automatic update process can be turned off, but that may have consequences for future upgrades and potential changes and support.
Monitoring agents can be installed on the appliance if supported.
Ubuntu Installation
If scalability or more flexibility is desired or even an issue, INDICA may be installed on multiple Ubuntu servers. Some of the components (like Zookeeper or SOLR) can be installed on any Linux flavour. Systems can be managed locally if so desired.
The INDICA software suite contains many components, that will run on the appliance or 1 or more systems.
Spiders
The first part of the INDICA systems contains multiple spiders, that will retrieve data from the different sources. These spiders will discover data, from amongst others files and emails, in the connected sources. Most uses are spiders for fileshares, email, sharepoint, databases and websites. Spiders will discover data and send it to the parsers.
Fileshares
Files from fileshares will be sent to the parsers by the spiders, together with all available meta data. This meta data will contain location, time/date and rights – the Access Control Lists (ACL's).
To be able to read this information, INDICA needs access:
- Service Account
- Firewall access to the fileshare
- Read rights and listing rights to the share and file structure (take care with special rights!)
Email can be connected in different ways, of which the most common way is a direct connection to the Webserver through the Exchange Web Services API. Other possibilities are direct reading of PST or OST files, or a connection with POP3 or IMAP boxes. Sometimes the latter are configured for 'always bcc' archiving. As email protocols and therefore mail servers are not capable in supplying direct access to information, INDICA is forced to create and keep a copy of all the emails and supply that to the indexer for searching. The email spiders will send the entire email to the parser and will keep a copy in archive to be able to supply the original email if necessary.
INDICA needs the following to spider email:
- A mailbox enabled (service) account
- Read rights, impersonation rights
- Correct rights to needed mailboxes or Exchange environment
- Firewall access to EWS interface
- https://support.indica.nl/solution/articles/36000021958-connecting-to-exchange-ews-o365-accounts
- https://support.indica.nl/solution/articles/36000021947-setting-up-exchange-servers
- https://support.indica.nl/solution/articles/36000021943-setting-up-access-to-office365
Sharepoint
When using Sharepoint, we distinguish Hybrid or Office 365 implementations using Graph API and local 2013 or 2106 implementation using the Sharepoint API.
The former offers more flexibility and therefore more information and meta data can be indexed by INDICA.
The latter has many shortcomings and is not being developed further by Microsoft. In most cases documents and rights may be indexed from Sharepoint.
- A service account with Sharepoint administrator rights
- 'full read' rights to all needed sharepoint sites
- firewall access to the Sharepoint environment
Parsers
Parsers will kick into action any time they're needed to get data from files. Email, word, excel and pdf files and so forth will be converted to plain text. Images will be run through image recognition (OCR) systems and from sound files all meta data will be extracted. For all different kinds of files and documents, INDICA contains many parsers and system to collect as much information as possible. At the same time, thumbnails are created from documents.
Parsers forward the information to the indexers.
Indexers
Indexers collect and group all data such as plain text content, meta data, source location, date, rights and other information. Separately some pre treatment of data takes place, by applying multiple classifiers and NLP including entity recognition. Also if needed, privacy and other tokens are recognised at this time. Every document is placed in the index along with all the extra information.
Index
The index itself is a scalable system. Basically it runs on the INDICA appliance, but it can be scaled up and out easily outside the borders of the appliance by taking components out of the appliance. The index contains all information that has been taken out of the connected sources. And only the information that has been taken out of the connected sources. This information is not easily readable, only the graphical user interface and connected API of INDICA can query the index. The index has been set up especially for INDICA use.
User Interface
The Graphical User Interface is a Web application that contains 3 parts. These 3 parts can be maintained within INDICA, but usually the system is connected to a Microsoft Active Directory, Azure AD, Amazon AD or other LDAP-based system, so management of users and groups can take place according to the regular processes.
The user interface of INDICA is available through HTTP, using SSL if needed. Access to the interface is essential.
Admin interface
System administration can be done through the Admin. Basic infrastructural and system settings can be changed in this part, including networking and Microsoft Active Directory settings, user management and case management.
Management interface
Case management can be done by case manager through the Manage interface. Sources can be connected, tags created, audit and regular logging viewed, if needed Privacy Tokens, exports, imports and other classification method management.
Regular user interface
Regular users and reviewer will view data, dashboards and other related items. Queries and part of the INDICA algorithms are being executed in this part of the system. A lot of it is Javascript related, therefore old-fashioned browsers and too strict access will limit functionality.
Updates, support and remote access
Running a recent version of INDICA is preferred to allow optimal access to features and possibilities of the system. Updates and patches are published regularly, and can be pushed to the systems automatically.
If desired, INDICA engineers should have access to the system to provide adequate support and maintenance. INDICA contains a method of providing 'unattended' access, allowing for automatic system maintenance. This access can also be enabled on demand through the Admin interface. A secure connecting will then be established between the INDICA connection server and the INDICA client system. This connection can only be established when an INDICA engineer actively enables the connection and the client system is allowing this – automatically or one-time. The connection between the engineer and the client system allows the engineer to then log on using known or supplied credentials. It's security is therefore multi-layered.