Blog | Spider

Progressive loading of Timeline for massive data

April 21, 2022 · 2 min read

Timeout in Spider on production data :(

In Flowbird production environment, Spider is capturing around 400GB per day. Impressive!

But it is a challenge by itself! If the capture works great, the UI, map and statistics are getting timeouts when loading the timeline around the whole day.

Progressive loading with composite aggregation

To improve the situation, I've added an option to the timeline (first) to do progressive loading of the data, with pagination. It uses the composite aggregation of Elasticsearch, and the results is updated the existing timeline data whenever possible, instead of resetting the whole timeline everytime

The settings is activated by default and may be deactivated in the display settings:

Result: Pros & Cons mix

Pros

The progressive loading is really visible
It makes cancelling the timeline loading expensive query quite easy
It limits the load on the server when the user navigate quickly on the timeline

Drawback:

Loading the timeline data gets longer. Pagination implies many requests made to the server in sequence, with a request overhead for each call.

I think it is worth it.

Demo

Manage secured Elasticsearch

April 15, 2022 · One min read

With new Elasticsearch releases: 8 and on, security is active by default on the cluster:

User authentication
TLS with mutual auth between Elasticsearch nodes

In order to be ready to use it, I upgraded all microservices using Elasticsearch to support all authentication methodes supported by ES Javascript client. Everything is managed by the central setup, that expect Elasticsearch setup to required authentication.

TLS may also be used to connect to Elasticsearch, with self signed certificates if needed.

New parameter to protect against too big packetLots

April 10, 2022 · One min read

Too protect servers and UI against communications that would be too big, a new protection exists:

When a TCP packetLot gets over a certain limit, the packets are marked as plTooBig to be then avoided in parsing.

Packet flag is shown on Packet details
Packet is colored in grey in TCP details
Parsed communications are market as INCOMPLETE (since they miss packets), and capture status reflects the errors in parsing

This avoids loading too many subsequent packets in memory for parsing. Default value is set to 10 MB. Which should be quite enough.

Associated Whisperer version is 5.1.0.

New plugin hook - client-enrichment-plugin

April 8, 2022 · One min read

A new client-enrichment-plugin hook is available.

This plugin allows to resolve client identification extracted from JWT or Basic auth against an external system.

The enrichment is applied to:

Grid, and export
Filters
Details
Stats
And map!

Example

This sample plugin decodes Spiders own identifiers of JWT tokens to display name of Whisperers and Users.

It is available here: https://gitlab.com/spider-plugins/spd-client-resolver

In grid & filters:

In map:

Plugin API

{
   inputs: {identification, mode},
   parameters: {},
   callbacks: {setDecodedClient, onShowInfo, onShowError, onShowWarning, onOpenResource},
   libs: {React}
}

identification: value of the identification to resolve (JWT sub field, or Basic auth login)
mode: 'REACT' or 'TEXT', depending on the expected output
onOpenResource({id, title, contentType, payload}): callback to open a downloaded payload in details panel. XML and JSON are supported.
- id: id of the resource, to manage breadcrumb
- title: displayed at the top of details panel
- contentType: application/json or application/xml are supported
- payload: the resource content (string)

Logo change

March 20, 2022 · One min read

After several advices from people that found Spider logo a too 'evil' because of the strong eyes, I studied how to change it for the best.

I changed eays position and look so that the Spider now seems too look below it, watchfull for what happens on its web.

Tell me what you think!

Anonymous statistics as a user choice

March 13, 2022 · One min read

User may now chose on its own to anonymize its usage statistics. It is available in the Settings panel.

The statistics are anonymized so that no link can be made between the statistics and the user. UserId and email are replaced by a client side generated UUID that is regenerated at each user login, or when the anonymous stats flag is changed.

Spider is getting more and more ready.

Another scaling limiting feature removed :)

March 13, 2022 · One min read

When too many TCP sessions or HTTP communications are parsed in the same minute, their count could overflow Node.js or Redis capabilities to manage in a single call.

I couldn't see it before since I had to scale parsing services with many more instances that now. Now parsing services are more efficient. They can handle much more load by each single replica, but then, they reach a limit in scaling!

After much study and not finding a way to simplify the data sent, I decided to... chunk the calls in pieces ;) Simple solution =)

So now, big loads do not generate errors and are absorb quite smoothly.

Last statistics are showing that Spider processes 400 MB/min with only 8 CPU cores fully used :) Nice!

Consent validation

March 13, 2022 · One min read

I've just added Consent validation of Privacy terms.

This complies with GDPR regulations to inform the user of collected private data, and the processing behind.

Consent is mandatory to use Spider
User consent is saved on the server and requested again when the terms changes

Date of consent and terms may be accessed later on the new Help page. (See next post)

Grid link UX improvement

March 13, 2022 · One min read

When building training support I found that automatic filter when clicking the link icon in the grid was not using smartfilters.

I changed that quickly :) So that from a /controlRights item in the grid to the fan out display in the sequence diagram, you're only 1 click away !

New Help details

March 13, 2022 · One min read

Instead of only redirecting to https://spider-analyzer.io, now the Help page provides more information.

The classic About terms.
The Changelog - that moved position from an independent details to here.
The list of Free and Open Source tools and libraries used with their licences.
- It takes a bit of time to... render ;)

The content is driven by a jsonld public manifest file visible in the Manifest tab.

Timeout in Spider on production data :(​

Progressive loading with composite aggregation​

Result: Pros & Cons mix​

Demo​

Example​

Plugin API​

Timeout in Spider on production data :(

Progressive loading with composite aggregation

Result: Pros & Cons mix

Demo

Example

Plugin API