The answer and challenges
For AI to work for us and alert us upfront, it ought to have good high quality, dependable information over time, and this information might be retrieved from our traditional logs when any occasion is triggered. Ping and SNMP would solely present information in polling time intervals of two or three minutes, and it looks like a blurred actuality; they gained’t inform us the present state or projected states on tendencies.
So the analysis started: What stage of knowledge logs ought to we be gathering? Info stage. We have been gathering logs from round 2,500 international gadgets, and so we have to scale for capability servers, which isn’t an issue in a big group.
We have been now gathering each informational stage log from our SD-WAN routers, which included SLA violations, CPU spikes on {hardware}, bandwidth threshold will increase, logging configuration adjustments each second and even gathering netflow…as a result of let’s simply agree brownouts normally conceal between “person” and “app,” not inside a single gadget.
SD-WAN routers have SLA displays configured for DNS, HTTPS and SaaS software displays, which labored as our artificial emulators and created a log at any time when SLA breached for a layer 7 service or when any web site is “sluggish,” which might assist us monitor layer 7 protocols from a router.
From our radius/TACACS servers, we have been receiving logs on safety violations on layer two ports and MAC flooding(sometimes). Not simply that, we even collected granular information like sign power, SSID, channel bandwidth, and variety of purchasers on the entry level on our wi-fi infrastructure, all due to a vendor API that made fast work of this. Equally, for our switches, we have been gathering information from layer two VLAN adjustments to OSPF convergence, from radius server well being to interface statistics.
After all of the heavy lifting, we have been in a position to get all this information into an information lake, but it surely turned out to be extra like a swamp, as the information had 10 completely different timestamps and it was not labeled accurately. And AI with out labels is wishful pondering.
