Streaming real-time data at scale and processing it efficiently is critical to cybersecurity organizations like SecurityScorecard. Jared Smith, Senior Director of Threat Intelligence, and Brandon Brown, Senior Staff Software Engineer, Data Platform at SecurityScorecard, discuss their journey from using RabbitMQ to open-source Apache Kafka® for stream processing. As well as why turning to fully-managed Kafka on Confluent Cloud is the right choice for building real-time data pipelines at scale.
SecurityScorecard mines data from dozens of digital sources to discover security risks and flaws with the potential to expose their client’ data. This includes scanning and ingesting data from a large number of ports to identify suspicious IP addresses, exposed servers, out-of-date endpoints, malware-infected devices, and other potential cyber threats for more than 12 million companies worldwide.
To allow real-time stream processing for the organization, the team moved away from using RabbitMQ to open-source Kafka for processing a massive amount of data in a matter of milliseconds, instead of weeks or months. This makes the detection of a website’s security posture risk happen quickly for constantly evolving security threats. The team relied on batch pipelines to push data to and from Amazon S3 as well as expensive REST API based communication carrying data between systems. They also spent significant time and resources on open-source Kafka upgrades on Amazon MSK.
Self-maintaining the Kafka infrastructure increased operational overhead with escalating costs. In order to scale faster, govern data better, and ultimately lower the total cost of ownership (TOC), Brandon, lead of the organization’s Pipeline team, pivoted towards a fully-managed, cloud-native approach for more scalable streaming data pipelines, and for the development of a new Automatic Vendor Detection (AVD) product.
Jared and Brandon continue to leverage the Cloud for use cases including using PostgreSQL and pushing data to downstream systems using CSC connectors, increasing data governance and security for streaming scalability, and more.
EPISODE LINKS
Ask Confluent #18: The Toughest Questions ft. Anna McDonald
Joining Forces with Spring Boot, Apache Kafka, and Kotlin ft. Josh Long
Building an Apache Kafka Center of Excellence Within Your Organization ft. Neil Buesing
Creating Your Own Kafka Improvement Proposal (KIP) as a Confluent Intern ft. Leah Thomas
Confluent Platform 6.0 | What's New in This Release + Updates
Using Event Modeling to Architect Event-Driven Information Systems ft. Bobby Calderwood
Using Apache Kafka as the Event-Driven System for 1,500 Microservices at Wix ft. Natan Silnitsky
Top 6 Things to Know About Apache Kafka ft. Gwen Shapira
5 Years of Event Streaming and Counting ft. Gwen Shapira, Ben Stopford, and Michael Noll
Championing Serverless Eventing at Google Cloud ft. Jay Smith
Disaster Recovery with Multi-Region Clusters in Confluent Platform ft. Anna McDonald and Mitch Henderson
Developer Advocacy (and Kafka Summit) in the Pandemic Era
Apache Kafka 2.6 - Overview of Latest Features, Updates, and KIPs
Testing ksqlDB Applications ft. Viktor Gamov
How to Measure the Business Value of Confluent Cloud ft. Lyndon Hedderly
Modernizing Inventory Management Technology ft. Sina Sojoodi and Rohit Kelapure
Fault Tolerance and High Availability in Kafka Streams and ksqlDB ft. Matthias J. Sax
Benchmarking Apache Kafka Latency at the 99th Percentile ft. Anna Povzner
Open Source Workflow Automation with Apache Kafka ft. Bernd Ruecker
Growing the Event Streaming Community During COVID-19 ft. Ale Murray
Create your
podcast in
minutes
It is Free
Insight Story: Tech Trends Unpacked
Zero-Shot
Fast Forward by Tomorrow Unlocked: Tech past, tech future
The Unbelivable Truth - Series 1 - 26 including specials and pilot
Well There‘s Your Problem