Overview

Z Collective is a group of academics, industry professionals, federal employees, and volunteers, all working together to solve some of cybersecurity’s most pressing challenges. We bring a diverse range of experiences, methodologies, disciplines, and approaches to our research and disseminate it in ways we deem most appropriate and helpful for our communities. Whether through publication, free and closed source models, free and open source models, conference talks and presentations, academic publishing, classroom instruction, or productization, we are passionate about getting the right solutions into the right hands in the right way.
At the heart of our research is the desire to build right-sized AI and ML models, services, and solutions to solve cybersecurity challenges. Our research interests include:
- Cloud native security
- Network security
- Supply chain security
- Fraud and abuse detection
- Disinformation and propaganda detection
- Hacker forum analytics
If you’re interested in learning more, collaborating with us, or using our research in your organization, please don’t hesitate to reach out.
What is User Entity Behavioral Analytics?
User Entity Behavioral Analytics (UEBA) studies how users interact with devices, services, and authentication mechanisms within a system or organization. Its primary goal is to establish users’ baseline behaviors to detect unusual activities such as suspicious login locations, unexpected resource access, or unusual patterns across user groups.
How is UEBA used in cyber security?
In cybersecurity, UEBA helps detect various threats: malicious insiders, compromised accounts, excessive access permissions, and coordinated malicious activities.
What techniques and methodologies can be used for UEBA?
User Behavior Analytics (UBA), a related field commonly used in marketing, helps understand system activity patterns. UBA employs techniques like a priori algorithms, Bayesian modeling, and FP trees for behavioral cohorting—analyzing how certain actions predict future behaviors.
For example, UBA might examine whether users who engage with freemium products n times in one day are more likely to become paying customers. This analysis falls under behavioral cohorting methodology.
UEBA builds on these techniques while introducing newer approaches: deep learning, Bayesian deep learning, graph-based, and probabilistic methods. Recent advances in deep learning have sparked interest in mapping user activity across devices to detect anomalies and make predictions. Key goals of deep learning approaches include:
- Creating high-quality embeddings of the user activity
- Clustering user activity
- Finding outliers
- Determining patterns in baseline user activity
- Enabling explainability
How our research is helping the field?
Z collective brings UEBA expertise to organizations focused on preventing coordinated abuse and fraud in everyday technology. If you’d like to learn more, implement our research, or collaborate with us to advance UEBA, please reach out!
Z collective engages in emerging and state-of-the-art natural language processing (NLP) research, including developing novel large language models (LLMs), novel small language models (SLMs), and leveraging foundation models in safe and appropriate ways. We apply our NLP research to several domains including disinformation and propaganda detection, hacker forum analytics, and log analysis. We collaborate with threat intelligence partners, academic institutions, and industry to bring right-sized NLP solutions to address key threats.
We have advanced NLP capabilities built over a half decade of work, ranging from novel embeddings and text representation techniques, to formulating custom models that capture both lexical and phrasal semantics.
If your organization has a challenging cybersecurity problem that requires processing and analyzing large amounts of data and deriving actionable insights from it, get in touch with us!
What are time series?
Time series are ways of modeling events or actions over time. Many of us use time series every day- investigating CPU usage on our computers when they’re getting slow, reviewing stock market trends, and looking at weather forecasts over the course of several months.
How are time series used in cyber security?
Time series are used in cybersecurity to track changes in system behavior over time. They can model resource utilization (for example, the load on a hypervisor over the course of a day), the number of users registering on a platform over the course of a month, or the number of failed authentication requests made to an API or service. Modeling data as timeseries is impactful as they can be used to detect anomalies that may indicate an attack, such as a sudden increase in network traffic or a change in user behavior.
What are anomalies in time series?
Although time series are extremely useful for understanding baseline behavior of users, systems, and services, they are perhaps most useful for detecting anomalies. Anomalies in time series are data points that deviate significantly from the expected pattern, and can be indicative of attacks, unwanted behavior, service degradation, and more.
Several challenges exist when using time series to detect anomalous events. One challenge of time series anomaly detection is determining whether an anomaly is a single point or a sequence of points. This can be difficult to determine, as both types of anomalies can be caused by various factors, and methodolgy that detects point anomalies may not be suited to detecting sequence anomalies, and vice versa.
How to know which method to use?
Another challenge is determining which anomaly detection method to use. There are many different methods available, and each has its own strengths and weaknesses.
Dynamic thresholding
Dynamic thresholding is also a challenge, as the threshold for what constitutes an anomaly can change over time. This can be due to changes in system behavior or changes in the environment.
How does one detect anomalies in time series?
Anomaly detection in time series data is a non-trivial task and decades of research has been conducted in the pursuit of better time series anomaly detection techniques. Anomaly detection in time series data can be done using various methods, including statistical models, machine learning algorithms, and deep learning models.
Statistical models
Statistical models can be used to identify anomalies by comparing the observed data to a predicted model. For example, the ARIMA model can be used to forecast future values and identify deviations from the forecast.
Deep learning models
Deep learning models, such as SR-CNN, can be used to learn patterns in time series data and identify anomalies that deviate from these patterns.
GPT models
GPT models can also be used for anomaly detection by analyzing the context of the data and identifying anomalies that are inconsistent with the expected context.
Z Collective conducts ongoing time series anomaly detecting research, cataloging and testing emerging techniques, and testing and benchmarking established techniques. We’ve established deep methodological expertise in both deep learning methods and statistical methods. If your organization is looking for guidance and expertise in time series anomaly detection methodology don’t hesitate to get in touch.
Z Collective collaborates with industry, threat intel, and government to advance critically needed network security initiatives by building novel AI/ML architectures and models for pressing challenges. Specifically, we have worked to create models that detect domains used for C2 communications by malware, achieving higher accuracy and precision than state-of-the-art approaches. We have also conducted research to detect data exfiltration over DNS, eBPF-based network anomaly detection, and are advancing service mesh-based detection of threats in cloud native environments.
If your organization is interested in implemented threat detection models that can analyze vast amounts of network traffic, we would be happy to speak with you!
At the heart of our research interests is applying AI/ML for cloud native security. These efforts include detecting anomalies in production Kubernetes clusters, and analyzing service mesh data to create dynamic rules for analysts, preventing toil and static, heuristics-based rules for network security.
We are always interested in partnering with industry to advance this critically needed work. If your organization runs and maintains production Kubernetes clusters and are interested in pioneering AI/ML solutions for cloud native security, please drop us a line.