Day 4: PegasosQSVC & detection of vulnerable IoT security cameras
In our initial article, we delved into Quantum Support Vector Machines (QSVM) before meticulously dissecting their intricate anatomy in the subsequent piece. The third article took us on a journey through hyperparameter tuning, leveraging the powerful capabilities of model tracking using tools like Weights and Biases. A fundamental insight gleaned was that the Quantum Kernel is indeed the heart of QSVM. It necessitates the integration of quantum feature maps such as the PauliFeatureMap, ZFeatureMap, and ZZFeatureMap.
As we investigated more, we found out two problems with the repeating pattern of optimizers. First, it makes the model development process take longer. Second, it makes it harder to use Noisy Intermediate-Scale Quantum (NISQ), our best option right now, on cloud-based systems. This is because the repeating nature of optimizers can cause stability issues for long-lasting tasks.
This makes us wonder: Is there a better optimizer that needs fewer repeats? This brings us to Pegasos, short for Primal Estimated sub-GrAdient SOlver for SVM. That’s what we’re going to talk about today.
The optimizer called PegasosQSVC is now used in Qiskit, and that’s a big step forward. We’ll explore this in our article. We will create a situation like the ones cyberdefense experts usually handle and show how this algorithm can be helpful. Think about this: a security weak spot in camera systems can be taken advantage of. It’s not just an imagined problem — it’s a real issue in today’s fast-changing world of cybersecurity.
We’re looking for an ML algorithm that can explore a large IoT network (like a city, state, or country) and find security cameras based on their unique traffic features. After finding them, the next important step is to add them to your list, get them ready for crucial updates, and fix any issues. Finding every security camera is a big job, so we suggest making a binary classifier algorithm. This algorithm can look at network traffic and easily tell if the traffic is coming from a camera. Then, it can start other defense actions.
We will use our collected data to create a standard test to see how well the PegasosQSVC algorithm works for us. This is a great tool to have because CCTV networks that spread across cities and countries are always at risk of being attacked or spied on, and being able to detect these issues quickly and correctly could make a big difference.
We got our example data from the Machine Learning for Cybersecurity Cookbook, Chapter 05. If you want to refer to it or use it again, you can find the train and test datasets here.
Step 1- classic benchmark and dimension reduction
Let's run this cell to make a benchmark with classic XGBoost before feature reduction:
!wget https://dgadata.blob.core.windows.net/dga/iot_devices_test.csv
!wget https://dgadata.blob.core.windows.net/dga/iot_devices_train.csv
import pandas as pd
from sklearn import preprocessing
from xgboost import XGBClassifier
from sklearn.metrics import confusion_matrix
training_data = pd.read_csv("iot_devices_train.csv")
testing_data = pd.read_csv("iot_devices_test.csv")
X_train, y_train = (
    training_data.loc[:, training_data.columns != "is_hackable"].values,
    training_data["is_hackable"],
)
X_test, y_test = (
    testing_data.loc[:, testing_data.columns != "is_hackable"].values,
    testing_data["is_hackable"],
)
testing_data["is_hackable"].unique()
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(training_data["is_hackable"].unique())
y_train_encoded = le.transform(y_train)
y_test_encoded = le.transform(y_test)
from xgboost import XGBClassifier
model = XGBClassifier()
model.fit(X_train, y_train_encoded)
accuracy = model.score(X_test, y_test_encoded)
print(f"accuracy of XGBoost={accuracy}")
# Generate confusion matrix
cm = confusion_matrix(y_test_encoded, y_pred)
cm
If we want to use the PegasosQSVC code effectively, we need to cut down the number of features from 297 to a simpler number, like 40. This is because existing quantum simulators/devices limit how many Qubits we can use. This isn’t a big problem for cybersecurity analytics, though, because reducing the features even makes a tool like XGBoost work better:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import VarianceThreshold, mutual_info_classif, SelectKBest
from sklearn.preprocessing import StandardScaler, LabelEncoder
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
# Load the train and test data
train_data = np.genfromtxt('iot_devices_train.csv', delimiter=',', dtype=None, encoding=None, skip_header=1)
test_data = np.genfromtxt('iot_devices_test.csv', delimiter=',', dtype=None, encoding=None, skip_header=1)
# Define the feature and target variables
X_train = np.array([list(row)[:-1] for row in train_data], dtype=float)
y_train = np.array([row[-1] for row in train_data])
X_test = np.array([list(row)[:-1] for row in test_data], dtype=float)
y_test = np.array([row[-1] for row in test_data])
# Encode the target variables
le = LabelEncoder()
le.fit(np.unique(y_train))
y_train = le.transform(y_train)
y_test = le.transform(y_test)
# Preprocessing data: Normalize data
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Remove constant features
selector = VarianceThreshold()
X_train = selector.fit_transform(X_train)
X_test = selector.transform(X_test)
# Create a random forest classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
# Train the classifier
clf.fit(X_train, y_train)
# Get feature importances
importances = clf.feature_importances_
# Select the top 10 features using mutual info
selector = SelectKBest(mutual_info_classif, k=10)
X_train = selector.fit_transform(X_train, y_train)
X_test = selector.transform(X_test)
# Train the XGBoost model
model = XGBClassifier()
model.fit(X_train, y_train)
# Predict the test data
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
# Generate confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(cm)
Step 2- Using Quantum Pegasos or QSVC
First, we set hyperparameters:
# number of qubits is equal to the number of features
num_samples, num_features = X_train.shape
num_qubits = num_features
# number of steps performed during the training procedure
tau = 100
# regularization parameter
C = 1000
Now we use the PegasosQSVC API, and of course, still in the ideal world of AER without transpilation to real quantum circuits:
from qiskit import BasicAer
from qiskit.circuit.library import ZFeatureMap
from qiskit.utils import algorithm_globals
from qiskit_machine_learning.kernels import FidelityQuantumKernel
from qiskit_machine_learning.algorithms import PegasosQSVC
from sklearn.metrics import confusion_matrix
import time
algorithm_globals.random_seed = 12345
feature_map = ZFeatureMap(feature_dimension=num_qubits, reps=1)
qkernel = FidelityQuantumKernel(feature_map=feature_map)
pegasos_qsvc = PegasosQSVC(quantum_kernel=qkernel, C=C, num_steps=tau)
# Training
pegasos_qsvc.fit(X_train, y_train)
# Testing
y_pred = pegasos_qsvc.predict(X_test)
pegasos_score = pegasos_qsvc.score(X_test, y_test)
# Generate confusion matrix
cm = confusion_matrix(y_test, y_pred)
print(f"PegasosQSVC Accuracy: {pegasos_score}")
print("Confusion Matrix:")
print(cm)
In harnessing this model’s power after hyperparameter tuning for service provision and creating an inference endpoint, we gain the ability to swiftly identify which entities within a giant metropolitan area network might be security cameras. This determination is made by analyzing their traffic patterns and subsequently shortlisting these entities for security hardening.
In our forthcoming article, we will delve into the intricacies of the Variational Quantum Classifier. Keep following our journey as we explore Quantum Cybersecurity Analytics.
Remember to follow our LinkedIn and Twitter accounts, where we will continue to demonstrate how hybrid quantum machine learning is poised to revolutionize the sphere of cyber defense.
Comments