top of page
Writer's pictureMadjid Tehrani

Navigating Quantum Cybersecurity Analytics: A Daily Exploration by CyberSec-DMS

Updated: May 5

Day 2: Anatomy of QSVM


Welcome back to the bustling digital metropolis, where quantum technology is arming our defenses against nefarious botnets! In our previous note, we introduced you to Quantum Support Vector Machines (QSVM) and the exciting promise they hold for detecting the hidden world of botmasters and zombies. Imagine if those super-powered police cars we talked about earlier had a few unexpected quirks: perhaps they require special fuel that’s difficult to find, or their advanced systems only work in specific weather conditions. On the surface, these vehicles are superior in every way, but when we peek under the hood, we might discover why they are not always the perfect solution.

In this note, we’ll pull back the curtain on QSVM, diving deep into its mechanics and exploring why it might not be as powerful as we hoped.

First of all, QSVM is a Kernel SVM. Let’s examine what is happening inside SVM by using some random data. This will allow us to show how SVM works and what role the Kernel Trick plays.

# Step 1: Generate random data with two different distributions:
#         - "clouds" function creates linearly separable Gaussian clouds.
#         - "circle" function creates a non-linearly separable circular distribution.

import numpy as np
import matplotlib.pyplot as plt

def 	clouds(num_points=100):    
	centers = [(1, 1), (-1, -1)]    
	spreads = [0.5, 0.7]    
	labels = [-1, 1]    
	X = np.vstack([np.random.multivariate_normal(center, spread * np.eye(2), num_points) for center, spread in zip(centers, spreads)])    
	y = np.hstack([[label] * num_points for label in labels])    
	return X, y

def 	circle(num_points=120):    
	points = 1 - 2 * np.random.random((num_points, 2))    
	radius = 0.6    
	labels = [1 if np.linalg.norm(point) > radius else -1 for point in points]    
	return points, labels

def 	plot_points(points_list, labels_list, size=(12, 6)):    
	fig, axes = plt.subplots(1, 2, figsize=size)    
	for points, labels, ax in zip(points_list, labels_list, axes):        
		colors = ["red" if label == 1 else "royalblue" for label in labels]        
		ax.scatter(points[:, 0], points[:, 1], color=colors)        
		ax.set_xlabel("$x_1$")        
		ax.set_ylabel("$x_2$")    
	plt.show()

points_clouds, labels_clouds = clouds(100)
points_circle, labels_circle = circle()
plot_points([points_clouds, points_circle], [labels_clouds, labels_circle])

It's evident that the cloud dataset is separable by a line, but it's not true for the circle dataset.


The cloud dataset(Left) and circle dataset(Right) need a different approach.

Here is the code for the separation of the cloud dataset, achieving 96% accuracy, along with its plot:

from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import numpy as np

# Training the SVM model
model = SVC(kernel='linear') # Assuming a linear kernel
model.fit(points_clouds, labels_clouds)

# Predicting on the training data
sk_predict = model.predict(points_clouds)

# Calculating accuracy
accuracy = accuracy_score(labels_clouds, sk_predict)print("Accuracy:", accuracy)

# Plotting the points
colors = ["red" if label == 1 else "royalblue" for label in labels_clouds]
plt.figure(figsize=(10, 9))
plt.scatter(points_clouds[:, 0], points_clouds[:, 1], color=colors)

# Plotting the decision boundary
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 50),             
	np.linspace(ylim[0], ylim[1], 50))
Z = model.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='black')
plt.show()


This occurs as a result of solving the optimization problem described below:


Where:

  • x is a vector of features for a single training example

  • w is a vector of weights for each feature

  • b is the bias term

  • y is the target output for a single training example, which can be -1 or 1

  • C is a hyperparameter that controls the trade-off between maximizing the margin and minimizing the classification error

  • The sign function returns -1 if the argument is negative, 0 if it is zero, and 1 if it is positive

  • The || ||2 notation represents the squared L2 norm of a vector

The optimization problem seeks to find the values of w and b that minimize the loss function, subject to the constraint that all training examples are classified correctly or lie on the correct side of the decision boundary. We include the constraint to classify the data correctly by introducing a penalty with a weight of λ>1. A larger value of λ emphasizes the importance of correct classification



The code below demonstrates how this mathematical concept can be translated into python:

from qiskit.algorithms.optimizers import L_BFGS_B

#Definition of loss function
def loss(support_vector, X, y, penalty=1e5):  
	w= support_vector[1:]  
	b= support_vector[0]  
	norm=np.linalg.norm(w)**2  
	constraint = sum(max(0,1-y[i]*(w.dot(X[i])-b)) for i , _ in enumerate(y))  
	return norm + penalty * constraint  

#Definition of optimizer
optimizer = L_BFGS_B()
result = optimizer.minimize(lambda sv: loss(sv, points_clouds, labels_clouds), x0=np.random.random(3))
support_vector=result.x

#Definition of classifier 
def classify (point, support_vector):  
	w=support_vector[1:]  
	b=support_vector[0]  
	return np.sign(w.dot(point)+b)

Now, we can predict, plot, and obtain the same results that we observed with the standard SVM function.


predicted = [classify (point, support_vector) for point in points_clouds]
colors = ["red" if label ==1 else "royalblue" for label in labels_clouds]
markers = ["o" if label == predicted_label else "x" for label, predicted_label in zip (labels_clouds, sk_predict)]
plt.figure(figsize=(8,6))

for point , marker, color in zip (points_clouds, markers, colors):    
	plt.scatter(point[0], point[1], color=color, marker=marker)
b,w=support_vector[0], support_vector[1:]
x1=np.linspace(-3,3,num=100)
x2=-1/w[1]*(w[1]*x1+b)
plt.plot(x1, x2, "k-")


You may have observed that we use an optimizer named L-BFGS-B. Indeed, we have many other options for optimizers, as depicted below:



We can also use other loss functions ,such as the ones below:


It’s time to see how to deal with the circle dataset. One idea is to increase the dimension of data using a featuremap:

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def feature_map(x):    
	return np.array([x[0], x[1], x[0]**2 + x[1]**2])
embedded_points = np.array([feature_map(point) for point in points_circle])
colors = ["red" if label == 1 else "royalblue" for label in labels_circle]

fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(projection='3d')
ax.scatter(embedded_points[:, 0], embedded_points[:, 1], embedded_points[:, 2], color=colors)
ax.view_init(10, 20)
plt.show()

This actually demonstrates that by bringing the data into a higher dimension, the separability will increase.


This is exactly what we expect to do with the quantum kernel. It will use a quantum featuremap that transforms the data into Hilbert space.


#QSVM Feature Map
from qiskit.circuit.library import ZZFeatureMap, ZFeatureMap, PauliFeatureMapnum_qubits=2
x=np.random.random(num_qubits)
data=ZZFeatureMap(feature_dimension=num_qubits, reps=1,entanglement="linear")
data.assign_parameters(x,inplace=True)
data.decompose().draw("mpl", style="iqx", scale=1.4)

The ZZFeatureMap, as we coded above, is a special form of PauliFeatureMap, and its circuit is depicted below:



To code quantum kernel and understand its mechanism, we propose the below code:

from qiskit import transpile, BasicAer, QuantumCircuit
from qiskit.circuit.library import ZZFeatureMap
backend = BasicAer.get_backend("qasm_simulator")
shots =1024
dimension =2
feature_map=ZZFeatureMap(dimension, reps=1)
def evaluate_kernel(x_i, x_j):
  circuit = QuantumCircuit(dimension)
  circuit.compose(feature_map.assign_parameters(x_i), inplace=True)
  circuit.compose(feature_map.assign_parameters(x_j).inverse(), inplace=True)
  circuit.measure_all()
  transpiled=transpile(circuit, backend)
  job=backend.run(transpiled, shots=shots)
  result=job.result()
  counts=result.get_counts(transpiled)
  return counts.get("0"*dimension, 0)/shots

To evaluate our function and compare it with Qiskit standard implementation of QuantumKernel, we can use the below code:

#Lets compare our implementation with original implementation
evaluate_kernel(points_circle[2],points_circle[3])

The above method gives us a result of 0.3154296875, while the standard implementation using the code below yields 0.296875. There is a minor difference between the two results.

from qiskit_machine_learning.kernels import QuantumKernel
kernel=QuantumKernel(feature_map, quantum_instance=backend)
kernel.evaluate(points_circle[2],points_circle[3])

The difference in values may stem from additional details considered by Qiskit that are not accounted for in our implementation. However, the discrepancy is not significant, as you can see from the t-test results below:

from qiskit_machine_learning.kernels import QuantumKernel
from scipy.stats import ttest_ind
import random

# Define the feature map
feature_map_qiskit = ZZFeatureMap(dimension, reps=1)

# Define the QuantumKernel using Qiskit's implementation
kernel = QuantumKernel(feature_map=feature_map_qiskit, quantum_instance=backend)

# Generate 5 random pairs of indices
indices = random.sample(range(len(points_circle)), 10)
pairs = [(indices[i], indices[i+1]) for i in range(0, len(indices), 2)]

# Evaluate the kernels using both methods
custom_kernels = [evaluate_kernel(points_circle[i], points_circle[j]) for i, j in pairs]
qiskit_kernels = [kernel.evaluate(points_circle[i], points_circle[j]) for i, j in pairs]

# Perform a t-test to compare the results
t_stat, p_val = ttest_ind(custom_kernels, qiskit_kernels)
print("Custom kernels:", custom_kernels)
print("Qiskit kernels:", qiskit_kernels)
print("t-statistic:", t_stat)
print("p-value:", p_val)

# Inference
alpha = 0.05
if p_val < alpha:
    print("The two methods produce statistically significantly different results (p < 0.05).")
else:
    print("There is no statistically significant difference between the two methods (p >= 0.05).")

So, now that we know using a quantum kernel will map the data to a larger space, making separation easier, let’s utilize this quantum kernel with SVM. We’ll refer to this combination as QSVM:

qsvm=SVC(kernel=kernel.evaluate)
qsvm.fit(points_circle,labels_circle)
predicted=qsvm.predict(points_circle)

The accuracy of this technique is 98.33%, calculated using the following code:

from sklearn.metrics import accuracy_score
accuracy = accuracy_score(labels_circle, predicted)
print("Accuracy:", accuracy)

We can identify the data that is labeled incorrectly using the code below:

markers = ["o" if label ==predicted_label else "x" for label, predicted_label in zip(labels_circle,predicted)]
plt.figure(figsize=(6,6))
for point, marker, color in zip(points_circle.tolist(),markers,colors):
  plt.scatter(point[0],point[1],color=color,marker=marker)


Now, we must answer two questions to determine whether our police car can be as fast as promised:

  1. Under what conditions can QSVM attain the same or better accuracy than SVM?

  2. If we can meet those conditions, will it result in a quantum advantage?


In the next note, we will show how to use model tracking tools like Weights and Biases (W&B) to track quantum algorithms like QSVM and lay the foundation for moving on to the next algorithm: PegasosQSVC!


Stay with us, and don’t forget to follow our LinkedIn and Twitter, where we will show how hybrid quantum machine learning will change the realm of cyber defense.

7 views0 comments

Comments


bottom of page