pandora.embedding_comparison module

class pandora.embedding_comparison.BatchEmbeddingComparison(embeddings: List[Embedding])[source]

Bases: object

Class structure for comparing three or more Embedding results. All comparisons are conducted pairwise for all unique pairs of embeddings.

This class provides methods for comparing both Embedding based on all samples, for comparing the K-Means clustering results, and for computing sample support values.

BatchEmbeddingComparison makes use of pairwise EmbeddingComparison objects for comparing results pairwise.

Parameters:
embeddingsList[Embedding]

List of embeddings to compare

Attributes:
embeddingsList[Embedding]

List of embeddings to compare

Methods

compare([threads])

Compares all embeddings pairwise and returns the average of the resulting pairwise Pandora stability scores.

compare_clustering(kmeans_k[, threads])

Compares all embeddings pairwise and returns the average of the resulting pairwise Pandora cluster stability scores.

get_pairwise_cluster_stabilities(kmeans_k[, ...])

Computes the pairwise Pandora cluster stability scores for all unique pairs of self.embedding and stores them in a pandas Series.

get_pairwise_stabilities([threads])

Computes the pairwise Pandora stability scores for all unique pairs of self.embedding and stores them in a pandas Series.

get_sample_support_values([threads])

Computes the sample support value for each sample respective all self.embeddings.

Raises:
PandoraException
  • If less than three embeddings are passed.

compare(threads: int | None = None) float[source]

Compares all embeddings pairwise and returns the average of the resulting pairwise Pandora stability scores.

See EmbeddingComparison::compare for more details on how the pairwise Pandora stability is computed.

Parameters:
threads: int, default=None

Number of threads to use for the computation. Default is to use all available system threads.

Returns:
float

Average of pairwise Pandora stability scores. This value is between 0 and 1 with higher values indicating a higher stability.

compare_clustering(kmeans_k: int, threads: int | None = None) float[source]

Compares all embeddings pairwise and returns the average of the resulting pairwise Pandora cluster stability scores.

See EmbeddingComparison::compare_clustering for more details on how the pairwise Pandora cluster stability is computed.

Parameters:
kmeans_kint

Number k of clusters to use for K-Means clustering.

threadsint, default=None

Number of threads to use for the computation. Default is to use all available system threads.

Returns:
float

Average of pairwise Pandora cluster stability scores. This value is between 0 and 1 with higher values indicating a higher stability.

get_pairwise_cluster_stabilities(kmeans_k: int, threads: int | None = None) DataFrame[source]

Computes the pairwise Pandora cluster stability scores for all unique pairs of self.embedding and stores them in a pandas Series.

Parameters:
kmeans_kint

Number k of clusters to use for K-Means clustering.

threadsint, default=None

Number of threads to use for the computation. Default is to use all available system threads.

Returns:
pd.Series

Pandas Series containing the pairwise cluster stability scores for all unique pairs of self.embeddings. The resulting Series is named "pandora_cluster_stability" and has the indices of the pairwise comparisons as index. So a result looks e.g. like this:

(0, 1)    0.93
(0, 2)    0.79
(1, 2)    0.71
Name: pandora_cluster_stability, dtype: float64

Each value is between 0 and 1 with higher values indicating a higher stability.

get_pairwise_stabilities(threads: int | None = None) Series[source]

Computes the pairwise Pandora stability scores for all unique pairs of self.embedding and stores them in a pandas Series.

Parameters:
threadsint, default=None

Number of threads to use for the computation. Default is to use all available system threads.

Returns:
pd.Series

Pandas Series containing the pairwise stability scores for all unique pairs of self.embeddings. The resulting Series is named "pandora_stability" and has the indices of the pairwise comparisons as index. So a result looks e.g. like this:

(0, 1)    0.93
(0, 2)    0.79
(1, 2)    0.71
Name: pandora_cluster_stability, dtype: float64

Each value is between 0 and 1 with higher values indicating a higher stability.

get_sample_support_values(threads: int | None = None) Series[source]

Computes the sample support value for each sample respective all self.embeddings.

The sample support value per sample is computed as the 1 - dispertion across all embeddings where dispertion is computed using the Gini Coefficient. The support values are computed for all samples in the union of all sample IDs of all self.embeddings.

Parameters:
threadsint, default=None

Number of threads to use for the computation. Default is to use all available system threads.

Returns:
pd.Series

Pandas Series containing the support values for all samples across all pairwise embedding comparisons. Each row corresponds to a sample, with the sample IDs as indices and the PSV as value. The name of the series is set to "PSV".

class pandora.embedding_comparison.EmbeddingComparison(comparable: Embedding, reference: Embedding)[source]

Bases: object

Class structure for comparing two Embedding results.

This class provides methods for comparing both Embedding based on all samples, for comparing the K-Means clustering results, and for computing sample support values.

On initialization, comparable and reference are both reduced to contain only samples present in both Embeddings. In order to compare the two Embeddings, on initialization Procrustes Analysis is applied transforming comparable towards reference. Procrustes Analysis transforms comparable by applying scaling, translation, rotation and reflection aiming to match all sample projections as close as possible to the projections in reference. Prior to comparing the results, both Embeddings are filtered such that they only contain samples present in both Embeddings.

Note that for comparing Embedding results, the sample IDs are used to ensure the correct comparison of projections. If an error occurs during initialization, this is most likely due to incorrect sample IDs.

Parameters:
comparableEmbedding

Embedding object to compare

referenceEmbedding

Embedding object to transform comparable towards

Attributes:
comparableEmbedding

comparable Embedding object after sample filtering and Procrustes Transformation.

referenceEmbedding

reference Embedding object after sample filtering and Procrustes Transformation.

sample_idspd.Series[str]

pd.Series containing the sample IDs present in both Embedding objects

Methods

compare()

Computes the Pandora stability between self.comparable to self.reference using Procrustes Analysis.

compare_clustering([kmeans_k])

Computes the Pandora cluster stability between self.comparable and self.reference.

Raises:
PandoraException
  • If either comparable of reference is not an Embedding object.

compare() float[source]

Computes the Pandora stability between self.comparable to self.reference using Procrustes Analysis.

Returns:
float

Similarity score on a scale of 0 (entirely different) to 1 (identical) measuring the similarity of self.comparable and self.reference.

compare_clustering(kmeans_k: int | None = None) float[source]

Computes the Pandora cluster stability between self.comparable and self.reference.

Compares the assigned cluster labels based on K-Means clustering on both embeddings.

Parameters:
kmeans_kint, default=None

Number k of clusters to use for K-Means clustering. If not set, the optimal number of clusters is determined automatically using self.reference.

Returns:
float

The Fowlkes-Mallow score of Cluster similarity between the clustering results of self.reference and self.comparable. The score ranges from 0 (entirely distinct) to 1 (identical).

pandora.embedding_comparison.match_and_transform(comparable: Embedding, reference: Embedding) Tuple[Embedding, Embedding, float][source]

Uses Procrustes Analysis to find a transformation matrix that most closely matches comparable to reference and transforms comparable.

Parameters:
comparableEmbedding

The Embedding that should be transformed

referenceEmbedding

The Embedding that comparable should be transformed towards

Returns:
transformed_comparableEmbedding

Transformed comparable Embedding, created by matching comparable to reference as close as possible using Procrustes Analysis.

standardized_referenceEmbedding

Standardized reference Embedding, reference is standardized during the matching procedure by Procrustes Analysis.

disparityfloat

The sum of squared distances between the transformed comparable and transformed reference Embeddings.

Raises:
PandoraException
  • Mismatch in sample IDs between comparable and reference (identical sample IDs required for comparison).

  • Mismatch in number of samples of PCs in comparable and reference.

  • No samples left after clipping. This is most likely caused by incorrect annotations of sample IDs.