pandora.embedding_comparison module
- class pandora.embedding_comparison.BatchEmbeddingComparison(embeddings: List[Embedding])[source]
Bases:
objectClass structure for comparing three or more Embedding results. All comparisons are conducted pairwise for all unique pairs of embeddings.
This class provides methods for comparing both Embedding based on all samples, for comparing the K-Means clustering results, and for computing sample support values.
BatchEmbeddingComparison makes use of pairwise EmbeddingComparison objects for comparing results pairwise.
- Parameters:
- embeddingsList[Embedding]
List of embeddings to compare
- Attributes:
- embeddingsList[Embedding]
List of embeddings to compare
Methods
compare([threads])Compares all embeddings pairwise and returns the average of the resulting pairwise Pandora stability scores.
compare_clustering(kmeans_k[, threads])Compares all embeddings pairwise and returns the average of the resulting pairwise Pandora cluster stability scores.
get_pairwise_cluster_stabilities(kmeans_k[, ...])Computes the pairwise Pandora cluster stability scores for all unique pairs of
self.embeddingand stores them in a pandas Series.get_pairwise_stabilities([threads])Computes the pairwise Pandora stability scores for all unique pairs of
self.embeddingand stores them in a pandas Series.get_sample_support_values([threads])Computes the sample support value for each sample respective all
self.embeddings.- Raises:
- PandoraException
If less than three embeddings are passed.
- compare(threads: int | None = None) float[source]
Compares all embeddings pairwise and returns the average of the resulting pairwise Pandora stability scores.
See
EmbeddingComparison::comparefor more details on how the pairwise Pandora stability is computed.- Parameters:
- threads: int, default=None
Number of threads to use for the computation. Default is to use all available system threads.
- Returns:
- float
Average of pairwise Pandora stability scores. This value is between 0 and 1 with higher values indicating a higher stability.
- compare_clustering(kmeans_k: int, threads: int | None = None) float[source]
Compares all embeddings pairwise and returns the average of the resulting pairwise Pandora cluster stability scores.
See EmbeddingComparison::compare_clustering for more details on how the pairwise Pandora cluster stability is computed.
- Parameters:
- kmeans_kint
Number k of clusters to use for K-Means clustering.
- threadsint, default=None
Number of threads to use for the computation. Default is to use all available system threads.
- Returns:
- float
Average of pairwise Pandora cluster stability scores. This value is between 0 and 1 with higher values indicating a higher stability.
- get_pairwise_cluster_stabilities(kmeans_k: int, threads: int | None = None) DataFrame[source]
Computes the pairwise Pandora cluster stability scores for all unique pairs of
self.embeddingand stores them in a pandas Series.- Parameters:
- kmeans_kint
Number k of clusters to use for K-Means clustering.
- threadsint, default=None
Number of threads to use for the computation. Default is to use all available system threads.
- Returns:
- pd.Series
Pandas Series containing the pairwise cluster stability scores for all unique pairs of self.embeddings. The resulting Series is named
"pandora_cluster_stability"and has the indices of the pairwise comparisons as index. So a result looks e.g. like this:(0, 1) 0.93 (0, 2) 0.79 (1, 2) 0.71 Name: pandora_cluster_stability, dtype: float64
Each value is between 0 and 1 with higher values indicating a higher stability.
- get_pairwise_stabilities(threads: int | None = None) Series[source]
Computes the pairwise Pandora stability scores for all unique pairs of
self.embeddingand stores them in a pandas Series.- Parameters:
- threadsint, default=None
Number of threads to use for the computation. Default is to use all available system threads.
- Returns:
- pd.Series
Pandas Series containing the pairwise stability scores for all unique pairs of
self.embeddings. The resulting Series is named"pandora_stability"and has the indices of the pairwise comparisons as index. So a result looks e.g. like this:(0, 1) 0.93 (0, 2) 0.79 (1, 2) 0.71 Name: pandora_cluster_stability, dtype: float64
Each value is between 0 and 1 with higher values indicating a higher stability.
- get_sample_support_values(threads: int | None = None) Series[source]
Computes the sample support value for each sample respective all
self.embeddings.The sample support value per sample is computed as the
1 - dispertionacross all embeddings wheredispertionis computed using the Gini Coefficient. The support values are computed for all samples in the union of all sample IDs of allself.embeddings.- Parameters:
- threadsint, default=None
Number of threads to use for the computation. Default is to use all available system threads.
- Returns:
- pd.Series
Pandas Series containing the support values for all samples across all pairwise embedding comparisons. Each row corresponds to a sample, with the sample IDs as indices and the PSV as value. The name of the series is set to
"PSV".
- class pandora.embedding_comparison.EmbeddingComparison(comparable: Embedding, reference: Embedding)[source]
Bases:
objectClass structure for comparing two Embedding results.
This class provides methods for comparing both Embedding based on all samples, for comparing the K-Means clustering results, and for computing sample support values.
On initialization,
comparableandreferenceare both reduced to contain only samples present in both Embeddings. In order to compare the two Embeddings, on initialization Procrustes Analysis is applied transforming comparable towards reference. Procrustes Analysis transforms comparable by applying scaling, translation, rotation and reflection aiming to match all sample projections as close as possible to the projections in reference. Prior to comparing the results, both Embeddings are filtered such that they only contain samples present in both Embeddings.Note that for comparing Embedding results, the sample IDs are used to ensure the correct comparison of projections. If an error occurs during initialization, this is most likely due to incorrect sample IDs.
- Parameters:
- comparableEmbedding
Embedding object to compare
- referenceEmbedding
Embedding object to transform comparable towards
- Attributes:
- comparableEmbedding
comparable Embedding object after sample filtering and Procrustes Transformation.
- referenceEmbedding
reference Embedding object after sample filtering and Procrustes Transformation.
- sample_idspd.Series[str]
pd.Series containing the sample IDs present in both Embedding objects
Methods
compare()Computes the Pandora stability between
self.comparabletoself.referenceusing Procrustes Analysis.compare_clustering([kmeans_k])Computes the Pandora cluster stability between self.comparable and self.reference.
- Raises:
- PandoraException
If either
comparableofreferenceis not an Embedding object.
- compare() float[source]
Computes the Pandora stability between
self.comparabletoself.referenceusing Procrustes Analysis.- Returns:
- float
Similarity score on a scale of 0 (entirely different) to 1 (identical) measuring the similarity of
self.comparableandself.reference.
- compare_clustering(kmeans_k: int | None = None) float[source]
Computes the Pandora cluster stability between self.comparable and self.reference.
Compares the assigned cluster labels based on K-Means clustering on both embeddings.
- Parameters:
- kmeans_kint, default=None
Number k of clusters to use for K-Means clustering. If not set, the optimal number of clusters is determined automatically using
self.reference.
- Returns:
- float
The Fowlkes-Mallow score of Cluster similarity between the clustering results of
self.referenceandself.comparable. The score ranges from 0 (entirely distinct) to 1 (identical).
- pandora.embedding_comparison.match_and_transform(comparable: Embedding, reference: Embedding) Tuple[Embedding, Embedding, float][source]
Uses Procrustes Analysis to find a transformation matrix that most closely matches
comparabletoreferenceand transformscomparable.- Parameters:
- comparableEmbedding
The Embedding that should be transformed
- referenceEmbedding
The Embedding that comparable should be transformed towards
- Returns:
- transformed_comparableEmbedding
Transformed comparable Embedding, created by matching
comparabletoreferenceas close as possible using Procrustes Analysis.- standardized_referenceEmbedding
Standardized reference Embedding,
referenceis standardized during the matching procedure by Procrustes Analysis.- disparityfloat
The sum of squared distances between the transformed comparable and transformed reference Embeddings.
- Raises:
- PandoraException
Mismatch in sample IDs between
comparableandreference(identical sample IDs required for comparison).Mismatch in number of samples of PCs in comparable and reference.
No samples left after clipping. This is most likely caused by incorrect annotations of sample IDs.