Planted Matching in Hypergraphs

Question

Cross posted from MSE

I’m working on the planted perfect matching problem in random $k$-uniform hypergraphs $k \ge 3$, and I’m stuck on rigorizing the impossibility (lower bound) side of what looks like the information-theoretic threshold. I’d be very grateful for references or structural ideas.

Model

Let $k \ge 3$ and $n \in \mathbb{N}$. Consider a $k$-uniform hypergraph $H$ on $N = kn$ vertices, either:

$k$-partite with parts of size $n$, or
a non-partite $k$-uniform model (both versions behave similarly for my purposes).

We generate $H$ by:

Planted perfect matching $M^*$: a fixed perfect matching of size $n$ (partitioning the vertices).
Noise edges $U$: we either
- include each potential $k$-tuple as a noise edge independently with probability
  $$ p = \frac{d}{n^{k-1}} $$ (so $\mathbb{E}|U| \approx dn$), or
- equivalently, condition on $|U| = m = dn$.

We only observe the unlabeled hypergraph $$ H = M^* \cup U, $$ and the inference problem is to recover the planted matching $M^*$.

What is already known / what I have

For a fixed $\alpha \in (0,1)$, call a perfect matching $M$ a $t$-swap matching (or “at distance $t$ from $M^*$”) if $$ |M \triangle M^*| = 2t, $$ i.e., $M$ uses exactly $t = \alpha n$ noise edges and $n - t$ planted edges.

Using a first-moment / union-bound argument over all $t$-swap matchings (for linear $t$), I compute an explicit rate function and get a threshold of the form $$ d^*(k) \approx e^{k-1}-1 \quad (k \to \infty), $$ such that (roughly):

If $d < d^*(k)$, then for each fixed $\alpha$, the expected number of $t$-swap matchings is $\exp\{-c(\alpha,k)\,n\}$; summing over $t$ shows that with high probability $M^*$ is the unique perfect matching. This is the achievability (upper) bound.

I am now trying to prove the converse:

If $d > d^*(k)$, then exact recovery of $M^*$ is impossible, because there are exponentially many alternative perfect matchings indistinguishable from $M^*$.

Approach: Kahn-style reverse deletion and the obstruction

I’m trying to adapt Jeff Kahn’s reverse-deletion martingale method from his proof of Shamir’s problem.

Start with the full noise set $U$ and delete noise edges in a uniform random order: $$ e_1, e_2, \dots, e_{|U|}. $$
Let $$ H_i = M^* \cup (U \setminus \{e_1,\dots,e_i\}) $$ be the hypergraph after $i$ deletions.

Fix a deviation $t = \alpha n$. Let $\mathcal{F}_t(H_i)$ be the set of $t$-swap perfect matchings in $H_i$, and define $$ A_i := |\mathcal{F}_t(H_i)|. $$

When we delete $e_i$, a fraction of the matchings are killed: $$ A_i = A_{i-1}(1 - \xi_i), $$ where $\xi_i$ is the “heaviness” of edge $e_i$: the fraction of $t$-swap matchings that contain $e_i$.

Conditionally, one can compute $$ \gamma_i := \mathbb{E}[\xi_i \mid H_{i-1}] = \frac{t}{m_i}, $$ where $m_i = |U \cap E(H_{i-1})|$. Summing $\gamma_i$ over a certain “log–linear” window $m \in [dn, \theta n \log n]$, one exactly recovers the exponent from the first-moment threshold $d^*(k)$. So if we could replace $\log(1-\xi_i)$ by $-\xi_i$, we would match the upper bound.

The problem is controlling the second-order loss $$ Z := \sum_i \big(\log(1-\xi_i) + \xi_i\big) \le 0, $$ which is essentially dictated by the sum of squares $\sum_i \xi_i^2$.

Define $$ D_m := \log(1 - \bar{p}_m) - \frac{1}{m}\sum_e \log(1 - p_e), $$ where:

$p_e$ is the inclusion probability of edge $e$ under the uniform measure on $\mathcal{F}_t(H_m)$, and
$\bar{p}_m = t/m$.

Then one has $$ \mathbb{E}[\log(1-\xi_i)\mid H_m] = \log(1 - \bar{p}_m) - D_m, $$ so $$ Z = -\sum_m D_m. $$

Analytically, using a Jensen / Taylor argument for $f(x) = \log(1-x)$, one can show $$ D_m \;\lesssim\; \operatorname{Var}_m(p_e), $$ where the variance is over a uniform random edge $e$ in $H_m$.

Thus, what I really need is a bound of the form $$ \sum_m D_m \;\le\; K_k\,n $$ for some $K_k = K(k)$ (ideally decaying like $1/d$ as $d$ increases). A sufficient condition would be something like $$ \operatorname{Var}_m(p_e) \ll \bar{p}_m^2 $$ uniformly in $m$ in the $[dn, \theta n \log n]$ window, or, more concretely, a “no heavy edges” statement: no edge is contained in more than a constant multiple of the typical fraction $\bar{p}_m = t/m$ of $t$-swap matchings.

This is precisely where I get stuck: I don’t know how to control the edge marginals $p_e$ in this conditional planted ensemble.

I’ve been working on this for a while; another professor recently told me the Kahn-style martingale approach is probably doomed in this sparse regime, but I’d really like to understand whether there is already something in the literature that rules it out or, conversely, a technique (SSM, cluster expansion, switching, …) that could give the needed variance bounds. As a sanity check, I’ve simulated the $k=3$ case with $n$ from $[50,100]$ using a SAT solver to test existence of alternate perfect matchings; empirically the threshold where the probability of an alternate matching jumps from $\sim 0.5$ to $\sim 1$ matches the $d^*(3)$ I computed, but of course that’s not a proof.

The issue with black-boxing JKV is the planted constraint, they prove concentration for the total number of perfect matchings in a random hypergraph. I need to count 𝐴𝑡 (matchings at distance 𝑡 from the planted 𝑀∗ ).In the sparse regime (𝑑≈const ), the JKV "flatness" bounds don't automatically transfer to the conditional measure on 𝑡 -swaps. My concern is that the planted structure might induce local condensation (heavy edges) specifically among the 𝑡 -swaps, even if the total matching space is well-behaved. Also I have tried some second moment methods to upgrade first moment analysis to w.h.p but that has failed. This is the plot Im talking about

Stack Exchange Network

Planted Matching in Hypergraphs

Model

What is already known / what I have

Approach: Kahn-style reverse deletion and the obstruction

0

You must log in to answer this question.

Planted Matching in Hypergraphs

Model

What is already known / what I have

Approach: Kahn-style reverse deletion and the obstruction

0

You must log in to answer this question.

Related