EVENT DETAILS
Traditional database management systems (DBMSs) rely on data-dependent observability, where the optimizer utilizes selectivities and intermediate cardinalities to select efficient execution plans. As privacy concerns increase and regulatory requirements are enforced, privacy-preserving DBMSs lose access to this information. While secure query execution becomes feasible, it often incurs high computational costs. An additional challenge arises in production environments. In multi-tenant cloud platforms, when a tenant experiences a slow query, developers would typically re-execute the query on the original data to diagnose performance regressions. However, confidentiality requirements prevent this approach, leading to genuine but unreproducible performance regressions. This thesis proposes that database metadata can serve as a substrate to address both challenges. From the perspective of DBMS builders, public schema constraints and protocol-visible information can substitute for the private statistics used by conventional optimizers. From the perspective of DBMS operators, differentially private releases of physical metadata can reproduce execution behavior on substitute datasets.
My prior work develops three systems based on this principle. Alchemy derives a circuit-aware cost optimizer for oblivious SQL using public schema constraints. HAMMER extends the principle beyond a single privacy primitive, routing public operators to plaintext, slot-wise arithmetic to fully homomorphic encryption (FHE) on GPUs, and control-flow-heavy operators to secure multi-party computation (MPC). ScanTwin generates differentially private sketches of Parquet footers, allowing scan-level performance regressions to be reproduced on synthetic data without accessing tenant records. Building on these results, the proposed thesis extends ScanTwin to PerfTwin, releasing operator-level differentially private sketches for additional operators to enable reproducing full-pipeline performance regressions. Overall, the goal is to demonstrate that, despite privacy constraints, the database stack can be efficiently built and reliably operated from safe metadata alone.
TIME Tuesday May 26, 2026 at 1:00 PM - 3:00 PM
LOCATION Mudd 3001, Mudd Hall ( formerly Seeley G. Mudd Library) map it
ADD TO CALENDAR&group= echo $value['group_name']; ?>&location= echo htmlentities($value['location']); ?>&pipurl= echo $value['ppurl']; ?>" class="button_outlook_export">
CONTACT Wynante R Charles wynante.charles@northwestern.edu
CALENDAR Department of Computer Science (CS)