Learnable function
A learnable function is a mathematical function that can be learned from data, typically through a machine learning algorithm, to minimize errors and perform a specific task. In the context of statistical learning theory, this refers to a function class where an algorithm can identify a specific function that best approximates the underlying pattern or distribution within the data.
In the domain of database systems, specifically within the emerging field of AI-powered database systems, learnable function is a general concept, representing a paradigm shift where traditional, hard-coded system heuristics are replaced by these parameterized functions. This enables the DBMS to learn, adapt, and govern based on the data it manages. Instead of relying on static rules designed by human experts, database components utilize learnable functions to dynamically adapt to specific data distributions and workloads.
Definition
Fundamentally, a learnable function can be formalized as a mapping, parameterized by a set of weights. The goal of the learning process is to find the optimal parameters that minimize a specific loss function over a dataset :In this framework:
- represents the input features.
- represents the target output or label.
- is the hypothesis or model.
- The "learnability" of the function ensures that as the size of the dataset increases, the empirical risk converges to the true expected risk, allowing the function to generalize to unseen data.
Formulation in AI-powered Database Systems
In the context of AI-powered database systems, is replaced by a learnable function. This reformulation treats database internals as approximation problems:
- Input : Database states, query syntax, or runtime statistics, e.g., lock contention levels, data cardinality.
- Output : System actions or estimates, e.g., estimated selectivity, optimal concurrency action.
- Optimization: The function parameters are tuned via methods such as Supervised learning or Reinforcement learning.
Implementations in Database Systems
- Lookup Tables / Discrete Mappings: For low-latency requirements, e.g., concurrency control, can be implemented as a learned lookup table where is mapped to discrete buckets, and represents the stored actions in those buckets.
- Classical ML Models: Linear models and Tree-based models are often used for cost estimation where interpretability and training speed are balanced.
- Deep Learning: Neural networks are employed for high-dimensional mappings, such as cardinality estimation over complex joins, where is a vector embedding of the SQL query.
Applications in Learnable Database Components
Learned Concurrency Control
Concurrency Control algorithms ensure transaction isolation. Traditional algorithms like Two-phase locking or Optimistic concurrency control perform well in specific scenarios but fail to generalize.Recent research proposes treating concurrency control as a learnable function.
In such a model, the CC policy can be regarded an agent function:
- : Features such as data hotness, dependency graph depth, and operation types.
- : A combination of conflict detection and resolution mechanisms.
Learned Query Optimization
In query optimization, the learnable function typically replaces the cost model or the cardinality estimator.- Cardinality Estimation:. Deep learning models can capture correlations between columns that histograms fail to model.
- Cost Modeling:. Learning the latency of physical operators on specific hardware.
Theoretical Perspectives
Learnability vs. Complexity Trade-offs
While learnable functions offer adaptability, they are subject to the No free lunch theorem. A function class that is too complex, e.g., a large neural network, may overfit to a specific workload history and fail to generalize when the workload drifts. In constrast, a class that is too simple, e.g., linear regression, may fail to capture the non-linear performance characteristics of modern hardware.This trade-off can be mathematically expressed in the decomposition of risk:
Effective AI-powered Database systerms must balance the approximation error against the estimation error.
The Bitter Lesson and Scalability
The move toward learnable functions in databases can be discussed from the perspective of "The Bitter Lesson" by Richard Sutton, which argues that general methods that leverage computation ultimately outperform methods that rely on human knowledge.In databases, a handcrafted cost model is limited by the designer's foresight. A learnable function, however, scales with the availability of computation and training data, and hence, theoretically allows the database to asymptotically approach the performance of a theoretically optimal system, often referred to as an oracle machine with perfect information.
History and Development
The study of learnable functions emerged from the intersection of biology and cybernetics in the mid-20th century, specifically driven by the need to understand how organisms adapt to complex environments. Early research focused on perceptron learning algorithms for linear threshold operations and learning in random networks. In the 1960s and 1970s, researchers like Marvin Minsky, Oliver Selfridge, and Seymour Papert laid the groundwork for modern machine learning by contrasting the immense complexity of general Boolean functions against the learnability of specialized classes. Concurrently, classical approximation theories for continuous real-valued functions were further developed to provide mathematical rigor to the concept of function reconstruction. A subsequent formalization occurred in 1973 when Andrzej Ehrenfeucht and Jan Mycielski introduced the concept of -continuity and addressing functions. Their work on the interpolation of functions over a measure space formalized how a learner can efficiently search for a small number of relevant features within a high-dimensional data input. This perspective has also extended to neuromorphic engineering, where research demonstrates how functional specialization emerges directly from network structure.In the contemporary landscape of artificial intelligence and database systems, the concept of learnable functions has transitioned from theoretical classification toward the autonomous optimization of core system internals, such as query optimization and concurrency control. A prominent example is the CCaaLF framework, which treats concurrency control as a learnable agent function. This function dynamically maps real-time database states to specific conflict-handling actions.