[mlir][Affine] Add support for multi-store producer fusion This patch adds support for producer-consumer fusion scenarios with multiple producer stores to the AffineLoopFusion pass. The patch introduces some changes to the producer-consumer algorithm, including: * For a given consumer loop, producer-consumer fusion iterates over its producer candidates until a fixed point is reached. * Producer candidates are gathered beforehand for each iteration of the consumer loop and visited in reverse program order (not strictly guaranteed) to maximize the number of loops fused per iteration. In general, these changes were needed to simplify the multi-store producer support and remove some of the workarounds that were introduced in the past to support more fusion cases under the single-store producer limitation. This patch also preserves the existing functionality of AffineLoopFusion with one minor change in behavior. Producer-consumer fusion didn't fuse scenarios with escaping memrefs and multiple outgoing edges (from a single store). Multi-store producer scenarios will usually (always?) have multiple outgoing edges so we couldn't fuse any with escaping memrefs, which would greatly limit the applicability of this new feature. Therefore, the patch enables fusion for these scenarios. Please, see modified tests for specific details. Reviewed By: andydavis1, bondhugula Differential Revision: https://reviews.llvm.org/D92876

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[MLIR] Add support for extracting an integer sample point (if one exists) from an unbounded FlatAffineConstraints. With this, we have complete support for finding integer sample points in FlatAffineConstraints. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D95047

- [D] mlir/include/mlir/Analysis/AffineStructures.h

Revert "[mlir][Affine] Add support for multi-store producer fusion" This reverts commit 7dd198852b4db52ae22242dfeda4eccda83aa8b2. ASAN issue.

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[mlir][Affine] Add support for multi-store producer fusion This patch adds support for producer-consumer fusion scenarios with multiple producer stores to the AffineLoopFusion pass. The patch introduces some changes to the producer-consumer algorithm, including: * For a given consumer loop, producer-consumer fusion iterates over its producer candidates until a fixed point is reached. * Producer candidates are gathered beforehand for each iteration of the consumer loop and visited in reverse program order (not strictly guaranteed) to maximize the number of loops fused per iteration. In general, these changes were needed to simplify the multi-store producer support and remove some of the workarounds that were introduced in the past to support more fusion cases under the single-store producer limitation. This patch also preserves the existing functionality of AffineLoopFusion with one minor change in behavior. Producer-consumer fusion didn't fuse scenarios with escaping memrefs and multiple outgoing edges (from a single store). Multi-store producer scenarios will usually (always?) have multiple outgoing edges so we couldn't fuse any with escaping memrefs, which would greatly limit the applicability of this new feature. Therefore, the patch enables fusion for these scenarios. Please, see modified tests for specific details. Reviewed By: andydavis1, bondhugula Differential Revision: https://reviews.llvm.org/D92876

- [D] mlir/include/mlir/Analysis/AffineStructures.h

Support emptiness checks for unbounded FlatAffineConstraints. With this, we have complete support for emptiness checks. This also paves the way for future support to check if two FlatAffineConstraints are equal. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D94272

- [D] mlir/include/mlir/Analysis/AffineStructures.h

Introduce subtraction for FlatAffineConstraints Subtraction is a foundational arithmetic operation that is often used when computing, for example, data transfer sets or cache hits. Since the result of subtraction need not be a convex polytope, a new class `PresburgerSet` is introduced to represent unions of convex polytopes. Reviewed By: ftynse, bondhugula Differential Revision: https://reviews.llvm.org/D87068

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[MLIR] Turns swapId into a FlatAffineConstraints member func `swapId` used to be a static function in `AffineStructures.cpp`. This diff makes it accessible from the external world by turning it into a member function of `FlatAffineConstraints`. This will be very helpful for other projects that need to manipulate the content of `FlatAffineConstraints`. Differential Revision: https://reviews.llvm.org/D87766

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[MLIR] Redundancy detection for FlatAffineConstraints using Simplex This patch adds the capability to perform constraint redundancy checks for `FlatAffineConstraints` using `Simplex`, via a new member function `FlatAffineConstraints::removeRedundantConstraints`. The pre-existing redundancy detection algorithm runs a full rational emptiness check for each inequality separately for checking redundancy. Leveraging the existing `Simplex` infrastructure, in this patch we have an algorithm for redundancy checks that can check each constraint by performing pivots on the tableau, which provides an alternative to running Fourier-Motzkin elimination for each constraint separately. Differential Revision: https://reviews.llvm.org/D84935

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[MLIR] Consider AffineIfOp when getting the index set of an Op wrapped in nested loops This diff attempts to resolve the TODO in `getOpIndexSet` (formerly known as `getInstIndexSet`), which states "Add support to handle IfInsts surronding `op`". Major changes in this diff: 1. Overload `getIndexSet`. The overloaded version considers both `AffineForOp` and `AffineIfOp`. 2. The `getInstIndexSet` is updated accordingly: its name is changed to `getOpIndexSet` and its implementation is based on a new API `getIVs` instead of `getLoopIVs`. 3. Add `addAffineIfOpDomain` to `FlatAffineConstraints`, which extracts new constraints from the integer set of `AffineIfOp` and merges it to the current constraint system. 4. Update how a `Value` is determined as dim or symbol for `ValuePositionMap` in `buildDimAndSymbolPositionMaps`. Differential Revision: https://reviews.llvm.org/D84698

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[mlir][NFC] Remove usernames and google bug numbers from TODO comments. These were largely leftover from when MLIR was a google project, and don't really follow LLVM guidelines.

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[MLIR] Exact integer emptiness checks for FlatAffineConstraints This patch adds the capability to perform exact integer emptiness checks for FlatAffineConstraints using the General Basis Reduction algorithm (GBR). Previously, only a heuristic was available for emptiness checks, which was not guaranteed to always give a conclusive result. This patch adds a `Simplex` class, which can be constructed using a `FlatAffineConstraints`, and can find an integer sample point (if one exists) using the GBR algorithm. Additionally, it adds two classes `Matrix` and `Fraction`, which are used by `Simplex`. The integer emptiness check functionality can be accessed through the new `FlatAffineConstraints::isIntegerEmpty()` function, which runs the existing heuristic first and, if that proves to be inconclusive, runs the GBR algorithm to produce a conclusive result. Differential Revision: https://reviews.llvm.org/D80860

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[MLIR] fix/update affine data copy utility for max/min bounds Fix point-wise copy generation to work with bounds that have max/min. Change structure of copy loop nest to use absolute loop indices and subtracting base from the indexes of the fast buffers. Update supporting utilities: Fix FlatAffineConstraints::getLowerAndUpperBound to look at equalities as well and for a missing division. Update unionBoundingBox to not discard common constraints (leads to a tighter system). Update MemRefRegion::getConstantBoundingSizeAndShape to add memref dimension constraints. Run removeTrivialRedundancy at the end of MemRefRegion::compute. Run single iteration loop promotion and load/store canonicalization after affine data copy (in its test pass as well). Differential Revision: https://reviews.llvm.org/D77320

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[MLIR] Introduce full/partial tile separation using if/else This patch introduces a utility to separate full tiles from partial tiles when tiling affine loop nests where trip counts are unknown or where tile sizes don't divide trip counts. A conditional guard is generated to separate out the full tile (with constant trip count loops) into the then block of an 'affine.if' and the partial tile to the else block. The separation allows the 'then' block (which has constant trip count loops) to be optimized better subsequently: for eg. for unroll-and-jam, register tiling, vectorization without leading to cleanup code, or to offload to accelerators. Among techniques from the literature, the if/else based separation leads to the most compact cleanup code for multi-dimensional cases (because a single version is used to model all partial tiles). INPUT affine.for %i0 = 0 to %M { affine.for %i1 = 0 to %N { "foo"() : () -> () } } OUTPUT AFTER TILING W/O SEPARATION map0 = affine_map<(d0) -> (d0)> map1 = affine_map<(d0)[s0] -> (d0 + 32, s0)> affine.for %arg2 = 0 to %M step 32 { affine.for %arg3 = 0 to %N step 32 { affine.for %arg4 = #map0(%arg2) to min #map1(%arg2)[%M] { affine.for %arg5 = #map0(%arg3) to min #map1(%arg3)[%N] { "foo"() : () -> () } } } } OUTPUT AFTER TILING WITH SEPARATION map0 = affine_map<(d0) -> (d0)> map1 = affine_map<(d0) -> (d0 + 32)> map2 = affine_map<(d0)[s0] -> (d0 + 32, s0)> #set0 = affine_set<(d0, d1)[s0, s1] : (-d0 + s0 - 32 >= 0, -d1 + s1 - 32 >= 0)> affine.for %arg2 = 0 to %M step 32 { affine.for %arg3 = 0 to %N step 32 { affine.if #set0(%arg2, %arg3)[%M, %N] { // Full tile. affine.for %arg4 = #map0(%arg2) to #map1(%arg2) { affine.for %arg5 = #map0(%arg3) to #map1(%arg3) { "foo"() : () -> () } } } else { // Partial tile. affine.for %arg4 = #map0(%arg2) to min #map2(%arg2)[%M] { affine.for %arg5 = #map0(%arg3) to min #map2(%arg3)[%N] { "foo"() : () -> () } } } } } The separation is tested via a cmd line flag on the loop tiling pass. The utility itself allows one to pass in any band of contiguously nested loops, and can be used by other transforms/utilities. The current implementation works for hyperrectangular loop nests. Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Differential Revision: https://reviews.llvm.org/D76700

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[MLIR] Add flat affine constraints method to round trip integer set - add method to get back an integer set from flat affine constraints; this allows a round trip - use this to complete the simplification of integer sets in -simplify-affine-structures - update FlatAffineConstraints::removeTrivialRedundancy to also do GCD tightening and normalize by GCD (while still keeping it linear time). Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[MLIR][NFC] flat affine constraints - refactor to share, renames - refactor to remove duplicate code - some renaming / comment updates for readability Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Differential Revision: https://reviews.llvm.org/D76667

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[mlir][NFC] modernize / clean up some loop transform utils, affine analysis utils Summary: - remove stale declarations on flat affine constraints - avoid allocating small vectors where possible - clean up code comments, rename some variables Differential Revision: https://reviews.llvm.org/D76117

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[mlir] NFC: remove IntegerValueSet / MutableIntegerSet Summary: - these are unused and really not needed now given flat affine constraints Differential Revision: https://reviews.llvm.org/D75792

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[mlir][NFC] remove stray decl of toAffineExpr, rename for readability Summary: - remove stray toAffineExpr decl in affine analysis (name duplicate of mlir::toAffineExpr) - rename mlir::toAffineExpr for better readability - related NFC changes Signed-off-by: Uday Bondhugula <uday@polymagelabs.com> Differential Revision: https://reviews.llvm.org/D75694

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[MLIR][Affine][NFC] Remove obsolete and ambiguous definitions Summary: Looks like a refactor that was never completed. This change removes some unused and ambiguous definitions. Reviewed By: bondhugula, nicolasvasilache, rriddle Differential Revision: https://reviews.llvm.org/D75586

- [D] mlir/include/mlir/Analysis/AffineStructures.h

[MLIR][Affine] NFC: Move AffineValueMap and MutableAffineMap Summary: The `AffineValueMap` is moved into `Dialect/AffineOps` to prevent a cyclic dependency between `Analysis` and `Dialect/AffineOps`. Reviewers: bondhugula, herhut, nicolasvasilache, rriddle, mehdi_amini Reviewed By: rriddle, mehdi_amini Subscribers: mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74277

- [D] mlir/include/mlir/Analysis/AffineStructures.h