Designing an Infinite Depth Folder Structure and Troubleshooting

Spring Boot

Language :

In this post, I will document the process of designing a folder and file tree structure that supports infinite depth while developing a note-taking service.

Initial Business Requirements

  • Folders must support infinite depth creation.
  • Users must be able to arbitrarily sort (custom order) folders and files.

Initially, I intended to fully control the tree structure at the database level. However, after balancing the system resources (DB CPU, memory) with the business requirements, I refactored the architecture.

Initial Design: Reviewing PostgreSQL ltree

When analyzing the initial business requirements, the most challenging backend mission was 'finding and controlling all N-depth descendant nodes (children, grandchildren, etc.) under a specific folder at once.'

For example, imagine a scenario where you need to move an entire folder to another location or search for a file keyword exclusively within a specific folder.

If we use a standard Adjacency List structure with a parent_id, we would have to rely on recursive queries in the DB or endure the N+1 query problem.

To solve this, I actively considered the Materialized Path pattern and PostgreSQL's ltree extension, which stores the path as a string and utilizes a GiST multi-dimensional index to quickly scan sub-trees at O(log N) speed.

Pros and Cons of ltree

It is optimal for fast Read operations (searching a specific node's sub-tree). However, whenever the tree structure is modified (Moved), it triggers heavy string operations and massive Write Locks.

Initial ltree-based DDL

PostgreSQL

SQL

CREATE EXTENSION IF NOT EXISTS ltree;

CREATE TABLE note_folder (
    id BIGINT PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    path ltree NOT NULL, -- e.g., '1.5.12' (ID path from root to current node)
    sort_order INT NOT NULL -- Custom Ordering defined by the user
);
CREATE INDEX idx_folder_path_gist ON note_folder USING GIST (path);

Initial ltree-based Entity Structure

NoteFolder.class

Java

@Entity
public class NoteFolder {
    @Id
    private Long id; // ID generation (e.g., TSID) is required before INSERT to assemble the path
    
    private String name;
    
    @Column(columnDefinition = "ltree", nullable = false)
    private String path;
}

However, this structure carried a fatal write overhead: every time a folder was moved, the paths of tens of thousands of its child folders/files had to be bulk-updated.

Design Change: Compromising Requirements and Architecture Shift

Attempting to control the tree perfectly at the database level placed a heavy burden on system resources (DB CPU, memory). Consequently, I had to negotiate the business requirements.

1. Abandoning Custom Ordering

Allowing users to arbitrarily specify the order of folders/files via drag-and-drop triggers massive sort_order update queries every time.

Referring to cases like Obsidian, Evernote, and Windows/Mac OS file systems, I consulted with the product planning team and simplified this to "Sort by Name / Creation Date".

2. Changing the Response Structure (Loading the entire tree at once)

After discussing with the frontend developer, we decided not to lazy-load child folders upon clicking to reduce API calls. Instead, we agreed to return the user's entire 'Nested Tree' at once upon initial entry, allowing the client to manage the state.

Because of these two decisions, the need to use ltree at the DB level to slice and search specific sub-trees completely disappeared.

I finalized MySQL as our RDBMS, and the data structure reverted to the lightest and most intuitive Adjacency List (parent_id).

MySQL

SQL

-- Folder Table (Path is removed, only parent_id remains)
CREATE TABLE note_folder (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    member_id BIGINT NOT NULL,
    parent_id BIGINT COMMENT 'NULL이면 최상위 루트 폴더',
    name VARCHAR(255) NOT NULL
);
CREATE INDEX idx_folder_member_parent ON note_folder (member_id, parent_id);

-- Note (File) Table
CREATE TABLE note_file (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    member_id BIGINT NOT NULL,
    note_folder_id BIGINT NULL COMMENT 'NULL이면 최상위 루트 파일',
    title VARCHAR(255) NOT NULL
);
CREATE INDEX idx_file_member_folder ON note_file (member_id, note_folder_id);

With the path string gone, the complexity of having to pre-generate distributed IDs (TSID) in memory during data insertion was eliminated, allowing us to use a simple single INSERT structure.

Minimizing DB Operations and In-Memory Tree Assembly

The biggest hurdle was that we had to return the entire data as a Nested Tree format: { child: [ { child: [] } ] }.

Attempting to build this tree via RDBMS queries imposes a heavy load on the DB CPU due to JOINs and ORDER BY operations.

Therefore, we adopted a strategy where the DB only fetches a flat list, and the Spring Boot application's memory (HashMap) constructs and sorts the tree with O(N) time complexity.

Considering the maximum expected number of folders and files a single user could create, we determined that loading and assembling this data in the application memory was safe for server resources.

NoteTreeReadService.class

Java

@Service
@RequiredArgsConstructor
public class NoteTreeReadService {

    private final FolderRepository folderRepository;
    private final FileRepository fileRepository;
    private final NoteOptionRepository noteOptionRepository;

    @Transactional(readOnly = true)
    public List<NoteDirListDto> getFullTree(long memberId) {
        // 1. Load user-specific sorting options
        boolean isAscending = noteOptionRepository.findById(memberId)
                .map(NoteOption::getIsAscending).orElse(true);
        
        // 2. Minimize DB operations: Batch load flat data utilizing indexes
        List<NoteFolder> folders = folderRepository.findByMemberId(memberId);
        List<NoteFile> files = fileRepository.findByMemberId(memberId);

        Map<Long, NoteDirListDto> nodeMap = new HashMap<>();
        List<NoteDirListDto> rootNodes = new ArrayList<>();

        // 3. Map folder nodes (Type: DIR)
        folders.forEach(f -> nodeMap.put(
            f.getId(), 
            NoteDirListDto.builder()
                .id(f.getId())
                .name(f.getName())
                .type("DIR")
                .build()
        ));

        // 4. Assemble the folder tree
        folders.forEach(f -> {
            NoteDirListDto current = nodeMap.get(f.getId());
            if (f.getParentId() == null) {
                rootNodes.add(current); // Root folder
            } else {
                NoteDirListDto parent = nodeMap.get(f.getParentId());
                if (parent != null) parent.getChild().add(current);
                else rootNodes.add(current); // Orphaned nodes are integrated into the root
            }
        });

        // 5. Map and assemble file nodes
        files.forEach(file -> {
            NoteDirListDto fileDto = NoteDirListDto.builder()
                .id(file.getId())
                .name(file.getTitle())
                .type("FILE")
                .build();
            
            if (file.getNoteFolderId() == null) {
                rootNodes.add(fileDto); // Root file
            } else {
                NoteDirListDto parent = nodeMap.get(file.getNoteFolderId());
                if (parent != null) parent.getChild().add(fileDto);
            }
        });

        // 6. In-memory sort (Utilizing Timsort O(N log N))
        sortTree(rootNodes, isAscending);

        return rootNodes;
    }

    // Offloading to WAS's Timsort operation to prevent DB ORDER BY overhead
    private void sortTree(List<NoteDirListDto> nodes, boolean isAscending) {
        nodes.sort((a, b) -> {
            int typeCompare = a.getType().compareTo(b.getType()); // DIR 우선 정렬
            if (typeCompare != 0) return typeCompare;
            int nameCompare = a.getName().compareToIgnoreCase(b.getName());
            return isAscending ? nameCompare : -nameCompare;
        });
        
        nodes.forEach(node -> {
            if (!node.getChild().isEmpty()) sortTree(node.getChild(), isAscending);
        });
    }
}

Through this approach, the merging and sorting of tens of thousands of nodes are safely completed in the application memory within 10 to 20 milliseconds.

Side Note: JavaScript Precision Loss

During frontend integration, a critical bug emerged: a normal Long type ID from the backend (1234567890123456789) was deformed in the browser into a value with precision loss at the end (1234567890123456770).

This occurred because the backend's ID value exceeded the maximum safe integer (16 digits) that JavaScript's Number type (IEEE 754) can safely represent.

Since transmitting incorrect IDs can lead to severe malfunctions, such as modifying another user's data, we resolved this by serializing the ID field of the response DTO into a String.

Java

public class NoteDirDto {
    // Passed to the frontend as a string ("id": "12345...") to prevent precision loss
    @JsonSerialize(using = ToStringSerializer.class)
    private Long id; 
    // ...
}

Conclusion

By letting go of over-engineering (ltree, custom ordering) and focusing on the essence (sorting by name, hierarchical rendering), we were able to make the system architecture much simpler and more robust.

I gained valuable insights as a backend developer throughout the process of solving a complex problem using the simplest data structure (HashMap) and queries.

민갤

Back-End Developer

백엔드 개발자입니다.