Hacker News Clone

AI dev startups are struggling with one problem and I solved it - with POC

by kannthu on 7/1/2024, 1:51:12 PM with 1 comments

*TL;DR;*

Over one month ago I posted about a really hard problem that I "accidentally" solved (https://news.ycombinator.com/item?id=40460084).

The problem is to resolve cross-file references for multiple programming languages. I can generate a graph representation of the codebase.

*Why do you need to have a graph representation of the codebase?*

- To understand how code references other code

- Track how data is passed around

I generated references for repo https://github.com/dj-stripe/dj-stripe, here is a gist: https://gist.githubusercontent.com/kannthu/6e1bdd2781d2e0a6ded30844d61f089e/raw/f1fa4bc0f34891834ce13ac256eec12f6cc671e1/dj-stripe-references.json

The gist is a big JSON blob that contains definitions form the repository.

Definitions are:

- top-level functions

- classes

- methods and public properties

- top-level variables

- exports

Each definition contains:

- Snippet, path, and range within the file

- "references" - a list of places where the definition is used

- "expressions" - a list of resolved references (variables, functions, and classes) that are used within the body of the definition

*How this data can be useful?*

If you are building code generation, code intelligence, or code review products - your product needs to have an understanding of the codebase for many programming languages at once. The more accurate context you feed to LLM => the better output you will get, and doing it in-house is really expensive and resource-consuming.

Let me know if it is interesting for any of you.

by kannthu on 7/1/2024, 1:52:52 PM
Clickable links:
- https://news.ycombinator.com/item?id=40460084
- https://github.com/dj-stripe/dj-stripe
- https://gist.githubusercontent.com/kannthu/6e1bdd2781d2e0a6d...