Charts are the best storytelling tools for data scientists and engineers, but there is another type of chart called code charts. These graphs are the visual representation of code and execution flow and are used in machine learning projects. Now, thanks to the hard work of Google Research researchers, you’ll be able to create graphs from code much more accurately using machine learning. The types of graphs that can be made using the python_graphs library are – Abstract Syntax Tree (AST), Control Flow Graph (CFG), Data Flow Graph, Interprocedural Control Flow Graph (ICFG), Interval Graph and Composite “Program schedules.” Through this library, programmers can directly construct these graphs from code or give you tools to help you create other varieties of graphs. Graphical representations are a standard tool used in machine learning research, the most common being abstract syntax trees (ASTs), and several research papers use them. A typical syntax-based graph has an AST backbone with some properties of control flow, data flow, and syntax knowledge encoded into it as additional benefits. Other graphing systems create graphs using extra help like CodeQl etc. which can also lead to compile errors or errors in the future due to encoding extra information not given in the code. To improve the situation, Google researchers created python_graph because it does not need another source; therefore it is also free from its defects.

Control flow graphs are the graphs that show the flow of code execution, and each node in the graph represents a primary line of code. In addition to control flow graphs, the python_graphs library can also create statement-level control flow graphs, where a node can represent a single linestatement of code. Through this library, program groups are created, which are graphs with an abstract syntax tree as a backbone, and each node in the program correlates to one node in the AST.

python_graphs also allows alternative composite program graphs that allow the user to select the desired nodes and edges to construct the graph. Inter-procedural Control-flow Graphs allow you to create graphs that connect multiple functions rather than simply representing a single function. The data flow graph shows the dependencies of the diagram, and the nodes represent the location of the variable access, and the edges represent the relationship between those accesses. Mapped range graphs are tokenized graphs designed to be useful for machine learning applications. There are two tokenizations: one is per node and the other is whole program. In the whole program, you tokenize the whole program and then, using python_graphs, create the graph of the program in the node tokenization. You divide the program into parts and according to the node they are part of those parts arranged automatically.

Although this library intends to make life easier for data scientists and researchers around the world, this library still has its own limitations. One of them is that the code is written in Python, and to be parsed properly the code needs to be what we call static in coding terms, whereas Python is a very dynamic language in exactly those terms. So the library does a best effort analysis which cannot guarantee that the analysis is 100% correct.

And while this library has its drawbacks and benefits, just like any other coding library, it exists for the specific reason of making programmers’ lives easier. Where it excels, for its shortcomings, the library does more for the specific task it was created for than its predecessors, for which the people behind it deserve some credit.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'A LIBRARY FOR REPRESENTING PYTHON PROGRAMS AS GRAPHS FOR MACHINE LEARNING'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and github link.

Please Don't Forget To Join Our ML Subreddit


Asif Razzak is an AI journalist and co-founder of Marktechpost, LLC. He is a visionary, entrepreneur and engineer who strives to use the power of artificial intelligence for good.

Asif’s latest venture is developing an AI media platform (Marktechpost) that will revolutionize the way people can find relevant news related to AI, data science and machine learning.

Asif was featured by Onalytica in Who’s Who in AI? (Influential Voices and Brands)’ as one of the ‘Influential Journalists in AI’ ( His interview was also featured by Onalytica (


CMU and Google Researchers Open-Source ‘python_graphs’, a Library for Representing Python Programs as Graphs for Machine Learning Research

Previous articleVyne Announces CEO Succession – Electronic Health Reporter
Next articleTop 10 Industry Sectors for Private 5G