Arguments are typically evaluated by how persuasive they appear on the surface. This makes intuitive sense as it is easier to create a holistic judgement rather than separating logical structure from factual content. But surface level evaluation blends together three distinct things: whether an argument is formally valid, whether its premises provide strong inductive support, and whether those premises are actually true. Standard evaluation rarely gives a definitive answer on any one of them.
The goal of this project is to build a system that takes natural language arguments and detects these three aspects programmatically. The program first classifies each argument as deductive, inductive, or mixed based on its inferential structure. It then reduces the argument to its smallest atomic steps, highlights any hidden assumptions and mapping the general flow of logic. Each step can then be stress tested through adversarial debate, revealing weaknesses that single pass evaluation tends to miss.
I'm continuing to develop this with a longer term goal: decomposing multiple arguments within a field, synthesizing them into their strongest and most common form, and systematically testing for counterexamples and edge cases. My hope is that this will improve argument formation and potentially enable counterexample based investment research.