T TTE Agent Protocol-first causal software

TTE Agent documentation

Protocol-first software for defensible longitudinal causal studies.

TTE Agent helps investigators design, audit, and report observational longitudinal analyses before modeling begins. It is built for target trial emulation, causal guardrails, reproducible audit trails, and manuscript-grade appendices.

TTE Agent workflow Protocol, data, guardrails, reporting, and approved modeling linked by a signed audit trail. Protocol YAML design first Data profile support checks Red flags block or warn Reports appendices Approved run optional library Signed audit trail: protocol, diagnostics, plan, output, warnings
Python 3.9+ Deterministic agent layer Target trial emulation Longitudinal mediation guardrails Version 0.1.0

Overview

What TTE Agent adds

TTE Agent is not just a modeling script. It is a protocol, audit, planning, reporting, and guarded-orchestration layer around serious observational causal research.

Protocol intelligence

Requires eligibility, time zero, exposure strategies, assignment emulation, censoring, outcome, follow-up, estimand, covariate timing, positivity, and sensitivity analysis before modeling.

Data-support audit

Profiles required variables, missingness, time support, exposure and outcome counts, mediator availability, and temporal ordering.

Scientific guardrails

Flags post-exposure adjustment, mediator/collider mistakes, ambiguous index dates, immortal time risk, outcome leakage, positivity failure, sparse events, unstable weights, and estimand-model mismatch.

Manuscript-grade reporting

Generates target trial tables, STROBE-style checklists, causal assumptions tables, diagnostics summaries, sensitivity registries, and limitations language.

Scientific boundary

TTE Agent does not prove causal identification, does not replace investigator judgment, and does not automatically estimate natural direct or indirect effects. Its value is to make assumptions, design choices, and modeling permissions explicit before estimates are produced.

Installation

Local development workflow

The public package release is planned with the scientific manuscript. Until then, the documentation reflects the current repository workflow.

Locked development environment

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements-dev.lock
python -m pip install -e .
python -m pytest tests -q

Command-line entry point

tte-agent --help

python -m tte.cli profile \
  --data person_interval.csv \
  --output dataset_profile.md

Quick start

Run an audit before modeling

The central workflow is deliberately conservative: draft or load a protocol, profile the data, run deterministic checks, then decide whether modeling is scientifically appropriate.

import pandas as pd
from tte import TTEAgent

data = pd.read_csv("person_interval.csv")
agent = TTEAgent()

spec = agent.draft_protocol(
    analysis_name="ggt_hba1c_t2d",
    id_col="person_id",
    time_col="visit_index",
    exposure_col="ggt_high",
    mediator_col="hba1c_next",
    outcome_col="incident_t2d_next",
    baseline_covariates=["age", "sex"],
    time_varying_covariates=["bmi", "sbp", "hba1c", "ggt"],
)

result = agent.audit(data, spec=spec)
print(result.report)
print(result.next_actions)

agent.write_artifacts(result, "audit_output", prefix="ggt_hba1c_t2d")

Runnable examples

Try fixed public scenarios in the browser

These examples are safe, deterministic demonstrations of the current agent behavior. They do not upload data and do not run arbitrary code.

Complete protocol audit

A synthetic longitudinal panel with a complete target-trial protocol.

Click Run to see the deterministic output.

Protocol schema

Questions the agent requires before modeling

The schema forces design elements that are often left implicit in observational analyses.

Protocol element Why it matters
EligibilityDefines who could enter the emulated trial and prevents hidden selection rules.
Time zeroAnchors exposure, mediator, outcome, censoring, and follow-up to a common origin.
Treatment strategiesSpecifies the exposure regimes being compared.
Assignment emulationStates how observational assignment approximates randomized allocation and what exchangeability assumptions are needed.
Censoring definitionClarifies loss of follow-up, competing data processes, and censoring assumptions.
Outcome definitionPrevents leakage and makes event timing auditable.
Estimand and contrastKeeps total, direct, indirect, interventional, and predictive analyses distinct.
Covariate timingSeparates baseline covariates from post-exposure variables and mediators.
Positivity assumptionsRequires support for the strategies in relevant covariate histories.
Sensitivity analysesMakes unmeasured confounding, measurement error, and modeling choices visible.

Scientific guardrails

Red flags are first-class outputs

The agent separates blockers from cautions so that investigators know what must be fixed before modeling and what must remain visible during interpretation.

Post-exposure adjustment in total-effect models Mediator or collider role mistakes Ambiguous index dates Immortal time risk Outcome leakage Missing censoring definition Positivity failure Sparse events Unstable weights Estimand-model mismatch

Reporting

Reproducibility appendices for manuscripts

The reporting layer is designed to make manuscript drafting more transparent, not automatic. Investigators must review every generated statement before publication.

Target trial table

Eligibility, strategies, assignment, follow-up, censoring, outcome, and estimand.

STROBE-style checklist

Observational reporting items shaped for longitudinal causal studies.

Causal assumptions table

Exchangeability, consistency, positivity, censoring, and mediation-specific assumptions.

Diagnostics table

Data support, missingness, temporal order, sparse cells, and red flags.

Weight summaries

Distribution summaries and instability warnings when weights are present.

Sensitivity registry

Planned robustness checks before interpretation begins.

API reference

Core public interfaces

The current API is small by design. It exposes deterministic building blocks that a future conversational agent or web app can call safely.

TTEAgent

Facade for protocol drafting, data profiling, audit, next actions, analysis plan, reporting, artifact writing, and approved library-backed runs.

create_protocol_template

Creates a conservative protocol template with explicit placeholders for investigator decisions.

run_protocol_audit

Runs deterministic data-support checks and returns report-ready audit results.

run_external_validation_benchmarks

Runs generic synthetic benchmarks for audit behavior across valid, positivity-failure, and sparse-event settings.

Scientific status

Implemented, conservative, and future work

The current software is already useful as a reproducibility and scientific guardrail layer. The estimator layer should grow slowly, with explicit validation.

Area Current status Scientific interpretation
Protocol gateImplementedBlocks modeling until key target-trial questions are answered.
Data auditImplementedChecks support, timing, missingness, and sparse cells.
Red flagsImplementedDetects common causal design failures before modeling.
Reporting appendicesImplementedProduces deterministic manuscript drafting aids.
Library integrationGuardedAllowed only after audit approval, with signed artifacts.
Natural direct and indirect effectsConservative boundaryRequires assumptions beyond software execution; not automatically asserted.
Interventional effectsFuture estimator workImportant next direction when exposure-induced mediator-outcome confounding is present.

Roadmap

Path toward a mature scientific agent

  1. Documentation release: maintain examples, API references, scientific boundaries, and reproducibility recipes.
  2. Safe web demos: add fixed, parameter-bounded examples that run server-side without arbitrary code execution.
  3. Benchmark expansion: support multiple dataset structures and synthetic causal scenarios beyond JMDC-style panels.
  4. Estimator validation: add estimator modules only after simulation-based checks, diagnostics, and reporting standards are clear.
  5. Agent interface: introduce LLM assistance only around question-asking, protocol drafting, and report explanation, with deterministic tools as the source of truth.