Controlling Language Difficulty in Dialogues with Linguistic Features

Dilaprix: A metric for quantifying and regulating language difficulty in dialogues using linguistic features.

📌 Introduction

The Dialogue Language Proficiency Index (Dilaprix) is a composite metric that evaluates the linguistic complexity of dialogue utterances based on three categories of features:

Readability features (e.g., Flesch-Kincaid Grade Level)
Syntactic features (e.g., syntactic tree depth)
Lexical features (e.g., simple word ratio)

Dilaprix enables fine-grained control over language difficulty—useful for educational dialogue systems, accessibility tools, and language learning applications.

📊 Example: Utterances vs. Dilaprix Scores

Utterance	Dilaprix
Thank you for coming, Lily. Do you like meat?	0.08
Thank you for coming, Lily. I appreciate your help in the kitchen. To start with, do you like meat?	0.30
Thank you for coming, Lily. I appreciate your help in the kitchen. To better understand your preferences, may I ask: do you like meat?	0.55
Ah, excellent, Lily, for you to grace us with your presence in the kitchen. Now, to delve into a gastronomical inquiry: do you have an affinity for meat?	0.81

🔍 Lower Dilaprix = simpler language; Higher Dilaprix = more complex language

🧠 Linguistic Features

Dilaprix integrates the following 11 features:

Readability

Flesch Reading Ease ($F_R$): Higher = easier to read.
Flesch-Kincaid Grade Level ($F_G$): US grade level estimate.
Gunning Fog Index ($G_F$): Based on sentence length and complex words (≥3 syllables).
Coleman-Liau Index ($C_L$): Uses character counts instead of syllables.

Syntax

Tree Depth ($T_D$): Max depth of syntactic parse trees.
Leaf Node Count ($L_N$): Max number of leaf nodes in any sentence.
Non-terminal Diversity ($N_D$): Unique non-terminal tags in parse trees.
Subtree Complexity ($S_C$): Max number of sub-trees per sentence.
Utterance Length ($U_L$): Total tokens.

Lexicon

Simple Word Ratio ($S_W$): Proportion of words in a simple vocabulary list.
Intermediate Word Ratio ($I_W$): Proportion in an intermediate vocabulary list.

📐 Dilaprix Formula

The final score is computed as:

Where:

$\mathcal{X} = {F_R, F_G, G_F, C_L, T_D, L_N, N_D, S_C, U_L, S_W, I_W}$
$\mathcal{X}' = {F_R, S_W, I_W}$: features inversely related to difficulty
$\alpha_i$, $\beta_i$: 5th and 95th percentiles from a textbook dialogue corpus (used for robust normalization)
$\text{clamp}(v, 0, 1)$: ensures output stays in $[0, 1]$

🚀 Get Started

Installation

cd language_difficulty_control
pip install -e .

Usage

from language_difficulty_control import LinguisticAnalyzer

analyzer = LinguisticAnalyzer()
features = analyzer("Hello! How are you today?")
dilaprix = features["dilaprix"]
print(f"Dilaprix: {dilaprix:.2f}")

Output

Dilaprix: 0.06

Language Proficiency Controlled Dialogue Prompt Example

[flesch_reading_ease] for the Flesch-Kincaid Reading Ease;
[flesch_kincaid_grade_level] for Flesch-Kincaid Grade Level;
[gunning_fog] for the Gunning Fog Index;
[coleman_liau] for the Coleman Liau Index;
[tree_depth] The max Depth of the Constituency Parsing Trees of the sentences in your response;
[leaf_node_count] The max number of leaf nodes of the Constituency Parsing Trees of the sentences in your response;
[non_terminal_diversity] The max number of unique tags of the Constituency Parsing Trees of the sentences in your response;
[subtree_complexity] The max number of sub-trees of the Constituency Parsing Trees of the sentences in your response;
[utterance_length] the number of words in your response;
[simple_words_ratio] the ratio of simple words in your response;
[intermediate_words_ratio] the ratio of simple and intermediate words in your response.

You are given a context and dialogue tasks, and are asked to play a role to continue the following conversation naturally.

[DIALOGUE TASKS]
1. Ask Anna if she can play the piano
2. Ask Anna if she can ride a bike
[CURRENT DIALOGUE TASK]
2. Ask Anna if she can ride a bike
[CONTEXT]
Ming: Hi Anna, can you play the piano?
Anna: Yes, I can.
Your reply should consist of two parts:

1. First part should respond to the user kindly based on the context;
2. Second part should carry out the [CURRENT DIALOGUE TASK].

Additionally, your response should abide by the following linguistic features:
[flesch_reading_ease] 86.42
[flesch_kincaid_grade_level] 3.07
[gunning_fog] 3.0
[coleman_liau] 2.99
[tree_depth] 9
[leaf_node_count] 10
[non_terminal_diversity] 14
[subtree_complexity] 22
[utterance_length] 18
[simple_words_ratio] 0.8
[intermediate_words_ratio] 1.0

📚 Citation

@misc{xu2025controllinglanguagedifficultydialogues,
      title={Controlling Language Difficulty in Dialogues with Linguistic Features}, 
      author={Shuyao Xu and Wenguang Wang and Handong Gao and Wei Kang and Long Qin and Weizhi Wang},
      archivePrefix={arXiv},
      url={https://arxiv.org/abs/2509.14545}, 
}

📄 License

This project is licensed under the Apache License – see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
figures		figures
language_difficulty_analysis		language_difficulty_analysis
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Controlling Language Difficulty in Dialogues with Linguistic Features

📌 Introduction

📊 Example: Utterances vs. Dilaprix Scores

🧠 Linguistic Features

Readability

Syntax

Lexicon

📐 Dilaprix Formula

🚀 Get Started

Installation

Usage

Output

Language Proficiency Controlled Dialogue Prompt Example

📚 Citation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

alibaba/language-difficulty-control

Folders and files

Latest commit

History

Repository files navigation

Controlling Language Difficulty in Dialogues with Linguistic Features

📌 Introduction

📊 Example: Utterances vs. Dilaprix Scores

🧠 Linguistic Features

Readability

Syntax

Lexicon

📐 Dilaprix Formula

🚀 Get Started

Installation

Usage

Output

Language Proficiency Controlled Dialogue Prompt Example

📚 Citation

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages