MarketTone / Fin-Qwen

Overview

The goal is to make a local 8B model more reliable at understanding short financial posts, social-market slang, sarcasm, tickers, dilution pressure, and risk signals while producing JSON that downstream systems can parse.

Structured sentiment output

The model outputs a fixed JSON contract with sentiment_score, reasoning, tickers, and risk_flag.

Financial social context

The task covers meme language, cash burn, ATM capacity, dilution, rug/scam language, and leverage risk.

Teacher distillation

A teacher model creates structured labels and reasoning traces, then SFT distills the behavior into Qwen3-8B.

Evaluation discipline

The page separates clean eval from hard eval and states that metrics measure alignment with teacher labels, not human gold labels.

Problem

Financial social posts are often short but semantically dense. A single sentence can contain tickers, sarcasm, company finance facts, retail slang, and implicit risk. General models may read surface-level positive wording as bullish and miss financing pressure or high-leverage risk.

Sarcasm"Totally fine" may be bearish when it appears near cash burn, dilution, or a selloff.

Ticker ambiguityThe model must extract only real mentioned stock or crypto symbols, not ordinary words.

Meme languageTerms such as diamond hands, rug, bull trap, and dead cat bounce cannot be interpreted literally.

Mixed financial factsA post may combine positive and negative facts, such as a revenue beat with weaker margin guidance.

Schema

Fin-Qwen frames the task as structured generation. The model must emit a fixed JSON schema so the output can be consumed by monitoring, retrieval, alerting, or visualization systems.

{
  "sentiment_score": 2,
  "reasoning": "The text is bearish because the ATM expansion after cash burn suggests potential dilution pressure.",
  "tickers": ["IONQ"],
  "risk_flag": true
}

Field	Meaning
sentiment_score	A 1-5 sentiment score, where 1 is strongly bearish and 5 is strongly bullish.
reasoning	An explanation of the financial meaning, sarcasm, risk, and context.
tickers	Stock or crypto symbols that are actually mentioned in the input text.
risk_flag	Whether the text contains clear risk signals such as dilution, liquidation risk, rug/scam language, or cash pressure.

Pipeline

The training pipeline starts with public financial text and hard-case examples, then moves through cleaning, teacher annotation, SFT data construction, QLoRA fine-tuning, evaluation, and a local web demo comparison.

Raw financial textStockTwits-style posts, financial sentences, FiQA-style text, and hard-case examples.
Cleaning / deduplicationClean, deduplicate, and filter English text to reduce noise and repeated samples.
Teacher annotationThe teacher generates sentiment, reasoning, tickers, and risk_flag outputs.
SFT JSONLBuild instruction / input / output training records for supervised fine-tuning.
Qwen3-8B QLoRA fine-tuningTrain with 4-bit loading, LoRA adapters, Unsloth, PEFT, and TRL SFTTrainer.
Clean Eval + Hard EvalEvaluate both normal financial text and difficult cases such as sarcasm, dilution, leverage, and AI hype.
Web DemoCompare Base Qwen3-8B, Fin-Qwen, and the teacher on the same input.

Data

The public page uses a conservative disclosure level: it describes data source categories, construction steps, and evaluation sizes without exposing raw training content or bulk teacher outputs.

StockTwits-style social postsShort posts, cashtags, social-market tone, and retail-investor language.

Financial PhraseBank-style sentencesCompany finance news and factual financial sentence patterns.

FiQA-style sentiment textFinancial sentiment expressions from question-answering and comment-style text.

Reviewed hard examplesCases covering sarcasm, dilution, rug/scam language, leverage, AI hype, ticker ambiguity, and mixed facts.

Data processing includes cleaning, deduplication, English filtering, teacher annotation, and train/test overlap checks.

Results

The displayed comparison uses Base Qwen3-8B with a schema prompt as the baseline. Weighted F1 is better when higher; MAE is better when lower. These metrics measure alignment with teacher labels, not human gold labels.

Clean Weighted F10.8220 -> 0.8840

Hard Weighted F10.7382 -> 0.8528

Clean MAE0.3850 -> 0.2650

Hard MAE0.5325 -> 0.3247

Eval	Metric	Base Qwen3-8B	Fin-Qwen	Change
Clean Eval	Weighted F1	0.8220	0.8840	+0.0620
Clean Eval	MAE	0.3850	0.2650	-0.1200
Hard Eval	Weighted F1	0.7382	0.8528	+0.1146
Hard Eval	MAE	0.5325	0.3247	-0.2078

Clean Eval contains 200 teacher-labeled examples. Hard Eval contains 385 reviewed hard examples.

Challenges

Challenge	Problem	Solution
Structured output stability	Free-form text is hard for downstream systems to parse reliably.	Use a fixed JSON schema, SFT data formatting, parsers, and field validation.
Financial context understanding	Terms such as ATM offering, cash burn, rug, and AI pivot can be misread if interpreted literally.	Use teacher reasoning and hard-case examples to reinforce financial semantics and risk boundaries.
Hard-case evaluation	Only testing ordinary samples can hide failures around sarcasm, dilution, and ticker ambiguity.	Build reviewed hard examples and run train/test overlap checks.
Low-VRAM training	Full fine-tuning of an 8B model is memory-intensive.	Use 4-bit loading, LoRA adapters, and gradient accumulation for domain fine-tuning.

Limits

Teacher labels are not human gold labels, so the metrics should be read as agreement with teacher judgments.
The hard-case set includes template-constructed samples, which are useful for stress testing but do not fully represent the real social-media distribution.
The model can still fail on extremely short posts, missing context, or comments that depend heavily on community background.

Build a small human-labeled evaluation set.
Add more real financial social-media hard cases.
Split risk_flag into finer categories such as dilution, leverage, scam, and distress.
Improve the ticker resolver to reduce ticker hallucination and omission.
Publish an online demo, model card, and technical blog.

MarketTone: Fine-Tuned Qwen for Social Market Sentiment