Utilities
This section covers utility functions and helper modules used throughout the pipeline.
Common Utilities
General-purpose utility functions.
read_in_chunks(file_path, mode, chunk_size=4096)
async
read a binary file in chunks.
Source code in eve/utils.py
25 26 27 28 29 30 31 | |
HTTP Utils
HTTP client utilities for server-based processing.
Common HTTP utilities for making API calls across the pipeline.
make_openrouter_request(api_key, model, prompt, max_tokens=1000, temperature=0.1)
async
Make a request to OpenRouter API for LLM completion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str
|
OpenRouter API key |
required |
model
|
str
|
Model name to use |
required |
prompt
|
str
|
The prompt to send |
required |
max_tokens
|
int
|
Maximum tokens in response |
1000
|
temperature
|
float
|
Temperature for response generation |
0.1
|
Returns:
| Type | Description |
|---|---|
Optional[str]
|
Response content or None if request failed |
Source code in eve/common/http_utils.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | |
post_request(url, headers, data, timeout=30)
async
Make an async POST request and return JSON response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
The URL to make the request to |
required |
headers
|
Dict[str, str]
|
Request headers |
required |
data
|
Dict[str, Any]
|
Request data to send as JSON |
required |
timeout
|
int
|
Request timeout in seconds |
30
|
Returns:
| Type | Description |
|---|---|
Optional[Dict[str, Any]]
|
Response JSON data or None if request failed |
Source code in eve/common/http_utils.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | |
Regex Patterns
Common regular expression patterns used throughout the pipeline.
Common regex patterns used across the pipeline.
clean_doubled_backslashes(text)
Clean up doubled backslashes in LaTeX content.
Source code in eve/common/regex_patterns.py
76 77 78 | |
extract_html_meta_tags(html_content)
Extract metadata from HTML meta tags.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
html_content
|
str
|
HTML content as string |
required |
Returns:
| Type | Description |
|---|---|
dict[str, str]
|
Dictionary containing extracted meta tag information |
Source code in eve/common/regex_patterns.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
extract_html_title(html_content)
Extract title from HTML content.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
html_content
|
str
|
HTML content as string |
required |
Returns:
| Type | Description |
|---|---|
str
|
Extracted and cleaned title, or None if not found |
Source code in eve/common/regex_patterns.py
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |
extract_json_ld_count(html_content)
Count JSON-LD structured data blocks in HTML.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
html_content
|
str
|
HTML content as string |
required |
Returns:
| Type | Description |
|---|---|
int
|
Number of JSON-LD script blocks found |
Source code in eve/common/regex_patterns.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
fix_ocr_digit_letter_spacing(text)
Fix OCR issues where digits are concatenated with letters.
Source code in eve/common/regex_patterns.py
99 100 101 | |
get_latex_formula_patterns()
Get all LaTeX formula patterns in a dictionary.
Returns:
| Type | Description |
|---|---|
dict[str, Pattern[str]]
|
Dictionary mapping pattern names to compiled regex patterns |
Source code in eve/common/regex_patterns.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 | |
normalize_excessive_newlines(text)
Replace 3+ consecutive newlines with exactly 2.
Source code in eve/common/regex_patterns.py
81 82 83 | |
remove_nougat_artifacts(text)
Remove Nougat-specific warning and error artifacts.
Source code in eve/common/regex_patterns.py
104 105 106 107 108 109 | |
remove_single_symbol_lines(text)
Remove lines that contain only a single symbol or punctuation.
Source code in eve/common/regex_patterns.py
86 87 88 89 90 91 92 93 94 95 96 | |
Prompts
Prompt templates used in LLM-based processing.
Common prompts used across the pipeline.
get_latex_correction_prompt(formula_type, error_message, formula, context)
Generate a LaTeX correction prompt.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula_type
|
str
|
Type of LaTeX formula (inline, display, etc.) |
required |
error_message
|
str
|
The error message from LaTeX compilation |
required |
formula
|
str
|
The problematic formula |
required |
context
|
str
|
Surrounding context for better understanding |
required |
Returns:
| Type | Description |
|---|---|
str
|
Formatted prompt string |
Source code in eve/common/prompts.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | |
Logging
Logging configuration and utilities.