Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2206.01299
Communication compression is a crucial technique for modern distributed learning systems to alleviate their communication bottlenecks over slower networks. Despite recent intensive studies of gradient compression for data parallel-style training, compressing the activations for models trained with pipeline parallelism is still an open problem. In this paper, we propose AC-SGD, a novel activation compression algorithm for communication-efficient pipeline parallelism training over slow networks. Different from previous efforts in activation compression, instead of compressing activation values directly, AC-SGD compresses the changes of the activations. This allows us to show, to the best of our knowledge for the first time, that one can still achieve $O(1/\sqrt{T})$ convergence rate for non-convex objectives under activation compression, without making assumptions on gradient unbiasedness that do not hold for deep learning models with non-linear activation functions.We then show that AC-SGD can be optimized and implemented efficiently, without additional end-to-end runtime overhead.We evaluated AC-SGD to fine-tune language models with up to 1.5 billion parameters, compressing activations to 2-4 bits.AC-SGD provides up to 4.3X end-to-end speed-up in slower networks, without sacrificing model quality. Moreover, we also show that AC-SGD can be combined with state-of-the-art gradient compression algorithms to enable "end-to-end communication compression: All communications between machines, including model gradients, forward activations, and backward gradients are compressed into lower precision.This provides up to 4.9X end-to-end speed-up, without sacrificing model quality.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2206.01299
- https://arxiv.org/pdf/2206.01299
- OA Status
- green
- Cited By
- 2
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4281945007
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4281945007Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2206.01299Digital Object Identifier
- Title
-
Fine-tuning Language Models over Slow Networks using Activation Compression with GuaranteesWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-06-02Full publication date if available
- Authors
-
Jue Wang, Binhang Yuan, Luka Rimanić, Yongjun He, Tri Dao, Beidi Chen, Christopher Ré, Ce ZhangList of authors in order
- Landing page
-
https://arxiv.org/abs/2206.01299Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2206.01299Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2206.01299Direct OA link when available
- Concepts
-
Computer science, Pipeline (software), Overhead (engineering), Compression (physics), Data compression, Speedup, Convergence (economics), End-to-end principle, Data compression ratio, Parallel computing, Algorithm, Computer engineering, Image compression, Artificial intelligence, Operating system, Economic growth, Economics, Programming language, Image (mathematics), Image processing, Composite material, Materials scienceTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
2Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 1, 2023: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4281945007 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2206.01299 |
| ids.doi | https://doi.org/10.48550/arxiv.2206.01299 |
| ids.openalex | https://openalex.org/W4281945007 |
| fwci | |
| type | preprint |
| title | Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11612 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9993000030517578 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Stochastic Gradient Optimization Techniques |
| topics[1].id | https://openalex.org/T12676 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9947999715805054 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Machine Learning and ELM |
| topics[2].id | https://openalex.org/T10036 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9944000244140625 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Advanced Neural Network Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7834814190864563 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C43521106 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6533254384994507 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q2165493 |
| concepts[1].display_name | Pipeline (software) |
| concepts[2].id | https://openalex.org/C2779960059 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6246106624603271 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q7113681 |
| concepts[2].display_name | Overhead (engineering) |
| concepts[3].id | https://openalex.org/C180016635 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5654808282852173 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q2712821 |
| concepts[3].display_name | Compression (physics) |
| concepts[4].id | https://openalex.org/C78548338 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5279839634895325 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q2493 |
| concepts[4].display_name | Data compression |
| concepts[5].id | https://openalex.org/C68339613 |
| concepts[5].level | 2 |
| concepts[5].score | 0.48392659425735474 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1549489 |
| concepts[5].display_name | Speedup |
| concepts[6].id | https://openalex.org/C2777303404 |
| concepts[6].level | 2 |
| concepts[6].score | 0.46355903148651123 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q759757 |
| concepts[6].display_name | Convergence (economics) |
| concepts[7].id | https://openalex.org/C74296488 |
| concepts[7].level | 2 |
| concepts[7].score | 0.45743754506111145 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q2527392 |
| concepts[7].display_name | End-to-end principle |
| concepts[8].id | https://openalex.org/C94835093 |
| concepts[8].level | 5 |
| concepts[8].score | 0.4524648189544678 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q3113333 |
| concepts[8].display_name | Data compression ratio |
| concepts[9].id | https://openalex.org/C173608175 |
| concepts[9].level | 1 |
| concepts[9].score | 0.42241156101226807 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q232661 |
| concepts[9].display_name | Parallel computing |
| concepts[10].id | https://openalex.org/C11413529 |
| concepts[10].level | 1 |
| concepts[10].score | 0.3561360836029053 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[10].display_name | Algorithm |
| concepts[11].id | https://openalex.org/C113775141 |
| concepts[11].level | 1 |
| concepts[11].score | 0.3273807764053345 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q428691 |
| concepts[11].display_name | Computer engineering |
| concepts[12].id | https://openalex.org/C13481523 |
| concepts[12].level | 4 |
| concepts[12].score | 0.2912136912345886 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q412438 |
| concepts[12].display_name | Image compression |
| concepts[13].id | https://openalex.org/C154945302 |
| concepts[13].level | 1 |
| concepts[13].score | 0.27527889609336853 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[13].display_name | Artificial intelligence |
| concepts[14].id | https://openalex.org/C111919701 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[14].display_name | Operating system |
| concepts[15].id | https://openalex.org/C50522688 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q189833 |
| concepts[15].display_name | Economic growth |
| concepts[16].id | https://openalex.org/C162324750 |
| concepts[16].level | 0 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q8134 |
| concepts[16].display_name | Economics |
| concepts[17].id | https://openalex.org/C199360897 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[17].display_name | Programming language |
| concepts[18].id | https://openalex.org/C115961682 |
| concepts[18].level | 2 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[18].display_name | Image (mathematics) |
| concepts[19].id | https://openalex.org/C9417928 |
| concepts[19].level | 3 |
| concepts[19].score | 0.0 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q1070689 |
| concepts[19].display_name | Image processing |
| concepts[20].id | https://openalex.org/C159985019 |
| concepts[20].level | 1 |
| concepts[20].score | 0.0 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q181790 |
| concepts[20].display_name | Composite material |
| concepts[21].id | https://openalex.org/C192562407 |
| concepts[21].level | 0 |
| concepts[21].score | 0.0 |
| concepts[21].wikidata | https://www.wikidata.org/wiki/Q228736 |
| concepts[21].display_name | Materials science |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7834814190864563 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/pipeline |
| keywords[1].score | 0.6533254384994507 |
| keywords[1].display_name | Pipeline (software) |
| keywords[2].id | https://openalex.org/keywords/overhead |
| keywords[2].score | 0.6246106624603271 |
| keywords[2].display_name | Overhead (engineering) |
| keywords[3].id | https://openalex.org/keywords/compression |
| keywords[3].score | 0.5654808282852173 |
| keywords[3].display_name | Compression (physics) |
| keywords[4].id | https://openalex.org/keywords/data-compression |
| keywords[4].score | 0.5279839634895325 |
| keywords[4].display_name | Data compression |
| keywords[5].id | https://openalex.org/keywords/speedup |
| keywords[5].score | 0.48392659425735474 |
| keywords[5].display_name | Speedup |
| keywords[6].id | https://openalex.org/keywords/convergence |
| keywords[6].score | 0.46355903148651123 |
| keywords[6].display_name | Convergence (economics) |
| keywords[7].id | https://openalex.org/keywords/end-to-end-principle |
| keywords[7].score | 0.45743754506111145 |
| keywords[7].display_name | End-to-end principle |
| keywords[8].id | https://openalex.org/keywords/data-compression-ratio |
| keywords[8].score | 0.4524648189544678 |
| keywords[8].display_name | Data compression ratio |
| keywords[9].id | https://openalex.org/keywords/parallel-computing |
| keywords[9].score | 0.42241156101226807 |
| keywords[9].display_name | Parallel computing |
| keywords[10].id | https://openalex.org/keywords/algorithm |
| keywords[10].score | 0.3561360836029053 |
| keywords[10].display_name | Algorithm |
| keywords[11].id | https://openalex.org/keywords/computer-engineering |
| keywords[11].score | 0.3273807764053345 |
| keywords[11].display_name | Computer engineering |
| keywords[12].id | https://openalex.org/keywords/image-compression |
| keywords[12].score | 0.2912136912345886 |
| keywords[12].display_name | Image compression |
| keywords[13].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[13].score | 0.27527889609336853 |
| keywords[13].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2206.01299 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2206.01299 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2206.01299 |
| locations[1].id | doi:10.48550/arxiv.2206.01299 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2206.01299 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100440604 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-3641-3136 |
| authorships[0].author.display_name | Jue Wang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Wang, Jue |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5002684888 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-3188-2769 |
| authorships[1].author.display_name | Binhang Yuan |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yuan, Binhang |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5020051246 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Luka Rimanić |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Rimanic, Luka |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5101590823 |
| authorships[3].author.orcid | https://orcid.org/0009-0002-5901-0388 |
| authorships[3].author.display_name | Yongjun He |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | He, Yongjun |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5091734792 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Tri Dao |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Dao, Tri |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5031842648 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-7586-1855 |
| authorships[5].author.display_name | Beidi Chen |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Chen, Beidi |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5103852640 |
| authorships[6].author.orcid | |
| authorships[6].author.display_name | Christopher Ré |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Re, Christopher |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5100383731 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-8105-7505 |
| authorships[7].author.display_name | Ce Zhang |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Zhang, Ce |
| authorships[7].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2206.01299 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2022-06-13T00:00:00 |
| display_name | Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11612 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9993000030517578 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Stochastic Gradient Optimization Techniques |
| related_works | https://openalex.org/W624280404, https://openalex.org/W2207317090, https://openalex.org/W4200153455, https://openalex.org/W3003571078, https://openalex.org/W2186754325, https://openalex.org/W4322614724, https://openalex.org/W4300548572, https://openalex.org/W2552401318, https://openalex.org/W2366039184, https://openalex.org/W2226540757 |
| cited_by_count | 2 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2023 |
| counts_by_year[1].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2206.01299 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2206.01299 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2206.01299 |
| primary_location.id | pmh:oai:arXiv.org:2206.01299 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2206.01299 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2206.01299 |
| publication_date | 2022-06-02 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 3, 50 |
| abstract_inverted_index.In | 44 |
| abstract_inverted_index.an | 41 |
| abstract_inverted_index.be | 135, 182 |
| abstract_inverted_index.do | 119 |
| abstract_inverted_index.in | 67, 168 |
| abstract_inverted_index.is | 2, 39 |
| abstract_inverted_index.of | 23, 71, 80, 91 |
| abstract_inverted_index.on | 115 |
| abstract_inverted_index.to | 11, 86, 88, 147, 153, 159, 164, 189, 213 |
| abstract_inverted_index.up | 152, 163, 212 |
| abstract_inverted_index.us | 85 |
| abstract_inverted_index.we | 47, 176 |
| abstract_inverted_index.1.5 | 154 |
| abstract_inverted_index.2-4 | 160 |
| abstract_inverted_index.All | 194 |
| abstract_inverted_index.and | 137, 203 |
| abstract_inverted_index.are | 206 |
| abstract_inverted_index.can | 100, 134, 181 |
| abstract_inverted_index.for | 6, 26, 33, 55, 94, 106, 122 |
| abstract_inverted_index.not | 120 |
| abstract_inverted_index.one | 99 |
| abstract_inverted_index.our | 92 |
| abstract_inverted_index.the | 31, 78, 81, 89, 95 |
| abstract_inverted_index.4.3X | 165 |
| abstract_inverted_index.4.9X | 214 |
| abstract_inverted_index.This | 83 |
| abstract_inverted_index.also | 177 |
| abstract_inverted_index.best | 90 |
| abstract_inverted_index.data | 27 |
| abstract_inverted_index.deep | 123 |
| abstract_inverted_index.from | 64 |
| abstract_inverted_index.hold | 121 |
| abstract_inverted_index.into | 208 |
| abstract_inverted_index.open | 42 |
| abstract_inverted_index.over | 16, 60 |
| abstract_inverted_index.rate | 105 |
| abstract_inverted_index.show | 131, 178 |
| abstract_inverted_index.slow | 61 |
| abstract_inverted_index.that | 98, 118, 132, 179 |
| abstract_inverted_index.then | 130 |
| abstract_inverted_index.this | 45 |
| abstract_inverted_index.with | 36, 126, 151, 184 |
| abstract_inverted_index.first | 96 |
| abstract_inverted_index.lower | 209 |
| abstract_inverted_index.model | 173, 199, 219 |
| abstract_inverted_index.novel | 51 |
| abstract_inverted_index.show, | 87 |
| abstract_inverted_index.still | 40, 101 |
| abstract_inverted_index.their | 13 |
| abstract_inverted_index.time, | 97 |
| abstract_inverted_index.under | 109 |
| abstract_inverted_index.AC-SGD | 76, 133, 146, 180 |
| abstract_inverted_index.allows | 84 |
| abstract_inverted_index.enable | 190 |
| abstract_inverted_index.making | 113 |
| abstract_inverted_index.models | 34, 125, 150 |
| abstract_inverted_index.modern | 7 |
| abstract_inverted_index.paper, | 46 |
| abstract_inverted_index.recent | 20 |
| abstract_inverted_index.slower | 17, 169 |
| abstract_inverted_index.values | 74 |
| abstract_inverted_index.AC-SGD, | 49 |
| abstract_inverted_index.Despite | 19 |
| abstract_inverted_index.achieve | 102 |
| abstract_inverted_index.between | 196 |
| abstract_inverted_index.billion | 155 |
| abstract_inverted_index.changes | 79 |
| abstract_inverted_index.crucial | 4 |
| abstract_inverted_index.efforts | 66 |
| abstract_inverted_index.forward | 201 |
| abstract_inverted_index.instead | 70 |
| abstract_inverted_index.propose | 48 |
| abstract_inverted_index.runtime | 143 |
| abstract_inverted_index.studies | 22 |
| abstract_inverted_index.systems | 10 |
| abstract_inverted_index.trained | 35 |
| abstract_inverted_index.without | 112, 140, 171, 217 |
| abstract_inverted_index.backward | 204 |
| abstract_inverted_index.combined | 183 |
| abstract_inverted_index.gradient | 24, 116, 186 |
| abstract_inverted_index.language | 149 |
| abstract_inverted_index.learning | 9, 124 |
| abstract_inverted_index.pipeline | 37, 57 |
| abstract_inverted_index.previous | 65 |
| abstract_inverted_index.problem. | 43 |
| abstract_inverted_index.provides | 162, 211 |
| abstract_inverted_index.quality. | 174, 220 |
| abstract_inverted_index.speed-up | 167 |
| abstract_inverted_index.training | 59 |
| abstract_inverted_index.Different | 63 |
| abstract_inverted_index.Moreover, | 175 |
| abstract_inverted_index.algorithm | 54 |
| abstract_inverted_index.alleviate | 12 |
| abstract_inverted_index.directly, | 75 |
| abstract_inverted_index.evaluated | 145 |
| abstract_inverted_index.fine-tune | 148 |
| abstract_inverted_index.gradients | 205 |
| abstract_inverted_index.including | 198 |
| abstract_inverted_index.intensive | 21 |
| abstract_inverted_index.knowledge | 93 |
| abstract_inverted_index.machines, | 197 |
| abstract_inverted_index.networks, | 170 |
| abstract_inverted_index.networks. | 18, 62 |
| abstract_inverted_index.optimized | 136 |
| abstract_inverted_index.speed-up, | 216 |
| abstract_inverted_index.technique | 5 |
| abstract_inverted_index.training, | 29 |
| abstract_inverted_index.activation | 52, 68, 73, 110, 128 |
| abstract_inverted_index.additional | 141 |
| abstract_inverted_index.algorithms | 188 |
| abstract_inverted_index.compressed | 207 |
| abstract_inverted_index.compresses | 77 |
| abstract_inverted_index.end-to-end | 142, 166, 215 |
| abstract_inverted_index.gradients, | 200 |
| abstract_inverted_index.non-convex | 107 |
| abstract_inverted_index.non-linear | 127 |
| abstract_inverted_index.objectives | 108 |
| abstract_inverted_index."end-to-end | 191 |
| abstract_inverted_index.activations | 32, 158 |
| abstract_inverted_index.assumptions | 114 |
| abstract_inverted_index.bits.AC-SGD | 161 |
| abstract_inverted_index.bottlenecks | 15 |
| abstract_inverted_index.compressing | 30, 72, 157 |
| abstract_inverted_index.compression | 1, 25, 53, 187 |
| abstract_inverted_index.convergence | 104 |
| abstract_inverted_index.distributed | 8 |
| abstract_inverted_index.implemented | 138 |
| abstract_inverted_index.overhead.We | 144 |
| abstract_inverted_index.parallelism | 38, 58 |
| abstract_inverted_index.parameters, | 156 |
| abstract_inverted_index.sacrificing | 172, 218 |
| abstract_inverted_index.activations, | 202 |
| abstract_inverted_index.activations. | 82 |
| abstract_inverted_index.compression, | 69, 111 |
| abstract_inverted_index.compression: | 193 |
| abstract_inverted_index.efficiently, | 139 |
| abstract_inverted_index.functions.We | 129 |
| abstract_inverted_index.unbiasedness | 117 |
| abstract_inverted_index.Communication | 0 |
| abstract_inverted_index.communication | 14, 192 |
| abstract_inverted_index.communications | 195 |
| abstract_inverted_index.parallel-style | 28 |
| abstract_inverted_index.precision.This | 210 |
| abstract_inverted_index.$O(1/\sqrt{T})$ | 103 |
| abstract_inverted_index.state-of-the-art | 185 |
| abstract_inverted_index.communication-efficient | 56 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 8 |
| citation_normalized_percentile |