I attended PyCon this weekend. Here's what I learnt.
LLMs
- "AI engineers" are a thing. ML engineers optimize inference. AI engineers build agents and chains.
- Local models (llama2-70b, llama.cpp) are easy to run. LocalLlama subreddit and the HuggingFace MBET leaderboard are worth tracking. Local text embeddings are better than OpenAI's.
- Alternatives to langchain or LlamaIndex have emerged. Haystack, Microsoft's Semantic Kernel.
- Automate regression tests when changing models. Create and validate them using multiple LLMs.
- ReLM - a RegEx for LLMs - helps in this.
- RAG context order matters. OpenAI ignores the middle. Local LLMs use only the end. Varies with LLM. Use cross-encoders to re-rank.
- Optimize these metrics:
- First time to token (FTT): How long does it take to generate the first token?
- Query time (for RAG). Quantize the model, use GPUs, and cache (both responses AND query)
- Indexing time. Quantize embeddings themselves. Compess to fewer dimensions or even binary (still gets to 95%+ accuracy). Use sparse vectors
- In production, plan for
- Retries on failure. How long to wait? How many times to retry?
- Fallback to local models or cache
- Logging & tracing errors
- Special purpose LLMs are smaller but general purpose LLMs will be used more, trained more, and likely outperform them
- Use case and RoI identification is the biggest challenge for business leaders
- Almost no enterprise has monetized LLM apps yet
- Projects tend to ignore intangible risks / costs by simply not thinking about them, but these can be anticipated and quantified
- To build a team
- Encourage questions and curiosity - more than answers or certainty
- Train on concepts (Word2Vec, attention, transformers, RHLF), tools, use-cases and productionization
- Dashtoon is making comics via Gen AI. They have a team in Bangalore and might be worth exploring ComicGen with them.
Python
- Most Python web frameworks use ASGI. Tornado, which Jupyter and Gramex run on, is the notable exception.
- ASGI chains/wraps async functions with
(scope, receive, send)to build extensible web apps
- ASGI chains/wraps async functions with
- They're all build on top of asyncio.Protocol
- uvicorn and hypercorn are the base ASGI servers. Uvicorn uses callbacks which is efficient but less readable. Hypercorn uses streaming which is more readable but slightly slower.
- Uvicorn sends HTTP headers to the client immediately, before the body is computed. Clients may benefit from this, but if body generation fails, it could be incorrect.
- Uvicorn uses h11 to parse HTTP. It's pure Python, unlike httptools which is faster.
- Starlette builds on Uvicorn and has a ASGI middleware stack, tasks that handle Keep-Alive, lifespan handlers, and more
- Pydantic parses and validates inputs. (Unlike JSON schema, it also converts data types.) It's now a funded company
- Uvicorn and FastAPI are close enough to be released as 1.0, but Starlette is not
- Python and package security is a PSF focus area. They hired Seth Larson and Mike Fiedler.
- Python now uses sigstore to sign Python and PyPi packages which are logged in Rekor for auditability.
- pip-audit scans Python packages for known vulnerabilities (like
npm audit). - Rust is a fast, safe, concurrent language. PyO3 is a Rust binding for Python and maturin can build scaffolding for Python packages written in Rust.
- Having a 2-language strategy may not be organizationally scalable, but Rust may be a good language to learn
DevOps & Cloud
Some of the DevOps domains and tools I heard mentioned were:
- Infrastructure as code: Terraform, Pulumi
- Monitoring & alerting: DataDog, Prometheus, Grafana, Loki, Pyroscope, Open Telemetry
- Note: I'm surprised I didn't hear Sentry
- CI/CD pipelines: No specific tools, though Github actions was the default
- Scheduled data pipelines are a good use case for serverless functions like AWS Lambda. They're easy to deploy, maintain and log, especially when deployed with Terraform and when the source and data are on the cloud. They can be chained via notifications (e.g. SNS) or web API calls.
People
- Bowrna shared what she learnt while contributing to open source:
- "I learned not to ask permission to ask questions. Instead, to write a detailed question that someone can answer offline without needing clarifications." Just for that one skill, open source contributions are valuable. This is a talk worth sharing.
- "After putting your thoughts into words, your thoughts become clear. I learnt that from writing documentation."
- "Code is a liability. The best code is zero lines."
- (Then I stopped noting, listened, congratulated her on the best PyCon talk, and invited her to speak at Gramener.)
- Jessica Green said the German Govt helped retrain her on Python. Governments re-skilling people on technology is a powerful lever.
- Marcelo Trylesinski works remotely out of coffee shops