back
📓

misc projects

 

💰 2022.09.16 film cameras + redispatcher updates

 
Another short update for today. Was sick much of last week and busy otherwise, so didn’t spend too much time working on the image editor. I did have time to make some updates on two other bits of projects though.
 
First off, I finally cleaned up redispatcher - a small open source library I made to help with processing of distributed/background/asynchronous/however you want to call it work in Python. It’s similar to celery or dramatiq but way smaller of a footprint, and with a very declarative, intellisense-ful, and strongly typed API. It’s backed by Redis as its message broker. I’ve had it handing in a half-updated state since April, so the past day or two I’ve spent just cleaning things up, reorganizing the API, updating documentation, etc. I still have a bit of ways to go to bring the documentation up to spec so that it covers everything I want it to, including a nifty little monitoring script that you can use to view stats on worker queues, publish/ack rates, etc. Anyway, if you’re interested in such a library, don’t want to mess with setting up rabbit nor with overhead of something like a celery, check out redispatcher.
 
 
A few months back, as price inflation of everything from groceries to used cars started hitting the front page of the news, I wanted to figure out what the deal would be with analog camera prices.
 
 
 

 

🚴 2022.08.28 - CitiBike, an addendum

 
 
Sadly, they no longer provide Bike ID in recent datasets 😕  would’ve been really cool to, for example, follow along a single bike across this past month. Though, up until (at least) 2016, they include it so may still be something worthwhile there, just not as current.
 

 

🚴 2022.08.28 - CitiBike

 
I live in NYC. I’m sure many of you have heard a tale or stereotype of someone’s taxi getting stolen out from under them by someone else in a rush. That’s never happened to me… with a taxi. But goddamnit it’s happened to me so many times with a Citi Bike dock.
 
I’m really curious in understanding what Citi Bike trends look like over the course of a day and week. I’m interested to see how work commutes change the bike distribution across the boroughs each morning and afternoon. More notably for my purposes, though - I want to know how long on average I’d have to wait for a dock to open up if a station is full.
 
So in order to do this, I set up a little script to record the number of bikes, e-bikes, and docks at each of the 1703 Citi Bike stations. It’s currently runs as a daemon every 5 minutes, and I’ll be letting it run for about the next week or two. Once I’ve collected enough data, I’m going to try out some fun data visualizations - so, stay tuned for that! Unfortunately, this API (as far as I could tell) doesn’t return the IDs of bikes at each station. It’d have been super cool if I could actually track the migration of individual bikes. I’ll dig through their GraphQL schema some more to see if it’s at all possible.
 
You can check out the project on my GitHub here. I’ll be updating it with data-vis tools/scripts, the raw data I collect, and of course the visualization results themselves.
 
The “scraper” is really simple and only took a few minutes to put together since, luckily, you only need to make one HTTP request to CitiBike’s GraphQL API.
 
import asyncio from datetime import datetime, timedelta import httpx import uvloop from p3orm import Porm from citibike.models.db import Run, Station, StationStatus from citibike.models.gql import CitiBikeResponse from citibike.settings import Settings request = httpx.Request( "POST", "https://account.citibikenyc.com/bikesharefe-gql", json={ "query": "query {supply {stations {stationId stationName siteId bikesAvailable ebikesAvailable bikeDocksAvailable location {lat lng}}}}" }, ) run_count = 1 async def run(): global run_count run_time = datetime.utcnow() print(f"""starting run {run_count} @ {(run_time - timedelta(hours=4)).strftime("%c")}""") await Porm.connect(dsn=Settings.DATABASE_URL) run = await Run.insert_one(Run(time=run_time)) async with httpx.AsyncClient() as client: response = await client.send(request) data = CitiBikeResponse.parse_obj(response.json()) for cb_station in data.data.supply.stations: station = await Station.fetch_first(Station.citibike_id == cb_station.station_id) if not station: station = await Station.insert_one( Station( citibike_id=cb_station.station_id, site_id=cb_station.site_id, name=cb_station.station_name, latitude=cb_station.location.lat, longitude=cb_station.location.lng, ) ) await StationStatus.insert_one( StationStatus( station_id=station.id, bikes_available=cb_station.bikes_available, ebikes_available=cb_station.ebikes_available, docks_available=cb_station.bike_docks_available, run_id=run.id, ) ) await Porm.disconnect() async def daemon(): global run_count while True: asyncio.ensure_future(run()) await asyncio.sleep(5 * 60) run_count += 1 if __name__ == "__main__": uvloop.install() asyncio.run(daemon())
 
I use a loop to spit out asyncio futures that run in the background and wait 5 minutes before starting the next one - this way I get an exact 5 minute interval between each run, regardless of how long it takes for run() to complete each time. I’m using p3orm as my ORM of choice (give it a try if you agree with its philosophy) to persist the station and status of each station.
 
I’ve got it running in a simple tmux session I’m keeping open on my 7 year old DigitalOcean droplet. It’d have been more professional to set it up as a proper systemd service but, tmux was quicker and it’s like 2AM. Looking forward to getting some insights on this in a couple of weeks and spitting out some sweet r/DataIsBeautiful GIFs.
 

 

📓 2022.08.27 - devlog CDN

 
Notion’s not a great image CDN, as its images are just stored in S3 at full size. This meant that the images I upload here take pretty long to fetch and draw to the screen. Because of this, I spun up a quick “CDN” to stand in front of my uploaded images.
 
It’s just a simple FastAPI route hosted on Vercel that intercepts any image request, downscales it to 80% JPEG quality before caching it in memory and returning it. It reduces size ~10 fold for some of the larger screenshots I’ve posted and significantly speeds up fetching.
 
The app itself is super simple.
 
from io import BytesIO import httpx from fastapi import FastAPI from fastapi.responses import StreamingResponse from PIL.Image import open as pil_open app = FastAPI() MEMORY: dict[str, BytesIO] = {} @app.get("/") async def get_resource(image_url: str): if image_url in MEMORY: buffer = MEMORY[image_url] else: async with httpx.AsyncClient() as client: response = await client.get(image_url) image = pil_open(BytesIO(response.content)) image = image.convert("RGB") buffer = BytesIO() image.save(buffer, format="jpeg", quality=80) MEMORY[image_url] = buffer buffer.seek(0) return StreamingResponse(buffer, media_type="image/jpeg")
 
Unfortunately, there’s still ~1s of waiting before the image is actually served by the FastAPI app.
notion image
I’ll look into that another time.
 
 
 

📓 2022.08.16 - initial commit

I work on a lot of side projects of all shapes and sizes and I’ve been meaning to create a devlog, but I was stuck on the question of “how” to build it. Using something like Medium or handwriting HTML/JSX seemed either too blasé or too tedious. I was intrigued by options like Hugo and Gatsby, but I wanted something that neatly tied into my existing personal site.
 
I’ve been a fan of Notion for a few months now, particularly its rich editing features and display components, so I decided to use Notion as my CMS. Turns out (of course) there’s people already doing this, and there are some great libraries (notably, but not exclusively) coming out of NotionX.
 
They’ve created a set of libraries that can
  • Fetch content from a Notion page using their “private” API and
  • Render (virtually) every available block with a React component
 
Setting this up was really straightforward. I was already running this page with Next.js on Vercel. Is that overkill? Yeah, probably. But I like playing around with frameworks and I was tired of hosting this on a self-managed VPS. Anyway, back to the topic at hand, getting this set up with my current stack was dead simple, here’s the code for my devlog.tsx page
 
import { GetStaticProps, GetStaticPropsContext } from 'next' import { NotionAPI } from 'notion-client' import { ExtendedRecordMap } from 'notion-types' import { NotionRenderer } from 'react-notion-x' import NextImage from 'next/image' import NextLink from 'next/link' import styled from '@emotion/styled' const devlogPageId = 'redacted even though it doesnt really matter' interface Props { recordMap: ExtendedRecordMap } export default function BlogPage (props: Props): React.ReactElement { return ( <Wrapper> <NotionRenderer recordMap={props.recordMap} fullPage={false} darkMode={false} rootPageId={devlogPageId} rootDomain='raf.al' previewImages components={{ nextLink: NextLink, nextImage: NextImage }} mapPageUrl={(pageId) => pageId.replace(/-/g, '') === devlogPageId ? '/devlog' : `/devlog/${pageId.replace(/-/g, '')}`} /> </Wrapper> ) } export const getStaticProps: GetStaticProps = async (context: GetStaticPropsContext) => { const notion = new NotionAPI({ activeUser: process.env.NOTION_USER, authToken: process.env.NOTION_TOKEN }) const recordMap = await notion.getPage(devlogPageId) return { props: { recordMap: recordMap }, revalidate: 10 } } const Wrapper = styled.div` min-height: 100vh; width: 100%; `
 
This fetches the contents from Notion’s API server-side using GetStaticProps. We use GetStaticProps because this content will be the same for everyone, which means we can easily cache the built page. Additionally, this function returns revalidate: 10, which tells Next.js that the cached built page should be invalidated after 10 seconds and updated in the background.
 
That’s it for now. Tomorrow I’m back to work on my film scan editor.